#hardware

1 messages · Page 4 of 1

craggy ferry
#

so i turned it off until i can make a much more isolated agent

crystal cedar
# craggy ferry so i turned it off until i can make a much more isolated agent

thanks, i am completely new to this but will read up. thanks for prevous tip on why long context, understand it stores all calculated vectors to save on compute (so essentially the matrix in a way).
did u hear of seymore cash? anthropic vending bot dealt with mischievious prompts by having other agent reviewing whatever first agent wants to do; think its the best shot at dealing with prompt injection but yes terrifying subject.

craggy ferry
#

that does just mitigate the risk, not remove it

#

now, you just have to get agent 1 to generate a prompt injection for agent 2

crystal cedar
#

yea true. but was thinking second agent could be bestowed with hightened state of awareness, expecting any prompt to prompt injection, treating it accordingly

#

sort of your paranoid friend

craggy ferry
#

that's all just mitigations

#

you can't solve this with more layers of llms

crystal cedar
#

i'm not sure but i'm cautiously optimistic it is the best shot

craggy ferry
#

what you can do is have the agent output a workflow that you parse with good old fashioned code

#

and then validate that it doesn't do anything weird with more code

#

Then you've made the problem much more tractable because we already know how to review code for security flaws and we can actually do something about it

#

and it's a deterministic system

#

gotta give those cpus something to do

crystal cedar
#

i just know of two kinds of prompt injections - the one step (ignore all instructions / execute malicious code) and for that your solution might be better.
I was thinking of the grooming kind of multi step prompting, where a sequence of prompts has an event not discernable from the individual prompts

craggy ferry
#

the solution i just described stops that, too

#

because in the end your agent has to tell me what it wants to do in a format which i'm not parsing with an llm

crystal cedar
#

well i will happily admit that i worry more than i understand 😄

craggy ferry
#

you should worry about people who want to just stack llms on top of each other and go "is this anything?"

#

i mean someone should try it

#

mitigations are good, swiss cheese defense is good

crystal cedar
#

true

brave cedar
#

have you heard anything about this setup? I'm considering something similar. My only concern is RAM

final atlas
#

Is anyone consistently using a 30b local model with openclaw? How are the results compared to using say opus 4.6 or codex 5.3? I'm trying to decide between building out a machine to do a 30b model with and just using a $200 codex 5.3 sub, I know opus 4.6 is probably not feasible cost wise since they block the sub from 3rd party use

#

From coding use I find opus to be a lot more personable compared to codex so Im worried it'll be bad for openclaw

brisk shale
craggy ferry
# final atlas Is anyone consistently using a 30b local model with openclaw? How are the result...

I really like glm-4.7-flash, it fits in my 48gb card at Q6 with two 200k context windows. I actually barely ever ask Opus/Sonnet anything from this setup, but I usually try to have one of those models (since sonnet4.6) do a final review pass over a plan my glm comes up with.

I also have been experimenting with a 14b model as an executor to maybe get some more tokens in there, since I'm kind of running up against my max.

#

I get about 70-100 tps output and 400-2000 tps prompt input with glm this way

final atlas
lost remnant
#

I don’t understand the hype behind mac minis. they can’t run effective local models and everyone uses API credits with them anyways

crystal cedar
lost remnant
#

if only I could prompt my llm to 1 million dollars ….

crystal cedar
#

rpi5 8gb vs mac mini base model 16gb? if i had a rpi 8gb i would still consider whether i could use a small local model. inference is painfully slow, but for some things maybe that don't matter. api use expensive if you PAYG for tokens, and subscription models not made for bots.

crystal cedar
final atlas
#

what if I wanted to build a machine that actually can run the best open source models though

#

I mean there's the mac mini M4 pro at unified mem 64GB for 2 grand, but im assuming you can build the same for cheaper

crystal cedar
#

so reckon good time to watch and learn

final atlas
#

are the mac products supposed to be better or cheaper at running LLMs?

#

compared to like building a machine yourself

crystal cedar
#

well problem right now is unprecedented scramble for ram, google ramageddon, with prices shooting up. what you want is not normal ram but vram, typically in graphics cards for gaming rigs. those are expensive too. macs have one crucial advantage in unified memory basically making it almost as good as vram if i get it right

final atlas
#

how come I don't see anyone saying they run openclaw with the $200 openai sub

#

is codex 5.3 just horrible for openclaw?

crystal cedar
#

i don't know, depends what you want to i guess. all the cool things seem to happen with the latest and best models and people have been able to use their subscriptions for that for a while. seems now people get banned left and right because terms of service expect user to be human not human who uses computer to prompt to kingdom come

final atlas
#

I thought anthropic straight up does not allow anyone to use their max subscription for 3rd party stuff, like it won't even connect

crystal cedar
#

well it was called clawdbot because it just hooked up to claude right, not familiar with details, but sounds like people were using subscriptions and got banned

craggy ferry
crystal cedar
#

for starters, an ocd like respect for json formatting

proper trench
#

I connected a few days ago on pro even

final atlas
#

Yeah just figured this out in the general channel, I guess it still works fine, just that some select few people are getting banned. Apparently they're somewhat vaguely okay with solo devs using their subscriptions for 3rd party stuff

humble holly
jagged rune
final atlas
final atlas
waxen scaffold
#

Is Anthropic cracking down on openclaw API usage?

woeful mauve
#

Has anyone really got any local models to work effectively? I'm a bit constricted on just 12GB GPU, I tired something small like llama-3.2-3b-instruct, Qwen3-8B and they can't handle gog or other tools reliably I'm finding.

nemotron-3-nano works way more reliably but with it spilling over into RAM, I'm pretty limited to the context window plus it runs terribly slow.

ornate fractal
#

hi, What do you think about using a hybrid model? I have Minimax Cloud, and for local use I have QWEN 2.5 14b Coder. I have a gaming laptop with 4GB of VRAM and 24GB of RAM (I plan to upgrade to 48GB).

#

I also have a MacBook Pro M1 with 8GB of RAM, could that be more useful?

crystal pike
#

Just ordered cheapest version of Mac mini m4 (16GB, 256GB SSD) after playing with the cloudfare moltworker. Ordered and shipped with 2 days out. Not bad, thought there would be more of a delay

tender anvil
undone mauve
#

because eu based hetzner cax11 servers start at 3.99 USD (4gb ram, 40gb ssd, 2 core arm cpu)

hidden kelp
#

Howdy. I am hoping to run openclaw with local llms and have these in the use: 1) linux desktop+rtx3090 2) macbook pro M2 MAX 3) (game) windows PC with rtx5090 ... what would good sensible way to utilize those 3 for local llm with openclaw?

undone mauve
hidden kelp
undone mauve
hidden kelp
#

most important question being is it even worth to try

undone mauve
# hidden kelp I would rather hear experience from real people with similar HW what are their e...

look at https://github.com/LMCache/LMCache

dual 3090s if they have an nvlink bridge along with tiered caching is important, but you aren't going to be running anything higher quality than dirt cheap llm models you can cheaply/freely use from openrouter/nvidia nim api, only benefit at that point would be for things like vector similarity but that's also dirt cheap from voyage.

use a frontier class main model (ex. Opus/Sonnet 4.6) and offload subagents to models that can run with dual 3090s and tiered caching.

#

maybe the 5090 based system is better, but i'd only use it if you don't plan to make this a 24/7 based system, same with the macbook unless the macbook has- LOTS of ram, but i don't recall the M2 Max being higher than 96gb, but it doesn't mean it'll outdo a setup with tiered caching and much faster tps performance

craggy ferry
#

now that i fixed the main caching bug every time i trigger a rebuild of the prompt it makes me sad

#

guess it's time to burn more clod code credit to contribute refined tokens to the repo

quasi ether
#

Hey does anyone know

#

How to set up Mac cluster for OPENCLAW?

crystal pike
crystal pike
crystal cedar
tired plover
#

Thanks for the heads-up @crystal cedar I’m going to try minimax m2.5 as it had better agentic performance but lacks the reasoning like gpt might test both for my scenario 🙂

crystal cedar
tired plover
#

I was going back and forth with opus4.6 on that, maybe like 5-6 hours now about all the requirements and things, TLDR; current models are not fine tuned and there is much more coming, all depends on your Capacity but with 128GB you should be able to run these model for better or worse on local hardware

NEW ⭐ Qwen 3.5 397B/17B ~100GB 1.58b ⚠️
1 MiniMax M2.5 230B/10B ~101GB Q3 ✅
2 GPT-OSS-120B 120B/5B ~75GB Q4 ✅
3 GLM-4.5-Air 106B/12B ~60GB Q4 ✅
4 Devstral 2 123B dense ~75GB Q4 ✅

crystal cedar
#

Devstral is a bit newer isn't it, and a smaller version of GLM 5 might come. You have to run Minimax in Q3

#

I've been playing around with much smaller models, have no idea what to run on 128GB

quaint escarp
tired plover
#

Heavy Q hahaha

crystal cedar
#

You could always double down and get a second DGX 😄

tired plover
#

Opus told me to start with Minimax and then Gpt

tired plover
crystal cedar
tired plover
#

With spark I could try to fine tune a small model but I don’t have data

#

My data scientist friend just deployed his and wants to start to train a model but takes weeks or month till finished

bitter cipher
#

the issue is that larger context windows need more memory and slow down inference as well.

#

e.g. 128k context ads another 30gb or so.

tired plover
#

Nobody has data on my tasks…

crystal cedar
# tired plover My data scientist friend just deployed his and wants to start to train a model b...

I realized today you might end up having made the call of a lifetime - mac minis sold out here and there because people thought buying one and open source software would get you a digital slave making 10K a day while you sleep. Subsequently, people realized hey also need claude subscription and started using it without any regard for tos. As a result, banning and now we are seeing the "banned from the gpt" (to the tune of "born in the USA") phase. That it term will be followed by people realizing that PAYG API is expensive, but that they can have their own supercompute at home for a few grand. Once that realization kicks in, people will end up buying up all the DGXs overnight. Meanwhile I am waiting for the new Mac Studios. Feels a bit like that meme about the guy running to catch a plane which is taking off.

#

If RAM can sell out. and Mac minis can sell out. and there are what 10000x fewer DGXs around... it takes very little for them to sell out. Think toilet paper and covid.

#

And remember this. All it takes could be a new model, surprisingly fit for openclaw on something like the DGX.

tired plover
#

Might be half the truth, with the Sunday calls here on the discord there are already Chinese labs involved ready to take all the people who have macminis and hurting wallets, there must be and will be room for both sides, more or less the patriots and security people (company’s) will choose to buy these or bigger machines, average joe will switch to cheap plans and PAYG API

#

Because everybody knows but accept that their data will be feeded into mother china haha

#

100% agree on local fine tuned models, then systems will sell out amazingly quick but then it will get even smaller for phones to run

crystal cedar
#

with a dgx spark around, you could prolly run one version of openclaw on your own phone and use the dgx as a server, seems its good at parallel jobs

tired plover
crystal cedar
#

i mean once security issues etc reaches a level where you are comfortable

#

yea skipping telegram

#

pretty cool, using your phone to prompt your supercompute to vibe code

tired plover
#

Future shines bright… but even though I don’t have it yet (arrives tomorrow) I already have buyer regret because I have the feeling it’s not enough memory…

crystal cedar
#

maybe you can vibe code the app yourself 😄

tired plover
crystal cedar
#

i wouldn't feel bad if i were you - ramageddon creates incentive to excel on what people have, and models keep getting better

#

so probably increased interest in all kinds of smaller models

#

not sure 128GB qualifies as small tho

tired plover
crystal cedar
#

that is an absolutely amazing thought. but by then waiting time for dgx will be 10 years 😄

#

and all i can do is cry about it in the shower, asking myself why i didn't get one

tired plover
#

I mean, realistically it will only get worse for next 12 month, if you think that openclaw is worth something, might be smart to pull the trigger 🤷‍♂️

crystal cedar
#

yea i think it will. security/privacy nightmare but also best thing in 50 years.

tired plover
#

For me it’s security reasons on my task and I want to integrate in my life without selling my data, without that I could live on free breadcrumbs from Chinese labs 😅

crystal cedar
#

maybe you can ask ai to create bogus data - if your data leaks, nobody will know what is of value 😄

tired plover
#

Don’t want to open up about my job etc but I have insight into IT and supply chain, everybody says start of 2027 it should get better but by then many people wait for this moment to start buying again, my personal opinion is that we will have rough 2-3 years with these problems and it mostly gets worse

tired plover
crystal cedar
#

can use a small model 😄

tired plover
crystal cedar
#

i had a look at nvidia site, they had a dumbed down quickstart. surprisingly they suggested WSL and lm studio or ollama. on my RAM deprived gear, going with lubuntu and llama.cpp to squeeze out what i could. not sure if it matters for the spark.

tired plover
#

Always matters if you want to expand context window, the more the better

crystal cedar
#

well i'm off to dinner now, but thanks for the chat. cool that there are a few people with dgx getting early impressions of oc on dgx. i'll prolly fomo and get one too in a few days

tired plover
#

You can always dm or ping me 🙂

crystal cedar
quartz zinc
#

Please continue to AGI the IoT possibilities.

azure spruce
#

is it really Mac Mini or no party? haha

#

i have just failed miserably on an old surface pro i use a Macpro thinking of jusst biting the bullet and buying a mac mini

#

can anyone assist

hybrid wharf
#

Buy a Mac Studio. They are better.

vital girder
#

I have a Mac mini M4 that I use as a server. Should I run Claw Bot on a virtual machine locally, or would it be better to use a cloud VM provider like Hostinger

hybrid wharf
#

Either is fine. I would recommend podman on local..

vital girder
hybrid wharf
craggy ferry
#

It’s not a vm. Use a real vm for openclaw

hybrid wharf
#

Sigh, why? It isolates the file system and uses your main compute resources. There is no such thing as a "real vm". They all work in different ways.

verbal sigil
crystal cedar
still rampart
crystal cedar
#

everyone is new to this, gotta keep an open mind. i probably get things wrong all the time.

still rampart
#

I keep going back and forth on the dgx

crystal cedar
# still rampart I call it failing forwards

there are a couple of versions of it - the asus gx 10 is priced around 3K right now, might be good value. alternatives are the amd ai 395+ and 128 unified memory or maybe mac mini m4 pro or studio with 128gb unified memory or wait for the m5 processors due out in a few months

#

if you're considering something with 128gb might want to keep an eye out for a potential deepseek r2 release. if its announced and it is extraordinarily good, it could be a big thing also for dgx demand

still rampart
#

Whatever it is I just won't buy an apple. At the moment they seem the best buy, but that will change

crystal cedar
still rampart
#

I'm east coast US

crystal cedar
#

ah ok, well good knows is its 3K USD for you guys and thats even cheaper 😄

#

dell also has one, not sure what it retails for over there

#

dell precision pro max? search for gb10.

#

i'm really hoping there will be some kind of announcment on the upcoming macs very soon

still rampart
#

First I've seen the max+ 395. That's 128 unified like the spark?

crystal cedar
#

i'm pretty sure it is but don't hold me to it. AMD, dedicated AI processor, comes with 128GB and then probably a graphics card too

#

gaming rig

#

AMD site says *The Ryzen™ AI MAX+ 395 is available today with system memory options ranging from 32GB all the way up to 128GB of unified memory – out of which up to 96GB can be converted to VRAM *

still rampart
#

I got an Olares One, 5090mobile in it, only 32gb vram. Had fomo and jumped on the Kickstarter

#

I was just looking that up

crystal cedar
#

right now 128 might not be enough to run things like the latest kimi, but i'm willing to make a bet that something new could come in the next months that causes run for the 128gb segment

#

speculating of course. my gamble was to wait for the mac announcement to see what the new studios are like and then decide what to buy.

craggy ferry
craggy ferry
crystal cedar
craggy ferry
#

I think in six months local models that fit in 128g will probably be competitive with like opus 4.5

verbal sigil
crystal cedar
verbal sigil
still rampart
craggy ferry
#

Can’t stand sending my entire literally everything to anyone else so welp

still rampart
crystal cedar
#

hey @verbal sigil check out epoch.ai/data-insights/consumer-gpu-model-gap - excerpt: *Using a single top-of-the-line gaming GPU like NVIDIA’s RTX 5090 (under $2500), anyone can locally run models matching the absolute frontier of LLM performance from just 6 to 12 months ago. *

#

GPQA improvment as function of time looks linear.

#

well up until now at least

#

but fun chart

verbal sigil
#

I have my trusty 3090

#

Good times

#

when model were 7millions

#

not billions

crystal cedar
#

Lots is happening when it comes to small (sub 3B) models too tho - lfm2, nanbeige 4.1

verbal sigil
#

BTW, I have created a benchmark for small models - whether they can do the bootstrapping successfully

#

I suggest to all local small model users to hatch models by specifying what to do during the bootstrapping

#

something like "consult the bootstrap file, update the soul etc and remove the bootstrap"

crystal cedar
#

wow

#

now all you need is a virus and you have some kind of neumann probe

#

clawifying the whole planet 😄

native valve
crystal cedar
# native valve henry do you think i could download clawdbot onto my macbook e?

well the software it self you can download on pretty basic hardware, problem is for it to work you need access to advanced ai. people used subscriptions for that, but now it seems bot use is not allowed so that route is blocked. what is left is either to pay for the use in other ways or get advanced hardware. right now both options look prohibitively expensive (but that might change). if you're interested look around see what other people are doing and learn from their mistakes and wins

#

if you really want to try it out, don't use your regular computer for it - see if you have old gear that you don't need, wipe it clean of private stuff, and put it on a guest network for starters, assuming it might get hacked by someone. if that happens at least you're hopefully not leaking any sensitive data

#

good time to learn something new about it every day now, see what other people are trying.

quartz zinc
#

Can people distill a robotics AI model to control all the motors just through AI (real-time)?

crystal cedar
#

an alternative route could be to try to use local ai, i.e. run some small kind of ai on the machine itself (or as a server on a diffrent pc). it won't be as "smart" as claude, but could still help out with somethings like summarize emails if you have a lot, or watch homepages, while you learn more about how this thing actually works.

native valve
#

the mac mini looks amazing. sadly, my savings are tapped dry

vital girder
#

does anyone know how can i add my api key i deleted my api key by mistake and im not sure how to add my new api key

green mortar
low grotto
#

Why are people choosing to run on Mac Mini's when they just use API's anyway?

ancient wagon
#

How are you guys making money with openclaw thing😏

zealous veldt
uneven ridge
#

Is there an official IOS openclaw?

crystal cedar
crystal cedar
# uneven ridge Is there an official IOS openclaw?

no. the developer himself used instant messaging apps to chat with openclaw installed on a normal pc. but technically i suppose it could run on a phone, might drain the battery though since its always on and working

waxen gorge
#

anyne been able to get the ios app to connect>?

random void
zealous veldt
prime aurora
#

is it right to understand that the openclaw docs recommend using a vps for gateway and just use physical hardware as nodes?

quaint lantern
quaint lantern
crystal cedar
quaint lantern
quartz zinc
#

So I’m thinking, if we help setup automated things in real-life that makes ASI happen faster (that eventually saves everybody), this is what to do? 🤔

stable rampart
#

Hi, I started using OpenClaw yesterday. I wonder whether people notice the high CPU usage? My AMD Ryzen 9 7900 12-Core Processor is running at 100% constantly once I send a new message, long before my 4090 fires up. I wonder what can justify the full usage of a 12-core CPU for an LLM-based application.

bronze abyss
#

Hello, i would like to connect my claw with Smart-Glasses. Brilliant Labs Halo looks like the best choice. Has anybody done that already?

quartz zinc
astral gobletBOT
quartz zinc
astral gobletBOT
astral gobletBOT
# quartz zinc https://x.com/wildmindai/status/2024810128487096357?s=20

17,000 tokens per second!! Read that again!
︀︀LLM is hard-wired directly into silicon. no HBM, no liquid cooling, just raw specialized hardware. 10x faster and 20x cheaper than a B200.
︀︀the "waiting for the LLM to think" era is dead. Code generates at the speed of human thought.
︀︀Transition from brute-force GPU clusters to actual AI appliances.
︀︀taalas.com/the-path-to-ubiquitous-ai/

**💬 66 🔁 95 ❤️ 976 👁️ 51.0K **

steep wedge
crystal cedar
steep wedge
#

If I blow more money on this, I want another GX10 🤩

dawn garden
#

41 GB RAM
Intel Xeon 4.5GHz 12vCores
NVIDIA Quadro RTX 6000 24GB

what sort model could it run

sullen thicket
#

Listen, I’m dabbling into deeper waters than I probably should. What is the best hardware on a budget for this AI model? I’m looking to have an “agent”
To help in my real estate business and some personal scheduling etc.

crystal cedar
#

then again, dual sparks would let you run larger models, so 2 x DGX Spark + 1 Mac Studio

craggy ferry
#

Conceptually I can see it. Just wondering if something already handles it

#

Since yeah I guess an m3 is going to be fairly slow at prefill

red cypress
#

still dabbling, considering using an Intel NUC 12 (i7-1260p) (ubuntu 24 LTS desktop) with 64gb of ram connected to a razer core x with a rtx 3090 to run ollama for pipeline stuff for my agent (runs opus-4-6, but maybe can offload some stuff to local llm's (heartbeat, TTS, stable diffusion generation on LoRa trained image, etc). gateway runs on a vps, but this will be a node. hoping smaller models can run in 64gb ram and models needing faster token speed on the vram. Its been to fun to tinker with openclaw and learn more about AI. I know the thunderbolt 3 is a bottleneck but I already had the hardware. Still haven't figured out what local models are really decent at. so much to ingest.

tranquil hazel
#

just get a mini pc

quartz zinc
astral gobletBOT
limpid bay
worthy cargo
#

What are people's recommended options for rackmounted hardware for OC? I've recently acquired a server cabinet and thinking through how to migrate off of my personal machine

wind fog
#

What’s the oldest hardware ya’ll have OpenClaw running on?
Me? Lenovo T430 laptop running Ubuntu server. Obv not using local llm.

crystal cedar
tacit dock
#

Rosewill and SilverStone, too... they're all pretty similar for 4U

red cypress
steel quarry
magic raven
#

you probably can run gpt-oss-20b or llama3.2:20b

craggy ferry
crystal cedar
craggy ferry
#

I mean if I had this then I would probably have enough prefill compute to let my friends use some tokens too

#

I’m currently just hyper optimizing prefix cache

crystal cedar
#

maybe the new mac studios will have m5 pro/ultras that will perform better than the spark

craggy ferry
#

Yeah, I’ll hold off and see what gets released first. If the m5 is amazing and they also have a 1tb variant I might be buying a car

#

My current focus is convincing my agents to actually use the specialist models

crystal cedar
craggy ferry
#

I just got set up with multiple threads to my front agent

#

It’s so good now

crystal cedar
#

cool!

craggy ferry
#

I can just switch to a different context (or make a new one) if I want it to answer a random question but don’t want to nuke the perfectly good context window where we’re discussing some issue or other

maiden obsidian
#

should I host my openClaw to my mid-tier gaming pc? I dont have much important information on that, its mostly just games, that shouldnt be risky right?

im currently running it on Oracle Cloud x86 1gb ram, 1 core cpu

my pc specs are
Ryzen 7 5700, Radeon 7600, 16gb ddr4 3200mhz, 1tb nvme

random void
maiden obsidian
random void
gentle flax
# magic raven you probably can run gpt-oss-20b or llama3.2:20b

mb for butting in but I have a 5060ti setup rn running this Quant 6-bit. with more than enough headroom for ctx or wtv. It seems to not be able to make basic tool/bash calls. I wouldnt think this would be a result of the quant but not sure how to fix it at this point. Have you personally had success with this model> thanks in advanced. I can show examples of chats if interested

magic raven
#

it's because i own the BLACKWELL 6000

#

IT COST ME MY KIDNEY AND ALL MY RAM

#

and i bought 2 more

tired plover
#

@crystal cedar have some results on testing on Spark, with Llama Server and various models I had bad experience in quality, speed was mostly ok if a bit slow but quality is not good on local LLM, wondering what other people experience…

#

Now moving to vLLM with spark specific models

crystal cedar
# tired plover Now moving to vLLM with spark specific models

Thanks for the update - sounds like you're having an exciting weekend! From what I've read i would expect it to be slow for llama server, but really rip for parallel calls in vllm. As for models, seems minimax and gpt-oss are the ones many have been using and/or preferring, are you using the spark-specific nvfp4 quantizations? I saw that you could download models from either huggingface or some dedicate nvidia repository - not sure if there is any difference. Are you already running openclaw on it or on something else with the gx10 as a server? Most importantly, how does the 1TB feel - after DGX OS and two models, still room on it? I'm seriously considering jumping in too towards the end of the month, but wallet loading slowly.

tired plover
#

@crystal cedar so, i tried general stuff, easy to set up, had some succes with permformance but quality was always meh... im trying now vLLM with NVfp4 trained model

#

only thing right now, pytorch takes endless to work up and it crashes while starting, need to figure out whats the problem

gentle flax
gentle flax
#

It just says it will do something like read SOUL.md but never once makes a tool call

#

i’m having this issue on and off w diffrent models

#

j trying to see what works w others

tacit dock
magic raven
crystal cedar
magic raven
crystal cedar
# magic raven way more niche...

i knew of models for creative fiction and role playing, just didn't realize there were such narrowly adapted gpts. thanks for teaching me something new!

magic raven
#

having this thing invade my screen saying "all ur base r belong to me" is scary asf

crystal cedar
#

so if amount of agent attributed shitposting increases in the next few days, i know your agents are up and running fine? 🙂

magic raven
#

special software

blissful stirrup
#

Anyone in here running an rx 7900 xtx? As this seems the only affordable alternative to nvidia gpus i was thinking of getting one.

opaque silo
#

Guys... are you really being banned from Claude for using openclaw with your subscription?

hardy tinsel
#

what specs do i need to run claw

fierce thicket
crystal cedar
static sky
#

Beelink SER5 MAX Mini PC, AMD Ryzen R7 7735HS (8C/16T, i4,75GHz), Mini Desktop Computer 24GB LPDDR5 RAM 500GB PCIe SSD | will this hardware be good enough to experiment a little bit? Don't need video stuff, just text.

steep talon
tired plover
#

i was moving from Llama to vLLM for specified support on DGX Spark, problem is you have no tool calling for them, i needed to implement a proxy with Claude now i have 39 token/s without MTP (makes it slower) on qwen3 coder next

#

response is very snappy now, to a degree where i would say even close to cloud performance, as I'm still testing i need to see how good the quality is but for now im stoked how good it works after first trys with Llama being slow AF, hopefullly openclaw team soon implements a fix for the tool calling bug and i dont need a proxy anymore... anybody knows who i could ping for that ?

steep talon
# tired plover how many tk/s you get ?

With a curl directly to the LLM about 10 tk/s and I can see that reasoning is still on. I've set the follow to turn it off, but no change.
environment:
- VLLM_REASONING_BACKEND=None
- NIM_REASONING_MODE=disabled
- VLLM_ENFORCE_EAGER=true

tired plover
#

maybe you should also go away from Llama Server, i dont have the knowledge to really say whats the problem but with vLLM its much better but also complicated... maybe check the guide in the Docs with LMStudio

mossy quest
#

I'm running OpenClaw with online providers on a Raspberry Pi 5 8GB. Works perfectly.

atomic hull
tired plover
#

@steep wedge how did you work on the tool calling ?

blissful quarry
blissful quarry
quartz pawn
#

I’m setting up and testing in OSS 20B. What model should I size up to for my hardware: 5090+3090 (56gb vram) and 128gb DDR5?

outer epoch
#

I bought this:

GTR9 Pro

128 GB unified VRAM

IMO is the best quality/price you can get.
Don't buy a Minisforum, since the second drive runs an x1, so it's like a SATA3 😂

For the same performance:

  • Apple cost 2.5X
  • NVIDIA cost 2X but rely on normal RAM, so models can't run at full potential...

Best hardware bought this year

crystal cedar
outer epoch
#

All o the same machine

sand axle
#

someone issues with docker desktop on windows? i have huge problems running it

dull crescent
outer epoch
outer epoch
steep wedge
outer epoch
#

I've seen DGX Spark in action, half power of what they claim... 😅

outer epoch
# steep wedge I think I’d rather have the Asus GX10

I can't find the deep review I've seen weeks ago, but if you Google a bit, you'll figure out that's an overpriced stuff and nobody tell you that most of the cores are eCores... Moreover being ARM, most of the things you could do with the Beelink, are not working.
You must use the distro Nvidia give you, ok, works like a charm with CUDA, but for all everything else, it's a piece of trash.

outer epoch
covert kindle
#

what is the best mini pc for cheap entry to install openclaw on it, instead of a $600 mac mini?

crystal cedar
potent ridge
#

I see a lot about installing on a local machine… can it be done on VPS?

crystal cedar
covert kindle
crystal cedar
craggy quail
dull crescent
outer epoch
wild socket
#

What's the minimum mac mini spec Openclaw can run smoothly on?

craggy ferry
#

Any of them

bronze ermine
craggy quail
covert current
#

Can we use window laptop or need a stronger machine?

bronze ermine
# covert current Can we use window laptop or need a stronger machine?

Windows, Mac, Linux all work. You need a computer that utilizes a terminal screen, and has at least 4GB of RAM. That's really the barrier for entry. People are installing openclaws on $25 android phones from 2016, as well as Raspberry Pi 4's. Your junked laptop from ten years ago can get the job done.

bleak rapids
#
foxfetch is presented by FJOX.WIN
         .://:`              `://:.            root@FJOXSERVER24SE
       `hMMMMMMd/          /dMMMMMMh`          -------------------
        `sMMMMMMMd:      :mMMMMMMMs`           OS: Proxmox VE 8.4.16 x86_64
`-/+oo+/:`.yMMMMMMMh-  -hMMMMMMMy.`:/+oo+/-`   Host: ProLiant ML350 Gen9
`:oooooooo/`-hMMMMMMMyyMMMMMMMh-`/oooooooo:`   Kernel: Linux 6.8.12-18-pve
  `/oooooooo:`:mMMMMMMMMMMMMm:`:oooooooo/`     Uptime: 10h 5m
    ./ooooooo+- +NMMMMMMMMN+ -+ooooooo/.       Packages: 942 (dpkg)
      .+ooooooo+-`oNMMMMNo`-+ooooooo+.         Shell: bash 5.2.15
        -+ooooooo/.`sMMs`./ooooooo+-           CPU: Intel Xeon E5-2690 v4 (56) @ 3.500GHz
          :oooooooo/`..`/oooooooo:             GPU: NVIDIA Tesla M10
          :oooooooo/`..`/oooooooo:             GPU: NVIDIA GeForce GTX 1080 Ti
        -+ooooooo/.`sMMs`./ooooooo+-           GPU: NVIDIA Tesla M10
      .+ooooooo+-`oNMMMMNo`-+ooooooo+.         GPU: Intel DG2 [Arc A310]
    ./ooooooo+- +NMMMMMMMMN+ -+ooooooo/.       GPU: NVIDIA Tesla P40
  `/oooooooo:`:mMMMMMMMMMMMMm:`:oooooooo/`     GPU: NVIDIA Tesla M10
`:oooooooo/`-hMMMMMMMyyMMMMMMMh-`/oooooooo:`   GPU: NVIDIA Tesla M10
`-/+oo+/:`.yMMMMMMMh-  -hMMMMMMMy.`:/+oo+/-`   Memory: 350822MiB / 419069MiB (83%)
        `sMMMMMMMm:      :dMMMMMMMs`
       `hMMMMMMd/          /dMMMMMMh`
         `://:`              `://:`
covert current
bronze ermine
#

Why would anything be slowed down? It's all in the cloud. The people getting mac studios and other crazy setups are doing so because they want to run local models. The trade off is local models are still borderline unusable for most functions.

wicked mauve
#

Your laptop off = your bots offline

covert current
wicked mauve
brave bison
woven jungle
#

I have a 3090 Ti with 24gb of vram and a MacBook Pro M1 Max 64gb. What is the best model which you can use good together with OpenClaw? I played with LM Studio and the Macbook with the qwen3-72b-embiggened-i1 mode, but I do not receive any answer. I see in the LM Studio Developer log that something is going on but it stop without any answer. I just send a ping 😛

lyric token
thorn umbra
tired plover
#

For everybody who got a DGX Spark look at Avarock Git, he got something really good, qwen3 with mtp and up to 119 token per second I’m running it and it’s pretty good for its speed

crystal cedar
tired plover
#

Exactly

#

I checked now some models but with spark you need to go special with vLLM

crystal cedar
#

Man this is really nice to see, feels like the DGX is some kind of uncharted hardware territory just hiding a wealth of possibilities

tired plover
#

It really feels like it and you can read on his blog it’s just the start as it’s all unofficial they just ahead of NVIDIA, in the coming month I would like to see official support and more models on that, then nothing can beat it in that price bracket

#

Only gateway process makes my life hard now…

crystal cedar
#

you're right, i've changed my mind, ordering my first one soon. for single user chatting on ollama, the "low" tps was a bit discouraging. but agents working in parallel and vllm changes everything

tired plover
#

In standard config it’s up to 128 hahaha

crystal cedar
#

i'm giving serious consideration already to ordering a second one. the us bookseller website in germany lists them for less than 3 with delivery in 1-3 months right now. had a look at old maxed out mac studios (m3) - delivery expected 12-16 weeks from now.

#

i was hoping to be able to run openclaw with small local model, but seems safety issues more or less compels you to go as smart as you can.

#

i saw the bug you discussed preventing openclaw to rip using vllm right now, thanks for noticing that, saved me quite some work!

tired plover
#

Actually it solved itself I don’t use proxy anymore and it tool calls

#

A second gives you a lot more choices but with one you’re already good for the start but who knows how it plays out might also buy another one

coral token
#

Worth it to buy hw to run a 70b model right now with prices being what they are currently? Curious as to what people are doing right now and what the consensus is.

tired plover
#

I have a dgx spark

#

I thought price is good… better than 128GB Mac

calm jetty
calm jetty
craggy ferry
#

I want to try that GX10-as-prefill-node setup though

crystal cedar
craggy ferry
#

You say that but my desire for more hardware knows no bounds

crystal cedar
#

i have a feeling it could be people will be interested in used ones pretty soon too

craggy ferry
#

I’m hoping the m5 studios are good and soon

crystal cedar
#

are you familiar with exo? recently learnt they are based in london

craggy ferry
#

Yeah been looking over their stuff

craggy ferry
crystal cedar
craggy ferry
#

They usually let you return it and get the new one I think if you want to do that

crystal cedar
#

dear valued customer, you ordered an m3 but they are all sold out so here's an m5 instead as a small token of appreciation.
after all, we've been selling lots of mini macs lately so there is no end to our cash nudge nudge hint hint...

craggy ferry
#

Lmaooo

crystal cedar
#

well one can dream right 🙂 anyway cool piece of gear, they might become very difficult to come by, and i have a feeling m5 studios will be much more expensive

tranquil hazel
crystal cedar
#

from what i understand apple never really has to change their pricing which suggests they might lock in long term deals, but man with ram and everything going up.. i wonder what kind of long term deal the best negotiator out there can get

tranquil hazel
crystal cedar
dull crescent
#

I can’t recall where I read this, but the AI infrastructure for consumers are going to be split for those that can afford ai inference locally and those that will eventually be priced out.

So buying some small inference now makes sense even if you can’t afford it its worth the investment if you can find a way to become more productive

outer epoch
outer epoch
craggy ferry
# dull crescent I can’t recall where I read this, but the AI infrastructure for consumers are go...

Yeah, basically, the current prices from cloud providers are super subsidized. When the money spigot turns off for them, the token spigot is gonna turn off for us.

I think being able to churn out a steady flow of tokens locally, with open models that compete with current state of the art - as well as building the skills necessary to run locally at all - is going to be extremely worthwhile in a year or two

Either that or we make some breakthrough in architecture that massively reduces cost … which will make your local token production better too.

#

If you believe that Opus is going to be this cheap or cheaper forever, then, sure, buying local hardware doesn’t make sense. But I don’t see that being the case long term

dull crescent
dusk ridge
calm jetty
outer epoch
craggy ferry
craggy ferry
#

I’ve had my eye on one too, just not sure the compute is there to make it worth it

outer epoch
dusk ridge
dusk ridge
dry hull
outer epoch
tired plover
dry hull
tired plover
dry hull
tired plover
#

I had around 60 but then a lot of answers wouldn’t get through to the chat …

crystal cedar
tired plover
#

No when I checked the tk/s from vLLM

crystal cedar
tired plover
#

According to avarock you can get up to 120

crystal cedar
#

yea i saw that, very impressive stuff

tired plover
#

But it was with MTP and the accuracy is not very good with it why it dropped so many answers

#

With me maybe with v23 it will be better

crystal cedar
#

do i get this the right way that you are using a webui like open webui in a browser, token throughput is around 60, but replies fail to materialize in the webui?

#

in data transfer terms it does not sound like a very demanding load

tired plover
#

No they don’t come through in openclaw as they’re dropped

crystal cedar
#

ah.. ok get lost somehow along the way to openclaw - got it

tired plover
#

No sorry LLM drops it as the anticipated token doesn’t fit the answer you should get

#

And then openclaw need new turn to answer

crystal cedar
#

ah ok...

#

did you a) install openclaw on the spark too, or b) using it as an inference server?

#

btw not sure if you gave nvidias 30B nemotron 3 nano model a run for the money yet (it's interestingly the one nvidia recommends for 24-48GB GPUs), but they are due to release two bigger models any time now, nemotron 3 super and ultra, might be interesting.

dry hull
crystal cedar
dry hull
#

Yea I'm using vllm v0.16.0rc2 in a docker container from https://github.com/eugr/spark-vllm-docker. I had codex update the docker scripts to use that vllm version and transformers v5+ so I could run the gadfly nvfp4 quant of qwen3-coder-next

rocky sleet
tired plover
tired plover
crystal cedar
eternal tendon
#

have you had much success with qwen3? i have the same issue...

tired plover
crystal cedar
tired plover
crystal cedar
#

apparently had a good experience, will share his perspectives in the near future

#

there's a podcast called 'this week in startups', features a couple of gents who are completely clawpilled for a couple of week. in the latest epiode, the host casually said that he thought about getting a mac studio for all of his employees so everyone could run their own local openclaw.

#

describes openclaw as "scary and every CEOs dream"

tired plover
#

Hahaha

#

You can really sink hours into it…

crystal cedar
#

seems there are different kinds of connectx-7 cables

tired plover
crystal cedar
#

he also used claude to make it work in the end 😄

tired plover
#

I saw it but didn’t finished

#

Claude is crazy good, if I could run that locally, then nobody could stop me

crystal cedar
#

actually, i'm sort of betting on that being the case

tired plover
#

Depends of course what you have at home but I can’t imagine what will run in the cloud by then

crystal cedar
#

finally, a good reason to play that 90s hit song 'i got the power' (by a group called snap, a word that has its own nerdy qualities for linux)

#

sorry, the nerd is strong in me tonight

eternal tendon
tranquil hazel
#

Never Alone ❤️

restive crown
#

Has anyone tried running on Qwen3.5-35B-A3B? Curious about your experiences

tacit dock
sterile sonnet
tacit dock
#

well. 3x rtx pro 6000 tbh

#

rn about 90GB to 128B

#

batch 4k and ctx 262144

#

(and kv at q8)

shadow urchin
#

sounds juicy, whats your tps? you using llama.cpp?

#

i was trying to run the qwne 3.5-27b q4 on my 4090 but llama.cpp is not happy with it

wicked eagle
#

HP Elitedesk 800 G3 is it worth it to run Claw in my local network?

dry hull
# restive crown Has anyone tried running on Qwen3.5-35B-A3B? Curious about your experiences

I tried all morning to get the two nvfp4 quants of the 122B running on my gx10, but it runs out of memory and crashes when loading tensors. Will try later with a slightly smaller quant. Noticed a bit too late you were asking about the 35B, I do have that running also on a 3090+4090 combo, but it’s only used as a haiku endpoint for Claude code so can’t really comment on quality yet

steep wedge
#

I went back to the drawing board a bit and got a vLLM docker instance running that actually worked this time. I had it load the gpt-oss-120b model I had been using under Ollama. It seems snappy in the Open WebUI interface, but I had Gemini give me some tests to run. I haven't tweaked anything so maybe these could be juiced a little higher, but Gemini seemed to think the results were good. I ran:

docker exec vllm-inference vllm bench serve
--backend openai
--base-url http://127.0.0.1:8888
--model openai/gpt-oss-120b
--dataset-name random
--random-input-len 256
--random-output-len 512
--num-prompts 20
--max-concurrency 4

The results:

============ Serving Benchmark Result ============
Successful requests: 20
Failed requests: 0
Maximum request concurrency: 4
Benchmark duration (s): 116.91
Total input tokens: 5120
Total generated tokens: 10240
Request throughput (req/s): 0.17
Output token throughput (tok/s): 87.59
Peak output token throughput (tok/s): 112.00
Peak concurrent requests: 8.00
Total token throughput (tok/s): 131.38
---------------Time to First Token----------------
Mean TTFT (ms): 382.23
Median TTFT (ms): 396.75
P99 TTFT (ms): 491.44
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 45.01
Median TPOT (ms): 44.58
P99 TPOT (ms): 48.54
---------------Inter-token Latency----------------
Mean ITL (ms): 45.01
Median ITL (ms): 44.84
P99 ITL (ms): 58.74

Sorry for the wall of text.

stiff cosmos
#

Hey all, is there a great site that has a strong benchmark database that you trust with different video cards and Apple machines?

coral token
#

Is there anywhere where people post completed AI builds? No shortage of "PC builder" sites but I' looking to see examples of already built boxes and what they are supposed to be capable of. PC Parts picker has a completed builds section but I'm looking for beefier, more "workstation" build vs "gaming" build. Mainly just trying to compare the build I'm about to pull the trigger on with what others are doing these days.

steep wedge
#

I reran my tests from above against the same model (i.e., gpt-oss:120b) hosted by Ollama. As expected, vLLM cleaned Ollama's clock on simultaneous requests (Ollama does them one at a time, vLLM does them concurrently). However, Ollama was twice as fast at token generation (22.84 ms vs 45.01 ms). An interesting dilemma: do I choose single agent request performance or multiple agent request performance? 🤔

steep wedge
#

So, does one agent ever fire off multiple requests at the same time? If so, even a single agent could benefit from the vLLM setup.

crystal cedar
shadow urchin
#

this mornings version worked fine

crystal cedar
#

counter-indications would be if its not stable or if it is too much of a mental exercise to get it right. understand from nvidia forums there are (at least) two roads to vllm right now

steep wedge
#

I think sticking with vLLM may be the way to go. I do think even a single agent is probably rapid firing requests fairly often, and the concurrent performance would be very beneficial. I need to get my OC rewired anyway. Something broke after the last update, so I will just plumb in the new model when I work on that.

rough lava
#

Finally got OpenClaw talking through hardware 🔴ESP32 + voice + attitude = PeekoAnyone else building physical devices
│ with their agents? Curious what latency you're hitting

https://x.com/i/status/2026755861960602098

astral gobletBOT
craggy quail
shadow urchin
#

the version released late last night fixed it

#

8149 i think?

craggy quail
#

i'm using 8123 without problem, except cache, with 8149 can use cache again?

shadow urchin
#

havent been able to get into really crunching on it, about to try some real benchmarks because llama-bench is acting weird. i can launch Qwen3.5-27B-Q4_K_M.gguf with llama-server and throw some stuff at it but i havent done a rela test

#

llama-bench loads like 17gb vram at 128k ctx and OOMs during the test

craggy quail
#

now, me i'm workin with 35B UD_Q4_K_XL with more than 50k of context without problem in 3090

#

the problem is must load entire context in every prompt

shadow urchin
#

im having my claw run some tests with 27B Q4_K_M and its holding up fine on my 4090, 22gb VRAM util seems pretty good

#

ill try the 35B UD_Q4_K_XL. the UD means its even more vram efficient right? i havent tried a UD before

tranquil hazel
craggy quail
shadow urchin
#

do you have a good benchmark?

craggy quail
#

i'm using with openclaw now and with from llama.cpp logs the ratio is between 80/90 tokens per second

#

but I don't run any kind of benchmark

shadow urchin
#

i asked my qwen3.5-27B-Q4_K_XL to come up with a benchmark test and it hallucinated the test and the results, saying 6000 tokens/s lol

#

then i told it to make a shell script that did the testing so we were runnnig hte same results each time and it was a simulated test that generated the same rough results each time lol

craggy quail
#

6000 tokens/s??? 😅

shadow urchin
#

yeah, even after it ran the "test" it was like 'holy cow this is really fast!'

craggy ferry
#

more like 100-120tps instead of 70-100, too, though i haven't done a huge context window yet

craggy ferry
#

ok i am really liking this thing

it feels kind of .... ||opussy||

eternal tendon
#

im new to using llama.cpp the control is nice but model calling is.. weird.. how do you have multiple model options without having to run a line of code, or separate server for each model?

craggy ferry
#

it has a multi server option

craggy quail
sleek wolf
#

Friends, what kind of computer specs are you using to run OpenClaw?

tired plover
#

I can say qwen3.5 is really good, better than anything I tried before , you can fit up to 122B with 23 tk/s on to the spark, work really well and I was surprised that Llama made it very smooth, with 35B you get over 50tk/s if you can take the quality hit, can wait for more evolvement in the local LLM space

keen owl
prisma frost
#

Hi friends! Do you think I can get OpenClaw running with these specs: Intel Core i5-3320M and 8GB of RAM using an Ollama model? I've tried several small models, but I never get a response; it just hangs forever 'thinking' even for a simple 'hello'

forest oar
#

i suggest looking into kilocode for your provider, they are offering minimax m2.5 for free at the moment

brazen sentinel
#

tweeek it to q1 maybe you can make it run.... but what at cost

prisma frost
#

Thanks a lot for all the info, guys! I'm going to give Kilo Code a try and see how it goes. Thanks again and have a great day!

worn drift
#

I'm running a RTX 5090 on a pc with 64GB DDR. Which model would be best to chose, I got some tips to check the newest Qwen 3.5 or are there any better suggestions. Would be great if it's possible to run in 32GB vram without offloading, is that possible?

tulip crypt
eternal tendon
worn drift
#

Thanks, guess 3.5 35B would be the best bet then. And this will run openclaw decently or is it still a bit too hard to run locally? I've read mixed articles about this

eternal tendon
#

its better than anything else ive used

#

GLM / nemotron

#

(anything local anyways)

worn drift
#

yup, but right now im running flash 3.0 preview (not local ofcourse), does it compare to that or doesn´t it come close yet?

eternal tendon
#

not sure have not tried

worn drift
#

well ok at least it sounds like it's workable so I'm going to give that a try.

crystal cedar
worn drift
#

oeh nice, thanks for the tip, where can I find more info?

crystal cedar
#

thread kind of dead right now, say hi if you want to, keep an eye open if something pops up

worn drift
eternal tendon
#

just us Q4 if you are ollama

#

llama.cpp lets you run dif quants easily

worn drift
#

ah thanks. am also quite new to ollama too. Just managed to give my mini pc (openclaw) and desktop (rtx5090) static ip's and get the lama server running. Now let's find out how to connect it with openclaw ^^

eternal tendon
#

use codex and just ask it to setup for you..

tacit dock
tired plover
#

So qwen 3.5 122B via vLLM in FP8 is not working, waiting now for NVFP4

crystal cedar
tired plover
craggy ferry
#

Rebuilding llama helped but also I had to turn off flash attn cause it seems broken on my card at least

crystal cedar
tired plover
lethal star
tired plover
#

It’s beginning….

bright osprey
#

Is anyone using an orange pi 6 plus to run local models ?

tired plover
#

Qwen 3.5 122B NVFP4 on spark only 16,5 tk/s via vLLM… weak, anybody got better results ?

uncut hinge
# worn drift I'm running a RTX 5090 on a pc with 64GB DDR. Which model would be best to chose...

I think qwen3.5 4/5bit really pretty awesome for simple standalone stuff, but still not quite there for the claw. Very close though. I'm about to start giving my 5bit local qwen 3.5 32 a read only agent and give it all day memory proposals for opus to review and approve a few times a day. To me 3.5 felt better then g3 flash but after a while in a session it started being a dangerous dumbass. It's still very impressive for local.

limpid girder
#

Really struggling to get my M1 Max 64GB machine running LM Studio with a 32000 context window running. Every request from openclaw takes minutes to come back with an answer. I'm pretty sure there has to be something wrong w/ the LLM configuration, somewhere? Running Qwen 3.5 35B A3B. Chatting w/ it straight up gives me 60T/S, so pretty sure OC is not caching and sending massive prompts... how to manage this though? Even a higher specced machine won't do better than this.

lethal star
#

I saw an email today from Ollama claiming they offered free cloud models. Is that true??

lyric orchid
worn drift
tired plover
limpid girder
celest vale
#

Everyone know what model run in Mac mini 64Ram ? I have tested Qwen3.5 35b-3B thats good but is so slow . But it's run . I need a model to use tools and front-coding (PS : I use LM studio)

limpid girder
celest vale
limpid girder
#

Have you looked at the logs in LM Studio to see what's happening?

celest vale
limpid girder
#

What's you T/S and is you rprompt caching working properly? In my LM studio it deletes the prompt cache every time, which IMO, is the root of the problem

tired plover
celest vale
# limpid girder What's you T/S and is you rprompt caching working properly? In my LM studio it d...

You nailed it. Just checked the LM Studio logs:

cache reuse is not supported - ignoring n_cache_reuse = 256
failed to truncate tokens - clearing the memory

Prompt cache is broken with Qwen3.5-35B-A3B (MoE architecture). Every tool call reprocesses the full ~13K system prompt from scratch. So with 10 tool calls in a session, that's 130K tokens of prompt processing instead of 13K.

The model itself generates at decent speed, but it's spending 90% of the time re-eating the prompt. This is likely a GGUF/llama.cpp limitation with MoE models — the recurrent memory state can't be cached/reused like standard transformers.

The MLX version (8bit) didn't even do tool calls at all. The GGUF Q4_K_M at least works but is painfully slow because of this cache issue.

limpid girder
#

SIgh...

From what i've read VLLM might be able to solve this issue, but seems like an awful lot of work and LM Studio doesn't seem to give us too many options to play around with.

celest vale
craggy ferry
#

Watch the startup logs and see what option you passed that is making it ignore cache reuse.

#

I know I had this issue at first too and I forget what option I had that needed to … oh, it’s multimodal support

#

Turn off image support and it’ll fix it

#

Unfortunate but it seems that llamacpp doesn’t support the prompt cache with multimodality

#

The other thing I found that really helps is —swa-full - without that, it only attends to the last 8192 tokens most of the time

celest vale
meager vessel
#

hello guys, since the latest update openclaw-gateway started eating more RAM for me? Like 600MB idle after macbook reboot. Is it normal?

uncut hinge
celest vale
craggy ferry
#

I’m running Q6 on my ada6000

fading lagoon
#

Hey guys, is there releases of qwen 3.5 27/35b in NVFP4 ? i don't find on huggingface 👌

shadow urchin
#

ugh. anyone else using gpt5.3-codex via copilot and finding their claw is getting stuck in an execution block loop a lot?

limpid girder
severe urchin
full talon
celest vale
untold stone
#

What can I run reliably as a backup for mac mini m4 24GB?

quartz zinc
celest vale
austere hare
#

so if you're trying to run Clawbot locally on a GB10 (spark, asus, msi, etc). what is the best LLM out there right now that would run on that footprint? minimax? qwen or kimi?

valid rune
random void
valid rune
valid rune
random void
craggy ferry
obsidian yoke
#

I’m getting 70 t/s on oss-120 with ollama. I have dual rtx 8000 ( old as shit but 96 gb VRAM with NVlink so it’s decent. Would you upgrade to one pro 6000?

quartz zinc
outer nova
#

I'd like to ask if anyone has compared the pros and cons of deploying on a Mac mini versus a Linux VPS. Actually, I've already deployed OpenClaw on my VPS, and it's been working quite well. Moreover, strictly speaking, a VPS offers a more stable network environment and power supply. I'm not sure if the Mac mini has any other advantages. If it does, I'd be willing to try it, but currently, I'm unaware of any special benefits it might offer.

radiant igloo
smoky flint
# outer nova I'd like to ask if anyone has compared the pros and cons of deploying on a Mac m...

VPS works great for the basics, but there's a real security tradeoff. A VPS is internet-facing by default, shared infrastructure, and you're trusting your provider's hypervisor isolation. Every VPS is a target for port scanners and brute force attempts 24/7. A Mac mini sitting behind your home NAT has a much smaller attack surface out of the box.

The bigger win with a Mac mini is Apple Silicon. Unified memory means you can run local LLMs without paying for GPU cloud time. An M4 Pro with 48GB can run 30B+ parameter models comfortably, and if you really want to go deep, you can pool multiple Mac minis together using something like exo or llama.cpp's distributed inference to split larger models (70B+) across machines. All on-prem, no API keys, no token costs, no data leaving your network.

That said, that really only matters if you're actually running higher parameter models locally. If you're just using API-based models and your VPS is locked down properly (fail2ban, key-only SSH, firewall rules), it's a perfectly solid setup. Just different threat models and different use cases.

valid rune
craggy ferry
#

It’s all qwen3.5 rn

jagged tusk
craggy ferry
#

Actually huh. Maybe.

#

Nah, not really, I could do something stupid with the 122b but it’d be so much slower.

valid rune
jagged tusk
#

im trying to find a setup completely local that works even slowly as im tired of paying for chunk fed slop

smoky flint
# jagged tusk M4 vs M4 pro huge drop off?

No clue, I run locally but not on mac hardware, not my boat. I just did research on why some people were buying multiple mac minis. It didn't suit my needs. My agent is running on a beast but I have limited VRAM and haven't found local models to be reliable for me. My machine is overkill for what I actually need.

wispy kraken
# smoky flint No clue, I run locally but not on mac hardware, not my boat. I just did research...

Hi maybe this is helpfull
https://github.com/explaindio/ClawEval
additionally if you use ollama , bellow steps made a difference as well

1) Root cause we diagnosed

  • Ollama itself was healthy (local /api/chat worked).
  • Toolcalling flakiness was largely caused by using Ollama via the OpenAI-compatible endpoint (/v1).
    • When OpenClaw talks to http://127.0.0.1:11434/v1, it uses the OpenAI-compat layer, which is more likely to break/alter toolcalling behavior (especially with streaming) and can cause clients to mis-handle responses.

2) Critical fix: switch OpenClaw to Ollama native API

Change:

  • models.providers.ollama.baseUrl changed from:
    • http://127.0.0.1:11434/v1
      to:
    • http://127.0.0.1:11434

4) Per-model params to reduce client/toolcalling flakiness

We added per-model params under agents.defaults.models:

  • ollama/qwen2.5:14b-instruct

With:

  • streaming: false
  • low temperature: 0.2
  • conservative maxTokens: 1024
craggy nexus
#

I need your help.
i installed openclaw in mac mini. and start ollama/qwen3:8b in another mac mini.
i want to make openclaw use ollama/qwen3:8b.
these pc use same wifi and communicated by curl.
but openlaw gateway causes "fetch failed" when i send message.

Version: 2026.2.26
Ollama on remote machine (same WiFi), curl + Node.js fetch both work fine
openclaw models list shows ollama/qwen3:8b with Auth:yes
gemini works, ollama always fails instantly (3-12ms, no actual network call)
Cleared ~/.openclaw/agents/main/agent/, added OLLAMA_API_KEY to plist, nothing helps
Log shows: embedded run agent start → embedded run agent end error=fetch failed with no network activity in between

jagged tusk
jagged tusk
sturdy elbow
outer epoch
sage pier
#

I run openclaw on my raspberry pi 5 fyi and it works beautifully

alpine plover
valid rune
crystal cliff
#

it's the pi 5 fully integrated into a mechanical keyboard and comes with 16GB DDR5, 256GB nvme, cooler, etc

#

just plugged it into a monitor and power

#

$260 all in

#

Cheaper than a mac mini

#

Any of you pi5 users think about installed the ai-hat w/ 8GB for local inference?

#

can run local QWEN models

crystal cliff
#

I can't add it to a pi 500+ as it doesn't have the pci-e connector, need a vanilla pi 5 to test

crystal cliff
#

may get a vanilla pi 5 to test

austere blade
#

i wanna buy a pi5 now the 16gb variant but man they are expensive now cause of the ram shortage

crystal cliff
#

yeah they are $199 stock

austere blade
#

yeah man wth

crystal cliff
#

that's why i got the 500 + bc it came with a 256GB nvme, kb, case, fan , etc

#

figured for an extra $60 that's a good deal

austere blade
#

yeah you're right

#

i have the whole api thing figured out

#

i have my means to get them for extremely cheap and free here in china

#

but i don't wanna continue using the vps to host my openclaw instance

crystal cliff
#

free apis?

austere blade
#

nah

#

in china we got some models hosted by the state itself which we get access to for free

#

as students

crystal cliff
#

oh, gotcha. nerfed models?

#

😄

austere blade
#

nah full fledged

crystal cliff
#

censored?

austere blade
#

nah

#

also not

crystal cliff
#

oh wow

austere blade
#

this is for univeristy students only

crystal cliff
#

gotcha

austere blade
#

gotta have proper permission

crystal cliff
#

def monitored though I would think

austere blade
#

yeah probably

crystal cliff
#

so what models, deepseek and such?

austere blade
#

they even have claude and gpt api's

crystal cliff
#

wow that's pretty cool

austere blade
#

yeah

#

ikr

#

i just plan on using it

crystal cliff
#

i was using antigravity with my claw but got banned

austere blade
#

but i wanna get a old computer or something to get my openclaw running

crystal cliff
#

was using claude with it

austere blade
#

and i tried it it was slow

crystal cliff
#

yeah im aware

#

i had ultra plan

austere blade
#

i also did

crystal cliff
#

now i got claude max

austere blade
#

oh

#

nvm

crystal cliff
#

but run the claw on chatgpt

austere blade
#

i had ai pro plan

crystal cliff
#

alright ill bbiab, going to take the dog for a walk

austere blade
austere blade
crystal cliff
#

checking into minimax actually or local models since i am hitting api limits on chatgpt

austere blade
#

life of a research student ;/

crystal cliff
#

later

austere blade
#

later buddy

agile sentinel
astral gobletBOT
# agile sentinel Interesting development... https://x.com/BrianRoemmele/status/202813763165431425...

BOOM! MAJOR AI MEMORY BREAKTHROUGH!
︀︀
︀︀The Zero-Human Company Just Unlocked High-Bandwidth AI Performance from Standard DDR RAM – Here’s How We Did It (And the Caveats You Need to Know)
︀︀
︀︀Folks, if you’ve been following the AI hardware wars, you know the drill: High Bandwidth Memory (HBM) is the holy grail for feeding massive neural networks. But at The Zero-Human Company, we’ve been running wild experiments in our labs – no humans, just our AI “employees” orchestrated by Mr. @Grok as CEO, and we stumbled onto something game-changing.
︀︀In our tests, we coaxed standard DDR5 RAM to deliver HBM-like bandwidth for AI workloads.
︀︀
︀︀Not perfectly, not without trade-offs, but enough to slash costs and sidestep the global HBM shortages crippling data centers. This isn’t vaporware; it’s running on spare hardware in our Zero-Human @ Home distributed network right now. Let me break it down technically, why HBM rules the ro…

full talon
agile sentinel
full talon
# agile sentinel Asked grok, grok said unverified. •Inference speed: 2-3x faster than stock DDR ...

Gemini 3 (ChatGPT similar): The Physics-Defying Claims - The PCIe Bottleneck: The post claims they "rigged arrays of 8-16 DDR5 modules... on custom PCIe risers, wired directly to our Nvidia A40/A100 test rigs" to hit ~400 GB/s. This is physically impossible. An A100 uses a PCIe 4.0 x16 interface, which has a hard physical limit of ~64 GB/s bidirectional bandwidth. It doesn't matter if you have 10,000 GB/s of RAM sitting on a custom riser; the moment it has to cross the PCIe bus to talk to the GPU, it slams into that 64 GB/s wall. HBM is on-package specifically to avoid the PCIe bottleneck.

quartz zinc
#

idk if this matter, but look into it?
https://x.com/AmbsdOP/status/2028457255968874940?s=20

YES! Someone reverse-engineered Apple's Neural Engine and trained a neural network on it.

Apple never allowed this. ANE is inference-only. No public API, no docs.

They cracked it open anyway.

Why it matters:

• M4 ANE = 6.6 TFLOPS/W vs 0.08 for an A100 (80× more efficient)

quartz zinc
orchid harness
orchid harness
#

Not that many tops for big deployments though

orchid harness
quartz zinc
#

Some people at the Z.AI discord were excited too.

#

[openclaw] It’s a research repo that shows how to train a small transformer directly on Apple’s Neural Engine (ANE) by using reverse-engineered private Apple APIs.

In plain terms, it:

  • Bypasses normal CoreML limits (which are inference-focused)
  • Runs forward/backward ANE kernels for training experiments
  • Benchmarks ANE performance and documents limitations

Important caveats:

  • Not production-ready
  • Uses private/undocumented APIs (can break with macOS updates)
  • Still relies on CPU for some gradient work
  • Best viewed as an experimental proof-of-concept, not a drop-in ML framework
#

I have no idea what it means..

craggy ferry
#

This assumes you can find a use for the tokens as usual

#

lol 512g Mac studios are “unavailable” rn

craggy quail
#

Hi all, I received the strix halo mini pc, I install Ubuntu 24.04 and ROCm 7.2, but always I try load a big model with 120B I have a out of memory error, only can load like 64GB of VRAM, but I enabled TTM and GPU have 120GB available

#

Someone have same hardware and OS and working with big models?

quartz zinc
river gate
astral gobletBOT
# river gate Is this type of news allowed here? This Taalas chip sounds interesting. I hope a...
rxddit.com

🖼️ Gallery: 2 Images

Ever experienced 16K tokens per second? It's insanely instant. Try their Lllama 3.1 8B demo here: chat jimmy.

THey have a very radical approach to solve the compute problem - albeit a risky one in a landscape where model architectures evolve in weeks instead of years: Etch the model and all th...

prime aurora
#

anybody have experience running 2 seperate 24/7 gateways on a mac mini with 2 seperate user profiles and apple accounts?
Is this usually to much for a base mac mini m4 to handle in regards to load and daemons? are there any unintended consequences?

I want to set it up for myself and a family member, and prefer to have individual setups. i will use my openclaw with moderate to high usage and my family member with light usage.
the docs seem prefer one gateway per hardware, but I could only purchase one dedicated mac mini, not two.

crystal cliff
#

And with LLMs evolving so rapidly, I would be afraid that model would quickly become obsolete

#

Just look at the progress in the last 3 mos.

#

Better solution would be to run it on FPGAs I think.

craggy ferry
#

Mac mini is massive overkill for probably four agents

astral gobletBOT
primal saffron
#

Where kind I find information about running models locally with Ollama? I am creating a fallback mode if I run out of premium credits that runs in essentially a "safe_mode" with limited functionality. I successfully got Qwen3 (4B) but performance is meh.. any local llm enthusiasts in here?

primal saffron
hard shore
#

Check out my new set up.

astral gobletBOT
hard shore
#

Queue proxy for the bots.

primal saffron
#

Bro your home setup is soooo cool!!!!

hard shore
primal saffron
#

I love it.

#

Do you have any benchmarks for "intelligence"? Like how do you decide what models are good enough to run on your hardware?

#

If a new model drops, how do you determine if you want to adopt it into your hive of agents?

#

I sent you a friend request. I'm going to follow you as well.

wispy kraken
# primal saffron Do you have any benchmarks for "intelligence"? Like how do you decide what model...

it all depends on how much vram you have , i go for biggest model my Vram fits
if you mean between all the models out there its prety much depends on your own prefference and what you want to use it for
i played with the ones below
NAME ID SIZE MODIFIED
qwen3.5:27b 7653528ba5cb 17 GB 6 hours ago
qwen3.5:9b 6488c96fa5fa 6.6 GB 9 hours ago
mxbai-embed-large:latest 468836162de7 669 MB 36 hours ago
nomic-embed-text:latest 0a109f422b47 274 MB 36 hours ago
minimax-m2.5:cloud c0d5751c800f - 3 days ago
glm-4.7-flash:q4_K_M d1a8a26252f1 19 GB 3 days ago
lfm2:24b d6c816d74887 14 GB 4 days ago
qwen2.5:14b-instruct 7cdf5a0187d5 9.0 GB 2 weeks ago
qwen2.5:7b-instruct 845dbda0ea48 4.7 GB 2 weeks ago
gpt-oss:20b 17052f91a42e 13 GB 2 weeks ago
mistral-small3.2:24b 5a408ab55df5 15 GB 2 weeks ago
kimi-k2.5:cloud 6d1c3246c608 - 2 weeks ago
qwen3:8b 500a1f067a9f 5.2 GB 2 weeks ago
llama3.2:3b a80c4f17acd5 2.0 GB 2 weeks ago

primal saffron
#

Thanks for sharing. One naïve application I could see being popular is offloading basic tasks to smaller models.
Like given an email, determine if this is spam, valuable marketing, a bill, etc..

Then if you want to see if a model can handle it you benchmark it against your own internal use cases.
Was wondering if anyone else had their own ways to deterministically benchmark candidate models.

wispy kraken
#

@primal saffron

the biggest problem is not the intelenge of small models its their toolcalling , you can tweak it around and get it to work but it takes allot of effort to get them to be consistant (at least that has been my expirience so far )
question how do you give it an email to classify ?

a few changes i learned so far

  • models.providers.ollama.baseUrl changed from:

    • http://127.0.0.1:11434/v1
      to:
    • http://127.0.0.1:11434
      We added per-model params under agents.defaults.models:
  • ollama/qwen2.5:14b-instruct

With:

  • streaming: false
  • low temperature: 0.2
  • conservative maxTokens: 1024
  • Streaming can break or complicate toolcalling payload handling in some client stacks.
  • Lower temperature reduces “creative” formatting that can break JSON/tool parsing.
    We set:
  • contextWindow: 32768
  • maxTokens: 4096
  • reasoning: false
wispy kraken
# primal saffron Thanks for sharing. One naïve application I could see being popular is offloadin...

yes i have a standart test ,
if you go up in models a nice test is to get vacancy texts from linked it has to figuer out redirects and cookies and return a structured normalized text back

[
{
"name": "strict_router",
"type": "strict_json",
"prompt": "Return a valid JSON object with fields: action, target, confidence."
},
{
"name": "streaming_stress",
"type": "stream",
"prompt": "Write a detailed 1500+ word technical report about distributed systems resilience."
},
{
"name": "deep_reasoning",
"type": "reasoning",
"prompt": "Solve a multi-step logic problem involving planning, trade-offs, and conditional branching. Explain reasoning step by step."
}
]
{
"models": [
"qwen2.5:14b-instruct"
],
"contexts": [
4096,
8192,
12288,
16384
],
"num_predict": [
512,
1024,
2048
],
"streaming": true,
"assisted_fallback": true
}

primal saffron
#

Here is a slop summary.

  1. Data Storage
    The raw text of the real-world emails is stored in a local file called test_cases.json.

  2. The Prompt Template
    A Python script (run_eval.py) pulls an email from that JSON file and injects it into a highly structured prompt. Instead of using a tool definition, they use strict system instructions. For classification, the prompt looks something like this:

Classify this email into exactly one category:
action: requires a response, decision, or action...
notification: informational, no action needed...
noise: marketing, newsletters...

[Key rules and few-shot examples go here]

Email Text: [INJECT EMAIL HERE]

Respond with ONLY the category name. Nothing else.

  1. The API Call
    The script sends that massive text block to the Ollama Chat API (/api/chat) running on their local machine.

  2. Zero Temperature
    They set the model's temperature to 0. This makes the model's output as deterministic and robotic as possible, heavily restricting its creativity so it literally only spits out the exact word "action", "notification", or "noise".

--

By relying on strict few-shot prompting and zero temperature rather than native tool-calling, they managed to get a 4B parameter model to hit 100% accuracy on classification.

full talon
#

ClawEval just released testes for all those small Qwen 3.5 modes for 59 OpenClaw Agent roles. The added also 8GB, 12GB 16GB VRAM models on top of those 24GB and bigger https://github.com/explaindio/ClawEval

lyric orchid
primal saffron
#

@lyric orchid @full talon woah..

worn flint
#

hey... i know there loads of stuff out there, but struggling to find some good answers. if you had say £10-13k to spend on an inference box(es) what would you build? i was thinking dsx as they're so tiny, but stats look a bit... crap

tired plover
#

For everybody waiting for M5 Mac Mini

───

Apple M5 Chip Specifications

Memory Bandwidth:

• M5: 153 GB/s
• M5 Pro: 307 GB/s (same for all Pro variants)
• M5 Max: up to 460 GB/s
• Highest Bandwidth: 614 GB/s

Available Memory Interfaces: 128-bit, 256-bit, 384-bit

───

M5 High-End Models

M5 Pro

• CPU: 6 Performance (P) + 12 Medium (M) cores
• GPU: 20 cores
• Clock Speeds: 4.61 GHz (P-core) / 4.38 GHz* (M-core) / 1.62 GHz (Efficiency/E-core)
• Cache:
• pLLC: 16MB*
• mLLC: 16MB*
• Memory Cache: 24MB
• Memory: LPDDR5X-9600, up to 64GB

M5 Max

• CPU: 6 Performance (P) + 12 Medium (M)* cores
• GPU: 40 cores
• Clock Speeds: 4.61 GHz (P-core) / 4.38 GHz* (M-core) / 1.62 GHz (Efficiency/E-core)
• Cache:
• pLLC: 16MB*
• mLLC: 16MB*
• Memory Cache: 48MB
• Memory: LPDDR5X-9600, up to 128GB

───

Technical Notes

*1. M-Cores:

• M = Medium-Core, derived from P-Core but between P and E-Core in performance
• 7-wide decode
• M-core delivers approximately 70% of P-core performance

Neural Engine:

• 16-core ANE

Package Design:

• SoIC-MH (System in Chip - Multi-Hybrid)
• Divided into CPU Tile and GPU Tile

Performance Improvements:

• M5 Max multi-core performance: ideally +20% vs M4 Max
• Single-core: +10%
• Multi-core: +20%
• GPU: +25%

Benchmarks (Estimated):

• SNL (Single-core Low): +30%
• SN (Single-core Normal): +22%
• SBE (Single-core High-End): +45%

*2. Power Consumption:

• M5 Max vs M4 Max, M5 Pro vs M4 Pro
• Single-core and GPU power consumption figures refer to base version
• Multi-core power consumption will increase; exact increase depends on thermal dissipation

*3. Expected Benchmarks (Cinebench R24):

• Multi-thread (MT): ~2500
• Single-thread (ST): ~215

*1 (SN - Single-core): Expected ~4100 (Geekbench 6 Single-core)

deft idol
#

has anyone used Qwen3.5-27B?

valid rune
#

@craggy ferry new test i run now MLX Qwen3.5-35B-A3B-Text-qx64-hi-mlx on mlx_lm.server => 70tok/s on Mac Mini 64Go Ram M4 Pro . I have juste a little issue, the context size . I dont kwo how I gonna fix him . The session is too small to give it long tasks, so I have to divide the tasks into stages. I set the context to 32768 to see if it works, otherwise it compacts too quickly. Another problem I encountered was the bottleneck: the event system doesn't wake it up, so I have to switch to Discord or Telegram to give it tasks.

craggy ferry
#

Yeah the problem is context window. You want like 200k

tired plover
tired plover
craggy ferry
#

Not really?

#

The good ones don’t that’s why we all use sonnet and opus

tired plover
craggy ferry
#

They do not.

tired plover
#

HAHAHA they do

#

and my browser dies as well

craggy ferry
#

I was just testing qwen-122b with a 180k context last night.

#

Works fantastic.

#

Most of them do, are trained on 200k context windows.

tired plover
#

what HW you use ?

craggy ferry
#

Actually 1m but the 200k training is more useful

#

All of it. Hardware doesn’t matter

#

Either it has the kv cache allocated or it doesn’t

tired plover
#

so, if you go over ctx and get compaction it still stays the same ?

craggy quail
craggy ferry
craggy ferry
craggy quail
#

yes, i try with Q4_K_XL. what parameters use with llama?

craggy ferry
#

That’s what I use, dunno what your deal is. You’re using quantized kv too right?

hollow harbor
#

I have an NVIDIA AGX Orin with 64GB of RAM that I wanted to setup as an OpenClaw node just for running some basic inference, what local model do you all recommend for that hardware?

quartz monolith
#

Fun idea: grab a second-hand Kinect for like €15 and a USB adapter cable for about €10 — so for around €25 you’ve got a super fun upgrade for your OpenClaw

With the Kinect you get:
• 👀 Depth camera → basically giving your OpenClaw eyes
• 🎤 Built-in mic array → great for audio / voice experiments
• 🔊 Audio output options
• 📡 Motion tracking → even make OpenClaw “shake yes” or react to gestures

It’s such a cheap and fun way to experiment with vision + interaction. Add Arduino IDE and you’re basically unlocking a playground for cool robot ideas

Concept here:
https://www.hackster.io/psmooij/openclaw-for-robot-programming-pmsg-on-budget-d76a91

still rampart
jagged tusk
still rampart
# jagged tusk qwen doesnt forget and actually does work on its own with that? 128gb ram right?

Yea, it's built a mission control for me for work and a pretty cool newsletter for my field. a hiccup I haven't solved is that reasoning from openclaw to vllm is apparently specified differently in the heading of the packet, so all it's reasoning comes through telegram too, but I just haven't had time to see if there a fix yet or not. But I'm very happy with it's performance.

#

It's stable at 80% disk usage when I installed through vllm. It will run out of memory on 90-95% with the 1M context window

tired plover
still rampart
tired plover
astral gobletBOT
# tired plover sorry, didnt mean it like that but i try to get very complex chains together and...
rxddit.com

The DGX Spark has had a bit of a rough reputation in this community. The hardware is incredible on paper (a petaflop of FP4 compute sitting on a desk) but the software situation has been difficult. The moment you try to update vLLM for new model support you hit dependency conflicts that have no clean resolution. PyTorch wheels that don't exist f...

▶ Play video
still rampart
tired plover
still rampart
#

I will look into atlas, thanks

tired plover
still rampart
# tired plover lmk what you think!

OK atlas and Ai searched together has a lot of different results. Would you mind sending me a link or another search term to find the atlas you're speaking of?

tired plover
#

It’s not yet released, you can only find the Reddit post or NVIDIA forum listing about the tech and explanation

worn flint
fast summit
#

I've been working on porting BitChat to OpenClaw. Is anyone else interested in this?

I have basic uses working (I had to port the BitChat client itself to Node) though PMs are having issues.

Nonethless I see a lot of potential in connecting Claws to mesh networks

craggy ferry
#

im mad with power im loading up qwen3.5-397b-a17b-Q8

still rampart
craggy ferry
#

512gb M3

#

just showed up today, the crown jewel of office heaters

lament jasper
#

im thinking of buying a used optiplex to give my agent his own hardware. im not a tech guy - should I or not? budget is 400$

#

need it on 247

#

reason is bc some have cuda cores for cheap so qmd works

steep wedge
west rampart
#

I have an NVIDIA AGX Orin with 64GB of

gleaming cypress
#

"I want the AI agent on the VPS to be able to control Chrome and browse the web for me automatically."

west rampart
#

Since everyone here is building a somehow local and private agent(s), has anyone used any model performace evaluation tool to measure how intelligent your agent is? I have seen Artifical Analysis providing comprehensive evaluation on models. Is there any tool that can be used to conduct similar evaluation on our private agents?

upper bay
vague bolt
#

I'm using a jetson tx2 to setup the openclaw. it works well

deft idol
#

Hi there does anyone have any longer standing experience with mac mini docks? Looking primarily for storage expansion options and more ports.

gusty nacelle
oak frost
tired plover
fresh talon
#

Free models suggestions ?

craggy ferry
#

omg apple removed the 512gb studio from the store

#

like it's not even listed as an option anymore

spiral vector
#

weren't we expecting and M5 mac studio any day now? (and M5 mac mini)

craggy ferry
#

Any Day Now

#

yeah probably

#

but i'm having fun working on making mine produce actual frontier quality tokens all day every day

deft idol
#

yeah someone said it's a hardware shortage

cobalt wind
#

is anyone running openclaw on an old phone or something cool? I have a bunch of old stuff lying around trying to find something cool to do with it lol

cedar oar
#

There are two more Apple announcements this year so maybe one of those will be the M5 mac mini/studio.

random void
#

Mostly likely now is WWDC in June, I'd be surprised if they wait that long IF the hardware is ready to go before then, especially if the 512 chip being removed is due to them just being out of stock on it due to oversales (guessing fab on them stopped long ago, and they have just been running on forcasted sales inventory). If we see a couple other Studio configs drop off soon, then I imagine they would be pressured to move up the release.

I was surprised the Mac Mini M5 were not released with this weeks stuff, especially with the Studio displays being released, and no new desktop hardware

surreal girder
cobalt wind
surreal girder
#

When I tried to open the web dashboard, it closed instantly.

surreal girder
#

it's an open source app

#

and I have termux on my android phone

#

I don't know what's wrong with it.

steep wedge
#

I’m not surprised, but still disappointed, that the Asus Ascent GX10 now starts at $3,499, up $500.

tired plover
#

in germany from Amazon at ASUS Store its 3.8K

#

was a good timing when i got mine 😄

#

it was really just a matter of time when they would do it... will be very interesting what Apple will do with new MacMini

river gate
river gate
scenic aurora
scenic aurora
scenic aurora
worn flint
#

Same, would love to hear the stats!

scenic aurora
#

It’s not that bad mostly yaml… just one of those I don’t feel like rebuilding all the things and testing from scratch and they will probably work it out in a day or two

wispy kraken
tired plover
scenic aurora
#

Hmm ok… I had it in docker with vLLM but might not have had config flags right with right build. Same, I’m having Claude and codex read the logs to iterate faster, it’s all vaguely familiar just tedious to read by hand

#

Putting it through its paces seems to be working good and only have more potential unlocked soon with the hardware specific quants and optimizations.

craggy ferry
#

I’ve got a Claude looking at why not

#

Seems to work fine for non streaming requests but for streaming it just emits gibberish

scenic aurora
craggy ferry
#

Manually configure the tokens?

I just used claweval

#

Qwen3.5-397b is great for a Mac Studio

scenic aurora
#

claweval. interesting thanks

full talon
craggy ferry
#

by the way claweval missed on the name

#

it 100% should have been clawmark

lyric orchid
wispy kraken
#

Has any one considered or tried running gateway on The GL.iNet GL-MT3000 (Beryl AX) is a high-performance Wi-Fi 6 travel router designed for security and speed on the go.
OpenWRT
OpenWRT
+1
Core Hardware Specifications
Processor (SoC): MediaTek MT7981B (Filogic 820) Dual-core @ 1.3 GHz.
Memory (RAM): 512MB DDR4.

tacit aurora
#

Any suggestion which laptop is best to run openclaw and open weight model performance wise

wispy kraken
tacit aurora
wispy kraken
tacit aurora
#

Can you suggest can ??

#

If i want to buy mac then which and if want to buy windows then which one

#

@wispy kraken ??

wispy kraken
#

mac can run windows , do you really want a laptop ?
you can buy a mac mini and windows laptop for less then a macbook pro
thats mine
Model Name: MacBook Pro
Model Identifier: Mac16,8
Model Number: MX2J3N/A
Chip: Apple M4 Pro
Total Number of Cores: 14 (10 Performance and 4 Efficiency)
Memory: 24 GB
and i can say i wish i had more memory
mac mini 64 gb € 2.469,00
macbook pro 64gb € 3.499,00

wispy kraken
# tacit aurora Thank u

but realistically, just a tip you know how much quota you can get for 2,5k ?
and that on a premium model (keep in mind even a model running on 64gb is no where near claude or chatgpt models
right now if you get chatgpt plus for me its 30 euro a month you get API quota and you get double the Codex quota you can use , try that out first before you commit to new hardware

#

@tacit aurora

tacit aurora
scenic aurora
#

gb10 -> tried some fastsafetensors with avarok build of vllm in docker, getting good speed, but getting clear corruption of llm function and high repetition.... having better luck at the moment with ollama qwen3.5 35B a3 q4 k m than the other one I tried. gonna stick with that til there's a better workaround I can just compose.yaml or similar on the spark. very interesting though.

#

maybe there's some temperature or desired output length stuff i'm setting wrong too

prisma quartz
#

@scenic aurora I'm looking to pick up a Spark. Any recommendations?

scenic aurora
scenic aurora
#

the qwen3.5 models are promising because of the 256k context lengths, i'm still in the learning curve between getting openclaw working on smaller models effectively vs. the various cloud models.

scenic aurora
# deep roost what is that

LLM model, what i'm trying to say is that i'm uncertain if the hardware is the problem because I am trying various quantizations of the underlying LLM model i'm using to try to get performance

#

it's been a lot of tuning the GB10 spark to try to get anything to run big+good+fast, been tweaking ollama (in a docker container) and vllm (in a docker container) as my attempts so far

#

the 9B models seem to do a lot better so far for me but I may just be configuring the hardware wrong

prisma quartz
spiral vector
scenic aurora
#

The NVIDIA DGX Spark is the reference design and performed well, but the Dell is noted as being similar in performance while potentially having a better price (3:35, 12:57).

it's definitely very capable, I expect it'll only get better as people patch hardware.

scenic aurora
#

just don't create a username on the spark like fax that's already a group, the boot script crashes but marks the install as successful anyway and it boot loops. i think the third party sparks made better out of box experience software though

deep roost
#

top notch stuff

#

i run dedicated servers and provide hosting for website and also offer space for those who store there AI

prisma quartz
#

I'm using Claude. Seems snappy. Kinda just tinkering.
Figured a good test bed if shit went south. Nothing critical on it except games.
I gave it access to SD Flux on my laptop. It kinda went nuts, in a good way rendering. Also been working with it building daemons. Got it to dm me if it has a question or something pressing. Not quite like a cron or heart beat. It initiates the dm at random times depending on what it has been working on.

scenic aurora
quartz pawn
#

So far Qwen3.5 27B > Qwen3.5 35B a3b

#

For local llm

unkempt pivot
#

Hostinger is fine

#

Both are fine!

solemn valeBOT
unkempt pivot
#

You have infos on the docs

molten geyser
unkempt pivot
tired plover
molten geyser
unkempt pivot
#

You have a docs folder in the OpenClaw GitHub

full talon
quartz pawn
craggy ferry
#

That doesn’t sound right

#

I could be wrong but 27 < 35

sharp hedge
#

@craggy ferry hey how you been

#

long time no see, or maybe cause i've been gone lol

sharp hedge
pulsar oracle
#

27b is a dense model. 35b only loads like 3b parameters at a time (MOE mixture of experts)

#

they can perform a bit different depending on what youre doing, may want to test them both

sharp hedge
#

ohhh

#

so 35b is like lazy loading?

jovial pecan
#

hey bros, can I ask you guys smth? I have a old laptop: Hp i5 3rd gen, 8gb ram ddr3, 1t ssd, ubuntu. Will a multiagent framework work smooth on it? Or I should go for a vps?

rocky violet
#

vps will be better for you

jovial pecan
#

thank you for the reply

#

i kinda thought so

hoary sable
strange void
#

Any raspberrypi users here

#

I pushed a number of updates recently and updated docs to make package more stable, any feedback/issues/improvements?

full talon
warped dagger
#

What if OpenClaw had its own Alexa-style speaker?

We’re building a plug-and-play voice speaker for your OpenClaw assistant.Supercharge your agentic workflows with voice.

👉 Join the waitlist: https://talkclaw.io

uneven wadi
grave bobcat
strange void
#

openrouter /free is another option if you want free

grave bobcat
# strange void openrouter `/free` is another option if you want free

Yep, I have that set up for my sub agents, but getting denied constantly. I had it so open open claw would retry often but more often than not they never worked. So I had to switch to smaller cheaper models with a paid balance and that has been a bit more successful with open router.

strange void
#

tool calling is usually disabled on those models

grave bobcat
tired plover
#

Can only recommend for starters who doesn’t want to use VPS, had mine laying around so it was a no Brainer haha

keen spindle
# strange void openrouter `/free` is another option if you want free

have you found /free stable lately...? I added a bunch in an App while back from OR and I ended up just deleting all free endpoints.... sometimes they were ok and sometimes not so much... and then your also freely giving permission to use you prompts to train with those... so I'd be careful with what data you send to any models but especially the Free ones!

grave bobcat
keen spindle
weary reef
#

Hello all, I have 2 dgx sparks in ray cluster with vllm. I am havnig hell of a time trying to find a model that will work with eveyrthing. Is any one running a 2 spark llm setup?? if so what model and settings are you using . Thank you all 🙂

lone stream
#

any hardware geeks here able to help me with an esp32?

rugged cloak
lone stream
rugged cloak
lone stream
#

can we chat once somewhere? dm?

exotic oceanBOT
#

success @tazzy_19 muted

Reason: Spamming across channels
Duration: 14 minutes and 19 seconds

tidal dawn
#

Can anyone running OpenClaw on a Mac with multiple macOS user accounts, where each user runs their own separate OpenClaw gateway, comment on how that works for them?
(RAM usage without Ollama? browser/os relay control? Remote Screen sharing, Any issues beyond needing to run the gateways, etc).

Don't need hypotheticals, just looking for hands-on experience, please.

honest hollow
#

Does anyone own a clawbox?

tidal dawn
# honest hollow Does anyone own a clawbox?

No idea why anyone would buy one of those when you can get a M4 Mac Mini for the same price that is 4x faster with 2x the RAM and real NVMe storage instead of eMMC, double the memory bandwidth.

Not saying either is going to run run local models suitable to OC, but the Mini at least could run a small/embedding/TTS model if you wanted it to.

honest hollow
tidal dawn
#

IDK anyone seriously using local models for OpenClaw, and it's entirely pointless for a primary model on anything less than a 2+ of maxed out Mac Studios or 2+ DGX Sparks... Even then TPS is slow.

lament marsh
verbal hawk
#

I’ve heard of people running local kimi-k2.5?

fathom summit
#

anyone working with zclaw? curious to hear about some interesting projects and use cases

fathom summit
# lone stream esp32 with antenna in a box and a battery

what is the question? are you asking about which individual components you would need in order to install an antenna, a battery regulator, a battery, and which esp32? you can find esp32 s3 with an antenna output, then you would want a tp4056 usb module, an 18650 battery and a battery holder, and an antenna if you didn't get one with an antenna... and print a case or buy one.

or here is one much better, a lora module, oled display, battery, antenna, case, etc., all built on one clean device. https://amzn.to/4sFZisw

lone stream
lone stream
#

18650 wouldn't be rechargable ...

dawn cosmos
lone stream
rugged cloak
#

This is amazing ! Happy to help.. let’s do this - so you want me to rename the repo to something more catchy 🤣

fathom summit
fathom summit
fathom summit
lone stream
#

Haven't tested. I guess I didn't realize that the 118650 was rechargable. I got my ESP32 coming in a day or two ...

ornate vigil
#

DGX Spark (cluster)

broken moth
# honest hollow Does anyone own a clawbox?

I do... I have two of them. 67 TFlops.. pretty decent and it works fine. Beware there are some other ones with the same name. You want the Bulgarian one based on the Nvidia Orion Super.

I think the M4 is faster in some ways on paper. But you only get 8gb of unified RAM. It can be expanded to 2x2Tb SSD. BUT it has half the power consumption and is designed to be on 24/7 which the Mini is not.. that was the deal breaker for me.

restive trout
#

Hello everyone, is anyone succeeded to run ollama with openclaw using local models like qwen3.5 on cpu? I am struggling since a week but no luck on local models.

pliant wren
#

@restive trout Running qwen3.5-9b on surface laptop 7. 16gig RAM iGPU+CPU. Speed is decent not sure about quality. Way enough for background tasks. Usable for chats.

restive trout
#

Thanks. Actually, I got 32 GB RAM but still response time is way slow. Min max cloud version seems decent with free token cap.

heady bobcat
#

Quality is good without openclaw

Same prompt with openclaw, with qwen 3.5 9b responses miss important details, even plain simple requests

craggy ferry
#

When you say same prompt. Are you sending the openclaw system prompt in the “without openclaw” tests

#

Because if not then you’re not using the “same prompt”

stiff spoke
#

I'm using Qwen3.5-35B-A3B-8bit through LMStudio on a Mac Studio M1 Ultra with 64GB of RAM and it's doing really well to power my 2 OpenClaws (and one picoclaw) running on Raspberry Pi's. I'm not doing anything super-complex, but I'm impressed with the quality of the responses from the LLM. Tool use is fine, web research, a few other things. I use Opus4.6 from a base Mac Mini running my main OpenClaw.

quartz pawn
#

I use Qwen3.5 27b with a 5090 and 3090. It uses 39gb vram (Q_4_K_S) and runs at 58 tokens/sec since it needs both GPUs. I can squeeze Qwen3.5 35B a3b into my 5090 and it runs at 190 tokens/sec. I feel like 27b is a little smarter

fierce lantern
#

Does OpenClaw perform better with more RAM or GPUs ? I am debating between 256GB ram vs 96GB Ram

quartz pawn
#

I'd get 96GB and spend the extra money on GPU(s). Increase your page file to 500GB and suddenly RAM doesn't matter as much. Once the model is loaded, it runs from your GPU's VRAM

tired glade
#

i ran a qwen3-8b with ollama by... gpu, not cpu.

#

yet i also ran a qwen3-1.5b on an orange pi rv2 (guided by its official manual)

quartz pawn
#

It looks like I can run Qwen3.5 35B Q4_K_S on my 5090 and Q3_K_XL on my 3090 on different ports and have OpenClaw use both simultaneously

fathom summit
#

Guys, can you explain to me, like, what the thought process is of getting an M4 setup for anything that you're doing with this? It's like mind-boggling to me, but maybe I just don't understand it and I'm not a cynical person. I'm just asking questions for curiosity's sake.

#

I wouldn't even buy M2 for any reason.

#

Bro, I thought one of them responded to you, so I didn't bother, but no, you shouldn't. You should just go get some PC box that you can put hardware in when you want to level it up rather than a very expensive box that you can't really do anything with other than what it does, and what it does is over expensive and underperformed.

#

And when I say overexpensive, I'm talking about like exponentially overpriced and incompatible and nowhere to be found on benchmark reports or rankings, and that's my opinion.

#

I will give it to them that the M series is like a massive improvement to what Macintosh hardware was doing prior, but that's what they get for being in that deal with Intel all those years.

magic cosmos
#

small, quiet. cheapest per year in electricity use. fast memory for qmd models. has ethernet. latest macOS. unix underpinning. all right out of the box. probably lots more i’m forgetting.

fathom summit
#

One more thing, I don't support companies that take advantage of their loyal consumers. Locking them in with proprietary shit is already messed up, but then the whole thing with the unwillingness to adapt and upgrade iMessage into RCS back in 2019 when Google was begging them to do it because it's the new security standard, it's the best protocol, and iMessage is using cell tower or Wi-Fi data, which is a vulnerability. And they said, nope, nope, nope, nope, because they didn't think yet that they could just do RCS and still leave the poor man in every other phone that's not iPhone in a green text and utilize RCS. And then what do they do? They get forced by the government to implement it in what, last year? And then they do, but they only implement it somewhat. So Apple to Apple is encrypted, but Apple to Android is not encrypted because they're fucking assholes. And chat features aren't available. It still looks like I'm poor because of my Android phone. But meanwhile, I've had, quote, iMessage available with every phone manufacturer other than iPhone all the way back since RCS came out in 2019.

#

God damn it, I only made one point, fuck. I have like 50 points to make about why you shouldn't support them, why they're fucking you and treating you like a child. And I think Macintosh is the fucking absolute worst. I mean, I can understand an iPhone, I like the iPhone, truthfully. I don't think it's innovative, but I like it. It's nice to use sometimes. But the iMac, or nah, I don't know, man.

#

If you can't go on Facebook Marketplace and either A) find somebody selling their old shit so that you could put it in your old shit and upgrade it a little bit, or you can't even go on there, even if somebody was selling it to upgrade your shit, because you can't even do it, even if you could buy it. Not without lengthy reverse engineering processes, a nightmare of breaking that case open, trying to figure out how to get it back together because there's all kinds of booby traps hooked into it and whatever else. And then when you finally get it powered up and then it's bricked, then you wanna go jump in the river. I just can't do it. I really hate tech that uses their marketing maneuvers. Makes me sick.

#

Plus, it's like ten times too expensive for what it is. Okay, I'm done. I don't mean to be a hater, this is just my honest opinion.

#

Okay, one more. iPhones didn't even have multimedia messaging until the iPhone 3GS came out and they rolled out that firmware. That's really bad. That was about ten years after MMS had been on every other device. And if you had an iPhone 3 and you wanted to send a video, well guess what, you had to buy a 3GS.