#hi guys any suggestions why my bot
1 messages ยท Page 1 of 1 (latest)
so basically i was just trying to tell claw to fix telegram
because i was not able to write in telegram
and now he isnt responding at all?
if you can give me any Terminal logs i can help you track it down
well now it responded. i will write here if i see any problems again. i am just curious why it can be laggy if I am self hosting on rtx 5090 with 9950x3d
damn thats a hella of a setup, you should be good, you running local models?
i have a business that i want to integrate ai model
i do not understand what i am doing
at all
i installed qwen3.5:9b
i just fixed telegram (hopefully)
this thing is cooking my gpu haha
wierd, qwen3.5 9b should run smoothly, i run that as my local model myself, on an RX 580 Saphirre, 12 year old GPU lol
openclaw is harsh for beginners
well it just spikes at the moment of prompt
but runs well with microsoft flight simulator 2024 in the background
i just need time to figure how this thing work
if you want to integrate a model for you business, i do not recommend local models, they are weak as hell, and on your setup you cant run capable local models, your best shot is a MiniMax subscription for a workhorse, and maybe GLM for more complex workflow + a VPS
so for what claw is great?
MiniMax subscription is 10$ with 1500 requests/5hrs, which is hugeee, GLM is 18$ per month, with maybe like 300 requests per 5hrs
If you got a shit ton of work to dispatch, MiniMax is the way to go
but what do you mean local models are weak?
Orrrr, if you got $$$$ you can go with claude
Let's take your Qwen 3.5 9B for example, that 9B mean 9 billion parameters, for comparison, the new Deepseek Model is 1.6 Trillion
Parameters are the metric system of AI models training data
ok wow that is actually impressive
but what if i want to train my agent myself
teach him
That being said, Qwen 3.5 9B is "dumb" on professional work, but work good on simple automations
i guess it is not possible to download that deepseek model
You take your Qwen3.5 9B and upload it to Unsloth, idk how to guide you further tho, i didnt train any model yet
If you got like 300GB RAM and like 2 Nvidia H200 maybe yes
So you got the cash, i envy you haha, i also set Openclaw for my business but im kinda broke atm, trynna make some cash for upgrades, wait a sec to search for the deepseek link
V4 Flash is 238B parameters, V4 Pro is the one with 1.6T
is 4 5090 enough
The model is 800gb
i assume no
And as my Sonnet said
DeepSeek V4 Pro โ Hardware Requirements
GPUs:
Minimum viable: 8ร H100 80GB (640GB VRAM total)
Comfortable: 8ร H200 141GB (1.1TB VRAM total)
Consumer hardware (even 2ร RTX 5090) is not enough
System RAM:
~1TB fast RAM for hybrid CPU/GPU offload (and even then, expect slow inference)
Why so much?
V4 Pro is 1.6T total parameters. At Q4 quantization you're still looking at ~800GB just for weights, before KV cache. It's a server cluster story, not a workstation story.
Realistic alternatives:
V4 Flash โ 284B params, fits on a single H200 node (~158GB), delivers ~85-95% of Pro quality
DeepSeek API โ $1.74 in / $3.48 out per 1M tokens, OpenAI-compatible, just swap base URL
Ollama cloud โ ollama run deepseek-v4-flash:cloud for quick testing
Bottom line: Unless you're processing 200M+ tokens/day, the API will always be cheaper than self-hosting V4 Pro. The hardware alone runs $200,000โ$330,000+.
Or for the V4 Flash
From what we already found:
DeepSeek V4 Flash โ Hardware Requirements
Minimum (tight, prototyping only):
2ร RTX 4090 (48GB VRAM total) โ Q4 quantized, short contexts only, slow
Viable for internal/dev use:
4ร RTX 4090 (96GB VRAM) โ Q8 possible, reasonable batch sizes, 4-8k context
Comfortable production:
1ร H200 141GB โ fits the full ~158GB FP4+FP8 checkpoint on a single node
System RAM:
128-256GB DDR5 for smooth CPUโGPU data movement
Why it's so much easier than Pro:
V4 Flash is 284B total params / 13B active per token. At FP4+FP8 mixed precision it lands at ~158GB โ that's single-node territory vs the cluster you need for Pro.
Rough hardware cost:
4ร RTX 4090 setup: ~$8,000-10,000
Single H200: ~$35,000-40,000
Cheapest way to test it right now:
bashollama run deepseek-v4-flash:cloud
Or via API at $0.14 in / $0.28 out per 1M tokens โ still the most cost-effective option unless you're at serious token volume.
now i understand why gpu price is so big
Now that i look at these stats, this is kinda insane lol
fking 8 H100
Buuuut, you can run the Flash model
oh i can?
DeepSeek V4 Flash โ Hardware Requirements
Minimum (tight, prototyping only):
2ร RTX 4090 (48GB VRAM total) โ Q4 quantized, short contexts only, slow
Viable for internal/dev use:
4ร RTX 4090 (96GB VRAM) โ Q8 possible, reasonable batch sizes, 4-8k context
I mean yeah, on 4 RTX 5090 the Q8 should work
what type of business do you expect me to have
Q stands for quantization, and here is some more knowledge you should know for running local models
Think of it like audio compression.
A WAV file is uncompressed โ every sound sample stored at full precision, massive file size. An MP3 takes that same audio and throws away data your ears can't easily detect, shrinking the file by 10ร with barely noticeable quality loss.
Quantization does the same thing to AI model weights. Instead of storing every parameter as a 16-bit or 32-bit float (full precision), you round them down to lower precision โ 8-bit, 4-bit, even 2-bit integers. The model gets smaller and faster to run, at the cost of some accuracy.
The common formats you'll see:
FP16 / BF16 โ half precision, standard baseline
Q8 โ 8-bit, barely any quality loss, ~2ร smaller than FP16
Q4_K_M โ 4-bit, the sweet spot most people use locally, ~4ร smaller, small but noticeable quality drop
Q2 โ aggressive compression, fits on weak hardware, meaningful quality degradation
i need simple email services
idk tbh
I'm in marketing for example
For email services as in the AI sending decent written emails to your potential clients?
For that you would need a good model for copyrighting, GLM can work and its cheaper than local models
and for scraping, maybe try Qwen 3.6 35B
That model is insanely good for OpenClaw, or even Gemma 4 31B
i feel smart after talking to you
Qwen 3.6 and Gemma 4 are both local models
hahaha
ok so can i just install qwen 3.6 and call it a day
I'm passionate about AI, a nerd people would say lol
it took me 4 hours to set up claw
i think 3.6 will be more than enough
basically yeah, let me search a good trained model for ya
oh so
should be good for your use case
wait
hahaha
there is just a qwen 3.6
and some people train it
to the direction they need?
Basically yeah, you can do research on hugging face
on that site people are basically uploading their trained models
that is enough information for me
some are very good, some are very bad