#hi guys any suggestions why my bot

1 messages ยท Page 1 of 1 (latest)

flint oasis
#

Can you tell me more?

midnight harness
#

so basically i was just trying to tell claw to fix telegram

#

because i was not able to write in telegram

flint oasis
#

and now he isnt responding at all?

#

if you can give me any Terminal logs i can help you track it down

midnight harness
#

well now it responded. i will write here if i see any problems again. i am just curious why it can be laggy if I am self hosting on rtx 5090 with 9950x3d

flint oasis
#

damn thats a hella of a setup, you should be good, you running local models?

midnight harness
#

i have a business that i want to integrate ai model

#

i do not understand what i am doing

#

at all

#

i installed qwen3.5:9b

#

i just fixed telegram (hopefully)

#

this thing is cooking my gpu haha

flint oasis
#

wierd, qwen3.5 9b should run smoothly, i run that as my local model myself, on an RX 580 Saphirre, 12 year old GPU lol

#

openclaw is harsh for beginners

midnight harness
#

well it just spikes at the moment of prompt

#

but runs well with microsoft flight simulator 2024 in the background

midnight harness
flint oasis
#

if you want to integrate a model for you business, i do not recommend local models, they are weak as hell, and on your setup you cant run capable local models, your best shot is a MiniMax subscription for a workhorse, and maybe GLM for more complex workflow + a VPS

midnight harness
#

so for what claw is great?

flint oasis
#

MiniMax subscription is 10$ with 1500 requests/5hrs, which is hugeee, GLM is 18$ per month, with maybe like 300 requests per 5hrs

#

If you got a shit ton of work to dispatch, MiniMax is the way to go

midnight harness
#

but what do you mean local models are weak?

flint oasis
#

Orrrr, if you got $$$$ you can go with claude

midnight harness
#

what does local models do

#

well money is not a problem

flint oasis
#

Parameters are the metric system of AI models training data

midnight harness
#

ok wow that is actually impressive

#

but what if i want to train my agent myself

#

teach him

flint oasis
#

That being said, Qwen 3.5 9B is "dumb" on professional work, but work good on simple automations

midnight harness
#

i guess it is not possible to download that deepseek model

flint oasis
flint oasis
midnight harness
#

well i can

#

how to download it

flint oasis
#

So you got the cash, i envy you haha, i also set Openclaw for my business but im kinda broke atm, trynna make some cash for upgrades, wait a sec to search for the deepseek link

#

V4 Flash is 238B parameters, V4 Pro is the one with 1.6T

midnight harness
#

is 4 5090 enough

flint oasis
#

The model is 800gb

midnight harness
#

i assume no

flint oasis
#

And as my Sonnet said

DeepSeek V4 Pro โ€” Hardware Requirements
GPUs:

Minimum viable: 8ร— H100 80GB (640GB VRAM total)
Comfortable: 8ร— H200 141GB (1.1TB VRAM total)
Consumer hardware (even 2ร— RTX 5090) is not enough

System RAM:

~1TB fast RAM for hybrid CPU/GPU offload (and even then, expect slow inference)

Why so much?
V4 Pro is 1.6T total parameters. At Q4 quantization you're still looking at ~800GB just for weights, before KV cache. It's a server cluster story, not a workstation story.
Realistic alternatives:

V4 Flash โ€” 284B params, fits on a single H200 node (~158GB), delivers ~85-95% of Pro quality
DeepSeek API โ€” $1.74 in / $3.48 out per 1M tokens, OpenAI-compatible, just swap base URL
Ollama cloud โ€” ollama run deepseek-v4-flash:cloud for quick testing

Bottom line: Unless you're processing 200M+ tokens/day, the API will always be cheaper than self-hosting V4 Pro. The hardware alone runs $200,000โ€“$330,000+.

#

Or for the V4 Flash

From what we already found:
DeepSeek V4 Flash โ€” Hardware Requirements
Minimum (tight, prototyping only):

2ร— RTX 4090 (48GB VRAM total) โ€” Q4 quantized, short contexts only, slow

Viable for internal/dev use:

4ร— RTX 4090 (96GB VRAM) โ€” Q8 possible, reasonable batch sizes, 4-8k context

Comfortable production:

1ร— H200 141GB โ€” fits the full ~158GB FP4+FP8 checkpoint on a single node

System RAM:

128-256GB DDR5 for smooth CPUโ†”GPU data movement

Why it's so much easier than Pro:
V4 Flash is 284B total params / 13B active per token. At FP4+FP8 mixed precision it lands at ~158GB โ€” that's single-node territory vs the cluster you need for Pro.
Rough hardware cost:

4ร— RTX 4090 setup: ~$8,000-10,000
Single H200: ~$35,000-40,000

Cheapest way to test it right now:
bashollama run deepseek-v4-flash:cloud
Or via API at $0.14 in / $0.28 out per 1M tokens โ€” still the most cost-effective option unless you're at serious token volume.

midnight harness
#

now i understand why gpu price is so big

flint oasis
#

Now that i look at these stats, this is kinda insane lol

#

fking 8 H100

#

Buuuut, you can run the Flash model

midnight harness
#

oh i can?

flint oasis
#

DeepSeek V4 Flash โ€” Hardware Requirements
Minimum (tight, prototyping only):

2ร— RTX 4090 (48GB VRAM total) โ€” Q4 quantized, short contexts only, slow

Viable for internal/dev use:

4ร— RTX 4090 (96GB VRAM) โ€” Q8 possible, reasonable batch sizes, 4-8k context

#

I mean yeah, on 4 RTX 5090 the Q8 should work

midnight harness
#

what type of business do you expect me to have

flint oasis
#

Q stands for quantization, and here is some more knowledge you should know for running local models

Think of it like audio compression.
A WAV file is uncompressed โ€” every sound sample stored at full precision, massive file size. An MP3 takes that same audio and throws away data your ears can't easily detect, shrinking the file by 10ร— with barely noticeable quality loss.
Quantization does the same thing to AI model weights. Instead of storing every parameter as a 16-bit or 32-bit float (full precision), you round them down to lower precision โ€” 8-bit, 4-bit, even 2-bit integers. The model gets smaller and faster to run, at the cost of some accuracy.
The common formats you'll see:

FP16 / BF16 โ€” half precision, standard baseline
Q8 โ€” 8-bit, barely any quality loss, ~2ร— smaller than FP16
Q4_K_M โ€” 4-bit, the sweet spot most people use locally, ~4ร— smaller, small but noticeable quality drop
Q2 โ€” aggressive compression, fits on weak hardware, meaningful quality degradation

midnight harness
#

i need simple email services

flint oasis
midnight harness
#

to scrape potential clients

#

is that the model i have is not enough

flint oasis
#

I'm in marketing for example

#

For email services as in the AI sending decent written emails to your potential clients?

#

For that you would need a good model for copyrighting, GLM can work and its cheaper than local models

#

and for scraping, maybe try Qwen 3.6 35B

#

That model is insanely good for OpenClaw, or even Gemma 4 31B

midnight harness
#

i feel smart after talking to you

flint oasis
#

Qwen 3.6 and Gemma 4 are both local models

flint oasis
midnight harness
#

ok so can i just install qwen 3.6 and call it a day

flint oasis
#

I'm passionate about AI, a nerd people would say lol

midnight harness
#

it took me 4 hours to set up claw
i think 3.6 will be more than enough

flint oasis
midnight harness
#

oh so

flint oasis
#

should be good for your use case

midnight harness
#

wait

#

hahaha

#

there is just a qwen 3.6

#

and some people train it

#

to the direction they need?

flint oasis
#

Basically yeah, you can do research on hugging face

#

on that site people are basically uploading their trained models

midnight harness
#

that is enough information for me

flint oasis
#

some are very good, some are very bad