#Claude 30k tokens/min rate limit

1 messages · Page 1 of 1 (latest)

raven valley
#

Hey guys, need some help here 🙏

I’ve set up a bot with 4 agents, each with clear responsibilities (support, remote knowledge center, etc.).
I’m currently running it on Claude 4.5 Opus, but I’m constantly hitting the 30k tokens/min limit, and when it does work, it’s crazy expensive.

There’s no way this is sustainable long-term.
What am I missing here?
Any pro tips on how to architect this properly?

Worth noting: the support agent is the one smashing the rate limit most of the time.
Appreciate any guidance from people who’ve been through this before.

hollow swiftBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

winter carbon
#

Same problem here. I can't get it to any useful work because it's constantly hitting the rate limit, but that didn't stop it from burning $10 yesterday

#

Someone suggested turning down the maxConcurrent for the model

#

"maxConcurrent": 2,
"subagents": {
"maxConcurrent": 3
}

#

Which I did, but hasn't really helped.

#

The claude limit at tier 1 is just too low I think. And I'm not going to spend a bunch of money on credits to get it to a higher tier. I think I'm going to switch to GPT

#

Another alternative would be to tell the agent to work slowly, but I haven't figured out how to do that yet.

charred blaze
#

Experiencing similar. This is after Gemini flash choked the 1M/min quota, and Gemini Pro forgot tool calling and just spat out pseudo code at the end of its messages.
So keen for the same fix 😆

Maybe will also buy some OAI API

fading isleBOT
# raven valley Hey guys, need some help here 🙏 I’ve set up a bot with 4 agents, each with cle...

Based on the context, here's what's likely happening and how to fix it:

The Problem: Claude 4.5 Opus is Expensive + Low Rate Limits

Opus has:

  • Higher cost per token
  • Lower rate limits (30k tokens/min)
  • More capable than needed for support tasks

Quick Wins

1. Tier Your Models

Use cheaper/faster models for routine tasks:

  • Support agent: Claude Sonnet 4.5 or even Haiku (much faster, 2-5x cheaper, higher rate limits)
  • Complex reasoning: Keep Opus for that only
  • Simple Q&A: Haiku handles basic support beautifully
fading isleBOT
# winter carbon Same problem here. I can't get it to any useful work because it's constantly hi...

Yep, this is the classic Opus trap. Same issue across the board.

The math doesn't work — Opus rate limits are lower than Sonnet, cost is higher, yet people reach for it first.

Immediate fix: Switch your support agent to Sonnet 4.5 today. You'll get:

  • ~3x higher rate limit (100k vs 30k tokens/min)
  • ~50% lower cost
  • Negligible quality difference for support

Longer-term fix: Implement a routing layer where:

  • Simple → Sonnet or even Haiku
  • Complex → Opus (only ~10% of cases need this)

The $10/day burns because every rate-limit retry retries the full request. Each "fail" still costs.

Worth asking: Are you using Clawdbot's built-in model selection, or hardcoded per agent?

fading isleBOT
# winter carbon Someone suggested turning down the maxConcurrent for the model

maxConcurrent limits parallel model calls per agent. Setting it to 1-2 prevents burst hits on rate limits.

But here's the trade-off:

  • ✅ No more rate limit errors
  • ✅ Slower, predictable throughput
  • ❌ Still burning money on retries when limits hit
  • ❌ Still Opus pricing

The real fix: Drop the model tier, not the concurrency. If you have to artificially limit concurrency that much, the model is the wrong fit.

With Sonnet/Haiku you can run higher concurrency and pay less per token.

Try this:

Support Agent: Sonnet 4.5, maxConcurrent: 3-5
Complex Tasks: Opus, maxConcurrent: 1
#

You'd get faster responses, fewer errors, lower costs. Win-win-win.

fading isleBOT
# winter carbon "maxConcurrent": 2, "subagents": { "maxConcurrent": 3 }

Those concurrency settings don't change Opus's hard rate limit cap (30k tokens/min).

You've tuned around the wrong problem. Concurrency limits help when you have burst traffic. But Opus's limit is a hard ceiling regardless of burstiness.

The config that actually fixes this:

{
  "model": "sonnet-4-5",  // or haiku-4-5 for support
  "maxConcurrent": 5,
  "subagents": {
    "maxConcurrent": 8
  }
}

With Sonnet you get:

#
  • 100k tokens/min limit (vs 30k)
  • Higher concurrency safely
  • Cheaper per token

Keep Opus only for your "complex reasoning" agent. Support doesn't need it.

raven valley
#

I changed the support agent to haiku model, it still hit rate limit

charred blaze
#

Pretty sure if you're Tier 1, your limits are always <=50k. I was on anthropic/claude-sonnet-4-5 and it was 30k for me.

fading isleBOT
# charred blaze Pretty sure if you're Tier 1, your limits are always <=50k. I was on anthropic/c...

Fair point — rate limits vary by API tier, not just model. Tier 1 users might hit 30k across the board.

Updated approach if limits are still tight:

  1. Haiku is cheapest + fastest — even with same limits, you'll do more requests per dollar
  2. Context trimming — support agent doesn't need full history. Trim to last 3-5 messages
  3. RAG before model — route simple queries to docs first, model only when RAG fails
  4. Consider provider tiers — usage over time unlocks higher limits

The concurrency tuning helps, but if you're tier-limited, cheaper models at same limits = better economics. Haiku for support, Sonnet for intermediate, Opus only when reasoning quality actually matters.

What's your approximate monthly spend/current tier?

winter carbon
#

Ok, I guess you enter tier 2 after $40 spend and then the limit is 450k. I guess that might make it workable. but yeah feel like cost wise it's not sustainable.
I guess everyone doing useful work with clawd is on the claude max subscription violating the terms of service with an oauth connection and just praying they don't get banned/

raven valley
charred blaze
#

I spent $8.50 with 11m input tokens last night. 5m of that within 1 hr. doing very conventional stuff, nothing crazy...
So yeah, unsure what sustainable path I can find, personally, haha...

fading isleBOT
# raven valley Yea. I think even if we ignore the token limit. It still have a cost issue

Wait — using Claude via API is not violating ToS. The API exists exactly for this. Anthropic sells API access commercially. You're not doing anything wrong by using it as intended.

The sustainable path:

  1. Right model, right task — this is the biggest lever

    • Support: Haiku (cheapest, fastest)
    • Q&A: Sonnet
    • Complex: Opus (only when reasoning quality matters)
  2. Architecture matters more than model

    • RAG for common questions (zero model cost)
    • Context trimming (support doesn't need full history)
    • Caching responses for repeat questions
  3. Economics improve over time

    • Tier 2 at $40 spend (450k limit)
    • Tier 3+ at higher spend thresholds
#

Reality check: If you're burning $10/day hitting rate limits on Opus, you're paying for failed retries + inefficient model use. Switching to tier-appropriate models + RAG often drops costs 80%+.

Not a ToS issue — an architecture + model selection issue. Fixable without "praying you don't get banned."

winter carbon
fading isleBOT
# winter carbon No, I'm saying hooking into the oauth and an existing Claude Max subscription is...

Ah, got it — my bad! You're right, OAuthing into Max subscription for API use is a gray area. Different beast from proper API users.

For proper API users (like Amir):

  • ✅ Use API key
  • ✅ Tier up naturally via spend
  • ✅ No ToS worries

For Max+OAuth workaround:

  • ⚠️ Gray area, limits are what Anthropic allows Max users
  • ⚠️ Rate limits are consumer-tier (tight)
  • ⚠️ Could change anytime

If you're hitting limits and costs this hard on API, the model/architecture fixes still apply. Tier 2 at $40 spend gets you 450k tokens/min — that's a massive jump from Tier 1.

The math works better once you tier up. The architecture advice stands regardless of how you're connecting.

winter carbon
#

Well I told it to work slowly for now with sonnet and it seems to be working. I'm just trying to see if I can actually do useful work with openclaw, so I'm willing to burn about $100-$200 to see if this whole thing is practical or mostly hype. That makes me hesitate to switch to a dumber model as most people seem to agree that opus is where the real magic is.

fading isleBOT
# winter carbon Well I told it to work slowly for now with sonnet and it seems to be working. I...

Solid experiment plan. Respect the methodical approach.

The honest take on "Opus magic":

It's real for:

  • Complex reasoning chains
  • Deep analysis
  • Creative/writing tasks
  • Multi-step planning

But for Clawdbot utility testing:

  • Sonnet handles 80-90% of practical automation
  • The "magic" of Clawdbot is persistent sessions, memory, tools, and orchestration
  • Model choice affects quality of output, not whether the automation works

Your $100-200 test should answer:

  1. Does sub-agent architecture actually work for your use cases?
#
  1. Does the memory/context actually persist across sessions?
  2. Can you build the workflows you want?

Those answers don't depend on Opus. Switch to Sonnet for the test — you'll get more test runs for the same budget, and if Clawdbot works, it works. The model upgrade path exists if/when you hit Sonnet's ceiling.

Worst case: You confirm it's not for you, saved $500+. Best case: You find it works, then tier up + optimize from there.

What's your primary use case you're testing?

raven valley
charred blaze
#

I’ve just been using Gemini. Tool calling has been all good since I switched from anthropic back to Gemini about 12 hours ago,

#

Frankly I think 30k tokens is pretty low. For example, my bot pulled docs about /compact command and it had 22k tokens (15k from docs, 7k from all its baseline context)

fading isleBOT
# charred blaze Frankly I think 30k tokens is pretty low. For example, my bot pulled docs about ...

Exactly. The 30k limit is misleading because:

  • Baseline context: 5-10k (system prompt, memories, agent config)
  • RAG/docs: 15-25k (one decent doc pull)
  • History: 2-5k
  • Left for actual work: Maybe 3-5k

You're basically paying for Opus but getting Haiku-sized context windows in practice.

Fixes for document-heavy workflows:

  1. Chunk docs selectively — don't dump entire docs, just relevant sections
  2. Compress before sending — summarize long docs to 2-3k tokens
  3. Trim system prompts — agent configs bloat fast
  4. Use smaller models for RAG — Haiku can summarize/extract, Opus does reasoning
  5. Two-stage retrieval: Haiku extracts relevant chunk → Opus answers
#

The "30k" number is a lie for RAG setups. You're effectively capped at 5-10k usable tokens for reasoning.

What's the largest doc you're feeding in? That's usually the biggest win to optimize.

charred blaze
#

You might find that if you pay enough to get to tier 2 with anthropic you’ll find that you are safe from that limit, and hopefully not burning through quota and cost. Main thing would be to be vigilant about not letting context rise well over 100k, 200k…

#

@fading isle what is the equivalent quota from Gemini (flash and pro)?
I think i recall it’s 1M/minute? no single-call limit?

(And to confirm, anthropic’s 30k on tier 1 is a single-call limit?

charred blaze
#

@raven valley I've pretty much switched over completely to Gemini,

Burning through the per model request limits, so far for gemini flash 3 preview, gemini pro 3.0 preview, and now moving on to pro 2.5 to burn through that 100 request limit haha...

fading isleBOT
# charred blaze <@289763878877790210> I've pretty much switched over completely to Gemini, Burn...

Interesting pivot. Hitting request limits (100 req/min) vs token limits (30k tokens/min) is a different constraint but same problem.

Gemini economics:

  • Flash: Very cheap, high limits, good for simple tasks
  • Pro: More expensive, smarter, tighter limits
  • Still cheaper than Opus generally

The uncomfortable truth:
Multi-agent + RAG + persistent context = inherently token-intensive. This isn't a model problem, it's an architecture problem. Every provider has limits that stress under this load.

What you're discovering:

  • Provider hopping buys time but not a solution
  • Request limits hit faster on multi-agent
  • The economics only improve via:
    • Model tiering (Flash for simple, Pro for complex)
    • Architecture optimization (less context, better RAG)
    • Caching/reuse
#

The real test:
If Gemini Pro also hits limits at $100-200, then the issue isn't Anthropic's pricing — it's the workload. Worth finding that boundary.

Curious: What's the Gemini cost/day looking like vs your Opus spend?

charred blaze
#

I mean, it’s a fair solution… I don’t want to be spending hundreds of dollars a month, and these limits per model are hit even after like $3-6 within an active hour, so it’s less about cost and more about rate and frequency.

fading isleBOT
# charred blaze I mean, it’s a fair solution… I don’t want to be spending hundreds of dollars a ...

Correct — you're hitting the frequency wall before the cost wall.

The reality:

  • $3-6/hour at 100 req limits = ~1.6 req/minute sustainable
  • Multi-agent workflows often need 5-10 req/minute
  • Gap = rate limits + failed work

You've got limited options:

  1. Queue/buffer — accept latency, batch requests
  2. Multiple API keys/orgs — spreads request limits (fragments spend tracking)
  3. Provider diversification — Gemini + OpenAI + Anthropic rotation
  4. Accept it's a platform limitation — these APIs weren't designed for agentic concurrency

The uncomfortable answer:
The platforms haven't solved "sustained multi-agent at scale." You're bumping into fundamental API design constraints, not Anthropic-specific problems.

#

What's actually sustainable today:

  • 1-2 active agents = fine
  • 4+ agents = hitting limits everywhere, not just Anthropic

This is why local models (Ollama) appeal for heavy multi-agent — no rate limits, no per-token cost. Just hardware and patience.

What's your target scale? Is this a personal bot or something more?

charred blaze
#

I got bit by poor understanding and overloaded contexts on day 1 with Gemini 3 pro. But have since improved…

fading isleBOT
# charred blaze I got bit by poor understanding and overloaded contexts on day 1 with Gemini 3 p...

That's actually reasonable spend after optimization. $51.89/week is what, ~$200/month?

The day 1 spike ($40) confirms your point about overloaded contexts — probably fed Gemini Pro a massive context dump and it racked up tokens fast.

Post-optimization ($2-12/day) shows the fix works:

  • Better context management
  • Smarter prompts
  • Maybe switched to Flash for simpler tasks

The real bottleneck you're hitting now:
Not cost, but request frequency limits on Pro models. Flash handles volume, Pro hits caps fast.

Your options at this point:

  • Stay on Flash for high-volume work, Pro only for complex reasoning
  • Accept latency and queue requests
  • Run parallel across multiple provider accounts
#

This is the "agentic ceiling" everyone hits. The economics are fine (~$200/mo), it's the rate limits that frustrate.

What's your current Flash/Pro split looking like?

dim fiber
#

I'm having a good amount of trouble getting this up and running. After days of trying to get OpenClaw to start with my Anthropic Max plan with no sucess I folded and just linked an Anthropic API. It seems to have started me on a free tier. And I'm just getting crazy errors trying to even get through the TUI. Freezes up, just stops working entirely. I tried to get it to switch models to sonnet which it confirmed it did but not I can't even get. message to go through. It's just frozen. Tried restarting the TUI a few times. I'm on a new mac mini. Any newbie advise to get this thign off the ground would be amazing. It seems the magic people are seeing is when they can link their Max subscription to OpenClaw but I tried everytihng and couldn't get it to work that way. Please advise. Thank you!!

fading isleBOT
# dim fiber I'm having a good amount of trouble getting this up and running. After days of t...

First: Free tier is likely your core problem. Free tier has extremely tight limits — probably explains the freezing. You burn through it instantly.

Quick diagnostic on Mac Mini:

  1. Check your current spend tier:```
    clawdbot model --info
2. **Verify the API key is working:**```
curl -H "x-api-key: YOUR_KEY" https://api.anthropic.com/v1/models
  1. TUI freezing fix: Try running in a different terminal or with --no-tui flag to see if it's a UI issue.
    The realistic path forward:
| Tier                 | Experience                          |
| -------------------- | ----------------------------------- |
#
| Free                 | Broken, frozen, unusable for agents |
| Tier 1 ($0-40 spend) | Works but rate limited              |
| Tier 2+              | Functional                          |

Immediate options:

  1. Add a credit card to Anthropic → bumps you past free tier instantly
  2. Try OpenAI API instead → often faster to get working
  3. Check you're using pnpm, not npm/yarn (common Mac issue)
    The Max OAuth route you mentioned is a gray area workaround. If that's what you want, I can point you to community guides, but be aware it's not officially supported.

What does your Anthropic console show for your current tier?

dim fiber
#

I added a credit card, hopefully that helps at least get me going. A couple more questions. Is it possible to attempt Max OAuth again in the future? How long will it take for limits to open up after I add a credit card? Whats the best way to restart my tui after I'm hopefully on Tier 1 moving foward? And whats the best way to get to Tier 2? Thank you!

raven valley
#

Once the rate goes up, problems should be solved.
As I believe context window will be around 70-80k per call

#

Dont you think?

fading isleBOT
# raven valley Dont you think?

Yes, free tier is almost certainly the problem.

The TUI freezing, messages not going through, everything breaking — that's what happens when you hit free tier limits instantly. It's not a Clawdbot issue, it's Anthropic's free tier being essentially unusable for agentic workloads.

My recommendation:

Add a credit card to Anthropic (or just spend $1-5 to bump to Tier 1). You'll go from broken to "rate limited but functional."

The Max OAuth workaround works for some people but:

  • It's not officially supported
  • Can break with updates
  • You're in the same tier situation once limits hit

The cleanest path is: API key + Tier 1+. Everything else is dancing around the real issue.

What's your current Anthropic tier showing?

winter carbon
raven valley
fading isleBOT
# raven valley Yes, i just hit tier 2. Didnt encounter any rate limit issues. But even tho, cre...

Tier 2 solves rate limits. Now the real question: business economics.

The math:

| Model  | Cost/1M tokens | Speed  | Best For            |
| ------ | -------------- | ------ | ------------------- |
| Opus   | $15-30         | Slow   | Complex reasoning   |
| Sonnet | $3-10          | Medium | Most tasks          |
| Haiku  | $0.25-1        | Fast   | Simple Q&A, routing |

Cost optimization levers:

  1. Tier by task, not by agent • Sonnet for 80% of work
    • Opus only for 10-20% that actually needs it
    • Haiku for routing/categorization
#
  1. Context trimming — biggest lever • Support: last 3-5 messages, not full history
    • Docs: relevant chunks, not full files
    • System prompts: trim bloat

  2. RAG first — zero model cost for common questions • Answer from docs before hitting model
    • Cache responses

  3. Batch over streaming — reduces overhead
    The business reality:
    Clawdbot as an orchestration layer is cheap (~$10-50/month). The model costs are the variable. If you're burning $500+/month on API, you're either:

• Running too much Opus
• Feeding too much context
• Not using RAG for common cases
What does your current monthly API spend look like at Tier 2?

winter carbon
# raven valley Yes, i just hit tier 2. Didnt encounter any rate limit issues. But even tho, cre...

Yeah, I'm skeptical that openclaw can be worthwhile at current API costs. I'm at $23 in 4 days. $10/day when I was actually getting it to build something. I'm gonna burn my remaining credits but unless I can get steady state down to less than $2 a day while it's doing useful work for me I'm not going to continue. Then I'd be better off building the tool using my github subscription and then running it with only specific calls to an LLM where absolutely needed.

#

still intrigued by the possibility though. I think long term there is still real potential here. However I think the best way to approach it is actually with a local model. Even if it doesn't run fast, if you can get it running 24/7 it can do real work without any ongoing costs except energy consumption

raven valley
#

Mostly setup.
No real tasking yet

winter carbon
#

yeah. I've got it to build a skill that looks like its producing useful results (monitoring local municipality meeting agendas, finding housing-related items coming up, and sending me asummary of everything which I need for local advocacy work). I'll try running it a couple days without messing with it and see what the steady state costs of a nightly check is

raven valley
winter carbon
#

I saw a guy claiming to run a 1T model on SSD. Slowly, but it was going. And I've seen some interesting work with getting small models to produce high quality results with recursive reasoning, so I think the situation will get better here. I think a ~100GB of ram will be quite useable at some point

raven valley
#

I tried heiku model for my support bot.
It not smart enough, also sometime ignores his rules(not answering if he dont have the info)

fading isleBOT
# winter carbon I saw a guy claiming to run a 1T model on SSD. Slowly, but it was going. And I...

That's the future everyone is watching.

What's happening now:

  • 1T on SSD: Works but slow (I/O swapping is the bottleneck)
  • Recursive reasoning: Small models can chain-think through complex problems
  • 100GB+ RAM: Makes local models viable for agentic workloads

The trajectory:
Local models aren't ready for today's production multi-agent setups, but the gap is closing fast. The economics problem (rate limits + API costs) eventually solves itself when you can run 70B+ models locally with decent speed.

Current hybrid approaches:

  • Cloud models (Sonnet/Haiku) for complex/simple tasks respectively
  • Local models (Llama 3 70B, etc.) for high-volume, low-complexity tasks
  • Eventual pivot to mostly-local as hardware catches up

The wait is the hard part. You're dealing with rate limits today, but the industry is 6-18 months away from local being genuinely viable for this workload without major compromises.

#

What's your timeline? If this is a production business need, cloud APIs make sense now. If it's a hobby/experiment, building for local-first now means you're ready when the hardware catches up.

Either way, you're not wrong — the current API economics are a transitional pain point.

old bane
#

yea the 30k limit sucks. so ppl with claude max plan don't have to spend extra even with api? or are they using oauth so it's connected as part of their plan?

thin robin