#Why are the concurrency requests on the API so low?? How is it I can only do 15 concurrent requests

1 messages · Page 1 of 1 (latest)

feral prismBOT
#
AI-generated Answer for kes0953

Short answer: ElevenLabs limits by concurrent in-flight requests per account (not requests-per-minute). The exact concurrency you get depends on your subscription tier and the TTS model family you call. Requests above your limit are queued (typically adding ~50 ms) rather than hard-rejected, and you can monitor limits via the response headers maximum-concurrent-requests and current-concurrent-requests. (elevenlabs.io)

What your “15 concurrent” cap likely means

  • If you’re using the Multilingual v2 TTS model, Scale/Business tops out at 15 concurrent; Pro is 10. By contrast, the Turbo and Flash TTS models have higher limits: Pro 20, Scale/Business 30. Switching to Turbo/Flash can double your effective concurrency on the same plan. (elevenlabs.io)
  • The Help Center summarizes the same policy: “Flash and Turbo” have higher concurrency than “all other models,” with Scale/Business shown as 30 vs 15, respectively. (help.elevenlabs.io)

Why this doesn’t equal throughput limits

  • ElevenLabs uses a queue once you hit your plan’s concurrency; in practice this “typically only adds ~50ms of latency,” so you can still sustain high request volumes even with small concurrency numbers. As a rule of thumb from the docs, a concurrency limit of 5 can often support ~100 simultaneous audio broadcasts, depending on speaking patterns. (elevenlabs.io)

How to raise effective concurrency today

  • Use Turbo or Flash TTS instead of Multilingual v2 to access the higher per-plan limits. (elevenlabs.io)
  • Prefer WebSocket TTS: with websockets, only the time while audio is being generated counts toward concurrency; the rest of the connection time does not. (elevenlabs.io)
  • Upgrade your plan or contact Enterprise for elevated limits beyond the published caps. (help.elevenlabs.io)

What error you’ll see if you exceed limits

  • The API returns HTTP 429 with message too_many_concurrent_requests when you surpass your plan’s concurrency; system_busy indicates temporary platform load. (help.elevenlabs.io)

If you want, share which TTS model and plan you’re on—I can point to the exact limit and the least-effort way to increase it per the docs. (elevenlabs.io, help.elevenlabs.io)

dusky quartz
#

What??/ So will I get 429 errors or 50ms latency?? Your answer doesn't make sense.