#what are the benchmarks surrounding scribe v2 realtime? Is the sub 150ms the upperbound and how are
1 messages · Page 1 of 1 (latest)
Here’s what ElevenLabs has published publicly about Scribe v2 Realtime:
Latency/“under 150 ms”
- Positioning: “live transcription in under 150 ms” and “~150 ms latency” are the official claims for Scribe v2 Realtime. No percentile, SLA, or distribution is stated. (elevenlabs.io)
Benchmarks mentioned
- Comparative accuracy: The Scribe v2 Realtime launch notes say it “significantly outperforms all other models” on a set of 500 difficult live samples (noise + complex content). No numeric WER table is provided there. The same post also notes “93.5% accuracy across 30 commonly used European and Asian languages,” without a corresponding WER breakdown. (elevenlabs.io)
- Marketing pages reiterate that Scribe Realtime “beats” competing models and emphasize ~150 ms streaming latency; again, no per-dataset WER table is shown on those pages. (elevenlabs.io)
How WER is defined in the docs
- The documentation defines Word Error Rate (WER) generally as “how many errors are present in a transcript compared to a reference transcript,” and gives language-level WER bands for Scribe v1/v2 (batch STT). It does not describe the exact scoring recipe (e.g., case/punctuation normalization, tokenization, datasets) for Scribe v2 Realtime. (elevenlabs.io)
What’s not published
- As of today (February 10, 2026), ElevenLabs’ public blog/docs/help center do not specify whether “sub‑150 ms” is an upper bound versus an average/percentile for Scribe v2 Realtime, nor do they publish a formal WER evaluation protocol specific to the realtime model (datasets, preprocessing, or scoring configuration). If you need those specifics, you’ll need to request them from ElevenLabs directly.
Sources
- Introducing Scribe v2 Realtime (blog): live “under 150 ms,” 500 hard-sample comparison, and 93.5% multi-language accuracy claim. (elevenlabs.io)
- Realtime Speech to Text landing pages: “~150 ms” latency, comparative accuracy positioning. (elevenlabs.io)
- Transcription docs (capabilities): model list shows “Low latency (~150ms)” for Scribe v2 Realtime; general WER definition and language WER bands (batch models). (elevenlabs.io)
- Speech to Text product page: “Real-time in under 150 ms” and comparison graphic language. (elevenlabs.io)
If you want, I can ask the ElevenLabs team for the exact latency measurement methodology (e.g., end-to-end vs. model-only, network region, percentile) and the precise WER evaluation procedure for Scribe v2 Realtime.
Sources:
Introducing Scribe v2 Realtime
Introducing Scribe v2 Realtime
Scribe v2 Realtime Speech to Text - 150ms Latency API
Transcription | ElevenLabs Documentation
Introducing Scribe v2 Realtime
Scribe v2 Realtime Speech to Text - 150ms Latency API
Transcription | ElevenLabs Documentation
Speech to Text — Most Accurate Speech to Text Model
where can i find the starter code docs for building scribe v2 realtime in python?