what are the benchmarks surrounding scribe v2 realtime? Is the sub 150ms the upperbound and how are | ElevenLabs | Page 1

AI-generated Answer for himonopoly

Here’s what ElevenLabs has published publicly about Scribe v2 Realtime:

Latency/“under 150 ms”

Positioning: “live transcription in under 150 ms” and “~150 ms latency” are the official claims for Scribe v2 Realtime. No percentile, SLA, or distribution is stated. (elevenlabs.io)

Benchmarks mentioned

Comparative accuracy: The Scribe v2 Realtime launch notes say it “significantly outperforms all other models” on a set of 500 difficult live samples (noise + complex content). No numeric WER table is provided there. The same post also notes “93.5% accuracy across 30 commonly used European and Asian languages,” without a corresponding WER breakdown. (elevenlabs.io)
Marketing pages reiterate that Scribe Realtime “beats” competing models and emphasize ~150 ms streaming latency; again, no per-dataset WER table is shown on those pages. (elevenlabs.io)

How WER is defined in the docs

The documentation defines Word Error Rate (WER) generally as “how many errors are present in a transcript compared to a reference transcript,” and gives language-level WER bands for Scribe v1/v2 (batch STT). It does not describe the exact scoring recipe (e.g., case/punctuation normalization, tokenization, datasets) for Scribe v2 Realtime. (elevenlabs.io)

What’s not published

As of today (February 10, 2026), ElevenLabs’ public blog/docs/help center do not specify whether “sub‑150 ms” is an upper bound versus an average/percentile for Scribe v2 Realtime, nor do they publish a formal WER evaluation protocol specific to the realtime model (datasets, preprocessing, or scoring configuration). If you need those specifics, you’ll need to request them from ElevenLabs directly.

Sources

Introducing Scribe v2 Realtime (blog): live “under 150 ms,” 500 hard-sample comparison, and 93.5% multi-language accuracy claim. (elevenlabs.io)
Realtime Speech to Text landing pages: “~150 ms” latency, comparative accuracy positioning. (elevenlabs.io)
Transcription docs (capabilities): model list shows “Low latency (~150ms)” for Scribe v2 Realtime; general WER definition and language WER bands (batch models). (elevenlabs.io)
Speech to Text product page: “Real-time in under 150 ms” and comparison graphic language. (elevenlabs.io)

If you want, I can ask the ElevenLabs team for the exact latency measurement methodology (e.g., end-to-end vs. model-only, network region, percentile) and the precise WER evaluation procedure for Scribe v2 Realtime.

Sources:
Introducing Scribe v2 Realtime
Introducing Scribe v2 Realtime
Scribe v2 Realtime Speech to Text - 150ms Latency API
Transcription | ElevenLabs Documentation
Introducing Scribe v2 Realtime
Scribe v2 Realtime Speech to Text - 150ms Latency API
Transcription | ElevenLabs Documentation
Speech to Text — Most Accurate Speech to Text Model

#what are the benchmarks surrounding scribe v2 realtime? Is the sub 150ms the upperbound and how are