#Should I use Scribe v2 for non real-time speech to text or stick to v1
1 messages · Page 1 of 1 (latest)
Short answer: Stick with Scribe v1 for non–real-time (batch) transcription. Use Scribe v2 Realtime only when you need low‑latency streaming.
Why:
- Scribe v1 is documented as the best choice when high‑accuracy transcription is required rather than real‑time, with features like word‑level timestamps, speaker diarization (up to 32 speakers), and audio‑event tagging across 99 languages. (elevenlabs.io)
- Scribe v2 is presented as “Scribe v2 Realtime,” built for live streaming with ~150 ms latency, predictive transcription, VAD, and commit control (WebSocket API) across 90+ languages—i.e., optimized for real‑time agentic use, not positioned as a batch/offline model. (elevenlabs.io)
- ElevenLabs’ product pages explicitly separate the two: Scribe v2 Realtime for live transcription; Scribe v1 for transcribing, captioning, and editing recorded content. (elevenlabs.io)
If your workload is uploading files and getting accurate, fully featured transcripts, keep using Scribe v1. Switch to Scribe v2 Realtime only if you need sub‑second streaming. (elevenlabs.io)
Sources:
Speech to Text | ElevenLabs Documentation
Scribe v2 Realtime Speech to Text - 150ms Latency API
Speech to Text — Most Accurate Speech to Text Model
Speech to Text | ElevenLabs Documentation
What about scribe v1 experimental?
Scribe v1 experimental is an “experimental preview” of the Scribe v1 Speech-to-Text model released on April 7, 2025. It improves performance on mixed-language audio, reduces hallucinations around silent segments, and enhances audio tag detection. (elevenlabs.io)
How to use it: call the Speech-to-Text Create Transcript endpoint with model_id=scribe_v1_experimental. The API docs list scribe_v1 and scribe_v1_experimental as the available batch STT models. (elevenlabs.io)
When to choose it: ElevenLabs positions Scribe v1 as the high‑accuracy batch model (word‑level timestamps, diarization up to 32 speakers, 99 languages). If your files contain multiple languages or long silences, the experimental variant may yield better results; otherwise the standard Scribe v1 remains the default for non‑realtime transcription. (elevenlabs.io)
Note: For live/streaming use cases, use Scribe v2 Realtime; Scribe v1 (including the experimental variant) is for batch/non‑realtime transcription. (elevenlabs.io)
Sources:
April 7, 2025 | ElevenLabs Documentation
Create transcript | ElevenLabs Documentation
Speech to Text | ElevenLabs Documentation
Scribe v2 Realtime Speech to Text - 150ms Latency API
Worth using experimental over regular v1?