#I want to understand the most effective way to voiceover my text with timeline codes. I have my ow
1 messages · Page 1 of 1 (latest)
Short answer: If you already have target‑language subtitles with timecodes (SRT/CSV) and you want to keep using your own voice, the cheapest path is to clone your voice once, then generate audio per subtitle segment via the Text‑to‑Speech API and place each clip at its SRT time. Use the speed and timestamps features to fit each line to its slot. Dubbing Studio is faster to operate end‑to‑end, but it’s billed per minute and costs more.
Lowest‑cost, precise workflow (API)
- Clone your own voice (Instant or Professional) and use it with TTS. ElevenLabs only permits cloning your own voice and requires verification for Professional Voice Cloning. Recommended training length is 30–60+ minutes, ideally ~3 hours for best quality.
- For each SRT cue, call Text‑to‑Speech “create speech with timing” to get audio plus per‑character alignment; this gives you the exact duration of the generated line. If the duration doesn’t match your SRT window, adjust the voice speed and regenerate until it fits, then place the clip at the cue’s start time. Speed can be set per voice or per request; typical range is 0.7–1.2.
- Billing: Text‑to‑Speech charges credits per character (model‑dependent; Multilingual v2 is 1 credit/char; Turbo/Flash models can be 0.5 credits/char). This is generally the least expensive option when you already have the translation and timing.
When to use Dubbing Studio instead
- If you want an all‑in‑one timeline editor that separates speakers, preserves the original speaker’s voice automatically, lets you edit the transcript/translation, and exports finished deliverables (MP4/AAC/WAV tracks/AAF/SRT), use Dubbing Studio. You can import from YouTube/URL directly.
- If you want your pre‑timed script to be followed exactly, create a Manual Dub: provide the source video, background and foreground audio, and a CSV with columns speaker,start_time,end_time,transcription,translation (you can convert your SRT into this CSV). This locks timing to your file.
- Cost: Dubbing is billed per minute of source media. As of Aug 19, 2024, indicative rates were 2000–3000 credits/min for automatic (watermarked vs not) and 5000–10000 credits/min for Dubbing Studio editor; you’ll see the exact price before you start. This is typically more than straight TTS.
Recommendation for your case
- If you already have the target‑language SRT and only need your own voice: use the TTS API with your cloned voice and SRT‑driven batching; adjust speed using the timestamps loop for tight sync. This minimizes credits.
- If you also want automated separation, a visual timeline, and turnkey exports (or you prefer to start from your YouTube link): use Dubbing Studio. If you want it to follow your exact timecodes, use Manual Dub with the CSV converted from your SRT. Expect higher per‑minute cost.
Notes
- Your cloned voice can be used across supported multilingual models; choose the model that fits your quality/cost needs (e.g., Multilingual v2 for quality; Turbo/Flash for cost).
If you’d like, share a short SRT snippet and I can show the exact TTS calls and a speed‑adjust loop.
here is part of SRT
1
00:00:00,160 --> 00:00:05,960
What if fairy tales were real?
2
00:00:03,439 --> 00:00:09,200
What if there truly was a world of toys,
3
00:00:05,960 --> 00:00:12,240
with tiny dollhouses and the Nutcracker himself
4
00:00:09,200 --> 00:00:13,719
walking straight out of a children’s book?
Thanks for sharing the SRT excerpt. Here are the two supported workflows that map cleanly to ElevenLabs features, with exact steps and what they cost.
Option A — Keep your own voice automatically and honor timecodes in an editor
- Use Dubbing Studio. It preserves the original speaker’s voice automatically (track/clip cloning) and lets you lock clip durations to your timecodes. You can import the source via file or a YouTube URL. Exports include SRT, AAF timeline, and per‑speaker WAVs. (elevenlabs.io)
- If you want Studio to follow your exact SRT timing, create a Manual Dub project. Convert your SRT to the required CSV with columns: speaker,start_time,end_time,transcription,translation and use the h:mm:ss,ms time format. Studio will generate fixed‑duration clips for each row. Note Manual Dub requires: video file + background audio file + foreground audio file + the CSV. (elevenlabs.io)
- Cost: Dubbing Studio is priced per minute of source media. As of Aug 19, 2024, help center rates are 5,000 credits/min with watermark, 10,000 credits/min without. If you use ElevenLabs to translate inside Studio, translation adds 1 credit/character; if you provide your own translation in the CSV, that translation charge doesn’t apply. You’ll see the exact cost before creation. (help.elevenlabs.io)
How to convert your sample SRT to a Manual Dub CSV
- SRT
- 00:00:00,160 --> 00:00:05,960 What if fairy tales were real?
- 00:00:03,439 --> 00:00:09,200 What if there truly was a world of toys,
- 00:00:05,960 --> 00:00:12,240 with tiny dollhouses and the Nutcracker himself
- 00:00:09,200 --> 00:00:13,719 walking straight out of a children’s book?
- CSV (example structure; use your real source text for “transcription” and your target language for “translation”)
- speaker,start_time,end_time,transcription,translation
- Narrator,0:00:00.160,0:00:05.960,"[source line]","What if fairy tales were real?"
- Narrator,0:00:03.439,0:00:09.200,"[source line]","What if there truly was a world of toys,"
- Narrator,0:00:05.960,0:00:12.240,"[source line]","with tiny dollhouses and the Nutcracker himself"
- Narrator,0:00:09.200,0:00:13.719,"[source line]","walking straight out of a children’s book?"
- In Studio, you can keep “Fixed Generations” to force audio to fit each clip’s duration or use “Dynamic Generations” if you prefer the model to adjust duration to the text. (elevenlabs.io)
Option B — Lowest cost programmatic pipeline using your cloned voice and your SRT
- Clone your own voice (IVC or PVC) to obtain a voice_id, then synthesize per SRT row with Text‑to‑Speech. IVC/PVC are the supported ways to keep using your own voice. (elevenlabs.io)
- Call the “Create speech with timing” or streaming “with‑timestamps” endpoints to get audio plus character‑level timestamps; assemble the clips on your timeline at the SRT start/end times. Note that TTS does not natively “stretch” to a target duration; if you must hit exact window lengths you’ll need to time‑trim/timestretch in your DAW. (elevenlabs.io)
- Cost: Text‑to‑Speech is billed per character. Multilingual v2 is 1 credit/character; Turbo/Flash models are 0.5 credits/character (cheapest) per recent pricing updates. (help.elevenlabs.io)
Which should you pick?
- Need your original voice preserved automatically with built‑in timeline tools and
guide me step by step in API option