#Give me a summary of instant voice cloning, including ideal audio format, properties, and instructi
1 messages · Page 1 of 1 (latest)
Here’s a concise, docs-verified summary of Instant Voice Cloning (IVC) and how to use it.
What IVC is
- IVC creates a usable clone of a voice near-instantly from short sample audio; it does not train a dedicated custom model (unlike Professional Voice Cloning). It works best when the target voice is similar to voices seen during model training. (elevenlabs.io)
Ideal audio format and length
- Accepted: many formats; recommended container/codec is MP3. Help Center recommends MP3 at 192 kbps or higher; the IVC guide notes MP3 at 128 kbps or higher is sufficient and higher bitrates offer little benefit. In both cases, focus on clean capture quality rather than uncompressed formats (WAV yields little to no improvement). (help.elevenlabs.io)
- Length: provide about 1–2 minutes total of clear speech; avoid using more than ~3 minutes as it brings little benefit and can hurt results. (elevenlabs.io)
- Capture guidelines: single speaker only; no background noise, music, reverb, or artifacts; keep tone/performance and recording quality consistent across samples; match the performance style you want the clone to reproduce; balanced loudness (no clipping, no very low levels). (elevenlabs.io)
Voice “properties” (settings you can tune when using the clone)
- stability (0–1): higher = more monotone/consistent; lower = more expressive/variable. (elevenlabs.io)
- similarity_boost (0–1): how closely to mimic the target voice; very high values can also replicate artifacts if the sample had them. (elevenlabs.io)
- use_speaker_boost (boolean): subtly increases similarity at the cost of some latency. (elevenlabs.io)
- style (style exaggeration): amplifies the speaker’s style; may reduce stability—generally keep at 0 unless you need it. (elevenlabs.io)
- speed: playback speed multiplier; 1.0 is default. (elevenlabs.io)
- Practical starting point: stability ~50, similarity ~75, style 0; adjust per voice and use case. (elevenlabs.io)
How to use IVC (web app)
- Go to Voices → Add a new voice → select Instant Voice Clone.
- Upload or record your audio, then name/label the voice and confirm you have rights/consent to clone it.
- Your clone appears under Voices → Personal and is ready to use. (elevenlabs.io)
How to use IVC (API)
- Create a clone by POSTing multipart form data to /v1/voices/add with name and one or more audio files (optionally remove_background_noise=true, but only if the samples actually have noise). The response returns voice_id. (elevenlabs.io)
- Generate speech with the clone by calling POST /v1/text-to-speech/{voice_id} (optionally pass voice_settings to override stability/similarity/style/speed per request). (elevenlabs.io)
Policy and availability notes
- In the web UI you must confirm you have the right and consent to clone the voice. (elevenlabs.io)
- IVC
For IVC, should non-verb vocalizations be used or not
Short answer: No—don’t include non‑verbal vocalizations (e.g., “um/uh,” laughs, gasps, exaggerated breaths, mouth clicks) in IVC reference audio unless you explicitly want the cloned voice to reproduce them. The IVC model mimics everything it hears—including breathing patterns, mouth clicks, and filler sounds—so mixing these in can confuse the model and make results less predictable; keep reference audio speech‑only, clean, and consistent. (elevenlabs.io)
Related best practices: provide about 1–2 minutes of clear spoken audio with no background noise or long silences. (elevenlabs.io)
Sources:
Instant Voice Cloning | ElevenLabs Documentation
Instant Voice Cloning | ElevenLabs Documentation
I'll often have to generate non-verbal vocalizations like um/uh. No emotions like laughing/gasps however. if I trained the voice with just clean, consistent speech only, will I be able to generate such vocalizations?