#are generated voices prone to pops and audio problems?
1 messages · Page 1 of 1 (latest)
Generated voices can occasionally experience pops, audio distortions, or other issues, but these are typically tied to specific factors. Here are some common causes and ways to address them:
-
Training Data Quality: If you're using a cloned voice, inconsistencies in the training audio—such as background noise, varying volume levels, or sudden bursts of energy—can lead to instability in the generated output. Ensuring clean, consistent audio during training is crucial.
-
Audio Degradation in Long Generations: Longer text-to-speech generations may result in audio quality degradation, such as buzzing or muffled sounds. To mitigate this, it's recommended to break the text into smaller sections (under 800-900 characters) and use tools like the Studio feature for better control.
-
Settings Adjustments: Overly high settings for "stability" or "style" can sometimes cause instability, including pops or exaggerated effects. Keeping these settings at default or lower levels can help maintain audio quality.
-
Voice Model Limitations: Some older or experimental models, like Multilingual v1, are more prone to issues like noise or corrupt speech. Using newer models, such as Multilingual v2 or Turbo v2.5, can improve stability and reduce artifacts.
If you encounter persistent issues, reviewing your training data for consistency and ensuring proper settings can often resolve them. For cloned voices, normalizing the audio to an RMS level between -23 dB and -18 dB and keeping the true peak below -3 dB is recommended.
this is a generative AI voice generated by eleven labs