Hey everyone đź‘‹,
I’ve been working on training a Dudley (from Street Fighter) AI voice model. Since there wasn’t a complete dataset for him, I used Chatterbox to generate around 5 minutes of audio lines to build a dataset. I can share the audio dataset if anyone’s interested, and I’ll also share the trained model. The model does sound like him, but the main issue is that it doesn’t sound very natural — kind of robotic/flat at times.
I’d really appreciate any advice on:
- How to make the output sound more natural and expressive.
- Whether I should expand my dataset (e.g., longer clips, more varied emotion/pitch).
- Any training settings (epochs, configs, augmentations) you recommend tweaking for better realism.
Thanks in advance 🙏