FEEDBACK WANTED
This model was trained on the entirety of Eddy's lines from Season 1, which comes out to roughly 1 hours worth of speaking. I made a program that clips a character's lines from a TV Show and, while not perfect, just applied a "main vocal only" filter to the entire data set and then trained off of that.
I'm wondering if I should make some tweaks to my auto-clipper program to get better bits of speech or if a bit of manual clipping/ cleaning is inevitable. In the training data there's the odd word from another character that might be a corrupting factor as well as the lower fidelity caused from stripping loud music and SFX away from the voice.
Model: https://huggingface.co/EldrickPica/Eddy/resolve/main/eddy.zip?download=true
Here's the repo link which I added the data trained off of, as lazy_eddy.flac: https://huggingface.co/EldrickPica/Eddy/tree/main