this model was trained on 41 seconds of audio, i also used the titan pretrain.
sorry if the model sound like a tv, all the audio i got was ripped from squidwards tv on the episode "squid vision" so the audio will be a little weird.
I used a rtx 3070 for this
