#Trying to learn how to make an AI voice model!

1 messages · Page 1 of 1 (latest)

crisp prism
#

Hi! I'm trying to learn to make my own so I don't bother someone with making 'em for me! My GPU Is the picture provided and my OS is Windows 11! I'm not sure if this goes under the tag Model Training but I'll give it a try!

misty marsh
# crisp prism Hi! I'm trying to learn to make my own so I don't bother someone with making 'em...

Hello

I've checked the VRAM of your laptop GPU, and unfortunately it's only 6gb vram compared to the 8gb vram minimum needed for atleast 8 batch size on model training, which needs atleast 8gb vram

as said in https://docs.aihub.gg/rvc/resources/training/#batch-size:

For 30+ minutes of data batch size 8 is recommended and for less than 30 minutes batch size 4 is recommended

Your PC GPU can ofcourse still train, but wouldn't be able to use batch size 8, only 4

#

Do you wanna do it locally (running on your hardware), or cloud (remote good pc,with limited free time) ?

crisp prism
misty marsh
# crisp prism Would you mind elaborating on this?

okay so

batch size: a training setting, its the number of training examples used in one iteration before updaing the model's parameters. For **datasets **(set of the audio data cleaned of the voice to train) that are longer than 30 minutes, need batch size 8, while shorter ones its suggested to use batch size 4

batch size 8, uses 8gb of vram,
batch size 4, uses 4gb of vram

Vram is the memory of your GPU, which in your case is just 6GB
Meaning, you'd be able to train models with a batch size 4, shorter than 30 minutes of dataset, but can't train models with a batch size 8 (for longer datasets)

There's 2 ways you could train models:

  • Locally: Using your own hardware, your gpu, so having unlimited time, but in your case not being able to use batch size 8 and might be kinda slower since it's a laptop gpu
  • Cloud: using a remote good gpu service, which has limited time depending on the service, but has better gpus
#

training models is ofcourse a not so easy task, especially to a beginner, that takes a lot of time

was i clear enough?

crisp prism
#

Yeah, still a little confusing (super new to this haha) but I think I'll just go with locally

misty marsh
crisp prism
misty marsh
crisp prism
#

It's the character mafioso from forsaken/dream game, literally the most footage I can find is about 50 seconds long and i presume thats not enough </3

misty marsh
crisp prism
#

i can try and find more, but he doesn't have that many voicelines in game

misty marsh
#

I would really suggest you to find as many as you can, and try to follow that dataset creation guide, there's a lot you can learn about

crisp prism
#

Alright, I'll try my best to find more

forest estuary
#

8 GB is recommended for training with batch size 8

#

or, you can enable checkpointing option in Applio

forest estuary
crisp prism
forest estuary
#

even comparing with using synthetic dataset created by inferring with the former trained model

#

(and it sounded horrible on my first try)

misty marsh
crisp prism
#

I see

#

Thank you guys sm!