#Trying to learn how to make an AI voice model!
1 messages · Page 1 of 1 (latest)
Hello
I've checked the VRAM of your laptop GPU, and unfortunately it's only 6gb vram compared to the 8gb vram minimum needed for atleast 8 batch size on model training, which needs atleast 8gb vram
as said in https://docs.aihub.gg/rvc/resources/training/#batch-size:
For 30+ minutes of data batch size 8 is recommended and for less than 30 minutes batch size 4 is recommended
Your PC GPU can ofcourse still train, but wouldn't be able to use batch size 8, only 4
Do you wanna do it locally (running on your hardware), or cloud (remote good pc,with limited free time) ?
Would you mind elaborating on this?
okay so
batch size: a training setting, its the number of training examples used in one iteration before updaing the model's parameters. For **datasets **(set of the audio data cleaned of the voice to train) that are longer than 30 minutes, need batch size 8, while shorter ones its suggested to use batch size 4
batch size 8, uses 8gb of vram,
batch size 4, uses 4gb of vram
Vram is the memory of your GPU, which in your case is just 6GB
Meaning, you'd be able to train models with a batch size 4, shorter than 30 minutes of dataset, but can't train models with a batch size 8 (for longer datasets)
There's 2 ways you could train models:
- Locally: Using your own hardware, your gpu, so having unlimited time, but in your case not being able to use batch size 8 and might be kinda slower since it's a laptop gpu
- Cloud: using a remote good gpu service, which has limited time depending on the service, but has better gpus
training models is ofcourse a not so easy task, especially to a beginner, that takes a lot of time
was i clear enough?
Yeah, still a little confusing (super new to this haha) but I think I'll just go with locally
alright, how long would your model dataset training be tho?
That means the amount/time of voicelines used to train the model yeah?
the amount of total time of the cleaned up voicelines
It's the character mafioso from forsaken/dream game, literally the most footage I can find is about 50 seconds long and i presume thats not enough </3
RVC technically does not have a limit, tho it would be better if you try to find atleast 5-15 mins for a good/decent model
if you want, there's alot of info about dataset creation in https://docs.aihub.gg/rvc/resources/dataset-isolation/
Last update: May 5, 2025
i can try and find more, but he doesn't have that many voicelines in game
I would really suggest you to find as many as you can, and try to follow that dataset creation guide, there's a lot you can learn about
Alright, I'll try my best to find more
tl;dr RTX 3060 laptop has 6 GB vram
8 GB is recommended for training with batch size 8
or, you can enable checkpointing option in Applio
it is still possible to train but you'd barely get good/stable results as 10 min/more than 30 min one
Would rdcycling/looping the audio help? To make it longer?
the best bet is to test and compare
even comparing with using synthetic dataset created by inferring with the former trained model
(and it sounded horrible on my first try)
Looping the audio would just be a waste of time