#Guidance for long datasets

1 messages · Page 1 of 1 (latest)

twin radish
#

Hi all,

I am looking for some guidance regarding long datasets.
Everything below 60 minutes works great with RVC Mainline and Applio.

However, everything above that does not seem to work correctly. Either feature extraction or index creation seem to not work that well with longer datasets.

Is there anywhere I can read up on this, or someone who is an expert in longer datasets?

Any help is greatly appreciated 🙂

modern root
#

I believe RVC Disconnected (Colab) can deal with datasets of over 60 minutes. Theres someone at least thats training one with 1:30h

Quite frankly, I have also trained a dataset on RVC Mainline with over 60 minutes and it worked aswell now that I think about it

twin radish
# modern root This is not a direct answer to your problem so I am sorry if you were hoping for...

thanks for the quick reply. The dataset is clean and I do get good results, however, I feel that with a larger dataset, I could get more 'variance' into it. Like different expression still sometimes sound a bit 'robotic'. If I could extend the dataset with more sounds and more speach, I think it should be possible to cover most sounds and minimize robotic sounding stuff, like laughing or breathing.

honest timber
# twin radish Hi all, I am looking for some guidance regarding long datasets. Everything bel...

Not to discourage training 1h+ datasets, but note that:

  • the audio quality, cleanness, consistency, and variation (intonations, emotions, pitch range) are some important factors
  • default pretrain can work well on long datasets
  • for 1h+ long datasets, in index training MinibatchKMeans shrinks the index file size from supposedly 500+ MB to only about 10% of it, also drastically reducing indexing accuracy. find and remove a piece of code like this if you want.
    if big_npy.shape[0] > 2e5:
        big_npy = (
            MiniBatchKMeans(
                n_clusters=10000,
                verbose=True,
                batch_size=256 * cpu_count(),
                compute_labels=False,
                init="random",
            )
            .fit(big_npy)
            .cluster_centers_
        )