#NPIM Test Model - Nell (without pre-trained Model)

38 messages · Page 1 of 1 (latest)

graceful pumice
#

One of the questions many people have wondered about for a long time is, 'Is it possible to create a model without using a pre-trained model?' Recently, I haven't been able to focus much on this due to numerous modifications that needed to be made, but I can finally provide an answer.

As the name suggests, NPIM stands for Non-Pretrained Independent Model. This means that this model is created purely from the voice data of the model itself, without the use of any pre-trained models.

While it may not be efficient in many ways, it is possible if you have sufficient data. However, it requires a tremendous amount of time and data. Due to its inefficiency, using a pre-trained model is probably better for your mental well-being. Additionally, in most cases, lowering the index value allows the model to refer to data from the pre-trained model, enabling it to infer even when the model's own data is lacking. However, since this model is trained solely on its own dataset, in some cases, a higher index value may be required to reduce artifacts.

Because the dataset is entirely composed of Korean, artifacts may occur in some pronunciations that are difficult to produce in Korean.

**Info **
Dataset : 275 mins (4.35 Hours) [Normal / Angry / Sad / Shout / Sing]
Batch Size : 16
f0 method : RVMPE
Total Epochs : 4200+ Epochs / 185000+
Sample Rate : 48khz
VA : Chung Ah Han

Model Link - https://huggingface.co/SeoulStreamingStation/RVC_Voice_Models/resolve/main/Nell_Test_NPIM.zip?download=true

sharp fulcrumBOT
#
Rate the model

To rate this model, either use /rate or click the buttons below.

sharp fulcrumBOT
graceful pumice
short ivy
#

for some reason the high ends look like static

graceful pumice
# short ivy for some reason the high ends look like static

When training a model without using any pre trained model, a very strong mirrored spectrum appears, where the upper and lower parts are flipped symmetrically. as training progresses, this mirroring effect gradually decreases. however, the problem I've consistently encountered is that if there isn't enough data, the model tends to overfit before reaching the point where this effect disappears. additionally, if you train the model with very clean and strong sounds at the early stages, high-frequency noise doesn't easily dissipate, and the strong sounds create severe harmonics due to the mirroring effect.

On the other hand, when training with voices that have a soft, mid-to-low tone and datasets that still retain some noise (ensuring all data that could influence the entire spectrum is preserved), the inverted mirroring effect tends to resolve more quickly. moreover, as the amount of data increases—not just in volume, but in variety—the model can continue training safely without falling into an overfitting state, until the mirroring effect is fully eliminated. therefore, I believe it's crucial to carefully calculate the batch size and the amount of data. of course, since it's difficult for me to do this precisely, I've been manually adding and removing data to test the results. fortunately, the models currently in training (KLM4.2 48khz and 32khz) show almost no upper-spectrum mirroring patterns

#

If that mirroring pattern isn't removed from the pre-trained model, it will remain permanently when training with regular models. If you fine-tune a model with that pattern still present, the inverted pattern will continue to appear throughout the model's output

short ivy
#

by sync i mean, do you have tensorboard log steps per epoch instead of just having it log per 200 steps

graceful pumice
#

yes, I did. actually applio did for me. 🙂

short ivy
#

just to make sure

graceful pumice
#

oh I'll show you later if you don't mind. I can't stop right now because I have a model currently in training

#

I'll send you DM. 🙂

short ivy
#

on tensorboard

short ivy
cedar egret
#

I think it's cool a model can be trained off of purely the audio

restive ruin
#

4200 epochs for a 275 min data? impossible.

#

unless its how it works

#

also one of my deepest thoughts has been anwsered thanks to dis haha

hoary root
restive ruin
hoary root
civic crane
#

how do you do training without any pretrained like OG or custom in mainli RVC ? @graceful pumice

graceful pumice
restive ruin
#

hmmm

#

if its possible, if someone can, test inferencing a model with index ratio to 2

#

probably model will have convulsations or idk xD

cedar egret
restive ruin
short ivy
#

Well you use 1 u are relying 100% on the dataset more than the input

#

Can't go over that

restive ruin
#

So it isn't possible to use negative index or more than 1

lucid doveBOT
#
Model Ready ✅

This model has been synced with Weights and is ready to use for free!