#Nell [KLM4.4 x2 MRF] [MRF-HIFIGAN] [RVMPE 1001 Epochs]

1 messages · Page 1 of 1 (latest)

devout vessel
#

This model is a NELL model designed to test the KLM 4.4 MRF pre-trained model.

For testing this model, it is recommended to use Applio 3.2.8 or Codename Fork.

In the case of the 4.4 Model, high-frequency samples amplified in the BAD SAMPLE channel have been included in the dataset to suppress mirroring within the model itself. These samples exceed the sample rate, causing aliasing, and by training the model with these samples, the generator is guided to avoid producing spectrograms that do not exist in the inference target. This approach is similar to deliberately training the model with noise data to forcibly remove silence noise.

However, as this can amplify actual aliasing in some samples, the speakers were separated and trained in distinct channels to prevent high-frequency interference. Despite this training method, some mirroring in the ultra-high frequency range overlaps with the actual waveform used in speech, making it impossible to completely avoid such cases through dataset adjustments. Nevertheless, since the signal remains weak, it is not expected to cause significant issues in the generated audio (though variations may occur depending on the fine-tuned model’s voice).

Additionally, in the case of KLM 4.4, all vocal data was recorded in 3-second segments.
This means there are no instances where a singer's performance is abruptly cut off in the middle of a phrase. Currently, all voice actors have been recalled to re-record all songs. This process will continue to be updated as the KLM version progresses

DataSet -
24 Mins of [conversational tone]
14 Mins of [recitative tone or a read-aloud style]
7 Mins of [agitated tone or an impassioned tone]
1 Min of [Shouting]
8 Mins of Sing

Training -
Batch Size 4 [x2 GPUS]
Total Epoch 1001
Pretrained model KLM 4.4 exp x2 MRF

Model Link - https://huggingface.co/SeoulStreamingStation/RVC_Voice_Models/resolve/main/NELL_MRF.zip?download=true

grizzled plumeBOT
#
Model Ready ✅

This model has been synced with Weights and is ready to use for free!

grizzled plumeBOT
wild summit
#

how many epochs and datasets for this pretrain?

devout vessel
boreal torrent
devout vessel
# boreal torrent for different voice tone/styles you put all of the samples in one folder or each...

For the same person, all samples are placed in a single channel. However, using different channels based on purpose can also be a viable approach. In particular, for individuals whose voices change completely depending on their emotions, separating the speakers and inferring using the divided speakers allows for expressing the target voice in different emotions. However, in most cases, even if the same person's voice is separated into different channels, all channels tend to transform in the same way.

Using a feature index value can create subtle differences, but in such cases, separating speakers may not be necessary. However, when speaker channels are divided, they cannot reference each other's pitch. This has both advantages and disadvantages. The main advantage is that the pitch ratio remains consistent. If too much high-pitched data is mixed in, the model may generate sounds higher than the original voice. If the ratio of high-pitched music to the speaker’s voice is 1:1 or if the music has a higher proportion, the general speech output may sound like a different person’s voice.

To address this, speech data can be placed in Channel 0 while song data is placed in Channel 1, allowing for the separation of speakers—where Channel 1 is used for generating cover songs, and Channel 0 is used for standard speech inference. However, if there is no vocal data in Channel 0, high-pitched speech cannot be generated from that channel.

Since a pre-trained model is intended for various unknown user applications, the dataset configuration for the master and subchannels may differ slightly from that of a standard model. We are currently conducting ongoing tests on this aspect.

boreal torrent
devout vessel
waxen storm
#

so the dataset is 54 minutes?

devout vessel
waxen storm