#KLM 4 HIFIGAN (Final Version of HIFIGAN Model)

1 messages · Page 1 of 1 (latest)

restive arrow
#

Since KLM7S

For the past year and a half—a considerable amount of time—I have been fortunate to receive immense support from the AI HUB administrators, Devs, mods, users, and many members of both the Korean and Japanese communities. Thanks to the help of various studio engineers and voice actors, I have thoroughly enjoyed immersing myself in this work.

This will be the final version of the KLM series utilizing HiFi-GAN, and moving forward, all resources will be dedicated to training with RefineGAN.

KLM-HFG closely follows the structure and training methods of the OG model but also includes high-pitched singing data. Users currently utilizing the OG pre-trained model should be able to achieve satisfying results with this version as well.

KLM-HFG employs the same training methodology as VTCK and also includes VTCK data. While most KLM series models have been trained with perfectly denoised data, KLM-HFG incorporates all noise into the model.

This pre-trained model is capable of generating vocals in nearly all languages and has been trained on the same dataset size as KLM 5.

Important Considerations Before Use
One crucial point to note is that all high-pitch data in KLM has been sourced from real human recordings. This means that if your trained model is based on a normal human voice, you will be able to infer high notes even without dedicated singing data.

However, if the character you are training has an ASMR-style voice, a rough vocal texture, or any physically unnatural sound, high-pitch inference will not be possible.

Whispering sounds are produced by adjusting air pressure using the tongue’s position and the distance between the soft palate and the mouth, rather than engaging the vocal cords. This means that whispers cannot be turned into high-pitched vocals.
Techniques such as growling in rock vocals rely on intermittently breaking up waveforms while producing sound. Since high-pitched sounds have shorter wavelengths, they cannot be generated through this vocalization method.
In short, no matter how diverse the vocal techniques and singing styles in the dataset are, physically impossible sounds cannot be learned. If the model you are attempting to create is based on SFX-generated voices, mechanical sounds, or whispers, you should be aware that high-pitch inference will not be possible.

32Khz
G-https://huggingface.co/SeoulStreamingStation/KLM49_HFG/resolve/main/G_KLM_HFG_32k.pth?download=true
D-https://huggingface.co/SeoulStreamingStation/KLM49_HFG/resolve/main/D_KLM_HFG_32k.pth?download=true

40Khz
G-https://huggingface.co/SeoulStreamingStation/KLM49_HFG/resolve/main/G_KLM_HFG_40k.pth?download=true
D-https://huggingface.co/SeoulStreamingStation/KLM49_HFG/resolve/main/D_KLM_HFG_40k.pth?download=true

48Khz
G-https://huggingface.co/SeoulStreamingStation/KLM49_HFG/resolve/main/G_KLM_HFG_48k.pth?download=true
D-https://huggingface.co/SeoulStreamingStation/KLM49_HFG/resolve/main/D_KLM_HFG_48k.pth?download=true

tame thunder
#

i'm cheering!!!

#

will def try out soon

#

btw will the model with this pretrain still reach above c6 even when the dataset for the model is just monotone speaking?

#

kind of like klm 4.2

restive arrow
# tame thunder kind of like klm 4.2

It may not be possible since it wasn’t blindly designed to reproduce high frequencies like 4.2, but if you need that level of pitch range, it's better to use the RefineGAN version. trol

tame thunder
restive arrow
tame thunder
#

got it

weak wyvern
#

@restive arrow is it suitable for older colabs/forks?

weak wyvern
frigid galleon
#

Back to this model since MRF & Refinegan removed 👍

ocean flicker
#

pretty solid pretrain, seem to handle noise better than the previous versions but looks like it can't handle random sounds to well like the original pretrain (clicks, mic bump)

in my testing it got less average loss values compared to the original pretrain

blue line = klm

quality wise the results are more cleaner than the original pretrain

#

this is my new default pretrain, honestly works good alwaysy_blue_uwu
all of the problems of pasts versions are fixed in this version, amazing for every dataset ive trained

restive arrow
cyan viper
frail hatch
#

Best new default pretrain for my indo dataset, handle breath and pronounciation better than og pretrain

cyan viper
#

i can't hear the difference on my phone 😭

#

o

cyan viper
#

now i can hear the difference

#

that's weird pepe_stare

#

ah

#

yeas