KLM6 is a pretrained model utilizing the new SPIN embedder. It’s currently an experimental model designed for testing by developers.
The SPIN embedder offers powerful capabilities for non-verbal sounds, excelling at handling sounds like breathing. KLM6 is capable of learning and inferring audio such as coughing, sneezing, laughter, and ultra-high-pitched sounds, but it requires a dataset with sufficient non-verbal audio.
KLM 6 Exp V3 - L608 Last Update - 2025 June 9
Total Data - 86 Hours
Total Speakers - 626Train info
F0 ext. : Rvmpe
Opti. : AdamW
Embedder : Spin (7-12)G : 1.67M Steps
D : 1.58M Steps
Multi-scale MEL Loss functionThe Exp V3 L603 model is a fully retrained model developed using the AdamW optimizer and HiFi-GAN architecture. It utilizes the SPIN (7–12) embedder, and therefore, it is mandatory for users to use the SPIN embedder for compatibility.
You must use the Spin embedder (7-12) model for using this Pretrained Model. Otherwise, the generated output might sound like the scream of a hippo with water in its nose.
KLM 6.2 - 32Khz
G Link -
https://huggingface.co/SeoulStreamingStation/KLM6_Experimental/resolve/main/G_KLM6_Exp3_L6_32k.pth?download=true
NEW version of Spin Embedder by dr87
https://huggingface.co/dr87/spin-for-rvc/resolve/main/spin_layers_7_12.zip
I’ve already set up several different training runs in various ways, so once I recover, I’ll start sharing each model. Even if some issues aren’t fully resolved yet, the goal is to keep improving and building models that we can continue to use over time.



