- Please read before use KLM 5*
KLM 5 Uses a Pure Human Voice Dataset and Contains a Large Amount of Vocal Data
KLM 5 uses a dataset composed of purely human voices and includes a large amount of vocal data. If your dataset generates voices mixed with electronic sounds or special effects, we cannot guarantee its quality.
KLM5+RefineGAN allows the learning of ultra-high-frequency sounds that were difficult to implement with conventional HIFIGAN. This means it can learn very sharp high-pitched sounds, such as coughing, sneezing, and female screams, and can train and reproduce all high notes up to A6.
Even if your dataset does not contain high notes, the model itself includes data from the high-frequency range, enabling it to infer high-frequency sounds even from general speech recordings.
However, such high-pitch inference must be physically possible.
In general, ASMR voices or whisper-like voices that contain a lot of breath cannot infer high-pitched sounds. Even people with very breathy voices use modal voice (chest voice) when screaming or singing high notes. If a dataset consists only of breath-heavy voices, high-pitched sound inference becomes impossible.
Additionally, tough characters that use growled voices also cannot infer high-pitched sounds. Growled voices consist of quickly disconnected waveforms, and when the pitch is raised, the waveforms become too short, making the sound unnatural or inaudible. This is why growling techniques cannot be used to sing songs spanning three to four octaves.
In summary, most high-pitched inferences are possible as long as the sounds do not violate physical principles.
Guidelines for Generating Cover Songs Using KLM5
When generating cover songs with KLM5, please consider the following points:
The dataset must be very clean.
Do not use datasets containing reverb, delay, or harmonic residues. Since KLM5 generates purely human voices intended for final production through mixing and post-processing, we recommend using raw, clean recordings that require minimal cleanup.
There may be frequency limitations depending on the model (32kHz vs. 40kHz).
KLM5 enhances high frequencies by boosting weak ultra-high-frequency waveforms. However, if the boosted waveform surpasses the sample rate limit, the sound may become quieter or disappear.
Additionally, aliasing may occur in ultra-high-frequency ranges, which is a natural phenomenon and does not require concern. (The inverted waveform generated in this process is not a mirrored signal.)
**Dataset & Train **
800+ Hours
Applio RefineGAN
FP32
BatchSize per GPU 32
Emb. Model RVMPE
Total Loops : 650
Total Steps : 4.1 M
Total Speaker : 101
Gender Ratio : M 42 F 58
Vocal Ratio : 5~8%
The song included with the accompanying MR was specifically created to test KLM, and I hold all copyrights to the song
Vocal Model - NELL xe6 (RefineGAN)
32Khz
G -
https://huggingface.co/SeoulStreamingStation/KLM5/resolve/main/G_KLM50_RFG_32Khz.pth?download=true
D -
https://huggingface.co/SeoulStreamingStation/KLM5/resolve/main/D_KLM50_RFG_32Khz.pth?download=true
40Khz
G -
https://huggingface.co/SeoulStreamingStation/KLM5/resolve/main/G_KLM50_RFG_40Khz.pth?download=true
D -
https://huggingface.co/SeoulStreamingStation/KLM5/resolve/main/D_KLM50_RFG_40Khz.pth?download=true
44.1Khz
G-
D-
48Khz
G-
D-
KLM5 is updated with the same name each time training progresses, without a separate version notation.







