#KLM 6.1 (Experimental V3 L608)

1 messages · Page 1 of 1 (latest)

chilly creek
#

KLM6 is a pretrained model utilizing the new SPIN embedder. It’s currently an experimental model designed for testing by developers.

The SPIN embedder offers powerful capabilities for non-verbal sounds, excelling at handling sounds like breathing. KLM6 is capable of learning and inferring audio such as coughing, sneezing, laughter, and ultra-high-pitched sounds, but it requires a dataset with sufficient non-verbal audio.

KLM 6 Exp V3 - L608 Last Update - 2025 June 9
Total Data - 86 Hours
Total Speakers - 626

Train info
F0 ext. : Rvmpe
Opti. : AdamW
Embedder : Spin (7-12)

G : 1.67M Steps
D : 1.58M Steps
Multi-scale MEL Loss function

The Exp V3 L603 model is a fully retrained model developed using the AdamW optimizer and HiFi-GAN architecture. It utilizes the SPIN (7–12) embedder, and therefore, it is mandatory for users to use the SPIN embedder for compatibility.

You must use the Spin embedder (7-12) model for using this Pretrained Model. Otherwise, the generated output might sound like the scream of a hippo with water in its nose.

KLM 6.2 - 32Khz
G Link -
https://huggingface.co/SeoulStreamingStation/KLM6_Experimental/resolve/main/G_KLM6_Exp3_L6_32k.pth?download=true

D Link -https://huggingface.co/SeoulStreamingStation/KLM6_Experimental/resolve/main/D_KLM6_Exp3_L6_32k.pth?download=true

NEW version of Spin Embedder by dr87
https://huggingface.co/dr87/spin-for-rvc/resolve/main/spin_layers_7_12.zip

lofty grove
#

wrong source for the embedder model

#

pick the config and pytorch model over here

chilly creek
dawn gulch
#

yay new klm model

pastel frigate
#

Note for those that'll want to use my fork for upcoming pretrains;

  1. All versions support spin as " custom embedder "

edit: 25.04.2025
2. If you want native ( ui ) spin support + auto-download of the embedder, you should get the V3.1.1 version ( and higher ~ in future ).
https://github.com/codename0og/codename-rvc-fork-3/releases/tag/v3.1.1

GitHub

Tiny update:

Added support for multiple base URLs and dynamic file downloads

Notes:

The Spin embedder can now be used thanks to new experimental " KLM 6 " ( and soon ver 6.1 ) pretrai...

keen kettle
pastel frigate
# chilly creek

Sounds amazing ngl.
Now can't wait for ranger one ( as long you still have it in consideration, that is

#

Gonna test it out later for sure

vivid elm
#

Making a model with this pretrain right now!
Just to confirm, is the Spin model used the one trained on additional LibriSpeech data by dr87?

pastel frigate
#

just Mirrored by Noobies

river coral
#

amazing❤️
now we need realtime support for spin embedder

stiff breach
#

I need to try higgs with this new embedder or Tv Off

keen kettle
#

hi seoul, have you found a way to fix this pretrain yet, or is it still being investigated?

chilly creek
keen kettle
chilly creek
#

KLM 6 & 6.1- Experimental ver2

lofty grove
#

Reminder that the new spin embedder requires an updated mute file if you're going to use them. Extract mute_spin.zip into logs, that creates a new folder mute_spin

#

After that extract features using the new spin embedder, but before you start training edit the filelist.txt in the model's folder and replace /mute/ with /mute_spin/

strange crag
lofty grove
#

the result should look like G:\Applio\logs\mute_spin\sliced_audios\mute40000.wav|G:\Applio\logs\mute_spin\extracted\mute.npy|G:\Applio\logs\mute_spin\f0\mute.wav.npy|G:\Applio\logs\mute_spin\f0_voiced\mute.wav.npy|0

strange crag
#

or are u guys still testing

lofty grove
#

testing, but I did not see any bleeding with my model using the previous spin version

chilly creek
strange crag
#

ik this is a bit stupid to ask
but
is this compatible with realtime?

lofty grove
#

not yet

strange crag
#

ok

lofty grove
#

it does not look like there's any way to use a custom embeddr, at least in the fork

strange crag
lofty grove
#

I know

strange crag
#

@chilly creek so how im supposed to use this in applio?
there's no g and d files
im confused asf lol

lofty grove
lofty grove
#

i'm running klm4.9 retraining, 8 epochs done, 12 more to go

#

then I'll test the finetuninig

#

Seoul said he needs a few days to get some epochs done with the updated spin

strange crag
#

nice
ok time to wait

lofty grove
#

technically you can get the previous pretrain and do a finetune using updated spin

#

it is close enough

strange crag
#

i suppose is this one

quaint tide
#

Is this pretrain available? I don't see the download links

strange crag
quaint tide
#

Oh ok

chilly creek
#

Experimental Model Ver2 has been uploaded. Please note that this model is still experimental and may encounter various issues.

chilly creek
pastel frigate
#

I see

lucid night
pastel frigate
#

fucking up silence, in a short
afaik, that's all there is to it

#

Unless Noobies found out more on it

lofty grove
#

not specifically silence.. but if you include silence files produced by cvec the model would learn that specific thing means silence

#

and if you include spin files as well, then it would learn that that one also means silence

#

but if you dont, then the model would not know how to deal with silence encoded by spin when you infer something

pastel frigate
#

hmm, not sure if there's point in it anyways

#

cvec on spin and the other way, sounds like gagging demon

#

so ¯_(ツ)_/¯

lofty grove
#

except you can hear a faint inference behind all that that sounds somewhat right

quaint tide
#

Is the Spin embedder worth it

quaint tide
#

Its really good it sometimes mispronounces words but does much better than contentvec in my opinion hope to see full version of this pretrain in the future

pastel frigate
#

noticed any timbre leakage?

#

for instance, input audio leaking into ur model's voice

#

or such thingies? ( perhaps you could try to crank up index to the full and test?

lofty grove
#

that's not a thing with May 1 spin

#

no leaking there

pastel frigate
#

oh, legit?

lofty grove
#

yeah, it was fine

pastel frigate
#

huh, will have to test it then

#

cause haven't yet

quaint tide
#

I also like your fork pretty good

pastel frigate
#

thx for info

quaint tide
#

Hopefully you can add an update.bat soon so you dont have to redownload it

pastel frigate
#

I could think of it ye
thx for feedback
( in any case, for more requests or such, hit me up in ai testing channel

quaint tide
#

Np I like the customization pretty cool and the stop training thing after like 3 times it stays on stop training hopefully you can fix that but overall very good fork

#

Is the Spin embedder just contentvec trained with more data that involves different sounds?

pastel frigate
#

@ dr87 perhaps could tell you more technical details if you asked
he's the main lead behind it

quaint tide
#

Thought so because it does tend to say words better too than contentvec seems to be a little more context aware

pastel frigate
#

also, I think you could actually try to get access to ai testing channel
if you're curious of what we do at given time and what we exp with
( tho important notice, it is a channel more towards dev than answering questions ~ just in case

quaint tide
#

Then I probably couldn't because I wouldn't be able to understand it but hopefully I can just use experimental stuff and give feedback

pastel frigate
quaint tide
#

Alright

quaint tide
prime pasture
#

sounds kinda like a 11labs voice

pastel frigate
#

spin one is ringy / crunchy

#

" and a garden " @vocal geyser @lofty grove @quaint tide
That kind of artifacts I meant exactly

#

@fallen yarrow And as for you, wait for answers here.

lofty grove
#

I think the v2 pretrain uses spin from may 1st, so 7-12 layer

pastel frigate
#

Alright, thank you

quaint tide
pastel frigate
#

wanna share us a copy?

lofty grove
#

well, it was not a navy seal copypasta

quaint tide
pastel frigate
#

oh hmmm...

#

well, then Imma do some tests on my own later

#

to confirm whether it still exists or not

quaint tide
#

Hopefully it was a dataset thing

lofty grove
#

it is good to check prononciation

lofty grove
#

ty

fallen yarrow
#

so idk if i did something wrong but for me isnt working great, i trained many models with the new emebedder but i cand make it work i use 54min of dataset both models are 24k steps and for me contentvec still better, in inference and realtime voice changer i tried using noobies spin and also dr87 spin layers 7 12, i did replace the mutefile im using the codename fork , also i tried without replacing it but it sounds weird on the spin

#

btw i used klm 4 pretrain on contentvec and klm6exp2 on the spin

peak jetty
chilly creek
#

KLM 6 & 6.1- Experimental ver3

chilly creek
#

Experimental V3 grad_Avg_norm G/D

#

Loss_Value

pastel frigate
#

Discriminator's loss shouldn't go that way

#

should be closer to adversarial than 1 sided
( careful of the region past around ~1m steps, those yellow zones are quite fishy to me

#

Also, Seoul, due to how Noobies' logging works, you have to be testing ur " estimated " good point + surrounding regions, +/- 1-5 epochs each side

#

think my avg ( at least for g total ) should be included as a more direct logging measure

chilly creek
#

Try fine tuning it with your dataset. 🙂

pastel frigate
#

badly picked G/D can result in suboptimal finetunes afterall

#

but this should be a common sense, just as careful picking of the epoch, right.

#

🙂

#

esp when you consider gradient hygiene
( sure, it's bigvgan but you get the idea. such things are important. )

And I might have some insights on klm 4.9, but that only when I'm done with tests.
( I might have discovered something by accident. More on that when I'm done with the next 20-30 models )

chilly creek
# pastel frigate Yeah I will when done with tests but it nevertheless is still something that one...

Hmm… yeah, that's true. maybe I’ve just been too hasty.
well, to be honest, at this point, I don't think there's a clear way to prevent divergence when multiple speakers and a significant amount of non verbal data are involved.
even after adjusting the dataset ratio multiple times, G, D, or both eventually end up diverging.
for now, I’ve rolled back to exp2 and will wait a bit until a better solution comes up.
(I think I need to test RefineGAN a bit more.)

#

KLM 6 & 6.1- Experimental ver2

pastel frigate
#

all I say is that being careful and scrupulous when monitoring and picking ckpts is quite important imho

#

So, all in all, let users test it and provide feedback
( Like I said, I will too but I have tons of finetuning tests to do rn for some other experiment I do

pastel frigate
#

Pretty much, if you do have " more or less " predetermined how many epochs or steps on avg it takes you to get good or perfect ( by your ears standards ) results
you can try ranger21 ( stock pure 21 )
why?
Because it uses AdamW

#

In fact, you can turn literally everything off and you'll have AdamW + automatic warmup calculated for you, based on amount of epochs you set

#

Or perhaps using my fork with AdamW is also an option, as it has a warmup built in but you have to determine it on your own ( most sources recommend the warmup duration as: 10% of your total epochs

#

( Ye it is updated now and:

#

easy n quick switch

chilly creek
pastel frigate
#

the fork or the warmup?

#

If you're gonna do from scratch and use adamw, yes

#

and I suppose, these gonna be helpful for you too

#

Also, if you're willing to wait for a little, I can make a personalized fork for you

#

gonna replace my " avg every 5th epoch " part of logging with Noobies' every 50 or was it 20 steps

#

Then you can have best of 2 worlds

#

avg per epoch ( for unbiased G/D choice ) and avg of noobies for long term

chilly creek
pastel frigate
#

a

chilly creek
#

lol

pastel frigate
#

so ye, lemme know if you want the custom

#

or use what it as of yet

#

( and if you gonna use the repo, clone it, the zip in releases has a bug

chilly creek
#

gottcha

pastel frigate
#
  • change the decay
#

it is by default 0.995, but has commented default ( applio and mainline use default

#

Alr. In case of issues, lemme know

chilly creek
#

yup~!

pastel frigate
#

also @chilly creek

#

do you want me to quickly patch the fork with custom reference support?

#

I enabled it back

#

you can probs have something up to 20-30 secs ( as noobies says

chilly creek
#

sure thanks

pastel frigate
#

Alright, then wait a bit, imma update the thing and let you know

chilly creek
#

kk

pastel frigate
#

ye, will take uh, around 30-40 mins
gotta finish the run I have rn

#

gonna test the thing before give it to you yt_nails

#

less issues.. the better lol

chilly creek
#

Take your time, no need to rush. 🙂 let me know when it's done.

pastel frigate
#

gotcha

#

Hmmm.. actually you know wut @chilly creek
I'll just prepare a release just for you, with pre-configured infer-sample in tensor
so if you can ( or want ), send me some 10 seconds of singing and 10 seconds of speech ( unless you wanna just one type

chilly creek
#

10 secs? that's it?

#

I'll DM you

pastel frigate
#

or just 1 20sec clip

#

depends on if you want both speech and singing or just one, ye

#

In any case, total limit is just 1 clip of 20 sec duration, ideally

lofty grove
#

@chilly creek or you can use applio experimental branch 😛

proven perch
#

does the Trained Model work with W-Okada?? 🤡

chilly creek
chilly creek
#

KLM 6.2

#

KLM 6.1 (Experimental V3 L608)

lofty grove
#

@chilly creek regular spin 7-12 or wavlm spin 7-12?

chilly creek
#

oh wait, what happened to our name? it's shinny :-0

keen kettle
#

should i use the latest versoin of codename fork to use this pretrain? i'm using applio mainline rn

keen kettle
river coral
#

wonder if vonovox support this klm pretrain models

strange crag
proven perch
cerulean creek
#

Whats vonovox?

proven perch
cerulean creek
#

Is it recommended to remove all the silent parts from your dataset?

sterile minnow