#what are the main reasons why the model goes wrong?
1 messages · Page 1 of 1 (latest)
This is def an overprocessed dataset
alot of the silences are where you get the robotic noise too
Can I ask for some samples from your dataset?
You should be targetting the most unprocessed, natural voice possible. It sounds to me like you heavily overproccessed with plugins, but i cant be sure
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
why sounds so gibberish?
make sure you're using the same embedder as what the model was trained with
it's not always as the default (contentvec)
One message removed from a suspended account.
One message removed from a suspended account.
uh well
I haven't seen anything with jp hubert good enough
One message removed from a suspended account.
the old pretrain using it isn't proven to be that good
if you were training the model, I'd recommend KLM 4.9 with ofc default contentvec
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
this sounds great
Are you including silence in your dataset? That's what I would recommend avoiding first. RVC gets really pissy with silence
with your completed dataset, try these steps. This is "audio labelling" basically splitting every phrase into chunks, to remove silence, and also to help RVC not cut off phrases. Try retraining your set with audio exported like this
be careful with denoise, they can be damaging, but what i hear from those clips sounds pretty good, nothing wrong with those
i like to think silence like this is what's causing it to fuck
this makes a few hundred audio files btw so make sure to export into a folder lol
One message removed from a suspended account.
One message removed from a suspended account.
I ignore everything in the advanced section for preprocess
One message removed from a suspended account.
your other settings are okay. I use contentvec for english, idk how jp-hubert does, never touched it
yea silence confuses rvc or something. maybe its the pitch detection it confuses. but yeah, the audio labelling method I linked should resolve that, i highly recommend using that process
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
o
One message removed from a suspended account.
i highly highly suggest trying this and see if it improves your model
im curious too
One message removed from a suspended account.
oh and in your export menu, use these
One message removed from a suspended account.
yea being dynamic in the dataset helps it learn how to act
especailly for realtime. you never know
One message removed from a suspended account.
One message removed from a suspended account.
you should try to match your dataset's sample rate to the model. So if you have 44.1k data, id use 40k training
One message removed from a suspended account.
if you were to select 48k it kinda just makes shit up
any youtube/UVR dataset should prob be 32k
idk what exactly what youtube shits out. also verify its actually 44.1k, with a spectogram
One message removed from a suspended account.
One message removed from a suspended account.
if its all mismatched, id just resample to the lowest i have and move on. really, the different sample rates dont sound different in reality unless youre being a nerd
just make sure it doesnt make up shit in the high end
One message removed from a suspended account.
One message removed from a suspended account.
lmk cus im curious if it makes big improvement
One message removed from a suspended account.
i mean, yes to a point i guess
dont have extremely enraged screaming in the dataset
but its good to have excited tone, sad tone, ect
yea its good to be sorta selective and target one style of voice. but if youre using realtime, youll be forcing the model into alot of situations
One message removed from a suspended account.
what are you planning to use it for btw? realtime or inferring files with applio?
One message removed from a suspended account.
if youre concerned about getting good emotion, you can train separate models to target specific emotions. but that's not very good in realtime, cant switch models quick
RVC is pretty good at being dynamic tho, dont overthink. Some of the louder clips you shared are good to cut tho
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
which source are the samples? it's fine if openly available
One message removed from a suspended account.
One message removed from a suspended account.
I don't think you should share that
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
to avoid legal/copyright issues, you shouldn't share it
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
I haven't heard any decent voice models as loras
One message removed from a suspended account.
One message removed from a suspended account.
that seems off-topic, we're talking about the voice model I suppose
One message removed from a suspended account.
One message removed from a suspended account.
it's just apples and oranges lol
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
if no one manages to implement it, it's nothing more than hallucinations
oh yea copyright bad dont post that lol
One message removed from a suspended account.
The question is if the LLM knows wtf an RVC is
One message removed from a suspended account.
One message removed from a suspended account.
There's enough guides and stuff that I'm sure new LLMs can explain RVC alright. Depends if they scraped aihub.wtf or rvc githubs lol
One message removed from a suspended account.
One message removed from a suspended account.
how about the VAs? they have right to consent on it or not
btw I once had some voice from a vn by Aniplex exe
not sure about it but I never share but the trained voice model
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
i dont trust gpt
take your current dataset, and just do the audio labelling. Your current set sounds mostly good, and also im curious on the improvement audio labelling would make
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
yea absolutely not
One message removed from a suspended account.
that llm is thinking in the context of other ML stuff
drag drop all your samples into audacity, then under the Tracks tab, "align end to end". Then click Mix & Render to make it all one track, then you can apply the labels with the video
then
One message removed from a suspended account.
One message removed from a suspended account.
idk if mixing into one track is required but makes it much easier to see
One message removed from a suspended account.
it'll be one track, with a label track below it. You will use these #1372762613389201469 message settings to export all the labels into a ton of files
like, youll get 200 5 second .flacs kinda thing
the labels are targetting anything with -42db or higher, essentially truncating the audio. But we don't want to actually truncate, this method preserves the natural beginnings and ends of phrases
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
40 mins is nearing the limit of dataset size. Most people dont go over 1hr for sure
always flac
saves space and same quality
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
You could try that with your next dataset iteration. I'd just label what you already have and see how that goes before anything else tho
One message removed from a suspended account.
ye
if you dont like how that turns out, you can try experimenting with dataset changes and stuff
btw, youre using tensorboard to see your training progress right?
One message removed from a suspended account.
One message removed from a suspended account.
yea just dont cook the model
One message removed from a suspended account.
set it to maximum epoch and just train it until its definately overtraining. It saves every 50 epoch (or whatever you set), so you can always grab older weights
dont overthink tensorboard too much, people get lost in the sauce. The 4 graphs pinned by default in Applio's tensorboard are sorta the most important
One message removed from a suspended account.
ngl i dont even have applio on this pc so i cant check lmao
One message removed from a suspended account.
One message removed from a suspended account.
if you notice weird behavior, maybe ask about it. this is an example of mode collapse which usually isnt good
One message removed from a suspended account.
idk what 50_avg is, that might be some new tech idk
your G/D totals are most important.
btw, GPT can help you understand alot about tensorboard, if gpt is your flavor. This is a tool used widely by ML nerds
One message removed from a suspended account.
One message removed from a suspended account.
you should see the avg loss instead of normal one which logs the last batch in each epoch
the sharp dips mean it logs when learning mute files in the last batch
it's kinda misleading to call it "mode collapse"
yea clean line is good
just trying to explain to look out for sus shit
usually mode collapse
try googling about mode collapse in general
it usually means a condition where either G or D loss collapses to near zero, hindering the model improvement
we're not trying to go into deep specifics about ML here. I'm just trying to explain things in a friendly understandable way, like i would have wanted when i first was learning sovits
I've seen a scenario in rvc refinegan, where the d/loss keeps going down below 1, instead of normally going up then down slowly
which means no enough improvement made to the finetuned model
this isnt refinegan tho
and thats an edge case
im just trying to help them understand, dont want to overload with a bunch of jargon and information. I appreciate you sharing the experience but i dont think its very related here
the normal mainline rvc and applio shouldn't have mode collapse issue
theres years of chats here they can research if they wanna get deeper. but for now, we're focusing on making sure the audio is preprocessed okay
afaik I have discussed it before with codename & noobies 
One message removed from a suspended account.
tldr ask if you think the graph looks fucky
One message removed from a suspended account.
usually its ok but sometimes it fucks
One message removed from a suspended account.
the graphs alone may not be enough, you'd need to test the model as well
sometimes it could have unexpected results
that's what our fellow staff engineers have been figuring it out
One message removed from a suspended account.
if it's nothing but static noise, you have prob tried to train without pretrain
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
oh
Man
Uhh.. I've already made the best Kurisu you can get so ( imho )
Demo if you're interested:
https://www.youtube.com/watch?v=YVh1o_glnYA
¬ Yo ~ I'm back... for now at least, lol.
I feel like I've put my whole self into this one so, hopefully you like it.
If you have any requests, lemme know!
Also, big announcement; I've been working on Violet model ( from Violet Evergarden ) so far 3-4 episodes isolated but I am doing the 4th samples rework, was too aggressive with de-reverbin...
It's a private model of mine and the first one I ever made, so I treat her like my dearest and most precious gem she is, but in the same time..
I can understand the pain with samples you went through.
( uh.. I've spent like 2-3 months perfecting her and even the dataset alone is a legit hell lol. )
If you want, I can share the model with ya ~
Orrrrr you can keep on training your own model yeah, no issues with that,
but from experience I can already tell you she's really hard to train;
Vorbis compression and it's lack of consistency is one thing,
but finding the right hyperparms is another.
few tips;
- carefully inspect samples you cleaned / concatenated ( perhaps ? )
some have different frequency range response, some not.
( some are ' phone calls ', some are the assistant kurisu, so filtered / with effects. ) - if you got around 8-12 mins of audio ( +/- due to silence trimming ) then you're good,
but f it's more than that then either:
A) you used both games ( and just one is right, don't remember which had better audio. )
B) you included too many of ' bad samples '.
In any case. if you wanted the model, just write me a dm.
Gonna leave this one in here too if you wanna hear some raw recording
( I'm heading to sleep right now and will be on.. ig,in 6-8 hours if anything. )
ps. I Read a bit more in the chat and " mode collapses " aren't true mode collapses in this case, those are just places where silence or mutes were dominating / encountered.
ps.2. If you use rvc, refrain and use applio or even better if you went with my fork ( og rvc's logging is totally skewed and should not be trusted. )
One message removed from a suspended account.
oh yeah
One message removed from a suspended account.
yeah.. well, at some point I thought that perhaps I should make it priv cause back then people rarely credited
oof
might make an exception for you
One message removed from a suspended account.
One message removed from a suspended account.
Actually, in that case try to make a good one on your own first and if it turned out too hard or with issues, I can share
kinda don't wanna downplay your work
well yeah
took me 3 months to train, more or less
One message removed from a suspended account.
but then, at the same time, I was also learning rvc
actually, you can use it no issue, just don't play with extra settings until you're ready
that's all there is
but now, it has a crucial update and even better logs
the adversarial loss for g was missing
and it was just total G
- most crucial part here is the way of logging
mine's per epoch, applio's every 25 steps
which can be biased for some models and isn't " choosing epoch easily " friendly
( sorry for formatting btw. on phone rn
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
True
yeah, need too as well. woke up not so long ago. one sec ~
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
must be the choice of the steins gate
One message removed from a suspended account.
I mean, you can hear from samples it's kurisu
- I recognize the naming
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
Actually, you can hear a lot of Kurisu in Asami if you used to watch some of her older streams but that aside
now, as for samples, from what I remember, there's quite a lot of inconsistency so I myself personally, decided to only use and trust steins;gate 0's samples
One message removed from a suspended account.
One message removed from a suspended account.
the " static noise generator voice model " is something you can get in both applio and fork, it is not reserved to my thing but it depends on the conditions
One message removed from a suspended account.
as for " the fork's for advanced people " it's a disclaimer / safety-check for myself so when some.. less caring newbies ( who can't bother to spend 5-10 mins reading ) " experiment " with switches or things they have no clue, I won't have to explain everything like they're 5
One message removed from a suspended account.
If you need more specific info, my set's composed out of 0's files ( excluding samples that don't match, that is. ), CD drama
around 12 mins
as for denoising, well.. noise-profiling is a delicate thing and needs time
you have to obtain all possible safe traces / zones one by one, bit by bit
and get that " full spectrum " noise
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
That part entirely depends on your preferences
if a model is full of ' very sharp sibilants ', it's just that the model gonna inherit it
( just like mine did )
but I quite like it and have no issues eq'ing if needed
( but then again, that's preferences. )
Anyhow, an important catch is, consistency
All of the sources you have most likely use different recording chamber
different mic
and so on
One message removed from a suspended account.
so compression method and much more gonna vary, even proximity from mic has a lot of influence on the color
that's why, quality is better over quantity
( consistency.. consistency.. and consistency. rvc really hates lack of it )
One message removed from a suspended account.
One message removed from a suspended account.
cause a lot of things in docs and so on, are subjective interpretations
and afaik ( can be wrong on that. ) not everything is 1:1 with technical terms
then that makes sense, but I personally like 0 so, that's why
One message removed from a suspended account.
docs are a bit outdated
One message removed from a suspended account.
One message removed from a suspended account.
that's more or less how I have my set
but yeah, what's been compressed, stays that way
cannot be avoided
yuh
In any case, there's no right or wrong mh mh
just one's methods
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
Well, yeah
confuses rvc?
there is no such a thing
it all gets mixed up and suffled in the first place ( in the data loader or train loader
The reason you want to concatenate and silence-truncate, is because you want to yeet the silence completely
and then, get even and full tip-top samples
right
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
mmm.. so, what exactly do you want to know
One message removed from a suspended account.
other than " don't mix up too many sources ( divergent sources, that is. ) there's no wrong way to approach it
One message removed from a suspended account.
One message removed from a suspended account.
so let's sum up some facts
what batch size you tried
and your current set's length
and whether you train from-scratch or not
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
if you want accurate logs and access to all cutting-edge features, you should
One message removed from a suspended account.
if not then you're good
because that's the way it is, one cannot use differnt vocoders without a pretrained model like that
yet I gotta say, by default it has all the things set the same way applio does
one sec
One message removed from a suspended account.
gonna show you the ui
it is / was the same in applio
just Noobies or someone else turned it off / yeeted it from the ui
so others stop using refine or whatever and then cry of it not working ( because of not reading carefully
One message removed from a suspended account.
One message removed from a suspended account.
Nah, that's fine, it is about those:
A/B/A/B format;
" I wanna use refine gan "
" There is no pretrains for it tho "
" where do I get them ? "
" .... "
One message removed from a suspended account.
One message removed from a suspended account.
( + I believe my translations / explanations are more user-friendly or correct in actual terms, if you asked me
One message removed from a suspended account.
tl;dr, the only thing that is different from applio is " double-update strategy "
but you can turn it off and get an exact 1:1 behavior as observed in applio, yeah
( because all the start / default settings are just as in applio
One message removed from a suspended account.
the thing with from-scratch is
you need either A) really really lots of quality and diverge audio ( if single speaker )
or B) A lot of speakers ( if you intend to fine-tune it on imperfect or limited sets (( like Kurisu's
One message removed from a suspended account.
I am afraid that she just won't click that well with "specialized-model" approach
Butttt, you can still try
you should use a batch size of 16 and if that gave you bad results.. perhaps 8 ( and only if all failed, 4 )
One message removed from a suspended account.
for instance, VCTK dataset based pretrains ( original ones )
used 4 gpus * batch_size 4
so, the global batch becomes 16
so that's your reference point
One message removed from a suspended account.
as in, multi-gpu setup
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
🤨
One message removed from a suspended account.
One message removed from a suspended account.
from scratch = no pretrains used
One message removed from a suspended account.
One message removed from a suspended account.
well then, in that case
One message removed from a suspended account.
batch size 8 to 12, perhaps 14 or 16
( for a set of around 9-13, maybe 15 mins )
is what worked for me
but you have hours of data
One message removed from a suspended account.
and hours of data for finetuning is an overkill
I'd probs cap it at around 25 to 35 minutes
and then attempt either batch size 4, 8 or 16
.. but given she's quite emotional / not monotone, I'd perhaps be closer to picking anywhere from 8 to 12 for batch, maybe 16 if other attempts failed
that's just how it is, it is not something you can predict or calculate
teste tests and tests 😛
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
that's generally true
I myself cap the max within 20-35 region, but I believe up to 1 hour could work too
more than that and you might be getting diminishing returns
Yeah it should be all good as long you cap it, max 1 hour or 35-40 mins, try various batch sizes and are consistent with the dataset
( you can always make few control-sets, each of different length and see what works for you
One message removed from a suspended account.
oh, I mean yeah
because refinegan has no pretrain
Unless you go and find one, that's what's you gonna get " the static noise "
because you're effectively starting the training from absolute 0, no knowledge in generator how to reconstruct the audio and no knowledge in discriminator how to judge and spot
btw
One message removed from a suspended account.
yes, in that case, it behaves just like:
because there is no " override "
to switch the vocoder ( which you'd tick )
that's all the secret really
One message removed from a suspended account.
One message removed from a suspended account.
yeah I get that part, but in the same time, LLM and transformers are different from neural vocoders and so, hifigan like that
what you don't know?
can explain if you want
One message removed from a suspended account.
well, there's nothing new in here, " terms " wise
unless you encountered something new to you?
if so, lemme know and I'll explain it the best I can
One message removed from a suspended account.
but yea, dw too much about fork / applio dilemma
just go with what you find more fitting for your goals
One message removed from a suspended account.
oh yeah, this convo made me realize one thing
I should mention in the ui that " non-default vocoders " need pretrains explicitly
wdym?
oh, you wanna start again ye? with the model
One message removed from a suspended account.
One message removed from a suspended account.

One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
yeah sure, we can try to think of some workflow
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
alrrr
btw gonna be back in 10-15 mins
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
I heard you want the project for nsfw purpose
One message removed from a suspended account.
One message removed from a suspended account.
And I have to make it clear right away, if that's the case I can't and won't support it ( nsfw models in general too, don't get me wrong on that ~
One message removed from a suspended account.
Alr, then we all good
One message removed from a suspended account.
One message removed from a suspended account.
You can continue
One message removed from a suspended account.
One message removed from a suspended account.
yeah, in that case hmm...
I think it can be quite hard
yet, it is not impossible
You'll need KLM pretrains
Actually, I believe I used to want her asmr too
but didn't work too well sadly ( Back then we had no klm and such, obv
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
but in general view, groaning and " arousing " sounds are nsfw
But I get it if you need waifuu chems, let's put it like that.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
I mean, I have no issues with what you wrote specifically
just simply claryfing that generally having " kisses " and " groans " in the same chain, is quite nsfw
But still, you'll need KLM or such specialized pretrains
og pretrains can't handle such whispery and ' misc ' content like that well tbf
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
well I only said that because most of " groan, kisses " like asmr quite often end up having " ear licking, moaning, ' yosh yosh good boi ' " kind of content, so it's usually safe to assume nsfw.
anyways, guess our opinion just differs, that's fine.
Now, like I said, you're going to need klm pretrains or such
I believe there's no other than KLM that could potentially support such ' extras' that well
Have you heard of those klm ones?
One message removed from a suspended account.
One message removed from a suspended account.
Yup so, as of right now? I'd recommend klm 4.9
( at least until we get ' new gen ' pretrains that use new embedders n such. )
you dunno how they work in the applio or in general?
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
oh, not quite
Seoul is taking care of all the klm pretrained models, those are his
now, what I will intend to do is just an experiment on " somewhat cleaned " vctk dataset
And with that ^ gonna test the new embedder
rvc wont be able to do this
One message removed from a suspended account.
klm is quite potent in a lot of things but yeah, non-og pretrains is def a must
and I also hope for the new embedder to be helpful in lots of things
Tbf, it is a hit or miss
but doing what I recommended gonna increase the chances of a success
so as always in ml, you should give it a try
One message removed from a suspended account.
One message removed from a suspended account.
well, a good model is always a big W
One message removed from a suspended account.
+, if you do have a good set, there's always hope in future
yeah, I get you
but generally, if it's such " vocalized " stuff like ' groans ' and such
I am fairly sure she could somewhat be fine
because I know she does make certain type of sounds / vocal frying, at times.. ( at least in sg0 n dramas, don't remember much details about the rest
so again, you should def try and hope for the best
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
you tell me
some time-lines are crazy
as hell
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
Oh yea, these
nah, she's fine with such
I promise you ( or well, used to be for me during rt voice changer performance testing iirc
so you should be fine, mostly
One message removed from a suspended account.
I'd say, as long they're not dominating the dataset, yes, it wouldn't hurt
One message removed from a suspended account.
key idea here is that they should occur " naturally "
One message removed from a suspended account.
One message removed from a suspended account.
tl;dr, if they occur naturally in a sentence, just let them be
One message removed from a suspended account.
and if some are " loose "
also good
as in, there was :
okame: blahblah
kurisu: groan
okabe: blahblah
kurisu: groan
so you'd have: groan, groan in ur set, after concatenation ( voice-lines chronologically wise
that is fine too
ps. Always export as 32 bit float
One message removed from a suspended account.
One message removed from a suspended account.
only time you don't really have to is if you don't touch the volume / dynamics
One message removed from a suspended account.
One message removed from a suspended account.
ps. Spectral de-noise is your best friend
One message removed from a suspended account.
avoid AI denoisers by all means
( they can damage stuff, esp her samples as they are compressed to begin with
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
yup, seems like I do
just not sure which one are those
( I had one for dramas, one for vn
so, gonna send these and you can try it out, perhaps
( also got one for that nasty line in upper spectrum
" fking line "
lol
One message removed from a suspended account.
One message removed from a suspended account.
You know wut, you kinda motivated me to try and make her V2 variant
One message removed from a suspended account.

Kurisu arc 2025, soon ™️
ye, these will do too
btw, if you still have sources for your things, perhaps you could share em in dm ( also, dm cause discord is touchy
I believe I lost all my .txt from back then when switching drives a while ago
and it'd be helpful lol
One message removed from a suspended account.
One message removed from a suspended account.
I might have heard it but won't give my hand for it
so quite likely I haven't found that one, or missed
One message removed from a suspended account.
oh, then that's probs why
One message removed from a suspended account.
One message removed from a suspended account.
low quality
One message removed from a suspended account.
One message removed from a suspended account.
lq, hq
One message removed from a suspended account.
tho ye, how's samples from the app?
48khz or 44.1?
or 32/36khz ~ ( some devs decide to go this route for whatever the heck reason
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
but then, I'd have to compare the spectrums later
( The " distribution " of frequencies, on avg - cause if there's an inconsistency like that in a dataset... well rip the model's performance )
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
Audio generally is complex
I'd even go as far as saying, it is way more complex than tokens or semantics ( even tho I know shit about llm's from technical standpoint
cause for instance, the problem of phase reconstruction in ML is still a huge issue and not perfect
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
Oh, well..
I mean, aside of music making, I have experience in utau and vocaloid
One message removed from a suspended account.
One message removed from a suspended account.
ml / ai / hifigan came later
( And Kurisu was my test-subject )
I wanted to make an " amadeus " assistant
first goal was to make a model.. then I got to know about rvc
One message removed from a suspended account.
lol
One message removed from a suspended account.
But then.. the work and perfecting her too so much time I burned out at the very end
and lost the passion for further work ( learning llm n shit
but I might attempt it one day
One message removed from a suspended account.
and decided to fully commit to rvc / audio ml instead
and here we are, yea
One message removed from a suspended account.
One message removed from a suspended account.
Anyhow, what's your plan now?
One message removed from a suspended account.
oh, nahh, I can do just fine with the sources ( I like that part of the work, own commitment
cause if you need further help or something, I'm here
but other than that, I gotta go back to the work.. got some updates n fixes to do
One message removed from a suspended account.
One message removed from a suspended account.
One message removed from a suspended account.
yeah, I assumed so too
oh, by sources I meant uhh
websites you used, torrents etc etc, whatever you have
( hence mentioned dm, as discord's very itchy about such
One message removed from a suspended account.
mh mh ✨
One message removed from a suspended account.
😭 this is codename's grail model I didn't even know
Trust Cody he might know what hes talking abt lmfao
Idk wym by cut in. You're the actual RVC pro so feel free to chime in of course
Esp since you've handled this dataset yourself lol
welp
This gonna be one of the longest conversating threads I've ever seen on Discord.
namariiiiii
been a while ~