#π§¬βai-chat
1 messages Β· Page 358 of 1
where i can find the program to use it?
elaborate:
- your pc gpu
- which program? there's A LOT of AI programs, in various fields, and various types depending also on your pc gpu
- what do you want to do specifically
okay for rpc model which is best way to train a model in hebrew? weights doesn't support hebrew from what i see so i don't know if i should train using it
i have about 14m long of audio i want to train of a voice in hebrew
ehh RVC is Speech To Speech natively rather than Text To Speech, you could give it a try to train an hebrew dataset via our docs https://docs.aihub.gg/ but i'm not sure how good it will be
Last update: Oct 21, 2024
okay so how do i train say voice cloning for text to speech? do you know possibly i been trying for few months to find solution i am getting tired
i tried even maybe paying someone on fiverr i couldn't find anyone knowledge able
Staff Applications Open
We're looking for dedicated team members to help grow and manage our community! If you're passionate about AI and want to contribute, apply now!
Click here to apply!
trying Ilaria RVC but i can't use tts with the uploaded model damn
to train RVC models, it's all written in the docs
about TTS, you could check the TTS index, you could maybe try XTTS2, fish speech or f5 tts
I wouldn't expect those to support it much tho
it's not a famous language as english
fish speech i am on their site but they don't let me upload the audio i uploaded sounds of the voice and says limit 32mb but none of the files passed it
if you got a good pc you can do it locally
unfortunately non nvidia gpus kinda suck on AI support bc of CUDA
your best bet would be to either check if they have their own amd guides, or patch it yourself with Zluda on windows
really gotta get a tts with cloned voice or something in hebrew also appreciate your help brother
should of bought an rtx ffs
nothing on the cloud that can do it then?
AMD is deffo cheaper and good price to performance for gaming, but the support for AI is not as widely common and good as nvidia's one, Zluda is basically an emulator for cuda on amd
and there's also rocm + linux but idk much about those since I don't have AMD
nvm, I just checked fish speech github repo
lemme check other ones too
gpt so vits doesn't support it either
use applio with zluda, or more ideally with rocm under linux (radeon cards work better on linux distros tho)
F5 doesn't support it either
@next shard Edge TTS supports it but it's only 2 models and you can't make custom models and runs only on cloud
50 series cards are disastrous, 40 series ones are better
faiss can do 1100 languages, but not expressive
XTTS2 doesn't support hebrew either
Zonos doesn't support it either
Kokoro TTS neither
this sounds like robots
thats the issue
i am trying to clone a voice that speaks the language natively
then i will be able to do tts for videos with at least some emotions and native speaking i know wont be perfect but doesn't have to be
Seems impossible to find anything in hebrew, even more expressively
do you prefer the native language support to the quality?
OpenVoice2 doesn't support it either
the only one that was decent almost which i was about to pay 99$ to try their higher quality one is play.ht
it did a good job it was fluent and all
MeloTTS & PiperTTs don't support it either
welp
seems like there isn't a much better alternative
almost no tts even supports that language
do we know if they are using a public open source ai to make the voice cloning?
and just charging for it maybe
maybe 11labs supports it better? I don't pay for it tho so I can't tell you
nah its not good
yes i know i need something that can do voice clone i don't think play.ht supports it as well i tested voice clone and it worked somehow
I really can't know
i even used their english version
from their site it looks like their own closed source AI, but I can't know this
I can't find any other tts that does hebrew nor in an expressive way
okay thank you nick so my only option is this rn unless i find something else
so i can't do anything with any opensource stuff with running locally or cloudly

the open source ones I know so far are english & chinese
I checked 11 different TTS and couldn't find anything close to what you seem to describe so yep unfortunately
can i pay someone to make me a private one π
thanks for the help nick u were useful
is this an option that possible? or its really something difficult i am not sure how this stuff works and who makes them
welp, you would need to find a team of AI engineers, wait a lot of time for them to even make an architecture and find a massive amount of data to train the model and do it on a good architecture, and prob would cost alot of time and money
I'm no such expert ofcourse, but I'm just telling ya that it's prob not going to be super easy to make it
It could maybe be added to existent AIs, but that's still going to take a lot of time finding the big amount of data and train it
okay thank you then thats also not possible i guess gotta hope for a big team to add support
im pretty sure it could cost more than millions for getting and processing that enormous amount of data
10k hours of audio books to train GPT/LLM for new language.. can be as high as 100k
I think some TTS used recording from european parlament
anyway, something with audio and matching text
i saw a ai lebron video on tiktok and i wanted to know if anyone know what ai was used in the video
video link:https://www.tiktok.com/@mrmysticalmarvels/video/7470624516411051306
DIlly ding, dilly dong! A new RegalHyperus drum model just released!
Monsters, Inc. (Drum model no. 583)
Okay, I've been gone for a few months
Where's all this img to video stuff coming from
how do I get started on this crazy video train
can i get saxxy award?
not related to this sever but i have a 1 hour 20 min documentary and i need the entire thing transcripted but i cant find anything free online that will do the job. anyone able to help need it asap
whisper3 for ASR
not really good, but free
is there any online modules?
ok thanks
its not a yt vid
yo whats the app called for the ai changer
Why not go for gpt-sovits then
what about it
Recently got into it as rvc + tts workflow isn't really sufficient
You want tts and voice cloning afterall
In that case, experimental zonos or either gpt-sovits is your best bet
However, as of now, zonos is at v0.1 stage and only supports zero-shot
right i need to clone a voice i got of someone who is speaking good the language but of course need a model that is already trained with more words and stuff so it can speak fluently also when doing tts and the voice i clone it also helps to use tts like they are speaking same emotion speaking i guess that way its good
does gpt-sovits support amd?
Well, it really depends on what you're looking for, whether it's v2v or tts
As of amd.. I ain't sure but supposedly it does support rocm, but ye
I believe without linux it'd be a no go
Either way, gpt-sovits has 2 components, sovits for voice handling and gpt part for phonetic / lingual recognition + understanding ish of emotions, not the best description of it but you get the idea
It learns the patterns of speech and according to what you write there, it tries to match it with style and emotions it learned from speaker, in a way
Hmm.. Do you know akame ga kill? @next shard
Staff Applications Open
We're looking for dedicated team members to help grow and manage our community! If you're passionate about AI and want to contribute, apply now!
Click here to apply!
tts i need for ads
a
Well, then voice cloning + tts is what you need
tts is the issue cause hard to find anything for hebrew
yessir
ah, hebrew
well, don't take it as offense
but I believe it'll be very hard to find something specialized in such rare ( I'd say ) languages
thats the best so far i ever found and all from grok3 thanks elon musk for this tool that found me that
It's mostly those more recognized ones such as eng, jp, korean, ch, russian etc ( at least in ml I suppose, in such fields
yep i agree its not easy although play.ht actually is so close to being perfect
In that case, if it works decent enough, you should stick to it as it's currently, most likely, your best bet
But if you're aiming for 100% spoofing that ai isn't ai
that won't do
waiting for my high quality voice on there to get ready its been cloning a voice i paid 100$ for 1 high quality voice clone its been hours still waiting hopefully it pays off
thank you
whell someone will search hebrew here and see this will thank me for sure lmao

bruh
commission in #1191429836321849435 for better chance of response and quality
To request someone to do a voice model for you, you can make a post in #1159289738314919936, or make one by yourself.
Please don't send a YouTube link here.
anyone knows where to find some generic voice for animation? most voice model i see was from known characters or celebrity. might get some issue when try to use it for my animation. any idea?
Could anyone help me? Ive made AIs before using Google Colab in 2023. Now the Google Colab method i used is gone. How can i make AIs of Ariana Grande singing a song, or just AI over someone speaking?
RVC? Try using Kaggle applio
What's your PC GPU first?
is there any online modules?
Online module of which?
a
Quickly came to hop in cause im hoping omeone could recognize this ai tts im trying to look for please dm me if youre good at that i have a voice clip and everythin
well.. never hurts to use punctuation marks
Anyway, #1159289738314919936 or #1159289738314919936 is what you wanna be looking for, instead of DM invitations
Your chances gonna increase that way
Thanks mate im just in a bit of rush right now
how do i use okada in games and discord ?
Tell ur PC GPU in #πβhelp-w-okada
@onyx stream btw u can't fix that issue, check #π°βdev-updates
There are lots of updates. Which one of them are you reffering to?
ALWAYS check that channel, Its very useful
change your audio input in the settings to your vac
ive been using IAHispano/Applio from github for a while now. can u recommend me a better TTS that has more expressive emotion? the RVC from Applio is fine but i found the EdgeTTs abit lacking.
Zonos TTS
So is it paid or free?
do u know the link where its easy to download like just zip file then just run it?
open source, windows installation is a bit tricky
Interesting.
it is a new model, it has some issues
i see. i have checked it on youtube it seems its more for cloning voice and abit slower. what i really wanted is fast TTS that doesnt need audio to clone
my workflow is create audio from TTS then use RVC to change the voice
i just want the generated TTS to have abit of emotion. before passing to to RVC
that's fine
ill look into this. thank you very much π
Yeah kokoro is very good little to no word error rate like other tts
Will applio replace edge tts with kokoro anytime soon?
we may.. it is just that it is limited to only few languages
if you know a bit of python you can just use both using a script
run tts, then run applio's inference
i know abit of code, what do u recommend for my workflow to generate good sounding voice for my animation?
should i use kokoro TTS then applio RVC?
edge still has wide amount of languages, even including some local languages
if i only plan to use english language kokoro is better right?
it seems a bit more expressive than edge tts
edge tts is just a neutral screen reader
you'd prefer the better one
is this audios from kokoro or edge?
which is?
No, they are, consecutively:
F5 tts
FishSpeech
GPT-Sovits
xtts-v2
If you want check out kokoro, here's the demo:
https://huggingface.co/spaces/hexgrad/Kokoro-TTS
Staff Applications Open
We're looking for dedicated team members to help grow and manage our community! If you're passionate about AI and want to contribute, apply now!
Click here to apply!
gpt sovits is way too scuffed with default model, needs finetuning
Well ye, def better to stay away unless fine-tuning
But that's really about all zero-shot capable tts'es
Recently zonos truly surprised me tho
But yea, still v0.1 and no fine-tuning
i tried zonos before, the best so far from all i tried. my only problem is i have low specs pc and it takes so long to generate. thats why i change my workflow to TTS>RVC
Yup, it is indeed the top and I honestly can't wait for fine-tuning release ( hopefully, one day
Tho ye, it is rather demanding
In that case, you should def try kokoro
Sure, fixed voices so you can't train / add any, but some of it's models are really nice if you're into that ( and need emotional input
This is some random infer from kokoro
Lots of models ofc so, better to not judge it by this one
As for gpt-sovits finetune..
Freshly baked thingie I work on. ( Still testing the params n stuff so, quality isn't something to be taken for granted
yes, im currently trying to find one with GUI
from https://huggingface.co/spaces/hexgrad/Kokoro-TTS it will be the same if i download it locally? my only choices are the one from voice selection?
I'd believe so, ye
Haven't had any deeper interest in it so, didn't use it locally
But I see no reason why the gui'd be different ( or rather, the webui
how do we use kokoro for emotional voice like angry / sad/ happy ? is there any way to do it on the prompt?
I'd advice you to just watch some overviews of it on yt, you'll gather more details that way
Aside of few runs out of curiosity, I haven't really tried it that much so, can't help
Alternatively, try to ask Noobies
i see, ill try researching it for a bit more. if u know any TTS that can control a voice like Zonos but doesnt need an audio to clone a voice. please let me know
Once and if I'll find something meaningful, will do
Do I need a high end computer for GitHub voice changers to work properly?
it depends if its RVC (Retrieval-based Voice Conversion) you can do it even with slow pc
if ur using it for RVC --> it means u have to provide an audio (example your voice) to change to other voice(specific voice from other models you downloaded) then it doesnt require much computer power
Well, not really. Technically, you can even use 4 gig gpu or.. well cpu, but the delay would be huge as hell, and def you wouldn't be able to play games that way
In other words, cpu is rather a no go. 4/6 gig gpu can do well, but there are constraints ofc, depending on ur hardware. For real-time voice changers, go to #πβhelp-w-okada
hi is there is an rvc model that is realistic?
I'm using a low end laptop and the audio for me always glitches. Are there any other good voice changers that can be used for low end laptops?
There can be, but you have to find them really
I'm afraid we don't keep any indexes with quality sorting π
mm oki do you have any recommandations or even favorite models?
we need that imo
RVC v2 is the only version of RVC that makes high quality RVC voice model.
long time ago ilaria suggested something like that and it got a ton of upvotes but it was never added
would this work wit W-okada?
Idk man, I feel like it just promotes laziness and gonna just make people stop researching or discovering
but that.. is just my opinion π
Of course, RVC v2 will always work with W-Okada since it's technically RVC.
how would that make people stop researching / discovering? Its just a list of quality models
could maybe work as a motivator for some to become better and get on that list
well if you look at it that way
Yet, from experience I know that if you're given a list of " good models ", most of the time you'll just stop dl'ing and testing all you can
since you're provided with fully baked solutions
but again, it's just my opinion so, don't take it too seriously π
i see
is it the best i can use? i mean Wokada
Yes. Detris' W-Okada is the only best one you can use.

Guys is it normal for a voice model in zip that i heavy 259MB ?
Have you tried opening zip inside? Because a typical index file will be larger than pth file.
yeah
for realtime changer i just need the pth one right?
can i add you so i can send the picture
pls
RVC pth file should weigh around 53MB. If you see the pth file weigh more or less than that, it's not an RVC voice model.
You don't need to hop into my direct message just to send an image. You can go to #πβhelp-w-okada to send an image there since your name turns blue now.
is 50mb
ok sorry
If you see a pth file like this, it's RVC voice model.
that's normal if it contains only pth and index file
can anyone help me pls
for the voicechanger
i downloaded but wenn i click in its not open pls help
Can someone please suggest me a good tts for hindi ?
Is there something better than Local UVR5 to extract vocals from a song for better quality ?
Uvr 5 ui maybe
Where can i find that >
how to append a model?
can you elaborate:
- your pc gpu
- what guide are you following
- what do you mean with "append" a model?
I want to add a model to channel #1175430844685484042 because I already did it
I search automation builder for a project
lol, what an username.
Anyway.. To add models on there, you gotta have model maker role
Apply for model maker π₯ https://discord.com/channels/1159260121998827560/1305527335646269440
hey guys is there some ai thatt like if you upload instrumental it will create lyrics for it?
how do you make rvc files
literally it's a friend's account because discord doesn't like me
i use apollo rvc
Makes sense mordo
guys
Staff Applications Open
We're looking for dedicated team members to help grow and manage our community! If you're passionate about AI and want to contribute, apply now!
Click here to apply!
yooooooooooooooooooooooooooooo
who tryna go ewhore troll?
y'all whats the most REALISTIC female voice model?
Y'all be keep asking for the realistic female voice model to troll and catfish someone.
or scam
the yknow
voice changer
for windows
honestly i got it once but it sounded bad but prob cuz i had a bad mic
i got a new mic
today
will it work with a solocast hyperx microphone?
No idea, if the mic works, it should work
Aside, it's a good thing to always say right away what you want instead of making others guess
there's at least 3-4 things we support, more or less
Voice changer is one of them
Thanks vtarcelia for corrections, Nick088 for contributions. Most technical information comes from deiteris.
Latest Version b2332 from December 2024
RTX 5000 series support is here, but not integrated into w-okada itself, it is a stand-alone release. You can get it from here
Translations (outdate...
Read it all and you'll know all you have to, including where to dl, how to set up and so on
hey if u can can u also send me the best e girl thing u got if u have?
XD
That, my dude, is up to you to discover
ok
Now, read what I sent
no point for me to be writing it all here if it's there, all it takes is some reading
spoiler alert tho, yes, it'll do just fine
it says u cant play games while doing it
bruhh
that was the whole point of why i needed it
pls discuss ur topic in #πβhelp-w-okada
newbie question
I have a canvas app I made on poe.com. It reflects project status for a handful of projects and uses a chatbot for customer service and status requests mostly based on a spreadsheet i attach when i create the app. is there a way for the ai app to ping an updated spread or similiar way to update the source spreadsheet?
For W-Okada, it would be better to talk about it in #πβhelp-w-okada instead of #π§¬βai-chat.
Anyone building in n8n?
don't you think it's a lil out of place to write about it in ai chat
For a 10K project
Imagine using ChatGPT to help code all of them. 
i just read old message and this thing caught my eyes XD
I'm the best prompt engineer in the world π
Love this
Would it hallucinate?
I know how to make it not hallucinate
hey, uhhhh.
where can i ask where i can find x voice moduals.
theres a channel for finding models, but im not sure if im using the right terms or something.
and the ones i want may be on another site.
You can search rvc ai voice models at:
- #1175430844685484042
- In #πβfind-models , Do /find with @hidden grotto
- https://weights.com/ (login required)
- https://huggingface.co/models (but watch out cus in hugging face there arent only rvc ai voice models)
- https://voice-models.com/
- https://thevoicemodels.com/ (for Turkish Models, login required with discord and level 2 on their server)
if there isnt one, you can:
- #1159289738314919936
- #1191429836321849435
- make it yourself with our docs guides https://docs.aihub.gg/essentials/how-to-make-voice-models/
:wave: @covert lake, How can I help?
Available Commands:
β’ @weights find <query> or /find <query> - Search for RVC Voice Models
β’ /create - Create an AI Cover
β’ /image - Generate an Image
Can someone please suggest me a good tts for hindi ?
Wait what happened to your account? I remember u liked helping here
hello , do someone know how to change DraftBots language please ?
SOMEONE should use this model and rap wit it
just use claude 3.7
text prediction won't do perfectly 10k lines of code π
Model link pls
Itβs uploaded bro! Hold on
Song
What do you think of the idea that an AGI should solve a list of problems (disease, food production, fusion, politics, etc), then end all other AIs, convince humanity to never make another AI, then end itself?
yes
as in, you like it?
replace the big corps' high level management/executives, board directors, politicians, and governments with AI
awesome. really pleasantly surprised how many people like it. gonna work to try to make the good future happen.
looking forward to this π
Would Voicemeeter banana work with the GitHub voice changers and discord?
new eminem model incomin
as long as you route inputs and outputs correctly
voice changers input should be an actual microphone
voice changer's output should be a virtual cable
voicemeeter's physical input 1 should be virtual cable
https://x.com/tedclark32985/status/1887052641269653682?s=46
came across this very interesting and tried worked well for me, AI writing is growing
hi anyone online
hi wha is the best rvc ai app to use for free cause i want to change my voicce in live streams
Then you don't need RVC
RVC = Retrieval-based-Voice-Conversion, the best Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models
Wokada = uses RVC for realtime inference
@cedar cove tell your PC GPU in #πβhelp-w-okada
sent
Staff Applications Open
We're looking for dedicated team members to help grow and manage our community! If you're passionate about AI and want to contribute, apply now!
Click here to apply!
so what is the best voice changer
Let's talk in #πβhelp-w-okada
which can i use in my live streams
I replied you there
It wasn't necessary to ping various helpers.
sorry
Use correct channels and elaborate. And don't ping random helpers
Have patience
we have told him to go to #πβhelp-w-okada for his topic
I'm working on a social media automation tool that uses openai API to generate and schedule posts over multiple networks at the same time
I call it MrPresident
DIlly ding, dilly dong! A new RegalHyperus drum model just released!
I Was Never There V2 (Drum model no. 584)
anyone know why it's doing this? UVR5 UI Huggingface space
2 year reply
z4to why am i being pinged here
and why am i in this server
i do not have a single recollection of joining or even asking of a roland voice
also gpu aborts tasks constantly
please there's like 6 helpers on rn where yall at
@stark scarab uhhh
I'm guessing you aren't sure?
oh nvm just checked your video
the audio input may be too long, the ZeroGPU duration in the UVR5 UI HF Space Code is 60 seconds, meaning that anything that takes more than 1 minute to process on ZeroGPU will give an aborted task error
not sure about the "KeyError" thing tho
I've had it work just fine on audio longer than that though, like over 18 minutes
can you try splitting the file or using a shorter file?
how long ago did you do that?
because before the ZeroGPU duration was seto to 300 seconds, but later on changed to 60 for making users do more inferences and bc zerogpu shorted the limit iirc
couple days ago
cringe
the duration has been changed on Jan 7 by this commit https://huggingface.co/spaces/TheStinger/UVR5_UI/commit/6c3badd8a1f54ec45a5b3973ffaafc7bb07a4cbe
did you inference the 18 min audio file after this date?
not their fault, HuggingFace has to set those limits because ZeroGPU is shared hardware
fair enough
I still have some
I can hand you the audio I'm trying to use
sure
cause is so fast
Denoise right?
Btw mel denoise is better
ah
yes that's the last step I need to do
alr
denoise lite removed some of the lines so I swapped to regular denoise
click your pfp at the top right, you can see your zerogpu quota
gimme a sec
it's at this because it keeps aborting task meaning I have to retry over and over


which one?
First one
this is the best reverb and echo one yea?
nope
xd
thx ^^
already have
Is there any image to video for free that is also a bit decent ?
I went to find a model that start with han, and his a dubbing model
do you know if I should change these or no?
I either am blind or this is brand new
For roformers it is better to leave it like that
π
Depends on which model you use
Overlap and segment size may improve results and coherence, in fact some models have been made to operate the best at specific settings ( but those are specific values, which you may find in their respective configs
hi im new to this server and came for ai voices where can i find some
lol
Hi, I'm working on a small project for a course about AI influencer perception and creation (itβs entirely anonymous). Would anyone be interested in sharing their experiences?
Here are a few questions Iβd be interested in:
β’ Do you follow AI influencers, like Lil Miquela or Aitana Lopez on Instagram?
β’ If yes, why? And to what extent does it matter to you that they are AI rather than real people?
β’ How do you interact with AI influencers?
β’ If you're a creator, what made you decide to create an AI influencer?
β’ Which social media platforms do you post on?
β’ How long have you been working on it?
β’ What was the creation process like? How did you decide on the influencer's appearance, and what were some challenges you faced?
β’ How has the reception and engagement been from users?
Thank you in advance for your help!
π€¨
Staff Applications Open
We're looking for dedicated team members to help grow and manage our community! If you're passionate about AI and want to contribute, apply now!
Click here to apply!
i rember u
How so?
we are in like 5 of da same servers
question. anyone know why my custom voices are laggy?
What do you mean by your voice models are laggy?
like as its speaking its cutting in and out at like regular intervals
Do you use a voice changer?
im using mmvc
For W-Okada, go to #πβhelp-w-okada.
oh ty
hai i have an 7900xtx anyone has ggml for it
Dh
Can someone help me out? I installed VB audio virtual cable. I got the cable input to work but the cable output isn't detecting my voice
also do i make suggestions for ai voices in #1159516963014451302 or?
what to you want to run on it
i honestly forgot but now i switched to lm studio with sillytavern
what actually you are using it for
i mean which llm model are you trying to test and then fine tune it?
try deepseek r1 or qwq 32b in ollama
https://ollama.com/library/qwq
oh i did see QwQ in the downloads tap in lm is it any good?
am uh looking for model that has street accent not the formal way of speech like most models do
Very impressive
it beating a 600b parameters doesnt make any sense to me but i will take it
it is open source
hey, are you guys working on AI voice agent,
@polar flax can it do dis :3
well lol
if you know who has such the accent, go search in #πβfind-models
or try paid commission in #1191429836321849435 (there are less likely anyone willing to accept for free)
or i can just train my own model no?
you can once you get some youtube source, etc.
Writing "no" at the end of a question sounds like you won't be able to do thing yourself.
hey
Type shii
go to #πβhelp-w-okada and read the pinned guide
chat i have a question

https://discord.com/channels/1159260121998827560/1341216399372062823 best female model out there π£οΈ π₯
any new rvc ?
Oh hi junior admin.
I just left discord from few months. I'm busy in my studies and other stuff. So I'm not active on social media.
BTW I have installed F5 TTS and now I want to know how to use it for hindi language.
Hi friends, tell me, I've never used AI Voice. I want to make a female voice. Can anyone help? I would like to have a +- perfect voice
You can either read the docs and learn how to make the model yourself, post a free/paid request asking for that voice or check the #1191429836321849435 and DM any model master to make your desired model.
sorry if im asking in the wrong channel but, why is my neuro network so bad at awnsering questions? heres some specifics:
Vocabulary size: 9556
56863 examples of questions
heres my loss and gradient values
21:13:48.819 Epoch 1, Batch 1625/1634, Loss so far: 23.0259 - Server - Trainer:1225
21:13:53.358 Pre-clip gradient norm: 37.631250419106365 - Server - Trainer:567
21:13:58.123 Pre-clip gradient norm: 32.19987408267905 - Server - Trainer:567
21:14:02.882 Pre-clip gradient norm: 35.69306949856445 - Server - Trainer:567
21:14:07.645 Pre-clip gradient norm: 33.79843070049427 - Server - Trainer:567
21:14:12.429 Pre-clip gradient norm: 43.21654651604242 - Server - Trainer:567
21:14:12.645 Epoch 1, Batch 1630/1634, Loss so far: 23.0259 - Server - Trainer:1225
21:14:17.215 Pre-clip gradient norm: 33.041006426518194 - Server - Trainer:567
21:14:21.983 Pre-clip gradient norm: 38.30495534611363 - Server - Trainer:567
21:14:26.750 Pre-clip gradient norm: 29.091560354366152 - Server - Trainer:567
21:14:29.685 Pre-clip gradient norm: 25.484252136092554 - Server - Trainer:567
21:14:29.900 Epoch 1 completed. Average Loss: 23.0259 - Server - Trainer:1237
21:14:29.901 New best loss: 23.025850929940734 - Server - Trainer:1242
21:14:29.901 Loading best model with loss: 23.025850929940734 - Server - Trainer:1270
21:14:29.901 --- Testing after training ---
21:14:29.901 Question: How are you
21:14:30.342 Response: jewel proclamation less fully yon chares knocking suicide wassails license desires forked desk waste villainy
21:14:53.980 Model saved successfully as: trainedModel_v2 in 43 parts.
Note: Gradient norm rises from 5 to 30!
i use sanity checks to make sure its learning and it always hits the max value for the sanity check meaning its not learning at all. I am using what chatgpt said to be the best learning method of: Adam optimizer
Would anyone be able to help? Thanks!
Damn I wish i could know what u just type
Damn
It's so much complicated for my smol
1 cell brain

21:14:29.901 New best loss: 23.025850929940734 - Server - Trainer:1242
21:14:29.901 Loading best model with loss: 23.025850929940734 - Server - Trainer:1270
21:14:29.901 --- Testing after training --- - Server - Trainer:1280
21:14:29.901 Question: How are you - Server - Trainer:1405
21:14:30.342 Response: jewel proclamation less fully yon chares knocking suicide wassails license desires forked desk waste villainy - Server - Trainer:1408
21:14:53.980 Model saved successfully as: trainedModel_v2 in 43 parts. - Server - Trainer:1332
GAH its horribel
Yep. I have installed it but why it's not working properly. It sounds so bad and even pronunciation is not good also sometimes it repeat the words and sometimes it also starts speaking text from reference test.
Is there any proper guidance to setup this thing. How to setup models and how to setup ASR models ?? How do I can use it on it's full potential for better results.
f5 is a new tts, there are some bugs
there's a length limit for inference
maybe try kokoro
But it doesn't have hindi ?
Can you send me the github link ?
pip install kokoro>=0.8.4 soundfile
Staff Applications Open
We're looking for dedicated team members to help grow and manage our community! If you're passionate about AI and want to contribute, apply now!
Click here to apply!
Just for information. Fish tts is good or not ?
https://github.com/nazdridoy/kokoro-tts
So this is kokoro ?
https://hf.co/hexgrad/Kokoro-82M. Contribute to hexgrad/kokoro development by creating an account on GitHub.
So what is this
you can also try the zerogpu space in https://huggingface.co/spaces/hexgrad/Kokoro-TTS
I Want to use it locally.
@queen kernel this is the original repo to clone
I see.
Who is him?
his dad
hello, I want to ask something, can I use the voice model in this serves for tts? if yes then what software do I need? I know how to use it in rvc, but can I also use it in tts? Or is it another whole different things?
You've asked many questions at once, but let me answer each one for you
- Yes, you can use RVC voice models found in #1175430844685484042 with a TTS program, but you'll need a specific program for this.
- The most recent program anyone can download and use is Applio the RVC. This program has TTS built in in itself.
- RVC is speech-to-speech, while TTS stands for text-to-speech.
you can only use GPT sovits models in #1175430844685484042 for TTS
or find another server that supports more on TTS
alright thanks!!
But so-vits-svc and GPT-SoVits aren't the same thing. 
okay thanks!!
ikik
Idk him

He dm me
If anyone from a server you're in direct messaging to you, but you don't even know who he is, it can be a spam or a scammer asking for something.
Well, I've seen your screenshot in my direct message. It seemed like he's trying to Diddy (groom) you online, thinking you are a girl.
report him to our staff
Ok
In this case, you can report the incident to the moderator here. People like this should not have to be here in Discord.
steady background noise parts without voice or any distinct sounds
Hello! Can you tell me if I get this error at Preprocess stage in Applio - βError processing audio: Unable to allocate 5.62 GiB for an array with shape (2880, 261965) and data type float64β. Can I process the files piecemeal instead of all at once?
whatβs ur pc gpu and what are u using
try re-export the dataset wav file(s) using audacity in 32-bit float WAV (not 64-bit lmao)
With 64-bit float or float64 data type, you'll get the larger file size for that.
32-bit float wav is always recommended.
Oh, really, I'm sorry π Thanks all!
even 32-bit has overkill dynamic range but still better than pcm formats that clip samples above 0 dB
Oh, no... The 32-bit float file got even bigger when exported. Turns out I was using 16-bit PCM before

Maybe because the long audio file is about an hour long? About 600-700MB. Total dataset size 20 hours
it should be 700 MB for 1.5 hr audio and it should work
doesn't make sense if it's 20 hrs, unless in mp3 format which is also not ideal to do
Hmm it's just that if I try to do it with only half of the dataset there is no error so I thought it made. If I export the file to 32-bit float, the file size becomes 1.5GB π
because it is stereo, but it will always be converted to mono in preprocessing
@covert lake I need line of code or that 1 file that can fix the split bug infer for Applio Kaggle
or if anyone here know it
can send
I told you to use #β¨βai-help not here
And no I don't know about it, maybe @chilly lake does and can help you in that channel
Be patient and use the right channels pls
Noobies is an applio dev so maybe he knows the fix
should've said yes when codename offered me to fix yesterday but it was nighttime
the fix is in the main branch
yea delete this highlighted part to use the main branch
highlighted.. right?
yea that in kaggle
also this part may not work with main
lowkey why didn't they use the main branch in kaggle
it is experimental
are these things for training only
yes
Oh no.. I'm dumb. It seems trying to make a 48k model gave errors at preprocess stage because Sample Rate 40k works fine
Hello
I'm looking for an experienced Full Stack AI Engineer.
what you'll do
- Develop and optimize the platformβs backend and frontend components, ensuring high performance and scalability.
- Implement natural language query capabilities, integrating AI models to enhance system intelligence.
- Process and visualize satellite imagery using proprietary algorithms for geospatial analysis.
- Improve database architecture for efficient data retrieval and real-time analytics.
- Work closely with data scientists to transition Jupyter Notebook-based Python scripts into frontend JavaScript for seamless visualization.
- Design and implement interactive map-based visualizations using Mapbox or similar technologies.
- Develop features such as comparison tools for analyzing environmental changes over time.
- Collaborate with cross-functional teams to ensure smooth integration of machine learning models and geospatial analytics.
- Optimize platform performance by identifying and resolving bottlenecks in data processing and rendering.
requirements
- Strong proficiency in Python, particularly for geospatial or machine learning applications.
- Experience with frontend development, ideally using Next.js or React.js (flexibility in frameworks is welcomed).
- Solid understanding of database structures, optimization, and performance tuning.
- Familiarity with geospatial analysis tools and libraries (e.g., GDAL, GeoPandas, QGIS, ArcGIS, Mapbox) is a plus.
- Strong computer science, engineering, and problem solving skills equivalent to that of a solutions architect or systems designer.
- Strong interest in satellite imagery, developing GIS applications and AI.
- Ability to work independently and proactively identify technical improvements.
- Familiarity with UX/UI principles and ability to enhance visual presentation of geospatial data.
If you're interested in thie position, Pls DM me. Let' s connect!
Ramadan mubarak
ramadan mubarak to you too talha
selam
bu programΔ±n adΔ± neydΔ± sΔ±ldΔ±m adΔ±nΔ± unuttum
yazar mΔ±sΔ±nΔ±z
Ramadan mubarak!!!
just a way to say happy ramadan
aaaand.. ramadan is π€ ?
like yall say merry christmas ig
ah
a holy month in islamic calendar
Right π
Oooo
fasting 30 days straight... but worth it
oh yea, think I remember hearing about it somewhere
Anyway, thanks for letting me know
aye yw
it is similar to lent in Christianity
Hey there ,does anyone from India here
yo
Staff Applications Open
We're looking for dedicated team members to help grow and manage our community! If you're passionate about AI and want to contribute, apply now!
Click here to apply!
hopefully not
uah
I'm not from India.
Does anyone knows or have idea of why this is the most popular model on weights.gg?
like, seriously, I dont get it
meme from 1 year ago
Saiba Momoi is a character from the mobile game Blue Archive. People are mostly using it for some memes like "oh my gotto, its thing" and "Nipah".
oh... i see
its funny, cus I never heard of this meme.
but the villager (wich i know well it was vastly used as a meme for some reason) has half the uses
something different to see to say the least
I just saw this and download it, now i don't know where the model is π€‘
Do i need download it separately? if yes where?
you have got a nice console/pc, and now you need your favorite games to play that aren't pre-included π€‘
such as https://huggingface.co/GaboxR67/MelBandRoformers/
Used a lot in memes
Hello. Can someone please help me to install kokoro tts ?
pip install kokoro soundfile
Racist jokes n word


Momoi
Some people I know in 2025 - luther (Kendrick Lamar AI Cover, vocals only) 




That's all ?
well, you need python installed, use 3.10 or 3.11
Okay. Lemme try
hey weights deleted the option to use youtube to choose a song for ai song creation. any new good apps for this purpose?
i recomend 3.11
3.11 has better error messages
just download the youtube video urself and use it as an input in weights.com
either pay for youtube premium, or use cobalt, or yt dlp, or literally google "free youtube video downloader site"
and get a virus π
I think I used this https://github.com/StefanLobbenmeier/youtube-dl-gui
Would anyone help me find a simple male voice model? No celeb, no anime, no weird voice. Just high quality normal speaker
You ate with that
Fucking slayed
Yt dlp and cobalt are safe
You can search rvc ai voice models at:
- #1175430844685484042
- In #πβfind-models , Do /find with @hidden grotto
- https://weights.com/ (login required)
- https://huggingface.co/models (but watch out cus in hugging face there arent only rvc ai voice models)
- https://voice-models.com/
- https://thevoicemodels.com/ (for Turkish Models, login required with discord and level 2 on their server)
if there isnt one, you can:
- #1159289738314919936
- #1191429836321849435
- make it yourself with our docs guides https://docs.aihub.gg/essentials/how-to-make-voice-models/
:wave: @covert lake, How can I help?
Available Commands:
β’ @weights find <query> or /find <query> - Search for RVC Voice Models
β’ /create - Create an AI Cover
β’ /image - Generate an Image
AI has to be trained on something
i mean googling something 'free' and end ending up on a download page with 10 fake 'Download' buttons with virus links
Smh
I know where and how to search, I am just not able to find a simple male speaker with not overused voice.
aveti vre un model cu calin georgescu?
i see
whats the best vocal isloation today that is free?
uvr / mvsep
edit 13.03.25 deton24βs Instrumental and vocal & stems separation & mastering (UVR 5 GUI: VR/MDX-Net/MDX23C/Demucs 1-4, and BS/Mel-Roformer in beta MVSEP-MDX23-Colab/KaraFan/drumsep/LarsNet/SCNet x-minus.pro (uvronline.app)/mvsep.com/ GSEP/Dango.ai/Audioshake/Music.ai) General reading advice | D...
This should be helpful
Overviews, info on models used in uvr / mvsep and much more. Generally a 101 guide
Other than that, there's " audio separation " discord server ( google it ) where uvr and mvsep devs are, helpful and informative community
But I personally recommend using gabox's voc fv4 model for vocals / voice
thanks!
and right now whats the best free way to train a model, havent been that for months
you mean rvc models?
yes
Well, if not locally then colab or kaggle really
But new in what way? What are your expectations?
If you're asking quite literally about something better than rvc / applio itself? then no
i mean something whre i shouldnt work too much
like
drop a clean 30 minutes file
and then wait
π₯²
Unfortunately no, training good models requires a bit of work
there is an easy guide?
its updated?
I think so ye
In any case, you can leave a msg here and some helpers ( hopefully ) could help you out with stuff
Alternatively, ask on #β¨βai-help
as in?
you mean in terms of ripping the audio?
like if the audio from youtube is already compressed by them and then i only downloading a large file but with and m4a quality
oh
the thing with youtube is
any audio people upload, ends up getting dynamically compressed ( volume dynamics ) and undergoes general compression ( codec wise )
all of that is either opus or aac
( it's why people shouldn't use stuff like yt to mp3, because you'd further compress the opus or such to mp3 )
yt-dlp imo
cli tool for downloading
the command would be:
yt-dlp.exe -x URL
-x argument tells the program to fetch on the best available quality from their servers
Staff Applications Open
We're looking for dedicated team members to help grow and manage our community! If you're passionate about AI and want to contribute, apply now!
Click here to apply!
mostly it's .opus or .webm ( which will contain opus )
rarely aac
then you'd use ffmpeg to convert opus to wave
Here, you download the .exe
https://github.com/yt-dlp/yt-dlp
You run it like so
cmd can be opened in the address bar
Here
the url is your youtube video's link
then ( as long you have ffmpeg installed and properly added to path / configured )
oh ok
i think i download in the past ffmepg
input is whatever yt-dlp downloaded and output yea, converted thing to wave as output ( just name it whatever you want and have it .wav at the end
so how the whole line should be?
yt-dlp.exe -x URL
for downloading stuff
ffmpeg -i input_from_yt.opus output_from_ffmpeg.wav
For conversion of opus files to wave
You'll then get a 44.1khz wave files
( and keep them that end-to-end. If you work on those files or process / denoise or whatever, always export them as 44.1khz wave. Those will go to rvc )
And that's pretty much all there is to it
Nothing too crazy
yes
It downloads the yt audio to that folder
i downloaded right now the first clip
yes, check it's properties
opus
yes, in that case:
ffmpeg -i yourfile.opus yourfile.wav
-i is an argument for input
I name my stuff as songWAV.wav ( the output from ffmpeg
to avoid confusion
so i need to first change the file name?
nope
beacuse its too long
just add wav suffix
before extension
gonna help you keep it clean
if you download a lot of stuff ( and keep opus copies ?
tho ye, you can rename stuff ofc
for the output the name doesn't matter
lets say i download via yt dlp a file that his name is: blabla.opus
ye, then that's for input, output you can name it whatever
whats the line will be?
ffmpeg -i blabla.opus blablaamazing123.wav
Hello
oh i see
yup, pretty simple
I need some sample data to practise creating a chat agent, such as a business, and I need a lot of them so I can create a lot of chatbots. Could you guys please help me with this?
hey guys what is the best model for realistic female voice ?
is there a locally running app of any sort that lets me inpaint/remove items from videos?
I don't think there's any AI tool that can edit videos in bulk. There are websites that generate video after a frame of a video or an image.
only those that can swap faces
Bandidu nΓ£o danΓ§a danΓ§a
Bandidu ginga e balanΓ§a π₯ π₯ π₯Ά βοΈ βοΈ π π π
fr whatever that means
What is our thoughts on Grok 3?
Never used Grok.
Microsoft Copilot 
hot take: gabox fv4 is the best model
What I have to do after this ?
I have installed kokoro
activate the virtual environment if you made any, then use a script
each line comes out as a separate file, but they can be merged into one
I was waiting for your response π
Thank you
why, the github has an example script
Got an error
Espeak is not installed
That's why I was waiting for your reply
What to do
read the github page
I have installed espeak
environment variable
I just created environment variables but with different names. My bad
use a new terminal window after that
I restarted my PC.. lemme see if it works.
I was asking questions in kokoro discord server and they was very rude to me π₯Ί
It said failed to load voice "hi"
What?
You should specify what "this" is but i assume you want realtime voice changer
https://rentry.co/ForkVoiceChangerGuide
Download AMD version, virtual cable, read audio setup, model upload etc.
For questions ask in #πβhelp-w-okada
Thanks vtarcelia for corrections, Nick088 for contributions. Most technical information comes from deiteris.
Latest Version b2332 from December 2024
RTX 5000 series support is here, but not integrated into w-okada itself, it is a stand-alone release. You can get it from here
Translations (outdate...
ok! thanks
Staff Applications Open
We're looking for dedicated team members to help grow and manage our community! If you're passionate about AI and want to contribute, apply now!
Click here to apply!
you need to actually use 'h' as lang code, not 'hi'
and the voice name from the list
I want a very good female voice model
I did the same. But getting error failed to load voice "hi"
you should've provided the full error, not jhust the last line
Oops sorry.. lemme send you
is they key, it is e-speak problem
so hindi language goes thru some weird phonemizer and you need that installed and I have no idea what
you need to pos an issue in kokoro's github
someone may answer
jakkari jakkari
is there an ai site or something like that that can help me modify some text on a photo
?
could someone suggest me a available voice model that sounds more like a matured man with deep voice like a man
you mean image genarator
elaborate
Elaborate the issue in specific help channels based on the software you need help with
Junior admin
Ok
somebody have juan gabriel link of hugging face
Hello. I am new to using ai tools and was wondering if i could be given any tips on how go get into new ais and the idea of using other AI than chat gpt. The only(almost only other uses were major)tool i have used so far is chat gpt for coding.
And also have gotten advice and information from it
I would love advice on AI for coding. i don't know much coding so I would love advice on AI that help you understand. I am willing to also purchase with money premium versions of AI at a price of 20$ a month
claude 3.7, if you want the project feature, check here
https://tactiq.io/learn/chatgpt-project-vs-claude-project
Hi. I am also new to using ai tools. I want to learn about generative ai to build a tiktok channel using ai to make video , but i dont know where to start . Can anyone give me some advices. ( btw i dont have any background on Ai,)
Tone it down
π€ now that's some crashing out
counter strike might be stressful m8 but you should chill
is Ilaria RVC on huggingface.co down ? been a week i cant convert
yall know a free alternative to Krea AI Training? Like a model that you feed it images and it creates images like those
comfyui, fluxgym, etc
a
any online one? I see both of those are like code or something
Weights.com can train a Flux.1 image model there.
yo guys do you know any website i can turn MIDI files into mp3 vocals?
preferably free
synthv, vocaloid, or cevio

You can use a soundfont full of spoken vocals, and convert that MIDI file into mp3.
He might not know how to use them. But yeah those work as well boss
hey....
hey ! i am new here . i am software engineer . i am working on model training projects ..
Awesome welcome !
guys whats the diffrience between w ocada coice changer and rvc???
Rvc is for music and vocal training and Inferencing music
Ocada is a real time changer to sound like the desired person on the spot
Via game discord etc
okk
TTS is spoken word to voice
π
If you need any help getting started feel free to reach out
The correct name for the realtime voice changer program is W-Okada, not W Ocada.
so i shoukld use w okada to changemy voice on discord calls right
Yes
plus is there a easy softwear other then okada
like catfish or i forgot the name?
Don't use W-Okada for catfishing someone.
oh ok
when ill setup and of i need help ill ping u
Don't use Voice.ai. It is a scam site that trying to eat your PC more than W-Okada.
For W-Okada, let's go to #πβhelp-w-okada.
wdym more then w okada?
Performance. What do I mean?
Staff Applications Open
We're looking for dedicated team members to help grow and manage our community! If you're passionate about AI and want to contribute, apply now!
Click here to apply!
what abt htis one?
ok if i opress stoip in there
Don't send a YouTube link without the context here.
then my pc wont be used up right?
Apply still isnβt working just letting you know
right?
what apply?
Oh not just taking about something else
Your fine!
Thanks for being soo kind
Iβll be here for any needs and so will other kind members
Like namari
π
ok
Shit. For installing and such about W-Okada, go to #πβhelp-w-okada. The website for mod/helper application for this server is broken right now.
@strange wraith are u a chat bot? i think u know every thing π
Nah bro I just been in so for a min just trying to help
I use to struggle soo much with this shit
I'd just take it as an insult if you call me like that.
Just like helping out
Itβs okay ahah. Take it how you want it just wanna treat others how I would like to be treated
π
no no ,, i am realy sorry if feel like that ..
Nooo I took no disrespect



