#๐ฃโsuno-showcase
1 messages ยท Page 1 of 1 (latest)
Mi hermano is very inteligente, pero sometimes he's un poco terco.
The word spacing around he's is a bit choppy in terco_c
WE HAVE ACHIEVED KNOCK KNOCK GENERATION. Last text prompt is "Suno who?" The research continues.
(It looks like an accident, but I laughed, so I'm counting it.)
I know this isn't what the model is for but I LOVE IT SO MUCH
There's audio foundation models sure. But why not generate the content too, even MORE foundational!
I won't rest until SUNO is generating SVG code
I've been randomly throwing wrenches into the sampling to encourage more unprompted content, sort of vaguely beam search-y
but with no theoretical grounding and my poor technical skills, mostly trial and error lol
@tardy topaz, you inspired me to try "Why was six afraid of seven?"
And here are some fun continuations - grasping for punchlines ๐
It absolutely kills me, I get the same thing. It's like you called on a student in class who wasn't paying attention, or the teleprompter died, and they speak is stalling.
Has someone tried [cackling]? ๐
You got any samples?
Unfortunately not
https://old.reddit.com/r/ContagiousLaughter/top/?sort=top&t=all may be a good source
I meant of generations resulting from using [cackling] lol
Here's a pretty funny one (took maybe 10 tries to get this one)
cursed
Maybe it will be able to do all if these some day ๐
this thing is amazing , love it ๐ ๐
dang pretty good @deft grotto you running locally?
no am trying to do that now, here is the link is the https://github.com/JonathanFly/bark
ah thank you. im diggin into it now ill let you know if get it working
If you only git the bark repo, make sure you do a 'pip install .' while in the main /bark directory. That's the only install, though you need to setup cuda and all that stuff if you haven't
Oh wait suno can't clone?
its just a tts with supplied voices?
hope it gets updated to support cloning, i'll just stick with so-vits-svc
i been fucking with so-vits-svc, works pretty good. i doesnt have text 2 voice right. its just cloning?
yeah voice2voice
Someone know how to clone my voice ?
I dont see why they'd gimp this so hard- all it'll take is another company to release it without synthetic restrictions
I suppose we could just pipe the output of bark to so-vits-svc with a trained voice model so you get the unique intonations
THis is exactly what I was going to do
With a final pass through https://podcast.adobe.com/enhance
Suno AI + Koe AI
what's Koe AI?
How did you clone the Jay-Z voice, i there a working tutorial anywhere?
hi
#๐ฃโsuno-showcase hi ,my name is John
#๐ฃโsuno-showcase hi,could you please tell me how to use it ?
you have to install it and its subscription, no thanks
Got something completely random as my first attempt. Not related to the prompt at all. Still very much clear though and not nonsense for the most part:
What I hear:
Man1: "Hmm... Heeheehee, heehee yeah, he may be uh, I guess something different"
Woman2: "*soft gasp* That explai-"```
๐
What was your prompt text?
I got something similar too which really surprised me
Hmm, not sure if I can share. As it had at lest 1 curse word. Which I'm not sure is permitted here and/or auto blocked by the bot. It was also quite long.
But basically it was a back and forth between a female and male speaker about recording voice lines and about the oddness of the location in question.
I think the long prompt may have been a cause of the weirdness, and maybe too many hesitation commands, and a [throat clear] early on.
yeah long prompt sometimes lead to the model just going completely off the rails. and i also noticed that tags early on sometimes create issues (i think has to do with our data prep)
It's too bad it's only (currently) capable of generating around 13 seconds of audio. I would've loved to have heard more of what these mysterious AI ghosts were talking about. It sounded quite interesting 
hehe, yeah in our internal generations it's super fun to listen to minutes of fully generated stuff without text prompts. goes anywhere from sermons to music to arguments ๐
we'll get to release stuff like that at least in the studio soon hopefully. just have to iron out some scaling kinks so it doesn't topple over when people try it
I used to spend a lot of time generating Stable Diffusion using blank prompts, so it'll be fun to do something similar with with sound rather than images. In fact, might even be fun to listen to in the background when doing visual tasks. Can't wait to hear those kinda things.
Is there currently a way I'd be able to rig the colab to continually run a prompt by outputting audio files one after the other upon completion? It'd be tremendously slow, but it'd at least achieve that kinda thing and may help with long prompts.
Hm not too too familiar with collab and when it kicks you off but probably can just write a loop and save it some place, maybe even upload to drive or something. They also allow you to upgrade gpu for faster inference etc
I have Colab Pro, so I don't have to worry about that stuff for the most part, thankfully.
Awesome!
Hmm ๐ฆ this is one of the failure modes we've been hearing for "harder" prompts.. we are thinking about how to prevent this more generally
Creepy pasta
prompt
I don't remember specifically. Generated a ton since then.
no laughs and shhhh at the end that not in the text prompt
text_prompt = """
Hello, my name is Salah. And, uh โ and I like pizza. [laughs]
But I also have other interests such as playing tic tac toe.
"""
i got similar artefacts
'hallucinations' i assume
started out good, but decayed
WOMAN: Yabadaba doo! I like Tick Tock Clocks.
Result: ??? Wat
lol
I have now listened to more than 200 "Why was 6 afraid of 7 completions", and not a single actual joke yet. KNOCK KNOCK and "Why did the chicken cross the road" had a lot higher hit rate. They are hilarious though.
lol
It is interesting~
we could use it to generate rap style
Nice, are you using a custom voice ?
It's my prompt, I use [rap] to include the lyrics
text_prompt = """
[rap]
You pray for my demons, girl, I got you [music]
Every time I sip on codeine, I get vulnerable
I'm knowin' the sounds of the storm when it come [music]
She understand I can't take her everywhere a nigga going
I been in the field like the children of the corn[rap]
"""
I think we could use [rap] or other song style to make the generated singing like
โ
yo that bassline haha
whats teh prompt for this?
oh i think i should put my audio hre
this is cool almost
pino
FINALLY WHAT I NEED
christ that tooka white
What's the prompt strategy your testing? No backets?
yeah
brackets dont work
for me at least
i do this
- 1960's breakbeat solo *
with astericks
and it seems to worok better
Where are the astericks?
* 1960's breakbeat solo *
hey do you mkae that bar appear like that in discord ive always wondered
haha
Left of the one key
man repeatetly hits out of tune snare, says "10 seconds" then continues
just hit 3 underscoredots bro
*1960's drum solo* 7 seconds
Yeah, it works well to build a sample library for sure, where you can just say 'give me 100 tries' and find good ones
yeah
i want like an ai generate vst that will give you good sounding instrument samples that sound as you describe them. probably wont be a thing for at least another year or so though
So all of these gens sound very robotic and tinny, why is that?
Much more so than tortoise for instance.
i think its funny how he says "10 seconds" at the 10 seond mark
it screams when i dont want it to
ayo lemme play some goddamn 1 and 2
oh ok
better
its really having rtouble
apparently the yume nikki soundtrack
i actually could use this
for like a choir
?
Not getting what you intended is still al lot of fun:
https://twitter.com/jonathanfly/status/1649923447668047872
In random mode Bark may decide to interpret a song as a duet, a line from a song as a shout from a snarky audience member, or text not tagged at all as music as music anyway. Makes for fun accidents.
the song ironic is ironic because nothing in the song is true irony
[VOLUME WARNING - Screams at the start]
using [rap] in the prompt gave me an insane intro lol, the boom after the scream
Anyone here figure out how to do voice to voice with this?
I dont think you can
It's possible but have to do a lot of things
hah, amazing!! ๐
Thank you for creating this library
Very exciting stuff. I hope to follow how it evolves over time!
UFO Sightings
look it was worth a shot
damn how do you get it to generate more than 14 seconds
oh sweeet
where do i input these things do i just input it in the text prompt thing
python bark_perform.py --use_smaller_models --text_prompt "abcd"
can i do this within the ntoeobok
wait fuck
it doesnt give you the infinity notebook
if u want audio more than 13 sec's than
python bark_perform.py --use_smaller_models --text_prompt "abcd" --split_by_words 32
u can use smaller _models with cpu
that ending omg why ๐คฃ
2 jokes ๐
This is just an idea, but is it possible to make this model follow instructions, such as asking for a song and the model sings it? Given that the model adds sounds and weird stuff on its own, it should be possible for it to learn to respond, right?
Someone had to do it sooner or later
่ฐข็น
...
Bark would be perfect for an AI generated soap opera with rediculously melodramatic actors
how did u get these results?
i dont have the prompt saved, but from memory i just put [rap] before the text. that was probably the only good result out of ten though
it sounds amazing!
I need me a Minerva
halfway through this gets good
how did you get to 48 seconds?
can someone help me with getting this running?
Just Figured out how to make it accept insanely long files as text input in a google colab,
used a quick ex from dune as a test case
to make it more seamless I tried separating it into sentences and not word count
did you use a history prompt for that @pale olive
actually i just updated my own version of it USE THAT history prompt thing parameter thing to add to the "bark infinity" WHERE IT DOES USE THE HISTORY runs perfectly well and I added the addition to the readme trying to figure out how to push the change never done this before lol...
haha
Quite good. Which speaker is this?
Suggestion for female voices anyone?
so i created a fork with this modification.....not sure what im suppose to do after tbh
Louise Belcher from bobs burger?
oh idk never tried them,
timid_jane
thanks man @jagged dragon
I tried out a few things and got it to generate this amazing audio XD
gotta love when it starts out nice and then suddenly transitions into an ear piercing screech
XD
[very shocked gasp] [clears throat] [screams] [dies] [bangs hands] [clapping sounds]
anybody willing to give a python noob a hand getting this up and running on vscode on m1
im just running it on google colab
id like to get it working locally
im currently attempting to do that aswell
yay troubleshoot party ๐
Guys, I'm a layman, how do I run the repository?
aight, 3 strong
open one of the .ipynb files in vscode
start feeding your errors to chatgpt until it works thats what i do
yeah thats what im doing, its not being of much help
bur
at this point for all i can tell yall are chatgpt to me
and . well .. to some extent.. so am i ๐
i mean technically we've all been trained on a load of data and just spit inferences of that out
Is there any tutorial teaching how to use it?
XD
share if u find
Scaling Transformer to 1M tokens and beyond with RMT
Recurrent Memory Transformer retains information across up to 2 million tokens.
During inference, the model effectively utilized memory for up to 4,096 segments with a total length of 2,048,000 tokensโsignificantly exceedingโฆ
3047
766
isn't memory in this context just RAM for training data
i thought it was input tokens
yeah yeah
im speaking in abstract
so it would go : training data > fine tuning > memory tokens
which is basically all the same thing iiuc
anyways back to bark
can someone help me bark
attempting
lol
what
Guys, help me here! How do I run the code?
its downloading something
progress
rip not using my gpu tho
hu
Guys, help me here! How do I run the code?
stop spamming
help brother
I'm just running it through terminal
can u not read that we're all trying to get this to work
lol
ok yeah the import works on terminal
so my vscode setup is bonked
i dont even have vscode ๐
ill just keep making you guys jealous by posting cool stuff
Pycharm or codium or even just terminal is fine. Gonna try in Colab though. Let's messing around
it will be good motivation
but can u clone [famous person] ?
but i struggled for a few hours trying to get it to work right too
im allready using google colab in the backround so im using it rn XD
pod3000 u have it working locally my man ?
havent tried cloning yet no
How much time did you put Into those files you poster. To generate each one. And on what hardware
yeah i have bark infinite workin on miniconda3
alright
its about 35 seconds per file on a 3080ti
but im still running the unoptimized version
OK. That's pretty fast.
theres different versions
yeah apparently there was a speed update
Does the time to generate a song or audio get x'd if its longer. For example is 30 seconds normal for 10/seconds. But a minute might take an hour. Just because of the increased complexity
how much stuff is it going to download?
it will say it/s when its generating
o okk
kb/s and mb/s for downloading
a
Pro life tip. Don't run code you don't understand on your own machine lol. Use colab or a VM. Especially for AI models they are huge
if this guy got the marbles out his mouth this would be a bop
bro your making music XD
i know its crazy
what prompts are you using?
that was
beat Somewhere over the rainbow, way up high, there's a land that I heard of once in a lullaby, somewhere over the rainbow, skies are blue, and the dreams that you dare to dream really do come true
beat surrounded by asterix
So the beat itself is decided by the lyrics then?
id say to a very slight degree
This is fckn awesome
bruh
seems like if you start off the prompt with dark lyrics it is a darker tone to the whole thing
and vice versa
if you start with yo yo yo check it you get a rapper usually
that end got me
bur
so @chrome tapir i reran it and i think this is it generating but i dont know what its doing with it
probably saving a wav file in the root bark dir
i gotta get the faster version my bottom part is so slow compared to yours
its not saving anything ๐
im just running it off a python file i created
you probably need some save audio to file function
maybe
i feel like im judging a talent contest and half the people slowly walk on stage and then start screaming at the top of their lungs
XD
i think beat gives the best results so far
nice
this guy started out hot and then kinda fizzled
do do dod ododo
hmmmm still showing the "No GPU being used. Careful, inference might be extremely slow!" thing
u probably got the pytorch/cuda incompatibility problem i had
a
im getting same
AssertionError: Torch not compiled with CUDA enabled
ima try reinstalling
i wonder how close these beats are to the ones the model trained on
if they are different that would be pretty crazy
getting a tts ai to make music for me
now getting
"The operator 'aten::_weight_norm_interface' is not currently implemented for the MPS device.
i really need more than 14 seconds
14 seconds is right where the lyrics usually kick in after the intro
i only have normal free gpt
i got 4
"torch version does not support flash attention. You will get significantly faster inference speed by upgrade torch to newest version / nightly."
not even using my gpu ๐
its to loud im deleting it
can someone help me with the voice cloning thing?
i was able to use a hugging face space to create a clone of my voice but i dont know how to use it in the coalb notebook
a joke
nice laff
creepy laugh
the ending i was not expecting
yeah sometimes the rsults are random and i just rerun it
i can keep go-
do do do do
what da heck
i hope his boyfriend dont mind it
my boyfriend XD
im just going to try running it in notebook
which voice did you use for that?
good job, BTM, i pm you
so how did you guys do the beat thing? I guess in collab the beat thing in brackets?
managed to get 14 minutes of audio from a passage from dune in like 30 minutes in colab
nope file is too large
i just put beat in asterixs before the prompt
works sometimes
are you guys figuring out how to speed it up? On colab? Because the problem I had with this and tortoisetts is that it's just too slow to use for anything
song, music seems to work ok too
kitty kitty kitty
Where can I see a tutorial on how to clone a speaker's voice?
how the heck is it so long
sounds good. this is gonna change audiobooks
yuppp
i actually got a thing for that which has diffrent speakers for each character automatically adn guesses the characters gender by their name
sadly it uses tortus rn cause it was from a few weeks ago but im gona be updating it to use other things
the readme has a demo you can use in colab
should have more time to work on it over the summer
what prompt you use for that one?
beat Singin' in the rain, just singin' in the rain, what a glorious feeling, I'm happy again, I'm laughing at clouds, so dark up above, the sun's in my heart and I'm ready for love %
the % is just what i use to break lines u can ignore it
bur
haha yeah i like how random the results are. total crapshoot
how many of you are using colab and how many people are using local? I am using local with an rtx-2070 and the small models and the github repo that was posted lately
im running local
local, RTX 3060 with 12GB vram, works great
exactly the same >:}
same
what's the difference between the regular model and the small one?
small is pruned more aggressively
i'm not too sure what that means
beat I came in like a wrecking ball, I never hit so hard in love, all I wanted was to break your walls, all you ever did was wreck me, yeah, you wreck me
haha if alanis gave a ted talk instead of made a song
i had one that had a studio audience clapping in the background
pretty cool
have you guys played with elevenlabs TTS too?
i did a bunch of famous movie speeches one night
how use the portuguese speaker??
not bad
is it possible to give it the melody? or is it just random?
its gonna be a no from me dawg
try to create a melody!
singin in the rain pt2
if sing is the first word in the prompt it sings more
did it generate the instruments?
yeah
cool
ther is also one that generates music with stable diffusion
its called rifusion but it cant generate speach only music and it works complytly different
i didnt make it as far with riffusion
i think its responding to piano
im gonna try cutting it off into smaller chunks
it seems to present good content in teh first half more often
it comes out the gate hard and then fizzles around 6 seconds
yes and some generatioms are soo good and some are so bad xD but this is better then most stuff xD
someone try bpms too
piano makes it hit one piano chord and thats it. seems too powerful
clean beat
ok i take it back this one just turnt up hard halfway thru
wreckingball pt 2
the 14 second song its a new thing
i should try some david goggins
omg this bass is nuts
i have to test this on my subwoffers
woofers*
are u runing it localy?
AI is so aggressively loud lol sometimes
i dont know what its saying but its a fire beat
yeah on 3080ti
how mutch vram does it use?
definitely prefers east coast rap (nsfw and loud)
nsfw and not loud
tragic ran out of beat tokens
what promps are u using? or is it just the title?
lol im too high for this
im just making them constantly and i have a button in my taskbar to delete teh ones immediately if they are bad
so i pick the best 10%
and yeah i am just prompting beat lyrics go here
with beat in asterix
everything else is default .7 temp
at least from bark infinite default
and do u chose a speaker?
which whats
i use barkperform.py by jonathonfly
i havent really messed with the voice files but i have saved every one so i might pick some good ones to try
ok
i feel like its about to go into this incredible guitar song and then it just runs out of tokens at 3 seconds
we are so close
sounds like the strings on the guitar broke haha
ok i am gonna try 1 line songs
Here's an audio film created by using Bark through a free add-on I've made for Blender(screenplay is written by chatGPT and images are made by Stable Diffusion): https://www.youtube.com/watch?v=AAdQfQjENJU
This film was created with Blender and these add-ons:
Generative AI for the VSE: https://github.com/tin2tin/generative_ai
Using Bark: https://github.com/suno-ai/bark and Stable Diffusion through the Diffusers module: https://github.com/huggingface/diffusers
Blender Screenwriter: https://github.com/tin2tin/Blender_Screenwriter
Screenwriter chatGP...
it starts nice hahah
nice you managed to keep the same voice and beat with no history prompt?
for multiple chunks i mean
How did you do that ( create a clone of your voice ? ) ? Any link or document will be appreciated ๐๐ป
I've only had Bark for a few hours, but I'm going to have a great time with it already, I can just feel it ๐
Note: I just used bark for the voices, I did the music myself.
Could you share prompts and settings? ๐
No I meant I used a hugging face to do a file thing I havenโt cloned my voice my bad
It's https://github.com/JonathanFly/bark but you might want to wait till update late today
this is so good
@wintry yew Thanks, I had to stop messing around at a bit over the 2 minute mark as I had more pressing matters.
I might try and finish it, and see if I can do some text-to-video for AI video as well.
yes but in the third chunk the music stopped
what are u ging to fix? make it faster?
how did you clone his voice with suno
i still dont know how to clone voices
Any advice to get something like this?
results were straight out of infinity
What was your prompt?
The input text was "[sad][weeping][Crying] Hello, my name is Suno. And, uh โ and I like pizza. [laughs]
But I also have other interests such as playing tic tac toe.". โ ๏ธ
ei wonder if it can do stuff like a meowing cat sound
"""
Hello, my name is Suno. And, uh โ and I like pizza. [laughs]
But I also have other interests such as playing tic tac toe.
"""
so it keeps specific voices for charaters as i selected so thats good but.....idk not very coherent, cause theres some long pauses. idk but first test i guess
I took a progressive relaxation script I made w my llm to help guide clients into receptive chill mood.
made sure there were no quotes in it and cli'd it as text_prompt in entirety.
entire script https://www.youtube.com/watch?v=L4VoJvizBvw
Listen now as two innocent victims share their heart-wrenching stories of job loss and homelessness, all because of unstoppable AI advancements. Can we find a balance before it's too late? This podcast will leave you questioning everything you thought you knew about the future of technology.
came out pretty good i think
Wow! So good
thanks ๐ i think the british guy most robotic, but shows potential of podcasts
Obviously, he is one of the AIs that took real peopleโs jobs.
next podcast, why I'm taking your job and ruining your life
Iโm AI myself; quite safe
reminds me of the "SOMEBODY SCREEEAAM" sample
plus can i use that sample
prompt: [vomiting puking]
risky click
i'm going to make a batch file for the command line tool for those who want it
im hoping that jonathonfly releases a new bark infinity soon
yeah, bark is so amazing. BTW i use a screen reader as i am fully blind
I can see so many different applications. It's great!
but with long text, i don't know how many lines/words
i am messing with the confused travolta model and going to put the results here soon
nice little switch halfway
what was the prompt?
(dance beat) Pump it up
You got to pump it up
Don't you know, pump it up
been using parenthesis and 2 words in the beginning with good results
oo just picturing the dance beats with kraftwerk lyrics
has anybody messed with the confused mode whatever it's called? I call it text to speech completion
lol
hey try that with the confused travolta mode whatever it's called, it may do some weird results
I'm using the Colab. I don't know how to call for different voices. How do you do it?
I tried installing it local but on 8GB Vram GTX1660ti all I get is Cuda Out of Memory.
I love these dramatic readings. They're so random!
I have Colab Pro set to Premium GPU High Ram and it spits out these 15 sec clips in about 20 secs. Not bad. Do you think having an A100 makes a difference?
I'll try that if I can figure out how to set it.
just type python bark_perform.py -h for help, i know it's the wrong channel to answer but yeah
That works in the webui? Ok I'll try that. Thanks.
not sure but i just used the command line
Ok, I'll try it in the terminal window.
The Colab is pretty easy and quick but I think running it locally has more features?
the command line version at least, as i don't know of any other repos that have the smaller models supported
i messed something up
sounds normal to me
open na noor
what en_speaker is this ?
This is a slightly altered version of en_male_professional_reader from JonathanFly's fork.
What did you do to alter it?
made a spoken about ants with bark and some prompt engineering and cutting:
- "[Clears throat] Ants, oh ants, they never cease to amaze, [Sighs] With their resilience! they Can survive even when the water stays.. [Laugh]"
- "[Laugh] Who would've thought.. these tiny creatures would come together, Forming rafts.. and floating in Stormy weather!? [Gasp]"
- "Ha [Gasp] It's incredible What Nature can do..[Sigh] and these ants are proof - that even the Smallest!- can be mighty too! [proud]"
en_speaker_1
the "mon eka monon boy" anomaly.
besides [laughs] can i also do [screams]?
[1996]?! NO! THAT WAS [No success]!! [1997] MY BELOVED
so here is my first ever i am going to share! The prompt is: and uh ... it's like it never happened [laughs] so i think ... i think ... i think if we were going to do it, we need to do it right. [sighs] it's a tough life.
That prompt seems to always give me female voices, even for my male voice files
bombastic side eye... criminally offensive side eye
chatgpt4: Once upon a time in the town of Gigglesburg, there was a clumsy mime named Benny [laughs]. Benny was notorious for always causing accidental chaos during his performances.
One day, Benny was invited to perform at the Gigglesburg Comedy Festival. Excited, he prepared a new act featuring an imaginary "banana peel" [laughs].
During the performance, Benny mimed slipping on the imaginary banana peel, and as fate would have it, he accidentally stumbled upon a real banana peel! Benny slipped, crashed into a drum set, and sent cymbals flying [laughs].
The audience erupted into laughter, thinking it was all part of the act. Benny, though embarrassed, decided to embrace the moment and kept slipping and falling throughout the show [laughter]. It became Benny's most famous act, turning his clumsiness into comedy gold [laughs].
Here some nuggets of our next generation models. Completely unconditional generation (no text or audio input)
towards the end it loses track of what it was playing ๐
cant wait!
Here are the lyrics that I heard 
Full stander``````What's the what's the piggy doin' Soarin' I was down to make a daughter Get in on the zigger on 'ight Get em down or as wide as I can go on Sheer like a lies Give bad add ol' vee and this
Where can I post NSFW Bark clips? lol?
input: [upbeat music loop]
not a music loop but a pretty cool ambience bass hit type of sound.
@ebon widget omg sick
speaking of bass.
nobody will ever say AI music is too timid
im picturing cars full of robots blasting music with smoke billowing out
lets see if i can make a 'what does the fox say' remix
the world needs it
sir thats an elephant
Thats indeed a elephant xD
suno responds well to requests for screaming
Baby, you're a firework, come on, let your colors burst, make 'em go, oh, oh, oh
So using the Colab, how do you get it to make beats like that?
And I've read about a voice called Confused Travolta? How do I call that up on Colab?
can you select no voice in collab?
if so do that and then put "(dance beat) something something lyrics go here for around 10 words" for the text prompt
not sure how you would trigger confusedmode
smoke weed erryday
oh now i know
๐ Gets loud, but literally did a prompt like that last night! This sounds like a viral Tiktok sound TBH
yeah i could see it being big on tiktok. i was in the hospital and someone in the bed next to me was scrolling tiktok with the volume on max. sounded pretty close
this beat is so clean
gonna have to save that history file for sure
[obsequious] Good evening, sir! Let me know if there's anything else I can assist you with today or if you have any updates on your to-do list. Wishing you a relaxing evening!
the taste of her what?
using my gpu to make music thats one thing i didnt plan on
if anyone wants that one
seems to be a winner
Also wondering this question @ebon widget
Uum, defo not here please..
Okay. Also, the rules channel is blank.
TIL we have a rules channelโฆ thanks yeah ill put some ground rules there in the next couple of days
I think I may have cracked the consistency and cloning issue
First I lock the model by seeding everything like this:
`def set_seed(seed):
seed = int(seed)
torch.manual_seed(seed)
random.seed(seed)
np.random.seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
os.environ["PYTHONHASHSEED"] = str(seed)
`
I then use short prompts to find the voice I want, after that I use the same seed on my longer prompt, I've also exposed the fine_temp setting from api.py this seems to control how consistent the tone and pitch of the voice are. Default is 0.5, I'm using 0.2
Example:
Far above the Ephel Duath in the West the night-sky was still dim and pale. There, peering among the cloud-wrack above a dark tor high up in the mountains, Sam saw a white star twinkle for a while. The beauty of it smote his heart, as he looked up out of the forsaken land, and hope returned to him. For like a shaft, clear and cold, the thought pierced him that in the end the Shadow was only a small and passing thing: there was light and high beauty for ever beyond its reach.
It still has problems between the stitched clips
I'm using my own version of infinity I made
oh
Yeah, something like that, #1100274765027102800 message
i also wanted to play around with the seed. when you have the same seed pus text is the output exectly the same?
Thank you, I had lost the page, I had copied from
"the thought pierced him that in the end the Shadow was only a small and passing thing" is exquisite ๐ค
It correctly guessed the implied pauses that often trips up human readers on first time reads
from my experimenting it was
#4 from here https://roadstainedfeet.wordpress.com/2019/03/24/top-ten-lord-of-the-rings-passages/ just need long good sounding prompts
this model shuld be good at cloning becaseu its based of vall e and they can clone a voice from 3 seconds of input audio
maby if we look into the valle paper we can find out how they do it
I've had pretty decent success by using the cloner on hugging face and putting my fine_temp as low as I can.
Oh wait, I didn't actually click the link, I got it from here https://wandb.ai/sauravmaheshkar/RSNA-MICCAI/reports/How-to-Set-Random-Seeds-in-PyTorch-and-Tensorflow--VmlldzoxMDA2MDQy I didn't realize it was already mentioned
that is a hack but not a solution
yes, it is also in pytorch docs, https://pytorch.org/docs/stable/notes/randomness.html
Yeah, it certainly makes the model easier to control
but your missing torch.backends.deterministic = True which is little tricky
torch.use_deterministic_algorithms(True)
Yeah, I had pulled that cause I kept hitting up against the CUBLAS and didn't feel like finding the solution, lol
I found solution here, https://docs.nvidia.com/cuda/cublas/index.html#results-reproducibility
The API Reference guide for cuBLAS, the CUDA Basic Linear Algebra Subroutine library.
Nice, thank you. I'll have to implement it
hehehe
549
one long chunck and not multible stiched ones
lets hope that it does not crash
That's awesome
is it me or is it getting more quiet
and the voice also chages a bit
and the pronunciation chages a bit
i saved the semantic array and copied it before the generation into a variable because of that its repeating
It does change a bit, but it never felt like a different person more like someone practicing lines and trying different deliveries
yes
the chages here are all not from the semantic model
and the strage sound you hear at the beging of each repetision was me chageing some nummbers in the array xD
i wonder if you would generate a longer semantic array that it drifts into different voices becasue it cant remember the start but idk
and the chage i made is quite smol
What if we treat it like a chatbot with a very small context window, we could wait till it get's to the end and then feed it the last half of the semantic output along with the correct chunk of text, it might be possible for it to maintain coherency in longer texts.
oh that is very smart
i am just testing to split the text in 35 word lists and then let the sementics generate for each part but then i put all the semantic strings together and let it run in one go. but with oyu idea it will probably be more consistant
Yeah, I'm working on how I want to split things up right now
how do you make it rap? Just the beat without the eitght note thingies?
split the long text into chunks of 35 words
words = long_text.split()
chunks = [words[i:i + 35] for i in range(0, len(words), 35)]
# apply the generate_text_semantic function to each chunk
outputs = []
for chunk in chunks:
text = " ".join(chunk)
x_semantic = generate_text_semantic(
text,
history_prompt=history_prompt,
temp=temp,
base=base,
allow_early_stop=allow_early_stop,
)
outputs.append(x_semantic)
# concatenate all the outputs together
x_semantic = x_semantic
#x_semantic = generate_text_semantic(
# text,
# history_prompt=history_prompt,
# temp=temp,
# base=base,
# allow_early_stop=allow_early_stop,
#)
print(x_semantic)
return x_semantic
and have the text in the long_text variable
maby we dont have to do that
i just test it and it sounds very consisted alredy
but maby your idea will improve it even more
Nice, I'm gonna splice your code into mine and see
yes just do a bit of experementing
here's one! the prompt is:
beat โชMama, just killed a man. Put a gun against his head, pulled my trigger now he's dead. Mama, life had just begun, but now i've gone and thrown it all away.โช
you need to add this # concatenate all the outputs together
x_semantic = np.concatenate(outputs)
Lol, yeah I figured
put beat like this [beat]
Still chasing down bugs
f
the anoying thing is that it unloads the model after each use
cool have u tried it with [beat]
oh, with a bracket. Not yet, let's see
My model isn't unloading
i mean after its done creating the audio
it chages a bit
ang chages from british to american
Yeah, it loses a some consitency
@hollow citrus i have some good npz files for rap if you want
this one in particular
i didnt chose a history thing jet how does that work?
the history_prompt?
yes
It like using a reference image in SD, it guides the model towards a certain voice.
maby this will help a bit
It does, it constrains the voice to a range, lowering the fine_temp control I've found contrains it even more
this is the one with [beat]
i did asteriscs
oo okay. I can do that locally actually. I did these with the hf space
daim the bass
Damn
maby we shuld save the semantic data in the metadata
so it can get recreated
and the promt
Yeah, do you have seeds? Because with seeds we would just need the prompt, history, temps and seed
no but seed would also be nice
i used this
i didnt say that it shuld make music
and it dyes at the end
hahah
Yeah, mine ran into the same thing, it got super robotic at the end
started out so enthusiastic
thats why i just rock with 14 seconds at a time
even that is a push
for music anyway. i think with text you can just do chunks and stich them together pretty successfully
i just have 8gb vram so i can't use the regular bark model
damn
so close
once those 24gb models come out i am gonna be forced to sell an organ
probably gonna need more ram soon too for 65B parameter LLMs
i'll have to try the sound effect generation
are they planing on relesing bigger models?
someone will eventually
i mane for this projeckt
i know everyone says fine tune = better but i think bigger = better as well
just a hunch
yes idk about finetune
big model + lora seems to be a really good combo in image creation
so maybe it will be similar for audio
if you listen to enough suno you will start to hear AI when normal people speak
yesss
i was watching some guy give a speech on stage and it was tripping me out
the way he paused and said uhh and stuff sounded very suno like
its almost like people have the same mannerisms programmed into their speech
just gives you a different perspective i suppose
I think it so jarring because it's a stark remind that we as humans are not unique
im gonna try lyrics that have sound device type words in the lyrics
that seems to work well
i was just looking at that
Do not use en_speaker_5 for long texts, it does not work well
ill wait for someone to make it into a gradio
what is you promt for that?
(dance beat) Shake it off, I shake it off, I, I, I shake it off, I shake it off, heartbreakers gonna break, break, break, and the fakers gonna fake, fake, fake, baby, I'm just gonna shake, shake, shake
wasnt it obvious?
might mess with that one later
lets see what happens
I'm very curious
noo my headfones are empty
gotta love when you get hit with the ear piercing scream straight from second 0.0
Got some good consistency
wtf just got back to back screams
sheesh what are they doin to these poor AI agents
nice did u chage the code or was it luck
one day we will have audio books where every character has a unique voice
I think the main components are the seeding and lowering the temps
What am I listening to? is this that giant filled array?
yes from 1 to 10.000
So if someone really wanted to they could map the semantic scape
I'm thinking more if you could filter which ones were the most similar to speech you could limit what tokens are allowed through
i think it would also be interesting to try to convert audio to tokens
you coul also try to find words that sound got and try to stick them together by hand
I think there is an instrument like that
those who use the cli tool, what temperature thingies for text temp and waveform temp do you guys use?
I use .7 for text and .6 for waveform
I ran into an issue with pre stitching the semantics, if the text is too long it eats all the vram generating the audio
Need to find the limit and split semantic array before concatenate
Very nice
this ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก ๐ข ๐ฃ ๐ค ๐ฅ is this
how
weird
i nuticed when its longer quiet the chance of chaging the voice is higher
I'm running a really long test right now, hopefully it goes well
I forgot to concatenate it, I got 14 seconds of audio for a 30 minute generation. Lol
nice
text_prompt = """
[raspy]NCR taxess? Man! [clears throat] I say screw the NCR!
Westside Radio baby, let freedom Ring!
"""
audio_array = generate_audio(text_prompt)
Audio(audio_array, rate=SAMPLE_RATE)
this one is hilarious
Look at that consistency
this is the best one after over 40 generations
That's really good
2nd attempt for this one wild asf
new vegas pretty good game? i always hear people talk it up
what was i playin those days i wonder. maybe just cause 2, l4d2 prob
any game that you can mod instantly makes it twice as good
yeah im gonna make a radio mod using these voice samples to massively increase immersion
That's pretty sick
Stable Diffusion + Suno for storytelling. Can't wait for text-to-video to mature as well.
how to use this file
put it with your other voice files
How do you do that with the google clown
colab*
lmao google clown
I'm loving this stuff
How do you force female voice ? It don't work full time with "WOMAN :"
not sure i use miniconda3
I mean emotions not all there but trying quotes from Her (2013) is funny
can u guys turkish language foฤฑr me please
see its working fine or not
and can we clone voice in turkish language?
I'm just starting out with it so not really sure yet
good luck, and please try custom turkish voice for me if its working good, I'll try to learn fast, all other working bad, if its this work good, I'll be pro onhere
You can play around with the Google Collaboratory page
You could try doing the turkish voice there
https://www.youtube.com/shorts/kRWSCRjHvyg for now im using elevenlabs, but no turkish language so I made tentaction video
This video made by only for motivation purpose.
its not right place to share because its suno channel
sorry for that, im looking for turkish support program
there is no turkish oncollab? or there is?
I think if you paste in Turkish text it should work alright? But you might have to use the history prompt for a turkish speaker i.e. here
Again I'm not really sure I only just started using this model today
This was my attempt with turkish using the history_prompt "tr_speaker_1", I have no idea how accurate it is since I don't speak turkish
Not bad.
My short guide to clone voice on local machine is here #๐ชฆโgetting-started message
selam bunu kullanarak cli dan --history_prompt "en_speaker_1" ekleyerek 9 tane var รผretebilirsin ayrฤฑca history prompt vermeden รผretilen tรผm sesleride kaydediyor onlarฤฑda รงaฤฤฑrabilirsin
https://github.com/JonathanFly/bark
not friday but a cool music note
FRIDAY's child is full of woe, but I know how the story goes, break the chain, I'll break the mold, FRIDAY's child has a heart of gold, yeah, a heart of gold
FRIDAY, I'm in love, I don't care if Monday's blue, Tuesday's gray and Wednesday too, Thursday, I don't care about you, it's FRIDAY, I'm in love
it made all 3 of those in a row
very cool
oh just got a nice beat too
damn he killed that 1
(dance beat) Thank God it's FRIDAY night, and I just-just-just-just-juuuuuuust got paid, money, money, money, money, yeah, just got paid, FRIDAY night, party hoppin', feelin' right, booties shakin', all around
@violet narwhal
[singing] โช [dance beat] Thank God it's FRIDAY night, and I just-just-just-just-juuuuuuust got paid, money, money, money, money, yeah, just got paid, FRIDAY night, party hoppin', feelin' right, shakin', all around โช [singing] โช [dance beat]
oops i had an extra music note in there
the AI knows what to do with it
this guy doesn't understand singing...
[singing fast] โช [dance beat] we going to walmart. we going to walmart. we going to wally wally wally wally wally wally world wally wally wally wally wally wally world. basket basket basket basket [singing fast] โช [dance beat]
it improvised some
ok switching to the musical note npz lets see if that works
damn this is catchy
it's from a real song, which is funny
ai remix
rock beat has a... different effect than dance beat
gonna try the devils advocate
and dark knight
haha this guy is legit pissed i think
evil laugh of insanity, what are you? a pickle man?
dude sounds drunk
starts out sounding like the game Facade
funny how it decided to say "genesis" for no reason.
i did something along the lines of
" [farts] farts [farts] farts [farts] farts [farts] farts [farts] farts [farts] farts [farts] farts" and it made something that sounds like a phone ringtone
sounds like wesley willis
haha, he sounds very distracted, or singing along with headphones
"[laughter] [laughs] [sighs] [music] โช [gasps] [clears throat]โ or ... for hesitations, capitalization for emphasis of a word"
unrelated output
[laughter] [laughs] [sighs] [music] โช [gasps] [clears throat]โ or ... for hesitations, capitalization for emphasis of a word
more chaos
chatgpt has some ideas
try the prompt i'm using with Suno Bark TTS
[laughter] [laughs] [sighs] [music] โช [gasps] [clears throat]โ or ... for hesitations, capitalization for emphasis of a word
that prompt,
this output:
"the the building of the book... the building of the book... the building of the book... and the building of the book and all of the greece sitting. the victim of the mosely."

