#🐣┃suno-showcase | Suno | Page 3

crisp bone May 26, 2023, 4:34 PM

#

The first part scared the shit hell out of me

clever turret May 26, 2023, 6:18 PM

#

holy ☠️

hazy rain May 26, 2023, 10:11 PM

#

tardy topaz May 26, 2023, 10:18 PM

#

Did you filter out the brackets, she keeps replacing them with () instead. 😄

#

BTW I think Bark could be really cool for livestream that needs constantly changing voice. Even create new voice on the spot. That's a pretty unique capability. So a whole script, with new characters, each gets unique voice for each script.

#

Instead of single speaker.

#

(for clarity, I would cheat a bit, and make the random voices variants of already clear voices...)

hazy rain May 26, 2023, 10:24 PM

#

This is really cool!! Is it like footage loops with some Wav2Lip?

tardy topaz May 26, 2023, 10:29 PM

#

It responds impressively fast, for Bark anyway.

hazy rain May 26, 2023, 10:30 PM

#

It's overall very cool!

#

of course there is already a troll 😅

tardy topaz May 26, 2023, 10:31 PM

#

That's me

hazy rain May 26, 2023, 10:31 PM

#

30cm?

tardy topaz May 26, 2023, 10:31 PM

#

Oh no, nvm, I am same name

#

like the image I posted, probably before you

#

I tried to get her say brackets for awhile

hazy rain May 26, 2023, 10:32 PM

#

Yeah I though so, no someone that came just as I was testing it...

#

classic stuff

hazy rain May 26, 2023, 10:32 PM

#

tardy topaz I tried to get her say brackets for awhile

I saw I tried the song 😅

#

But yes it's super fast

tardy topaz May 26, 2023, 10:33 PM

#

It must be a small model?

#

Or maybe 4090

hazy rain May 26, 2023, 10:33 PM

#

A lot happen in LLM lately I haven't followed

#

but for bark yes it must use the small ones

tardy topaz May 26, 2023, 10:34 PM

#

I tested a friend's 4090 and pretty sure even that didn't return 14s in 14s

#

on large models

#

But it wasn't pytorch nightly...

hazy rain May 26, 2023, 10:35 PM

#

tardy topaz I tested a friend's 4090 and pretty sure even that didn't return 14s in 14s

There is a huge diff between 3090 and 4090 ?

tardy topaz May 26, 2023, 10:35 PM

#

I think it maybe 40% I forget tnow

#

not double at all

hazy rain May 26, 2023, 10:35 PM

#

Wow still much more than I thought

tardy topaz May 26, 2023, 10:36 PM

#

Yeah I am still jealous.

hazy rain May 26, 2023, 10:37 PM

#

But wait... in a very short time it does: Some diffusion image used as presentation, language processing, wav2lip and bark....

#

Must be multiple stuff in parrallel

tardy topaz May 26, 2023, 10:37 PM

#

the llm is probably an api

#

really just wav2lip and bark

#

or whatever the new hot wav2lip is

hazy rain May 26, 2023, 10:40 PM

#

Yes it must be gpt3.5 it just did the "as a language model"

#

I'm still baffled by the speed of progress

#

it uses diffusion!

#

#

tardy topaz May 26, 2023, 10:44 PM

#

I was playing with an 'infinite live tv show' on twitch. It used diffusion for the video, in a really hacky way. You generate a scene of the characters, which coudl be anything. Walking down a street in paris. Then you use some extensions to split it up and make the characters different looking, multidiffusion I think it was. (multiple people in SD tend to blend, like merge faces, if you ask for say three different people normally) and then you rapidly generate 'variations' which if you play quickly. Well it looks like puppets talking. But it was so not realtime, like I would have had to have three 3090s.

hazy rain May 26, 2023, 10:46 PM

#

I just ask it that and it told me "yes it's some of the tools I use"

tardy topaz May 26, 2023, 10:46 PM

#

At the time I had a burning motivation because "infinite Seinfeld" was blowing up and I thought it was just awful, and wasting the format. But when that I died I didn't have the urgency.

hazy rain May 26, 2023, 10:47 PM

#

tardy topaz I was playing with an 'infinite live tv show' on twitch. It used diffusion for t...

Far from realtime but it got a 2x speedup a few days ago with the nvidia driver update!

#

I'm experimenting a lot with SD for 3D CG (some tests in the readme here https://github.com/melMass/MLOPs-stage)

tardy topaz May 26, 2023, 10:49 PM

#

One thing I was trying to get working, was the depth mapping. You know how a sitcom is a multicamera thing? Like it's a single line of cameras. So all the angles are straight, a little left, and a little right. And you can depth map the output of SD, if you seen that. 3d-in-painting extension or similar name. And then pan left and right, like a sitcom! But OMFG so slow

#

If somebody gives me 10 4090s. I could make a hell of a sitcom stream. Just throwing that out there.

hazy rain May 26, 2023, 10:51 PM

#

Sounds intriguing I know 3d inpaint but I haven't tried much animation besides existing extensions for automatic

#

(deforum and temporal kit only actually)

tardy topaz May 26, 2023, 10:52 PM

#

I think it might be okay with deforum. I didn't explore it much.

#

But the images change more

#

At least then, I know there's been new developements.

hazy rain May 26, 2023, 10:52 PM

#

Yeah now it supports controlnet which yield to much more control

tardy topaz May 26, 2023, 10:52 PM

#

But you can basically 'pan left, pan right' it just (at the time) looks like a crazy dream image

#

I had worked so hard just trying to get the characters to talk like puppets, without their shirts changing colors constantly. I was like OH NO I can't throw that away.

hazy rain May 26, 2023, 11:08 PM

#

It started well 😅

hazy rain May 26, 2023, 11:09 PM

#

tardy topaz I had worked so hard just trying to get the characters to talk like puppets, wit...

Yeah controlnet avoids that and more, but there isn't yet a workflow that wraps it all yet

tardy topaz May 26, 2023, 11:26 PM

#

hazy rain It started well 😅

Haha, actually many voices (or voices that sounds like a podcast) seem to switch a lot. It's like modeling the changing of speakers in Bark I guess. Lots of random voices. The worst is cartoons. Try making a cartoon voice that doesn't change randomly sometimes. Also I swear Bark says that 'I Like pizza' in that style all the time, I know that! Weird. I actually never really checked, do any speaker prompts that switched voices very consistently, do they ever switch to the same voice, at a somewhat regular time like the end of a sentence or something? So you could maybe use one speaker and just render out a conversation in Bark, with Bark doing both sides. You don't split it at all. Doesn't seem like it should work. But stranger things have.

tardy topaz May 26, 2023, 11:27 PM

#

hazy rain I'm experimenting a lot with SD for 3D CG (some tests in the readme here <https:...

Cool projects, followed

hazy rain May 26, 2023, 11:27 PM

#

Ohhh I had saw your GauGan experiment!!

tardy topaz May 26, 2023, 11:28 PM

#

Haha, I think I burned 10 years of GPU lifetime against the NVIDIA GauGan, they never limited it.

#

I literally put entire movies through it.

#

Frame by frame. I had this ridiculous browser based segmentation model using processing.js just thrown together, as a preprocessing before the main one.

#

That's another project that was weirdly limited. It was like 'draw a tree' but actually had 100+ other categories

#

My Discord Icon? That's a gaugan person. When the model has no people. But the category for people was in the GauGAN model, a little bit, accidentally. And it would render people that looked like this. It was enchanting and spooky. Delightful.

#

It even has a mouth!

#

Later NVIDIA blocked all those categories, a sad day

#

Pretty sure they eventually came back, but SD kind of made it feel old fashioned

hazy rain May 27, 2023, 1:17 AM

#

I had no idea you could use it in the cloud back then! I showed your experiment around at the time quite a lot!

#

Testing ChatGPT with the input "Write a mid size monologue from an A.I that would imitate Joe Rogan's voice, make it light and fun" and the rogan voice (some long parts work really well):

#

tardy topaz May 27, 2023, 1:39 AM

#

hazy rain

Sounds pretty good but you need to run a speedup on that rogan. All the clones are too slow for me by default. Just run a bunch of long prompts and resave, try to find one that doesn't change voice, but talks faster.

#

Unless that's the goal, a robot Rogan?

hazy rain May 27, 2023, 1:41 AM

#

tardy topaz Sounds pretty good but you need to run a speedup on that rogan. All the clones a...

I need to bring back mylo's implementation, I just made a basic CLI based on the notebook you shared earlier and the output of voices is deterministic there for some reason

tardy topaz May 27, 2023, 1:41 AM

#

I think it is deterministic. That's partly why I cut like random.

#

I even added noise and stuff at one point, just get different results, but I think I pulled it

#

Or at least the large model. Maybe small wasn't, the first tone.

hazy rain May 27, 2023, 1:42 AM

#

📎 clone_cli.py

#

But inference is so I run inference multiple times

tardy topaz May 27, 2023, 1:44 AM

#

I don't recall if it's always a match but I have gotten exact same clones, with same audio. But Bark Infinity spits out clones like a fountain so doesn't really matter, just every few seconds in the audio.

#

That's my secret. I'm just spamming out the clones. There's good in there somewhere

hazy rain May 27, 2023, 1:46 AM

#

tardy topaz That's my secret. I'm just spamming out the clones. There's good in there somewh...

So you are spamming voice clones AND inference? Because the later too is non deterministic, I've had the same voice produce garbage or great results so now I mostly focus on the later

tardy topaz May 27, 2023, 1:46 AM

#

It's like, generate 10 clones, from each, render one sample. That's the default. The last clone, the one you get if you do 1, is often the best.

hazy rain May 27, 2023, 1:46 AM

#

(and I think it likes 10 sec inputs better for cloning)

tardy topaz May 27, 2023, 1:46 AM

#

Yeah you don't need 10 clones for 10 seconds.

#

But for 1 minutet?

hazy rain May 27, 2023, 1:47 AM

#

tardy topaz It's like, generate 10 clones, from each, render one sample. That's the default....

But then the cherry picked one can infere badly too right?

tardy topaz May 27, 2023, 1:47 AM

#

Yeah totally, it's really just go get a snack, come back, check the samples

#

Maybe some inferred wel

#

Then check again

hazy rain May 27, 2023, 1:48 AM

#

Yep I do that for the audio generation but not much for the voices (I tried initially), but I'm very much starting all this you have way more experience

tardy topaz May 27, 2023, 1:49 AM

#

My experience is mostly pre cloning, from tweaking voices. But weirdly it's still useful. Because the clones don't come out the oven quite right.

#

However I am still trying to figure out how to put some in Bark Infinity, as a feature that just works.

#

Mainly how to edit the speaker files

#

So for example, you can grab the rogan voice from seconds 12 to 18, or or whatever. (choosing random numbers)

#

because you can hear that part is good

#

So you can snatch that rogan up

#

I learned today you CAN do this in Gradio. There is an audio tool to pick a section of audio.

#

So it should be doable

#

I don't know if that makes sense. But for that rogan clip, if you save all the npzs. There might a section that just sounds really good. Well, you can carefully try to build a new npz of just that, by selecting the right spot.

#

It's almost so simple it's weird, right? Just like, fine the best rogan part, and make that a voice. But really it's that simple thinking that works in Bark

#

So today I learned I can make this possible in the UI

hazy rain May 27, 2023, 1:54 AM

#

That would be awesome I was wondering how to edit/visualize it better to enhance it.
One idea I want to try was lerping good voices from batch on mylo's ui or yours that are both non deterministic

tardy topaz May 27, 2023, 1:54 AM

#

Yeah it's still a little awkward, but I'll try.

#

Like that bar, you can't see anything. But if hit the play button it starts at that spot.

#

So it's trial and error but you can find the good spots

#

Audio editing is a really bad fit in Gradio but it's kind of what you need.

#

But you aren't editing the wav file, or not always. You edit the NPZ generation.

#

ANd maybe just semantic or coarse.

#

Though that's maybe too advanced and fancy.

#

I will just use the wav file as input, so I know where the user thinks the best voice is.

#

I may be getting more ambitious than my software dev skills. But it's really fun.

#

Just poking and fiddling with each voice.

#

I mean I got stuck today trying to fix some bad filename bug for 3 hours, so yeah, probably getting too ambitious. But it's a good idea.

hazy rain May 27, 2023, 2:00 AM

#

bark infinity is super cool, I couldn't make 'batch" work even after ticking the warning so I started the CLI to both clone and infer.
I will try to decipher the logic behind yours and mylo's clone to understand why mine is deterministic and I'll test further

tardy topaz May 27, 2023, 2:00 AM

#

For batch, can you describe what you were doing?

#

I just want to know if it was a bug, which it could be

hazy rain May 27, 2023, 2:00 AM

#

The cursor was showing a red forbiden sign and I coul'nt bump it up

#

(in the voice cloning tab)

#

the main one worked

tardy topaz May 27, 2023, 2:01 AM

#

what were you batching?

#

Voice clones, or samples?

hazy rain May 27, 2023, 2:01 AM

#

What we are saying 1 wav input -> mutliple clones (npz)

#

Regarding your audio editing tool, you can also replace the player completely in html/js to fit better

tardy topaz May 27, 2023, 2:02 AM

#

I could use a tool if somebody wrote one, but myself, probably not do it. But yeah it could work and I did search a little didn't find anything. But maybe already exists somewhere?

#

Okay so by batch, you want upload one wav.

#

That is 10 seconds ish

#

and get different clones

#

By processing it over and over

hazy rain May 27, 2023, 2:03 AM

#

Yep to then cherry pick the best!

#

(for now I'm doing this on the inference in my cli because my clone is deterministic)

tardy topaz May 27, 2023, 2:03 AM

#

I pulled that feature when it was deterministic, though I had for while running a audio process to make the wav slightly different.

hazy rain May 27, 2023, 2:03 AM

#

Oh yours is too!

tardy topaz May 27, 2023, 2:04 AM

#

However, I'm not sure you are getting better results

hazy rain May 27, 2023, 2:04 AM

#

I did not know!

#

Mylo's is not (audio webui)

tardy topaz May 27, 2023, 2:04 AM

#

Is your CLI really not deterministic?

#

when you clone?

hazy rain May 27, 2023, 2:04 AM

#

No I'm saying the opposite 🤣

#

Inference -> non deterministic
Voice clones -> deterministic

#

Mylo's audio webui aren't

#

you give the same input wav and each generation create a different size of npz

tardy topaz May 27, 2023, 2:05 AM

#

I think it is, but what happens is the gradio UI does some conversions sometimes

#

I tried to figure this out

#

and I'm pretty sure it is deterministic, if you give it exact same wav

#

But sometimes there's some audio conversion that adds rng to it

hazy rain May 27, 2023, 2:06 AM

#

mine (I compared in numpy but they even are exactly the same size on disk) produce the same stuff

tardy topaz May 27, 2023, 2:06 AM

#

I am not sure if that is useful, have you found it helpful?

hazy rain May 27, 2023, 2:06 AM

#

tardy topaz and I'm pretty sure it is deterministic, if you give it exact same wav

Maybe it's a bug then but try the same input and then save and reclick generate you'll see a different size

tardy topaz May 27, 2023, 2:06 AM

#

Like the same wav is really good a second time?

hazy rain May 27, 2023, 2:07 AM

#

I'm lost I thought that's what you are advocating for

#

(multiple voice clones)

tardy topaz May 27, 2023, 2:07 AM

#

Haha, it's confusing.

hazy rain May 27, 2023, 2:07 AM

#

I personally abandonned that path and only batch the inference (text to speech from the same npz)

#

and compare that

#

I think you can test the cli in your bark infinity env

tardy topaz May 27, 2023, 2:08 AM

#

In Mylo's UI, you can use a wav file two ways. One way is pure speaker. That puts a file in /clones_voices directory. That is deterministic.

#

The other way, the other field, is using the audio file as a prompt. So you get a crazy variety of voices. But it sometimes works. That is the speaker.npz file.

#

so if the name of the .npz is the name of the wav, that its 100% the same I think. The other one is kind of ignoring part of the file, it's more for creative uses.

#

But actually sometimes makes a decent clone by luck

#

I'm trying to figure out how to make that stuff clear myself

#

but basically, if name of npz = name of wav, perfect clone. if not, it's more like a creative sample based on the audio

hazy rain May 27, 2023, 2:11 AM

#

I would have to check but when I trailed the path to the methods I could not really understand all the slicing etc and it's definitely more complicated then the one I used or from the notebook

#

But I was only referring to voice cloning

#

Which I think there is only one way no?

tardy topaz May 27, 2023, 2:12 AM

#

So in my UI it says like, "Use audio file instead of text prompt" (doesn't actually work right now)

#

Mylo has that feature, it's the lower audio box

#

IIRC

#

the top one is pure clone. Honestly I would need to double check

hazy rain May 27, 2023, 2:13 AM

#

I have my venv around let me check

#

#

this "speaker from" which generate a voice

tardy topaz May 27, 2023, 2:16 AM

#

You know I can't quite remember on a late friday, just that directory, I think data\bark_custom_speakers is the pure clones

hazy rain May 27, 2023, 2:16 AM

#

That's what mylo told me to do

#

📎 rogan_01.npz

#

📎 rogan_02.npz

tardy topaz May 27, 2023, 2:17 AM

#

I'll check in a bit, just not at main computer kind of awkward

hazy rain May 27, 2023, 2:17 AM

#

📎 rogan_03.npz

#

Take your time, just sharing as I think it can be useful, you can see the size offset I was mentionning, this is just pressing generate consecutively

tardy topaz May 27, 2023, 2:20 AM

#

Yeah I think that's fine. I think that does a perfect clone. But then it uses the audio file again, as a prompt.

hazy rain May 27, 2023, 2:20 AM

#

I actually even opened a issue this morning about that: https://github.com/gitmylo/audio-webui/issues/7

tardy topaz May 27, 2023, 2:20 AM

#

So you should have two .npz. one in the dir, and one speaker.npz

#

It's roughly like cloning deterministically and then using the clone to generate.

#

That's basically the main process, in general

hazy rain May 27, 2023, 2:22 AM

#

Why two? That's what I don't understand, it doesn't autosave but autoloads:

tardy topaz May 27, 2023, 2:22 AM

#

the one in the directory, that one probably the same every time. But the other one is using a generation and saving again. so it different every time. And sometimes way better.

#

because it's a real bark voice

#

well real bark audio sample

hazy rain May 27, 2023, 2:24 AM

#

I don't really understand what you mean. There is no default npz

tardy topaz May 27, 2023, 2:24 AM

#

I will have to actually check. So you don't get a speaker.npz file?

#

just the one with the name?

hazy rain May 27, 2023, 2:25 AM

#

You only get a speaker.npz file but they aren't autosave, just hyperlinked in gradio for you to save in data/bark_custom_speakers

tardy topaz May 27, 2023, 2:25 AM

#

I think you are doing it the right away. You are basically cloning, then generating. So it's different every time with good variety

hazy rain May 27, 2023, 2:26 AM

#

Yep mylo's ui does it but I think it's a new thing since all the methods are called "'new" 😅

tardy topaz May 27, 2023, 2:27 AM

#

So it is deterministic, but it doesn't matter. Because it's way different from a generation.

#

It kind of just skips the deterministic part.

#

If you find some little workflow that works, let me know I haven't really tried much, just doing other cleanup

hazy rain May 27, 2023, 2:29 AM

#

And for completeness this is the one generated with the cli:

📎 rogan.npz

tardy topaz May 27, 2023, 2:29 AM

#

I think basically it's just like using that .npz, with your text, and then saving.

#

Which is the so far tried and true method.

hazy rain May 27, 2023, 2:30 AM

#

Hmm I think you are right now I understand what you meant

tardy topaz May 27, 2023, 2:30 AM

#

So you can use the CLI and just do the text yourself, and resave, should be same.

hazy rain May 27, 2023, 2:30 AM

#

4kb

tardy topaz May 27, 2023, 2:30 AM

#

Right. That is just 'hello

#

so it's nothing

#

You are cloning normally, same as CLI, same every time. Then you make clone speak. Then you save.

#

That is the best way.

hazy rain May 27, 2023, 2:31 AM

#

Yep I did not get why you correlated the two... they are in this UI

tardy topaz May 27, 2023, 2:32 AM

#

So you can batch it now, just generate samples and save final .npz

hazy rain May 27, 2023, 2:32 AM

#

tardy topaz You are cloning normally, same as CLI, same every time. Then you make clone spea...

This is what I settled for and I'm getting good results

tardy topaz May 27, 2023, 2:32 AM

#

with thte clone

hazy rain May 27, 2023, 2:32 AM

#

but only using the same full voice clone from the cli I don't edit npz

#

like the 5mn of joe rogan was first try

tardy topaz May 27, 2023, 2:33 AM

#

Yeah that's a nice feature, I will try to make it

#

I think it IS in Bark Infinityt

#

but just very crude

#

It is just chopping 5 seconds

#

always

#

first 5 or something

#

It's not a user feature

#

But if you generate in the clonining, it saves twice. And one is more in the earlier part of clip

#

However if you just pick a number, that is very often a bad spot

#

You really need the user to listen

hazy rain May 27, 2023, 2:36 AM

#

This is no?

tardy topaz May 27, 2023, 2:36 AM

#

That's only for regular samples

#

Well actually

hazy rain May 27, 2023, 2:36 AM

#

I was trying there

#

But got the cursor thing

tardy topaz May 27, 2023, 2:37 AM

#

Don't hit that button, Ithink

#

Just put a text prompt in

#

Like your rogan quote

#

Try setting repeat, honest to got I can't remember if that works

#

in the cloning

#

uncheck the 'just give me box' that is not really good. I mean it's weird. so try if you want. But it's slow

#

a failed experiment really

#

I should make this way less about chopping up. It's not great for short audio, like this case

hazy rain May 27, 2023, 2:39 AM

#

I can't capture it but what I meant earlier is that ticking or not "give me more clone" doesn't allow me to edit the slide, I get a red forbiden sign as a cursor

tardy topaz May 27, 2023, 2:39 AM

#

Oh hmnn

#

I can check in 20 minutes,

#

Probably bugged

#

But if you can, maybe restart?

#

and don't check it or leave unchecked

hazy rain May 27, 2023, 2:40 AM

#

No emergency! It's 4am here, I'll soon sleep

hazy rain May 27, 2023, 2:40 AM

#

tardy topaz But if you can, maybe restart?

I just did now to make the screenshots but I had tried earlier

tardy topaz May 27, 2023, 2:40 AM

#

Just the 'Create an audio sample for each created clone at the end, using the Main Text Prompt'

hazy rain May 27, 2023, 2:41 AM

#

and then just did the cli quickly

tardy topaz May 27, 2023, 2:41 AM

#

If you have a second rogan clip you can try the second audio thing

#

I'm only 80% sure it doesn't help. haha

hazy rain May 27, 2023, 2:42 AM

#

Ahah yes I mean to ask about that! I'll try, but I think I understand better what input yield to better results

tardy topaz May 27, 2023, 2:43 AM

#

I added many things that didn't really help, but I wanted to try. And left in for now. Will replace with new things I try.

#

I mean they might help I honestly didn't have testing time. But if so, hit or miss.

#

The second audio sample uses that audio, as the prompt. Instead of your text that you type in.

#

It seemed like that might be better but so far, not

#

So it makes Rogan, or whoever, say the things in that audio

#

I'll be back in 15, at desktop, if you're still awake I can check whatever.

hazy rain May 27, 2023, 2:46 AM

#

Oh nice feature so speech to speech?

tardy topaz May 27, 2023, 2:46 AM

#

Yeah

hazy rain May 27, 2023, 2:47 AM

#

I started to try voice cloning from french sample but the voice always deviate to english for some reason

tardy topaz May 27, 2023, 2:49 AM

#

Needs french training. Oh I sent someone a clip, so I can download from discord and post. Just funny to run long audio through as second sample, voice just roams all over.

#

So the og voice, gone quick, if you just keep playing because this is way too long. and it is replaying the audio samples with RNG voices that slowly morph.

#

To do it right you can't use a long clip but it's amusing

#

(for speech to speech, you could only really get the first 5 to 12 seconds at best. in this case just to try it, it keeps going, and the voices are completely lost from whatever the speaker was quickly)

dusky siren May 27, 2023, 3:40 AM

#

Hey all, we're considering using suno for a content creation platform we're building.

Are there any restrictions to using it for a commercial platform that features TTS + voice cloning, and charges users?

( i can share more details in private, if needed )

cc: @lyric steeple

lofty flint May 27, 2023, 7:37 AM

#

tardy topaz really just wav2lip and bark

yes you are right, just wav2lip , also stable diffusion and chatgpt3.5 api

lofty flint May 27, 2023, 7:37 AM

#

hazy rain This is really cool!! Is it like footage loops with some Wav2Lip?

yes , it is just wav2lip

lofty flint May 27, 2023, 7:38 AM

#

tardy topaz Did you filter out the brackets, she keeps replacing them with () instead. 😄

haha, the backend is just chatgpt3.5 api, so i have not too much control over the output 😄

lofty flint May 27, 2023, 7:42 AM

#

tardy topaz I was playing with an 'infinite live tv show' on twitch. It used diffusion for t...

wow may i have the link of twitch ?

chrome goblet May 27, 2023, 8:51 AM

#

https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer/blob/master/notebook.ipynb

GitHub

bark-voice-cloning-HuBERT-quantizer/notebook.ipynb at master · gitm...

The code for the bark-voicecloning model. Training and inference. - bark-voice-cloning-HuBERT-quantizer/notebook.ipynb at master · gitmylo/bark-voice-cloning-HuBERT-quantizer

#

i found there is a link which can clone voice?

#

https://github.com/serp-ai/bark-with-voice-clone/blob/main/clone_voice.ipynb

GitHub

bark-with-voice-clone/clone_voice.ipynb at main · serp-ai/bark-with...

🔊 Text-prompted Generative Audio Model - With the ability to clone voices - bark-with-voice-clone/clone_voice.ipynb at main · serp-ai/bark-with-voice-clone

viral lynx May 27, 2023, 9:22 AM

#

chrome goblet https://github.com/serp-ai/bark-with-voice-clone/blob/main/clone_voice.ipynb

spoiler alert: both use the same cloning code, the main difference is:
gitmylo/bark-voice-cloning-HuBERT-quantizer: creates the clone file for use in bark.
serp-ai/bark-with-voice-clone: creates and uses the voice clone in bark.

if you want a webui instead:
https://github.com/gitmylo/audio-webui
or
https://github.com/JonathanFly/bark

chrome goblet May 27, 2023, 9:44 AM

#

thank you

#

i will

#

i will try to make one

#

can we use it to sing a song ?hahaha

viral lynx May 27, 2023, 9:54 AM

#

chrome goblet can we use it to sing a song ?hahaha

if your cloned speaker was singing there's a high chance the generation will continue singing

viral lynx May 27, 2023, 10:22 AM

#

scrolling through my temp, there are some cursed gens

quaint night May 27, 2023, 12:22 PM

#

first time hearing bark yawn (around middle)

quaint night May 27, 2023, 3:38 PM

#

for children's books

tardy topaz May 27, 2023, 3:55 PM

#

quaint night for children's books

Did you try a lot? Feels dialed in just right.

quaint night May 27, 2023, 3:55 PM

#

first shot

tardy topaz May 27, 2023, 4:00 PM

#

Was it 'clone and do one sample' at least?

quaint night May 27, 2023, 4:01 PM

#

clone + prompt one thing, then try something else, this was that else

tardy topaz May 27, 2023, 4:01 PM

#

Nice, yeah it can be enough.

quaint night May 27, 2023, 4:11 PM

#

it's kind of funny, it does a good job and then keeps hallucinating, perhaps it should be used with a -0.5s voice connector

tardy topaz May 27, 2023, 4:13 PM

#

She performs for children. She's creative and improvises. Works for me.

#

Well, maybe not just randomly

#

But sometimes it sounds like it

quaint night May 27, 2023, 4:14 PM

#

no, it's a short prompt and a long voice

tardy topaz May 27, 2023, 4:14 PM

#

Oh yeah

#

Sounds like the intro to a song

#

I like my prompts short and my voices long

#

Though actually it's the opposite for me.

quaint night May 27, 2023, 4:15 PM

#

the best way to generate a voice with bark is to first have the voice say what you want to generate. Then use it as a prompt.

#

Except, if you already have the result, what are you asking bark to do? 🤷‍♂️

tardy topaz May 27, 2023, 4:15 PM

#

It is a funny quirk

#

Though I thought my code was bugged, until I tried it on the Suno default tspeakers lol

#

Ask them to say their own prompts

quaint night May 27, 2023, 4:16 PM

#

lol, don't torture them

tardy topaz May 27, 2023, 4:17 PM

#

I was like, why is my second segment always halucinating? But I was runing the same script, of repeats

lofty flint May 27, 2023, 5:28 PM

#

tardy topaz I was like, why is my second segment always halucinating? But I was runing the s...

I found that the history prompt will speak the the script originally feed in, it is probably becoz of its nature as gpt based

tardy topaz May 27, 2023, 5:59 PM

#

lofty flint I found that the history prompt will speak the the script originally feed in, it...

Oh, because there's no repetition penalty? I forget that used to not be standard.

#

So maybe the fix is just a repetition penalty? But

#

But I wonder if it works as with sounds...

lofty flint May 27, 2023, 7:23 PM

#

tardy topaz Oh, because there's no repetition penalty? I forget that used to not be standard...

it seems to have to be implemented in training stage, isn't it?

tardy topaz May 27, 2023, 7:27 PM

#

lofty flint it seems to have to be implemented in training stage, isn't it?

I'm pretty sure it's just code in the sampler, like you would implement top-k or whatever. But presumably if it was super easy it would be done already. Actually I just googled and it seems almost too easy to implement, I wonder if am I actually extra confused. This I could do. But why wouldn't it be in the nano GPT?

#

If it was easy, and effective. Presumably it would be generically in the fork. Whatever I am busy today and also not the person for that.

#

Someone who writes that code all day should get in there any try though. Maybe it is hard to do it right, as opposed to, at all. I just wait for huggingface to do all that kind of stuff.

lofty flint May 27, 2023, 7:48 PM

#

tardy topaz Someone who writes that code all day should get in there any try though. Maybe i...

You are right, thanks Jonathan, your contribution to this new technology is important

past sinew May 28, 2023, 12:34 AM

#

I think I broke it. Anyone know why it does this? It completely avoided the text.

https://huggingface.co/spaces/suno/bark/discussions/94#6472a0e822016353ae3cec6f

suno/bark · Bark: WOMAN I'm a dog. [Bark][barks][woof][meow][hiss][...

tardy topaz May 28, 2023, 12:42 AM

#

With Bark it could happen even normally, but anything with [words] all over it is high chance of failure. It's not really like 'feature' more something people discovered that sometimes works

#

I'd be interested to know if you get really good animals sounds actually.

chrome goblet May 28, 2023, 4:05 AM

#

#

can i choose the music for song ?

#

and how

zenith notch May 28, 2023, 6:22 PM

#

👍

light otter May 28, 2023, 10:30 PM

#

#🐣┃suno-showcase Hola

quaint night May 29, 2023, 2:39 PM

#

big semantic + big coarse model does make a difference, as much as I don't like waiting for it

tardy topaz May 29, 2023, 2:40 PM

#

quaint night big semantic + big coarse model does make a difference, as much as I don't like ...

I can sacrifice coarse but not big semantic.

quaint night May 29, 2023, 2:41 PM

#

I thought that the differences were small but I just tested on the wrong samples. They can generate qualitatively different things in some cases.

grizzled shard May 30, 2023, 9:31 AM

#

some test of some more audio post-processing in my bark plugin.

lofty flint May 30, 2023, 9:33 PM

#

Today podcast, all are inferencing in one go, not picked from dozens :
https://soundcloud.com/jacktalk/sets/jacktalk-today-20230530

SoundCloud

JackTalk

JackTalk Today 20230530

JackTalk is brought to you by ai.pictures. All content is generated with A.I.

▶ Play video

tardy topaz May 31, 2023, 2:31 AM

#

first non voice clone just making sure gradio upload works, first sample, copy and pasted help gradio text as content. each segment in clip same exact speaker, same text, yet musical in wildly different ways. literally the first generation of the first clone i just tested. just throw any audio into the cloner people. (It's not an audio clip as the prompt, it's just a music clip as a voice clone, I just happened to copy and paste the closest nearby text on screen.) The music clip was the Deux Ex theme, which you can hear a sliver of.

grizzled shard May 31, 2023, 12:34 PM

#

😂

round gorge May 31, 2023, 3:23 PM

#

hi

#

#🐣┃suno-showcase hi

obtuse slate May 31, 2023, 8:18 PM

#

lofty flint Today podcast, all are inferencing in one go, not picked from dozens : https://s...

Just love it ! Can you walk me to the process ? What prompt-voice are you using ? Do you generate each segment parallèle and then concatenate the audio or ? You say there is no post processing, how do you ensure the sentences are properly ended ? What specific tweaks is you program doing on the audio ?

blissful pulsar May 31, 2023, 9:06 PM

#

Hi guys, here is a dedicated blog post https://dev.to/adriens/agi-bark-smart-waitress-285h

DEV Community

😋 AGI (bark 🐶) Smart waitress 🎙️

❔ About With this post you'll see how I started my first full artwork creating a bridge...

#

Hopefully you'll like it

lofty flint May 31, 2023, 9:56 PM

#

obtuse slate Just love it ! Can you walk me to the process ? What prompt-voice are you using ...

in the first week, i spent hours to pick a few from hundreds of generated samples, then stick with those few and test with variations, to see which one is good for the results

fallen stump Jun 2, 2023, 11:12 AM

#

11

tardy topaz Jun 2, 2023, 11:55 AM

#

the applause is there, just can't quite surface it all the way. so close.

lofty flint Jun 2, 2023, 2:29 PM

#

tardy topaz the applause is there, just can't quite surface it all the way. so close.

i found that bark voice (if good quality) sound similar , it is probably because of the training dataset

knotty mountain Jun 3, 2023, 12:43 AM

#

I've been using Bark for doing a radio show. It's pretty fun.

#

English Voice 3 sounds very much like Chris Morris

#

If anyone is interested

obtuse slate Jun 3, 2023, 4:55 AM

#

knotty mountain If anyone is interested

Very interesting and creative. Also weird. Overall I enjoyed listening to it. Did you create all sentences in a single generation, or trying your luck multiple times ? The global editing is made by hand, I take it

knotty mountain Jun 3, 2023, 6:41 AM

#

Yeah just a single generation using the long form scripts. I did the second to try and fix up the weirdness that happens sometimes, which sort of makes it confusing, but works in the context. A bit of editing at first, but then just gave in to the pace of the generations.

cunning tendon Jun 3, 2023, 1:21 PM

#

#🐣┃suno-showcase 如果您注重局域网传输速度，或需要组建更大的网络，同时对价格不敏感的话，Asus TX-AX6000更能符合您的需求。如果您对价格敏感，或希望覆盖范围更广一些，同时还需要多项WiFi 6技术的支持，那么TP-Link xdr6088可能更适合您

tardy topaz Jun 4, 2023, 3:33 PM

#

couldn't resist trying one batch of 30, overloading Bark with hints like the woman literally starts speaking with "In a world..." some with no announcer. Though Bark makes those generally sound like weekly news teaser videos. The text really super shapes the voice you get. This is just a random assortment of them.

lofty flint Jun 4, 2023, 6:49 PM

#

if you want to learn how to make chatwithalice, please go here :
https://igg.me/at/chatwithalice

Indiegogo

Online course-build virtual channel 24x7 on twitch

I built a virtual teacher on twitch :

https://www.twitch.tv/chatwithalice

I tell you how to make. | Check out 'Online course-build virtual channel 24x7 on twitch' on Indiegogo.

deft granite Jun 5, 2023, 10:59 AM

#

The Chinese generated voice is very strange [MAN]:在苏联是否可以存在两党制？ [WOMAN][laughs]不,不可能,因为我们养不起.

knotty osprey Jun 5, 2023, 5:19 PM

#

你好

tidal fulcrum Jun 6, 2023, 1:08 AM

#

测试一下中文的效果看看。

blissful pulsar Jun 6, 2023, 5:45 AM

#

I just tried the example prompt "♪ In the jungle, the mighty jungle, the lion barks tonight ♪" and...

what the heck-

lofty flint Jun 6, 2023, 9:15 PM

#

blissful pulsar I just tried the example prompt "♪ In the jungle, the mighty jungle, the lion ba...

it is not a one go easy task as suno nature is different from other TTS

blissful pulsar Jun 6, 2023, 9:16 PM

#

lofty flint it is not a one go easy task as suno nature is different from other TTS

I see

lofty flint Jun 6, 2023, 9:16 PM

#

deft granite The Chinese generated voice is very strange [MAN]:在苏联是否可以存在两党制？ [WOMAN][laughs]...

it now only is good at English i tested several languages

tardy topaz Jun 7, 2023, 12:51 AM

#

lofty flint it now only is good at English i tested several languages

Is every Chinese voice bad? If you can find even 2 or 3 good ones, I think you can find infinite good ones. Eventually...

lofty flint Jun 7, 2023, 7:32 AM

#

tardy topaz Is every Chinese voice bad? If you can find even 2 or 3 good ones, I think you c...

chinese not that bad, it is already better than many tts projects, just not as good as english, i think it is becoz the training dataset not big enough, or may chinese dataset is not all around enough

rough dust Jun 7, 2023, 8:21 AM

#

有中国人吗？出来一下

tardy topaz Jun 7, 2023, 8:24 AM

#

抬头看!⬆️

lofty flint Jun 7, 2023, 8:31 AM

#

i think we should just focus on English at this moment, i think suno will make next verion which other languages will be better , just like chatgpt, it is particular good at english

blissful pulsar Jun 7, 2023, 1:57 PM

#

Hi everyone. Can you point me to the best singing examples Bark generated ? And can Bark run on a Windows install without using the GPU (like for example SVC, RVC, Vlad diffusion) ?

viral lynx Jun 7, 2023, 3:17 PM

#

Jonathan's webui is for bark, with lots of bark related features
My webui is for bark and any other audio related webuis, with less model specific features

merging them isn't really a good idea, as the webuis have different purposes

tardy topaz Jun 7, 2023, 3:33 PM

#

I'm frustratingly incapable of actually making the UI I want to make, technically. It would look completely different, like a node based sound laboratory where you draw lines and connect segments to create unique processes. Like a visual UI version of a Jypyter notebook. My only hope is I stumble across something almost like that I can fork lol

#

Some day I'm going to drink way too much coffee and try to rig up some crazy way to use https://wavesurfer-js.org/examples/#multitrack.js in gradio. But honestly I can't believe somebody hasn't done it already. It's a web page. Half the Stable Diffusion UI features are just hooking javascript into Gradio, already. Somebody who really specializes in that could probably do it quick.

turbid anvil Jun 7, 2023, 5:30 PM

#

Hey, guys! Want to ask, is it a right place to share my pet project utilising Bark?

inner pasture Jun 7, 2023, 6:15 PM

#

turbid anvil Hey, guys! Want to ask, is it a right place to share my pet project utilising Ba...

yes, that would be great, as long as it doesn't have anything that breaks the community rules

turbid anvil Jun 7, 2023, 6:22 PM

#

Nice! Here it is https://castpod.live/
It's quite straightforward. You provide a text prompt about a particular topic, and Castpod generates all the elements you'd expect in an audio podcast: cover art, a theme song, a title, description, tags, as well as the podcast's characters, including their names, roles, avatars, and of course, the podcast script and the corresponding audio conversation.
The part which is generating audio output is powered by Bark. Now, there are some limitations is quality and size of the podcast, but sometimes, the results truly mimic authentic conversations with insightful viewpoints that I hadn't even considered before.
It was so fun to develop, thank you a lot for your work and tool you have shared!

#

For now, podcast generation is quite expensive. The average cost of generating one podcast is approximately 60 cents — a considerable amount! As a result, I implemented a paywall for podcast generation. I'm sharing a license key that comes with some generation credits. Simply enter the key on the podcast creation page, and you'll be all set to give it a try. 04B68F62-79B7-4337-B054-F9741D1A65CC

tardy topaz Jun 7, 2023, 6:25 PM

#

turbid anvil Nice! Here it is https://castpod.live/ It's quite straightforward. You provide a...

The music isn't Bark too is it? If it is you must have really worked for those. Soo hard to get that high quality.

turbid anvil Jun 7, 2023, 6:27 PM

#

Nope, theme songs are generated with https://github.com/riffusion/riffusion

GitHub

GitHub - riffusion/riffusion: Stable diffusion for real-time music ...

Stable diffusion for real-time music generation. Contribute to riffusion/riffusion development by creating an account on GitHub.

#

Bark handles only audio conversation output

obtuse slate Jun 8, 2023, 4:56 AM

#

turbid anvil For now, podcast generation is quite expensive. The average cost of generating o...

Very nice. I built a similar engine a while ago but used 11labs for the voices. It's totally unsustainable money wise, and even tho it's way cheaper with self generation, your comment shows were not quite there yet. Maybe once soundstorm is ported to the open source - it's supposed to accelerate inference by an order of magnitude with their non auto regressive solution. I think the podcast shows that bark is not quite there yet also in terms of voice control and quality. It's a good experimental tool (and sometimes an artistic one) and probably a good project for research, but not really something that can be part of a product.

turbid anvil Jun 8, 2023, 9:39 AM

#

obtuse slate Very nice. I built a similar engine a while ago but used 11labs for the voices. ...

Yes, agree with that. For now, costs and quality are solid limitations for such a project. But awesome thing here - it is already able to solve the puzzle, even with super young solutions such as Bark and pretty dummy GPT-3 if compare to GTP-4, proof of concept works and sometimes works ridiculously well. There is no doubt with Bark undergoes further iterations and improvements and GPT upgrade, it will reach the level of quality and costs makes it possible to create products like that.

obtuse slate Jun 8, 2023, 9:52 AM

#

turbid anvil Yes, agree with that. For now, costs and quality are solid limitations for such ...

Indeed. My take was behind https://radio-hn.pages.dev and this was so fun building I toyed with the idea of productize it (like recast) but price of inference wuldn't allow it (yet)

Radio HN - The Five

Radio HN - The Five - is an AI generated podcast about the most impactful stories shared on hacker news daily.

turbid anvil Jun 8, 2023, 10:06 AM

#

obtuse slate Indeed. My take was behind https://radio-hn.pages.dev and this was so fun buildi...

Yeah, me, I just love building pets and when I checked out Bark that was like the most obvious idea to play with, I wanted to make it open, but for sure there is no way now, too costy. But, yeah, that was a lot of fun to build, looks and feels like a magic 🙂

quaint night Jun 8, 2023, 6:07 PM

#

!! NOISE WARNING !!
I hadn't seen this before, it goes from blast noise to speaking in a studio midway, maybe there's something interesting in there

{
_version: "0.0.1",
_hash_version: "0.0.2",
_type: "bark",
is_big_semantic_model: false,
is_big_coarse_model: false,
is_big_fine_model: true,
prompt: "♪ 広い宇宙の数ある一つ青い地球の広い世界で小さな恋の思いは届く小さな島のあなたのもとへ",
language: null,
speaker_id: null,
hash: "c342707bb377c37533b46660842959ed",
history_prompt: "None",
history_prompt_npz: null,
history_hash: "6adf97f83acf6453d4a6a4b1070f3754",
text_temp: 0.7,
waveform_temp: 0.7,
date: "2023-06-08_20-59-08",
seed: "1542369585"
}
!! NOISE WARNING !!

#

📎 2023-06-08_20-59-08__bark__None.npz

last geode Jun 8, 2023, 9:03 PM

#

As a very low effort youtube channel mostly for figuring out how to chain together different AI tools. I built a youtube channel that is almost entirely automated. Images from Stable Diffusion, script from gpt4, audio from Bark, and face from SadTalker https://www.youtube.com/channel/UChhN-FdST9UDux_Mu5JX4BQ

YouTube

Wide Ranging Conversations

blissful pulsar Jun 11, 2023, 3:55 PM

#

NEW AI JUST DROPED! https://huggingface.co/spaces/Martinic/MusicGen made by Meta, a better, open source MusicLM, with melody transfer! (Mario piano)

MusicGen - a Hugging Face Space by Martinic

#

Music might be dead.

tardy topaz Jun 11, 2023, 4:03 PM

#

I think they dropped the ball. They cleaned their data too well.

#

I mean it's awesome don't get me wrong

#

but it's killing me that the LLM lobotomized by only seeing tags and that souless description text on those stock photo sties

#

What do I know. Maybe the training would have fallen apart if there was more that such regular data. But from what I can they didn't try it.

#

However, holy cow, there are a lot of functions that look super interestitng

fickle otter Jun 11, 2023, 5:34 PM

#

blissful pulsar NEW AI JUST DROPED! https://huggingface.co/spaces/Martinic/MusicGen made by Met...

Very interesting. Thanks for sharing. Prompt: An 80s driving pop song with heavy drums and synth pads in the background

tardy topaz Jun 11, 2023, 5:38 PM

#

I am a little jealous of the cool wav files

jolly fog Jun 11, 2023, 10:02 PM

#

hello! today I installed Bark and MusicGen locally , and then I put this together 🙂 https://on.soundcloud.com/RHhFh

SoundCloud

MADWURMZ✪

a.i. (siri is my sister)

(Bark ai voice - MusicGen loop)

▶ Play video

quaint night Jun 12, 2023, 7:42 AM

#

blissful pulsar NEW AI JUST DROPED! https://huggingface.co/spaces/Martinic/MusicGen made by Met...

Included musicgen in my webui alongside bark (tts-generation-webui)

lofty flint Jun 12, 2023, 8:35 AM

#

last geode As a very low effort youtube channel mostly for figuring out how to chain togeth...

i have been doing it in 2016, but no stable diffusion and gpt4 at that time, fyi :
https://www.youtube.com/@ai.picturesespanol2405

YouTube

ai.pictures Español

quaint night Jun 12, 2023, 6:07 PM

#

Using vocos on some noisy bark generations:

fiery cove Jun 13, 2023, 1:54 PM

#

Hi my phone is running very fast

obtuse slate Jun 13, 2023, 8:03 PM

#

quaint night Using vocos on some noisy bark generations:

How was the inference speed, in comparaison ?

quaint night Jun 13, 2023, 8:14 PM

#

encodec and vocos are both really fast

#

the slowest part is coarse > semantic > fine > vocos > encodec

#

but really 90% is coarse, 9% semantic, with rest being small

pliant spruce Jun 16, 2023, 4:20 AM

#

This one sounds alright.

pliant spruce Jun 16, 2023, 4:57 AM

#

pliant spruce This one sounds alright.

prompt for this one should be: "solfège, dou rei mie fah sol law sti doe"

pliant spruce Jun 16, 2023, 4:59 AM

#

pliant spruce prompt for this one should be: "solfège, dou rei mie fah sol law sti doe"

then i reuse the npz to make this...: "[do][re][me][fa][so][la][ti]"...

limber onyx Jun 16, 2023, 7:26 AM

#

We built a small demo. It's a webui built with Next.js for JavaScript/TypeScript developers: https://github.com/failfa-st/bark-web-ui
It is still very basic but we're attempting to add more features in the next weeks.

GitHub

GitHub - failfa-st/bark-web-ui: Web UI for Bark by Suno.ai built wi...

Web UI for Bark by Suno.ai built with next.js. Contribute to failfa-st/bark-web-ui development by creating an account on GitHub.

knotty mountain Jun 16, 2023, 11:28 PM

#

Another use in a radio show - was testing different settings to see the results.

dusky siren Jun 17, 2023, 6:42 AM

#

jolly fog hello! today I installed Bark and MusicGen locally , and then I put this togethe...

Great work. That sounds amazing.

What prompt did you use with musicgen, if you dont mind my asking?

crude ingot Jun 17, 2023, 11:19 AM

#

Hi team, can i ask how can i clone the voice?

jolly fog Jun 17, 2023, 2:15 PM

#

dusky siren Great work. That sounds amazing. What prompt did you use with musicgen, if you...

thanks! 😊 it is fun to make music but days later I can't enjoy it that much.

I can't remember the prompt exactly , MusicGen has no way to recover the used prompt.
was playing around with words like :
simplistic, slow authentic drums, exotic percussion, subtle silent textures, old-school hiphop, drum break, simple beat, dj premier, dj revolution , non-harmonics, dry raw empty

Got very random results, so I made a lot of attempts. today it rendered this one. I used the large model between 10 to 20 seconds. then I used Audacity to edit the loop.

jolly fog Jun 17, 2023, 2:31 PM

#

crude ingot Hi team, can i ask how can i clone the voice?

if you got a good gpu, you can use RVC (Retrieval based Voice Conversion WebUI).
I also tried with Bark but that was a bit too complicated.
Today I cloned marilyn monroe her voice then I use Bark output to change it to her voice 😋

jolly fog Jun 18, 2023, 2:29 AM

#

making it do singing , used Bark Infinity Cloning, probably not as good as it can be, but my pc slow , only 11gb vram ✨ 😊 used 1 sample of 30 seconds for the cloning

jolly fog Jun 18, 2023, 4:50 AM

#

I'm Making a new version of Marilyn Monroe using Bark Infinity Cloning, now just her spoken voice, no singing. I also use the optional second sample part in the webui and I got really amazing results.
these are using the prompt to the letter. I think it sounds just like her 1953 movies, and her voice has this natural asmr vibe and it is just amazing how it can reproduce it. Now only if it could stay consistent ... [whispering] 😄

#

blissful pulsar Jun 18, 2023, 3:41 PM

#

I have a silky smooth voice, and today I will tell you about the exercise regimen of the common sloth.

jolly fog Jun 18, 2023, 4:49 PM

#

blissful pulsar I have a silky smooth voice, and today I will tell you about the exercise regime...

😋

#

cherrypicking...

dusky siren Jun 18, 2023, 5:49 PM

#

jolly fog thanks! 😊 it is fun to make music but days later I can't enjoy it that much. ...

Nice. This is helpful. Going to try it out myself!

dusky siren Jun 18, 2023, 5:56 PM

#

jolly fog 😋

Haha nice!

#

@jolly fog would you mind if I DM you with some quick questions?

I've built an app that I'm not ready to make public yet, so want to keep it private.

jolly fog Jun 18, 2023, 6:15 PM

#

using local version of sadtalker model here , much better result

jolly fog Jun 18, 2023, 8:52 PM

#

dusky siren <@219481591108337664> would you mind if I DM you with some quick questions? I'...

yeah ok! I'm curious what app let me know!

limber onyx Jun 19, 2023, 2:25 PM

#

@jolly fog thx for the link to Sadtalker
So much fun 🤣

blissful pulsar Jun 19, 2023, 3:04 PM

#

limber onyx <@219481591108337664> thx for the link to Sadtalker So much fun 🤣

Is it the opentalker one?

limber onyx Jun 19, 2023, 3:17 PM

#

blissful pulsar Is it the opentalker one?

whatever is available through auto1111: https://github.com/OpenTalker/SadTalker/blob/main/docs/webui_extension.md

blissful pulsar Jun 19, 2023, 3:23 PM

#

limber onyx whatever is available through auto1111: https://github.com/OpenTalker/SadTalker/...

Thanks

limber onyx Jun 19, 2023, 4:03 PM

#

jolly fog using local version of sadtalker model here , much better result

what did you use here?

jolly fog Jun 19, 2023, 4:10 PM

#

limber onyx what did you use here?

Hi! First one was with online demo of sadtalker, second one is with local sadtalker as extension of stable diffusion. That new one is using an updated model they just released, much more animated. But my gpu is 11gb and very slow at doing this.

limber onyx Jun 19, 2023, 4:23 PM

#

jolly fog Hi! First one was with online demo of sadtalker, second one is with local sadta...

did you enable eye blinking in the extension? (the UI does not have an option for that yet)
I was wondering if I can just modify the source files in the auto111-wbui-extension.

jolly fog Jun 19, 2023, 4:39 PM

#

limber onyx did you enable eye blinking in the extension? (the UI does not have an option fo...

no, but it is blinking , you are right if you notice it is animating pretty good, just a lucky render I think maybe from the pose style selection? I forgot what number I picked

jolly fog Jun 19, 2023, 4:52 PM

#

limber onyx <@219481591108337664> thx for the link to Sadtalker So much fun 🤣

or maybe your image is not realistic enough for recognizing the eyes, my other examples also have some eye movement

jolly fog Jun 19, 2023, 5:23 PM

#

limber onyx did you enable eye blinking in the extension? (the UI does not have an option fo...

I see there is a dev version you can activate blinking, so even if mine is blinking already, maybe I didnt understand what you meant?
https://github.com/OpenTalker/SadTalker/discussions/386

GitHub

Many new features are launching in SDWEBUI extension and standalong...

Hi, everyone! Thanks for your patience with the bugs and long-time updates in SadTalker. We are releasing some new features in SadTalker for the WEBUI. You can try to install the dev version for th...

limber onyx Jun 19, 2023, 5:25 PM

#

jolly fog I see there is a dev version you can activate blinking, so even if mine is blink...

all good, thanks, I thought you might have used a refernce video for eye blinking.

#

seems better with a different cutout and trying different pose-styles

jolly fog Jun 19, 2023, 8:05 PM

#

limber onyx seems better with a different cutout and trying different pose-styles

yeah nice! I think I heard anime style wont allow blinking eyes , but maybe the new coming update might change that.
how did you do the 50 seconds audio?

limber onyx Jun 19, 2023, 8:13 PM

#

I used my custom node express server: https://github.com/failfa-st/express-bark
with a dev version of hyv: https://github.com/failfa-st/hyv

It allows setting a batch size (3 seems to be the limit on a 4090) so that means 3 in parallel. the rest is a queue.

I can generate endless long mp3 files this way.

I'l soon buil it into this project too (still has the 13s limit): https://github.com/failfa-st/bark-web-ui

blissful pulsar Jun 19, 2023, 8:23 PM

#

yo, there's like 5 webuis at this point can yall work together ? lmao

tardy topaz Jun 19, 2023, 8:24 PM

#

I just want to spew feature ideas and have them appear, I am not a good UI programmer at implementing. I got like 5% of the way through my Bark list

limber onyx Jun 19, 2023, 8:40 PM

#

blissful pulsar yo, there's like 5 webuis at this point can yall work together ? lmao

We build our UIs with Next.js instead of Gradio. it's a different approach in general making AI stuff more accessble to JavaScript developers

fathom hatch Jun 19, 2023, 9:14 PM

#

Wheres the new sadtalker model?

#

I have bark and sadtalker set up on my discord bot so you can do it all thru discord

tardy topaz Jun 19, 2023, 9:33 PM

#

Who wants to PROMPT a TTS model. Just let them talk!

jolly fog Jun 19, 2023, 10:34 PM

#

fathom hatch I have bark and sadtalker set up on my discord bot so you can do it all thru dis...

I dont know if my comment made you look for that, but there is some tutorial video and an online demo, and I just noticed my version is much better animated than those old examples. but I maybe confused about that because their Changelog doesn't mention any new models except a 512 model .

tardy topaz Jun 20, 2023, 5:36 AM

#

I love how the Bark speakers trip over their words, and then try again. So eerily human.

quaint night Jun 20, 2023, 9:09 AM

#

limber onyx seems better with a different cutout and trying different pose-styles

how long does it take to process?

limber onyx Jun 20, 2023, 11:13 AM

#

Without face restoration it takes 2x length of audio (batchcount 4) on a 4090 with i9 13900ks and 128 ddr5
Face restoration takes 2-3 times as long)

pale falcon Jun 20, 2023, 12:08 PM

#

fickle otter Very interesting. Thanks for sharing. Prompt: An 80s driving pop song with heavy...

thats sounds reminicient of that band that did the backing to some Japanese animation, i cant remember its name, Ronin? I remember Sturgill Simpson Ronin

carmine ravine Jun 21, 2023, 12:36 AM

#

What / where is sad talker

#

Also aye🧿 randomly got one output that had music behind it … it’s amazing but can’t reproduce it … also: really really want to get these dudesto sing

#

Installed the latest one click installer d seems to 🐝 working consistently .:: wondering if musicgen is includded in there and how even it is invoked /.: in the prompt w a tag mayb ??

quaint night Jun 21, 2023, 8:23 AM

#

Yes, I believe mylos audio webui and mine (tts-generation-webui) include bark alongside musicgen

#

Someone said that they have used them together but I don't know how.

limber onyx Jun 22, 2023, 10:42 AM

#

carmine ravine What / where is sad talker

this is sadtalker (see link below, it generates talking heads for audio). It has a web-ui and an auto1111 extension (both work very similar but the auto1111 extension makes it easier to integrate generated images)
I tried using sadtalker from the commandline but there are some bugs
I tried using sadtalker trough the API provided by gradio but there are several bugs
It works with any image + audio (they do not have to be generated)
I hope this helps.

https://github.com/OpenTalker/SadTalker

#

This one includes several tts ttm models: https://github.com/rsxdalv/tts-generation-webui

It is nice to compare (hopefully soon combine) them. I haven't had any issues so far.
It allows voice cloning, suno.npz improvement (over vocos), bark, tortoise and MusicGen.

I wanna say: Amazeballz

GitHub

GitHub - rsxdalv/tts-generation-webui: TTS Generation Web UI (Bark,...

TTS Generation Web UI (Bark, MusicGen, Tortoise). Contribute to rsxdalv/tts-generation-webui development by creating an account on GitHub.

frail python Jun 22, 2023, 1:47 PM

#

Was playing around and stitched a few generations together to create a guided meditation (script by gpt-3.5-turbo). I tired using 2 different english voices, and tried one in Portuguese (I don't speak Portuguese. I was just seeing if it would work and it seems to do alright!)

It's interesting to me that the the speakers sometimes deviate from the original script, or make up words/sentences (sometimes they are semantically similar, which is even more interesting).

Another thing I noticed was the length of time of each meditation differs by a large margin, even when given (at least for the English speakers) the exact same script. Each speaker has their own style and such for reading and incorporating pauses.

teal gulch Jun 22, 2023, 5:31 PM

#

tardy topaz Jun 23, 2023, 12:22 AM

#

From D&D night

patent gorge Jun 23, 2023, 11:41 PM

#

Voice processing by Suno, followed up with SadTalker....

half yew Jun 27, 2023, 10:47 AM

#

tardy topaz Who wants to PROMPT a TTS model. Just let them talk!

Man... how realistic... Could you share the npz file for this ?

queen anchor Jun 27, 2023, 3:33 PM

#

taco taco song

tardy topaz Jun 28, 2023, 12:14 AM

#

half yew Man... how realistic... Could you share the npz file for this ?

I will soon, but the Obama voice specifically is one I'm holding back just a bit because I cloned it not using the cloner. Instead cloned 'by ear' using an truly absurd and ridiculous manual cloning process that I was very surprised and fascinated actually worked. I really want to express just how surprised I was when Obama actually emerged out of it.

So one day when I get time, I am going to do do fun YouTube video showing it off. It's horribly inefficient (it took 10+ of hours of tweaking to make a really good voice clone by hand, like this, though I could do a bad one in about 2 hours.) but the concept is just really interesting. Basically it's just iterative Bark semantic prompts only -- literally no audio reference. And then some manual merging/blending/tweaking to keep it on the rails. And lots of me personally listening with my ears and adjusting. I use my ears to setup an iterative process, then slowly ran Bark over rand over again, and lots of my just deciding that 'this voice has the right semantics, but it's missing this aspect from this voice, so if I made 5 new versions with a bit of each...' stuff like that.

Some of the manual tweaking is actually still useful for normal voice clones and I am porting it to my fork, eventually. Voice blending or model merging, ways to render a voice in increasing or decreasing intensity, or to soften it, to dial it in. Stuff like that. But it's hard to code it in a UI for Gradio, so not too much of it. Also with the voice cloner, probably just making a better normal voice clone is 99 times out of 100 better use of anyone's time.

#

Nobody sane should voice clone like that, but the fact that it is possible in Bark is remarkable.

#

All that said, I think there is likely some use of the techniques for 'dialing in' normal voice clones too. However all of my methods rely on my personal hearing and judgment of voices (typically I rank a bunch in a folder, as a first step) and that doesn't scale, so would have to be automated.

#

But also, maybe just fine-tune all of Bark on one voice, and never do any of that stuff...

#

The TLDR is that Bark lets you do whatever you want, almost. Just open up an .npz file and cut and paste, remove every third token, mash two voices together, insert weird patterns, ban all the tokens in one voice from another. Literally do anything, and usually it still sounds pretty natural! You do get a ton of 'broken' voices with weird stutters that get stuck halfway through reading your text, but Bark makes even the broken voices sound like a real person with a speech impediment!

#

Obama with speech problems or weird accents.

#

I mean listen to this. Poor Trevor Noah is given a horrible stutter. But in Bark, he KEEPS TRYING TO SPEAK THE TEXT. He doesn't give up! How cool is that right? It's more like a personality than a voice. The prompt is just the regular sentence, not repeats.

tardy topaz Jun 28, 2023, 1:17 AM

#

Bark not reading your text is the worst but also maybe actually the best thing about it.

dark token Jun 28, 2023, 1:38 AM

#

I had it spit out this randomly ignoring the text and it has me briefly question what I was doing lol

#

(bunch of silence at the start but it's worth it for the ending)

hollow trail Jun 28, 2023, 10:15 AM

#

Hi

turbid jasper Jun 28, 2023, 12:37 PM

#

MusicGen + bark + a bit of audacity and you can have a 24/7 news radio 😄

random slate Jun 28, 2023, 8:13 PM

#

@blissful pulsar edited it a bit LUL

carmine ravine Jun 29, 2023, 7:06 PM

#

i wanna know how i made that music happen

#

liek in the background

random slate Jun 29, 2023, 9:29 PM

#

KEK

tardy topaz Jun 30, 2023, 12:26 AM

#

carmine ravine

Try

[music]
My words here
[music]

[music][music][music]
My words Here
[music][music][music]

It's super hit or miss, but if a voice DOES work, it often works again

#

So save your .npz outputs. And then try resusing the ones that work.

tardy topaz Jul 1, 2023, 3:34 AM

#

(Maybe worth a repost) Google trained SoundStorm on 100,000 hours of dialog in part so they could have two person conversation prompts. The same text prompts - character for character identical - very often just work in Bark right now, out of the box.

(Well... Bark loves to generate a dialog of a person talking to themselves but I think you can fairly consider even those samples a dialog, it's performed like a conversation.)

Something really funny happened to me this morning. | Oh wow, what? | Well, uh I woke up as usual. | Uhhuh | Went downstairs to have uh breakfast. | Yeah | Started eating. Then uh 10 minutes later I realized it was the middle of the night. | Oh no way, that's so funny!

fast ferry Jul 1, 2023, 10:33 AM

#

😄

fast ferry Jul 1, 2023, 7:38 PM

#

viral lynx Jul 2, 2023, 1:09 PM

#

mixing the full and small models helps reach fast generation speeds without the quality dropping much

carmine ravine Jul 2, 2023, 6:55 PM

#

tardy topaz So save your .npz outputs. And then try resusing the ones that work.

I’ve really got to spend some time wrapping my head around npzs, and voice generation

#

Like it seems to me like it’s random every time ??

#

But somehow feel like that cannot 🐝

tardy topaz Jul 2, 2023, 8:58 PM

#

carmine ravine I’ve really got to spend some time wrapping my head around npzs, and voice gener...

It's a checkbox called 'save every NPZ' in my fork. Bottom right area. If you check that box, next to your output .wav or .mp4 files, you will always have new .npz files. These files are the voice of the audio file you just made. (If you used a random voice, it should save the .npz next to the .wav by default.)

Then later you want to make new audio that sounds like that cool .wav sample? So you go the menu, something like "pick an npz file from your filesystem as the speaker" and you can pick one of those .npz files, that is the same name as the .wav file. If you decide you like the voice a lot you can put the .npz file into a directory so it shows up in of choices, alongside en_speaker_03.npz etc.

For music and more unusual things there's a decent chance the .npz file doesn't reproduce the same effect. But for pure voices, it should.

limber bison Jul 3, 2023, 11:02 PM

#

https://www.youtube.com/channel/UCGimnCFFH_5AyDpGxd71kqw (suno for voiceovers 🙏)

YouTube

Comedy, Code, & Pixels

Welcome to Comedy, Code, & Pixels – where dark humor and artificial intelligence collide in a spectacle of pixel wizardry. Step into an unparalleled domain hosted by Zane, the enigmatic Goth, and Oliver, the sharp-witted Englishman. With their elusive charm and sarcastic repartee, they lure you into a world where codes dance to uncanny tunes, an...

runic whale Jul 4, 2023, 4:16 AM

#

(╯°□°)╯︵ ┻━┻

teal grove Jul 4, 2023, 9:25 AM

#

limber bison https://www.youtube.com/channel/UCGimnCFFH_5AyDpGxd71kqw (suno for voiceovers 🙏...

Goddamn! Thanks

#

https://www.youtube.com/watch?v=DIXYlTyqgSk

YouTube

Comedy, Code, & Pixels

[Inside The Matrix] Unmasking Banksy: The Enigmatic Street Artist

Join us in this captivating journey as we unravel the mysteries behind Banksy, the renowned street artist. From his humble beginnings in the underground scene of Bristol to his groundbreaking artistic style, we delve into the impact of his art on the public and the ongoing enigma surrounding his true identity. Discover how Banksy has made a mark...

▶ Play video

#

💀

#

Sounds like someone slowly reading from a script for the first time

#

Not even that

#

He sounds like he’s struggling to read

#

And has no energy!

#

At least the voiceover is consistent in this video

#

https://www.youtube.com/watch?v=_Qmsu--7as4

YouTube

Comedy, Code, & Pixels

[Inside The Matrix] Exploring the World of HAM Radio: A Journey int...

Join us as we embark on a captivating journey into the world of HAM radio, exploring its significance, origin, emergency communication role, licensing process, diverse frequencies and modes, global reach, sense of community, inclusivity, and technological advancements. Discover the fascinating world of HAM radio and its timeless appeal.

▶ Play video

#

BWAHAHAHAHAJAJAJSJDJDJDO

#

“Ladies and gentlemen. Gather, round, for another, riveting episode, of our illustrious channel”

#

FWUHUHUHUHHU

#

He sounds so slow

#

And bored

#

We can make more energetic & stable voices for this though?

#

I want every day in the show I make to sound like Looney Tunes

teal grove Jul 4, 2023, 10:02 AM

#

tardy topaz I will soon, but the Obama voice *specifically* is one I'm holding back just a b...

What’s your manual merging/blending/tweaking?

#

Code / commands?

teal grove Jul 4, 2023, 1:28 PM

#

oh damn

#

i found out

#

that files of the same prompt that take less time to produce have less background noise (i think)

#

a noisy file took 8 minutes

#

a good file took just over 1 min (1 min 8 seconds) + waiting time to write files

#

here they are

#

📎 bark_generation5.npz

#

with the npz

#

i assume the npz is the speaker file that i can select

#

gonna try doing that next

#

i can write code to automatically quit the process of voice generation if it takes too long so that i don't waste my day (in theory) perhaps

#

i need to test this some more times to be sure

#

damn wrong place to post

#

*wrong channel

#

anyway the prompt was """ fine fools flower fight frame """

tardy topaz Jul 4, 2023, 1:44 PM

#

teal grove What’s your manual merging/blending/tweaking?

It's not automated like that. One reason I didn't automate or build in to a command (yet) is because it's more or less me going "let's try using this many tokens of this, and then doing this.... nope that didn't work. let's try a litle more..." instead of fixed process.

teal grove Jul 4, 2023, 1:53 PM

#

tardy topaz It's not automated like that. One reason I didn't automate or build in to a comm...

why does this work but not this?

#

audio_array = generate_audio(text_prompt, history_prompt="v2/en_speaker_1")

#

this works

#

audio_array = generate_audio(text_prompt, history_prompt="v2/bark_generation5")

#

this doesn't

#

and i put the npz in the same folder

tardy topaz Jul 4, 2023, 1:54 PM

#

the file paths can be bit fiddly. just suing full path, like whole directory. including '.npz'

teal grove Jul 4, 2023, 1:54 PM

#

ok

tardy topaz Jul 4, 2023, 1:54 PM

#

i forget exactly what works, but it does work,

#

also check what your current python process thinks is the current directory. maybe it's not what you though

teal grove Jul 4, 2023, 1:55 PM

#

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

#

ded

#

i just copied and pasted the path

tardy topaz Jul 4, 2023, 1:56 PM

#

hmmm what are you on, windows, google colab?

teal grove Jul 4, 2023, 1:56 PM

#

windows

tardy topaz Jul 4, 2023, 1:56 PM

#

there's not like a weird character in the name? or emoji?

teal grove Jul 4, 2023, 1:57 PM

#

no but for some reason the copied path uses slashes in the other direction

tardy topaz Jul 4, 2023, 1:57 PM

#

so worse case, if you just want to ge it working

#

rename an existing suno voice

teal grove Jul 4, 2023, 1:57 PM

#

\ instead of /

tardy topaz Jul 4, 2023, 1:57 PM

#

with your voice

teal grove Jul 4, 2023, 1:59 PM

#

alright i replaced all the \ with /

#

it didn't give the same error

#

now i wait for it to generate

#

yay

#

ngl that was odd lmao

#

i'm guessing that's what the truncated \UXXXXXXXX escape means

#

maybe it's for security idk

tardy topaz Jul 4, 2023, 2:01 PM

#

filenames and paths and differences between windows, linux, whatever - it's a cause of TONS of headaches

teal grove Jul 4, 2023, 2:22 PM

#

teal grove

GODDAMNNNNN

#

IT'S GOOD.

#

8 minute and a few seconds for this 13 second audio file.

#

if this voice was randomly generated, does that mean it was never heard before? or is it selected from existing speakers?

#

(it's open source and therefore commercial use anyway right)

#

I can probably keep using this voice again and again, or try to change the emotions and then generate it repeatedly until i get it to speak the way i want it to, correct?

teal grove Jul 4, 2023, 2:27 PM

#

teal grove 8 minute and a few seconds for this 13 second audio file.

CPU only

tardy topaz Jul 4, 2023, 2:37 PM

#

teal grove if this voice was randomly generated, does that mean it was never heard before? ...

if you used no prompts, it's technicall random. that said it's not impossible for 'random' voice to be really close to existing voice, the most common one I get is the twitch streamer robot voice. it's got be maybe the most common voice on youtube or soemthing

teal grove Jul 4, 2023, 2:37 PM

#

tardy topaz if you used no prompts, it's technicall random. that said it's not impossible fo...

Lol

tardy topaz Jul 4, 2023, 2:38 PM

#

now common, not that common. i got it 3 or 4 times, in 10s of thousands

#

but still

teal grove Jul 4, 2023, 2:38 PM

#

Ok

tardy topaz Jul 4, 2023, 2:38 PM

#

it really is THAT VOICE

teal grove Jul 4, 2023, 2:39 PM

#

I guess 1) if the person who sounds like it hears it, they can tell me to replace it 2) I can put post-processing on it using a DAW to make it sound higher or lower or change the formant to attempt to obscure it more

#

And lastly, it is fairly common for two people to sound SIMILAR (not exactly the same)

#

I think I’ll keep this one because I want it and it sounds excellent and no one said no (screw copyrighting fears)

limber bison Jul 5, 2023, 12:27 AM

#

teal grove Goddamn! Thanks

Thanks! I truly appreciate all the feedback. We're making improvements everyday. 👍

limber bison Jul 5, 2023, 12:28 AM

#

teal grove We can make more energetic & stable voices for this though?

Maybe possible in post, but I dont know of any way to tell the model that. 🤷‍♀️

teal grove Jul 5, 2023, 4:39 PM

#

I did this video today with those samples and hand animated vector art

#

PS who can guide me to an AI that can take the audio I’ve generated and make it sound like a clean recording?

#

I wish for this AI to be available offline, usable on the laptop (Windows 11 PC) I used to make these audio samples, and open source / commercial use.

#

I’ll research this myself as well, will post here if I find something good

fast girder Jul 5, 2023, 5:50 PM

#

teal grove PS who can guide me to an AI that can take the audio I’ve generated and make it ...

Check out https://github.com/facebookresearch/denoiser , have used this with some success

GitHub

GitHub - facebookresearch/denoiser: Real Time Speech Enhancement in...

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a ca...

fast girder Jul 5, 2023, 5:54 PM

#

limber bison https://www.youtube.com/channel/UCGimnCFFH_5AyDpGxd71kqw (suno for voiceovers 🙏...

if you dont mind sharing, is this all scripted or do you do some work in a video editor?

teal grove Jul 5, 2023, 6:43 PM

#

fast girder Check out https://github.com/facebookresearch/denoiser , have used this with som...

Thanks 🙏

limber bison Jul 5, 2023, 11:45 PM

#

fast girder if you dont mind sharing, is this all scripted or do you do some work in a video...

Everything is scripted. The only human input is really the topic. Video is assembled using FFmpeg

limber bison Jul 5, 2023, 11:46 PM

#

fast girder Check out https://github.com/facebookresearch/denoiser , have used this with som...

I couldnt get this working due to some old dependency conflicts. Ill have to try in a more isolated environment.

limber bison Jul 5, 2023, 11:47 PM

#

teal grove PS who can guide me to an AI that can take the audio I’ve generated and make it ...

I got some improvements in audio using -filter:a "dynaudnorm" in FFmpeg

carmine ravine Jul 6, 2023, 12:46 AM

#

What is going on heee haha

#

jolly fog Jul 6, 2023, 3:21 AM

#

teal grove Jul 7, 2023, 9:20 AM

#

fast girder Check out https://github.com/facebookresearch/denoiser , have used this with som...

Is it possible to use this on specific files instead of the entire system?

fast girder Jul 7, 2023, 12:59 PM

#

teal grove Is it possible to use this on specific files instead of the entire system?

Should be able to do this programatically within Python

teal grove Jul 7, 2023, 1:31 PM

#

fast girder Should be able to do this programatically within Python

Yeah the whole thing but do you mean I have to rewrite the module

#

I’m looking for quick fixes although if I spend time I probably can

#

Like ideally there should be existing functions

#

I’ll check it out.

#

It looks like a lot of the audio reading functions are for getting the dataset file locations (audio.py & denoiser.py)

#

More time

#

https://facebookresearch.github.io/denoiser/

#

This won’t upgrade speech audio quality

#

It just denoises background audio

teal grove Jul 7, 2023, 1:37 PM

#

teal grove This won’t upgrade speech audio quality

This is what I was trying to find (an offline version)

teal grove Jul 7, 2023, 1:38 PM

#

teal grove

For the files used in this video (it’s okay but it could sound cleaner)

#

Needs an AI

fast girder Jul 7, 2023, 1:39 PM

#

^ this problem (called "Speech Super-resolution") AFAIK is still pretty difficult, not many good models out there

teal grove Jul 7, 2023, 1:39 PM

#

fast girder ^ this problem (called "Speech Super-resolution") AFAIK is still pretty difficul...

🙏 I’ll go try and find something and will post when I do

foggy forge Jul 7, 2023, 2:05 PM

#

Done finally, Its a poem I wrote a while ago and I wanted to hear it sung

limber bison Jul 7, 2023, 9:52 PM

#

Nice. I should experiment more with the singing..

tardy topaz Jul 8, 2023, 12:59 AM

#

foggy forge Done finally, Its a poem I wrote a while ago and I wanted to hear it sung

I wonder if you can could get a longer term melody out of that voice if you made a bunch of variants, used in sequence. It's almost sort of got some coherence, way more than typical Bark singers.

#

I think Bark can maybe do music if it's singing only, or maybe singing + one single instrument that is used a bit sparingly. More than that and it seems to not quite hold it together.

wraith mesa Jul 8, 2023, 8:05 AM

#

I am using the BARK TTS WEB UI and it seems it is speaking whatever it is feeling like instead of my script!

foggy forge Jul 8, 2023, 9:43 AM

#

Book reading using the AI

nimble musk Jul 8, 2023, 3:00 PM

#

foggy forge Book reading using the AI

hey can you please share the prompt

foggy forge Jul 8, 2023, 3:13 PM

#

The source? Its a part of the prologue for the book A game of thrones in the ASOIAF series

gusty ingot Jul 9, 2023, 11:45 PM

#

teal grove PS who can guide me to an AI that can take the audio I’ve generated and make it ...

Hey @teal grove
I found this https://github.com/Rikorose/DeepFilterNet seems to work really well

GitHub

GitHub - Rikorose/DeepFilterNet: Noise supression using deep filtering

Noise supression using deep filtering. Contribute to Rikorose/DeepFilterNet development by creating an account on GitHub.

surreal wagon Jul 9, 2023, 11:57 PM

#

#🐶┃bark-beta message

#

best audio ever lol

teal grove Jul 10, 2023, 12:30 PM

#

gusty ingot Hey <@393388986497892364> I found this https://github.com/Rikorose/DeepFilterNe...

Thanks

#

It’s just a denoiser or can it also filter out or correct the rough parts of audio?

#

Here’s the latest short vid I did

#

I like the music in this case, but it would be cool to use something that can separate the music and give a good result

teal grove Jul 10, 2023, 12:33 PM

#

gusty ingot Hey <@393388986497892364> I found this https://github.com/Rikorose/DeepFilterNe...

I’ll go ahead and test this out anyways

gusty ingot Jul 10, 2023, 1:04 PM

#

teal grove It’s just a denoiser or can it also filter out or correct the rough parts of aud...

not that am aware of

little ore Jul 10, 2023, 1:24 PM

#

FINALLY something decent

#

brilliant

tardy topaz Jul 10, 2023, 1:32 PM

#

Is that a random voice? Pretty unique vibe

#

I wonder if that voice will be more likely to hallucinate with normal text prompts... Lol

little ore Jul 10, 2023, 1:38 PM

#

No, that's a fine-tuned voice. I guess I don't need elevenlabs anymore... Now if only I had time to prepare a proper USLT vietnamese dataset, but ALL OF THE TVB MOVIES FROM THE 2000S HAVE BEEN REPLACED WITH THE NORTHERN ACCENT DUB!!!! *glares at the vietnamese community (HOW DID YOU GUYS LET THAT HAPPEN?!! AND WHY?!!!!! )

tardy topaz Jul 10, 2023, 1:44 PM

#

little ore No, that's a fine-tuned voice. I guess I don't need elevenlabs anymore... Now i...

Ahh, still super cool voice. But I was like, damn, if that came out of prompting a made up childhood rhyme and Bark just came up with that, that'd probably be the most impressive text prompted Bark voice I'd seen.

#

It might work with well known childhood rhymes... though getting many children out of bark random voices is like pulling teeth

little ore Jul 10, 2023, 1:46 PM

#

And no it hallucinates in general, but only at the end of the prompt and even setting min eos has no effect, but whatever, it's after the prompt so cropping it off is easy enough.

And oh yes, it's the best voice I have in any voice AI.

Anyways, childhood rhyme, eh? https://kingkiller.fandom.com/wiki/Lackless_poem

Kingkiller Chronicle Wiki

Lackless poem

Young boy in troupe of non-Ruh performersI know a poem about Lackless! The Lackless poem is a poem with two versions. One is about Lady Lackless and the other is about the a door related to the...

#

You should hear him in RVC, though 😱

#

Also wth there's a boy version of that poem??? I never saw that in the book?!!!

tardy topaz Jul 10, 2023, 1:48 PM

#

Bark is still the model that is choosing which syllables to stress, how to pronounce things. Unless you fine-tuned in just rhymes?

#

Actually I'll try that in base vanilla bark, curious

tardy topaz Jul 10, 2023, 1:50 PM

#

little ore You should hear him in RVC, though 😱

Do you mean it's even better?

little ore Jul 10, 2023, 1:51 PM

#

No, the variance is alright. I just fine tuned it on speaking. But the source speaker is very expressive, so it ends up being a premium dataset. There's little gems like this in the dataset:

#

And yes he is legendary in RVC

#

His voice is the reason I halted training all other voices to desperately probe the secrets to his success

tardy topaz Jul 10, 2023, 1:52 PM

#

Do you happen to remember if RVC stressed the childhood rhyme as well?

little ore Jul 10, 2023, 1:53 PM

#

No RVC just flawlessly voice converts source audio, be it singing, emotions (to somme extent) and others:

#

That's inference from an anime character

tardy topaz Jul 10, 2023, 1:54 PM

#

Right, yeah, so I guess it's about whatever you happen to use for the TTS part, before Bark. If that had the same childhood rhyme speech pattern, as well as those samples. Just curious how unique Bark is.

little ore Jul 10, 2023, 1:55 PM

#

The dataset doesn't speak in rhymes, the voice is just very expressive

#

Bark kind of figured out the rhyme on it's own and that actually took a while. The first few fine-tunes got the pacing wrong, and also if you put in too many verses, it gets the pacing wrong and ignores half the prompt. Too short and it sounds weird, so no typing it in verse by verse

#

(first few fine-tunes didn't even bother pronouncing the word "seven" or the s sound at the end of "things". )

tardy topaz Jul 10, 2023, 1:59 PM

#

For pacing you will eventually be able to control in inference, though I'm not 100% sure the same things I tested work in a fine tuned model, I guess.

#

I haven't tried myself but the fine-tune just a diff of the Bark weights, not the full 6 gigs for text_2.pt or whatnot, I think? So I guess you could even just use different versions of the voice

little ore Jul 10, 2023, 2:02 PM

#

Well here's the same inference on normal speech, so I think bark can read the context to some degree

tardy topaz Jul 10, 2023, 2:03 PM

#

The end of that is classic Bark weirdness. IT's like the perfect weird horror movie sound model. So mahy times late at night I'm like WOAH.

little ore Jul 10, 2023, 2:03 PM

#

And this version has trouble speaking german...

tardy topaz Jul 10, 2023, 2:04 PM

#

Especially because it's usually just like that, after a big sound gap, so you get surprised lol

#

Actually Bark not too bad, like sample 3, it is understanding the text as a rhyme and the voice

little ore Jul 10, 2023, 2:06 PM

#

Oh yeah little gems like this one which for native japanese speakers I'm sure must be gold

#

Yeah, I saw alot of potential in bark back then despite the horrible voice quality. I don't usually go messing around with heavily beta stuff. MusicGEN for example is absolute garbage. But bark.. now bark is actually good.

tardy topaz Jul 10, 2023, 2:07 PM

#

You mostly get music, actually, because of the lack of puncutation. But the non music is way more children than Bark typically, so the childhood rhyme is being somewhat undestood

#

Genuinely seems like an especially good text prompt, tons of great unusual text voices. Complements to Rothfuss, maybe.

little ore Jul 10, 2023, 2:13 PM

#

I actually wasn't expecting it to work that well. Elevenlabs kind of failed at the Alan Wake poems, the "For he did not know, that beyond the lake he called home, lies a deeper, darker ocean green, where waves are both wilder and more serene. To its ports I've been, to its ports... I've been!"

tardy topaz Jul 10, 2023, 2:15 PM

#

little ore I actually wasn't expecting it to work that well. Elevenlabs kind of failed at t...

Yeah I'm impressed, it does for sure generate many samples in the proper childhood rhyme speaking pattern and intonation, just out of the box, and using this fictional rhyme so it's just matching the concept of the text, not the rhyme against a known one.

#

Got a classic Bark 'local TV news broadcaster doing a promo for upcoming news segment' voice. Even in this rhyme, can't dodge the news voice lol.

little ore Jul 10, 2023, 2:18 PM

#

HAHAHAHAHAH OH GOD THATS PERFECT

#

Limit at seven verses.. mkay...

tardy topaz Jul 10, 2023, 2:35 PM

#

little ore Limit at seven verses.. mkay...

I don't have this in a public fork yet, it's pretty fiddly, but you can penalize quieter tokens in the generation code and make Bark amusingly fast.

#

Bark is just so good, I think that sample even sounds like the speaker is out of breath!

#

God damn

fast girder Jul 10, 2023, 2:39 PM

#

that quick breath in the middle is awesome

tardy topaz Jul 10, 2023, 2:39 PM

#

fast girder that quick breath in the middle is awesome

I know right? It's like truly modeling a person trying really hard to speak fast!

little ore Jul 10, 2023, 2:43 PM

#

Woah and that sounds like rapping/rap battles. Hmmm...

tardy topaz Jul 10, 2023, 2:44 PM

#

Yeah I think there's ton of potential that doesn't need fine-tuning or loras, anything, just nudging the sampler a bit and Bark is so good it usually makes things sound good.

rigid idol Jul 10, 2023, 2:45 PM

#

你好

#

#🐶┃bark-beta 你好

#

#🐣┃suno-showcase 你好

little ore Jul 10, 2023, 2:46 PM

#

Well I REALLY hope finetuning is sufficient to add another language, or a really hacky solution is to remap chinese characters to Hán Việt (https://en.wikipedia.org/wiki/Sino-Vietnamese_vocabulary)
But that would be REALLY inconvenient

Sino-Vietnamese vocabulary

Sino-Vietnamese vocabulary (Vietnamese: từ Hán Việt, Chữ Hán: 詞漢越, literally 'Chinese-Vietnamese words') is a layer of about 3,000 monosyllabic morphemes of the Vietnamese language borrowed from Literary Chinese with consistent pronunciations based on "Annamese" Middle Chinese. Compounds using these morphemes are used extensively in cultural and...

keen lotus Jul 11, 2023, 4:35 AM

#

little ore No, the variance is alright. I just fine tuned it on speaking. But the source sp...

Are you channeling Young Sheldon Cooper?

pure brook Jul 11, 2023, 11:49 AM

#

created a script to take long monologues and export them. Just need to add multi threading now, and fix some of the parsing

#

pure brook Jul 11, 2023, 12:30 PM

#

I didn't realize there was already a tutorial on this. I made it in java.

fast ferry Jul 11, 2023, 4:38 PM

#

Listening to No-game No-life light novels. I think with RVC and bark it would be really cool audiobook

fast girder Jul 11, 2023, 5:00 PM

#

pretty nice narrator and slightly longer generation

light iris Jul 11, 2023, 5:02 PM

#

ooh I like that

quaint night Jul 11, 2023, 5:14 PM

#

little ore Oh yeah little gems like this one which for native japanese speakers I'm sure mu...

Japanese bark is all over the place

fast girder Jul 11, 2023, 5:15 PM

#

yeah, we really need to find some good presets there - I think it can be good with good presets but has huge variation

quaint night Jul 11, 2023, 5:15 PM

#

Presets for narration or Japanese?

fast girder Jul 11, 2023, 5:16 PM

#

Japanese

#

(well - both)

quaint night Jul 11, 2023, 5:16 PM

#

Can I ask for a quick test, can the bigger models properly spell this: 中華物　＝chuukabutsu

#

I tried several variations and it always chose the wrong spelling/reading (chuukamono)

#

by the way, phonetically I've heard that it's able to produce good output, but the phonemes chosen are not always correct

fast girder Jul 11, 2023, 5:18 PM

#

interesting! do you have it in a longer prompt

#

i see

#

that makes a lot of sense..

quaint night Jul 11, 2023, 5:18 PM

#

#🐶┃bark-beta message

#

here's a test I tried on the bot

fast girder Jul 11, 2023, 5:20 PM

#

interestingly enough Google Translate chooses the other reading as well

#

(I am out of my depth here)

quaint night Jul 11, 2023, 5:20 PM

#

if it were English, it's like using a Germanic pronunciation for a Latin origin word?

#

sometimes Mono is correct, sometimes butsu

fast girder Jul 11, 2023, 5:21 PM

#

yeah very interesting!@

#

first two seem to have gotten it (although bad qual)

quaint night Jul 11, 2023, 5:21 PM

#

aaaaaaaaaaa

#

sorry I'm just happy w

#

the first one seems to mimic one of popular Japanese TTS's that's probably why the quality is such

fast girder Jul 11, 2023, 5:23 PM

#

yeah 😦

#

we certainly have that problem in English too

quaint night Jul 11, 2023, 5:24 PM

#

can those models use the small model's npzs? I could give you a better one for history_prompt

fast girder Jul 11, 2023, 5:24 PM

#

ya

#

might take a little effort but doable

#

also if you have any slightly longer prompts (3sentences or so) would love to try those

quaint night Jul 11, 2023, 5:26 PM

#

ok I'll find some to choose from

fast girder Jul 11, 2023, 5:31 PM

#

one more decent one as far as i can tell (still grainy)

quaint night Jul 11, 2023, 5:31 PM

#

おおお

#

it sounds like a news special from a tv report

#

This is just a randomly generated paragraph
地球の気候変動に関する新しい研究が発表されました。研究者たちは、再生可能エネルギーの利用が急速に増えていることにより、温室効果ガスの排出量が減少していることを明らかにしました。太陽光や風力などのクリーンなエネルギー源の利用が進んでいるため、地球温暖化の抑制に大いに寄与しています。これは素晴らしいニュースです！

#

📎 jp_special_voice.npz

#

Also, for some reason bark likes to generate many Japanese voices with foreign accents

fast girder Jul 11, 2023, 5:36 PM

#

yeah, we have the same problem with Chinese (and other languages too)

quaint night Jul 11, 2023, 5:37 PM

#

📎 jp_old_lady_narrator.npz

fast girder Jul 11, 2023, 5:38 PM

#

here's a random one - I think we have a little work to do to make npzs work so will report back

quaint night Jul 11, 2023, 5:39 PM

#

This one has clearer audio but it jumbles words:

📎 Sumire.npz

#

Hmm, what if I gave you a recipe for generating a good "seed" voice?

#

btw with google and chuukabutsu - it's funny because they write chuukamono but they generate chuukabutsu in the audio

fast girder Jul 11, 2023, 5:41 PM

#

i see!

fast girder Jul 11, 2023, 5:41 PM

#

quaint night btw with google and chuukabutsu - it's funny because they write chuukamono but t...

yeah could be worth a try

quaint night Jul 11, 2023, 5:43 PM

#

{
  "_version": "0.0.1",
  "_hash_version": "0.0.2",
  "_type": "bark",
  "is_big_semantic_model": true,
  "is_big_coarse_model": true,
  "is_big_fine_model": false,
  "prompt": "初めて会った日から 僕の心の全てを奪った どこか儚い空気を纏う君は 寂しい目をしてたんだ",
  "language": null,
  "speaker_id": null,
  "hash": "ba221be9420a7791e8dc6ec5f175ca12",
  "history_prompt": "None",
  "history_prompt_npz": null,
  "history_hash": "6adf97f83acf6453d4a6a4b1070f3754",
  "text_temp": 0.6,
  "waveform_temp": 0.8,
  "date": "2023-06-10_19-03-45",
  "seed": "332186546"
}

fast girder Jul 11, 2023, 5:43 PM

#

awesome

quaint night Jul 11, 2023, 5:43 PM

#

here's what the above json sounds like for reference

#

it has an unnecessary bgm but it's voice actor level of diction

#

another one from the same 'family'

fast girder Jul 11, 2023, 5:47 PM

#

will throw a few more random ones over, gonna need to write some code to try the other ones

#

(no idea if these are good)

#

quaint night Jul 11, 2023, 5:48 PM

#

fast girder will throw a few more random ones over, gonna need to write some code to try the...

it has good parts but sounds like a record that's stuck in other places

pure brook Jul 11, 2023, 5:48 PM

#

Oh man I had to

quaint night Jul 11, 2023, 5:50 PM

#

fast girder

they mash up words and leave them out, but there's very little noise - they could be piped into a following generation and the results might be good

fast girder Jul 11, 2023, 5:51 PM

#

quaint night they mash up words and leave them out, but there's very little noise - they coul...

interesting! OK we have some work to do it seems

quaint night Jul 11, 2023, 5:51 PM

#

sometimes stuffing words down bark's.. prompt causes it to snap back into reality

quaint night Jul 11, 2023, 5:52 PM

#

pure brook Oh man I had to

this could come in handy

📎 rap_god1.npz

copper night Jul 11, 2023, 5:53 PM

#

alguem sabe se consigo
adicionar a minha voz
e a inteligencia artificial
gerar audios com ela ?

pure brook Jul 11, 2023, 5:53 PM

#

quaint night this could come in handy

😱

#

I will try that out later. I way more time on this then id like to admit today haha

#

ehhh i have to try it. Ill do some reading to see how to add this in

full badger Jul 11, 2023, 7:43 PM

#

test

dire seal Jul 11, 2023, 8:19 PM

#

/bark

fast girder Jul 11, 2023, 8:20 PM

#

dire seal /bark

u can do in #🐶┃bark-beta

pure brook Jul 11, 2023, 10:44 PM

#

cedar ledge Jul 11, 2023, 11:56 PM

#

/bark

pure brook Jul 11, 2023, 11:58 PM

#

quaint night this could come in handy

so I tried this, but its not working. I renamed it, and changed generation.py to make the index higher

#

is it v1 or v2?

remote turret Jul 12, 2023, 12:03 AM

#

#🐶┃bark-beta

pure brook Jul 12, 2023, 12:33 AM

#

sonic robin Jul 12, 2023, 1:19 AM

#

olá

pure brook Jul 12, 2023, 1:26 AM

#

quaint night Jul 12, 2023, 5:16 AM

#

pure brook so I tried this, but its not working. I renamed it, and changed generation.py to...

It's a history prompt so it doesn't have a version, it's supposed to be loaded as a dictionary and passed as history_prompt parameter

quaint night Jul 12, 2023, 9:27 AM

#

Here's Bark + Demucs:
Original (Bark):

#

then, seperating vocals using Demucs:

#

Running vocos @ 3.0kbps on the isolated voice:

chrome tapir Jul 12, 2023, 10:42 AM

#

yo yo

#

workin on a new chatgpt/suno/audiocraft project

#

dis gon b gud

fast girder Jul 12, 2023, 12:12 PM

#

quaint night + Running vocos @ 3.0kbps on the isolated voice:

this is awesome!

humble ruin Jul 12, 2023, 3:31 PM

#

Oi, tudo bem ?
Aqui quem fala é o Fernando, sou o Especialista no tratamento com o GOTA MAX, e vou te ajudar neste atendimento.

grizzled shard Jul 14, 2023, 2:12 PM

#

long generation with my Plugin.

teal grove Jul 14, 2023, 7:54 PM

#

any possibility for elevenlabs quality tts voice?

#

it's like these voices are passed through a filter

#

and they end up sounding like they have some slight (graininess/machinelike/synthlike) quality which is (at the moment) difficult to hide

#

ebon widget Jul 14, 2023, 7:59 PM

#

yeah we are working on that for the next version. clean history prompts defo help but ultimately it's limited by the codec

#

generally the variation is a feature since it can do arbitrary audio but it needs to be controllable to remove it for TTS use cases

teal grove Jul 14, 2023, 7:59 PM

#

how do these two audio files compare to your ear?

teal grove Jul 14, 2023, 8:02 PM

#

teal grove

SPOILERS do not read until you tell me your first impression ||(I don't care about the true quality in this instance, I'm fine with there being an illusion of quality)||

#

it can be harder to tell the difference if you hear the same thing over and over and get used to the sound (potentially)

ebon widget Jul 14, 2023, 8:03 PM

#

the cleaned one sounds a bit richer, but not necessarily less noise

teal grove Jul 14, 2023, 8:03 PM

#

richer how & 2) by noise, do you mean machine-like?

#

you say it sounds a tad ... better? (i hope)

#

spoiler || it's an EQ + distortion to hopefully make the sound sample sound less tinny ||

#

|| it actually sounds awful with distortion only ||

#

actually nvm after a short break i can tell it's the same sample quality

#

eh well

teal grove Jul 14, 2023, 8:25 PM

#

i tried cleaning a dile that sounds like this:

#

i removed the noise profile in the file ending with _2 from the file ending with _4 to produce the file ending with _3

#

in fl studio

#

tell me how good this denoiser is

#

i think it can work, i'd just need to isolate the worst sounding parts of the audio by hand and obsess over it for a while to hopefully get a better result

#

it's come out muffled because the noise profile was in the higher frequencies

#

(used Edison)

#

these last two, after applying pitch and formant effects, it's like there's background noise

#

and idk exactly _7 is

#

lol

#

is suno bark just actually cutting up audio and stitching it together?

#

in creative ways

#

like sometimes it will say things that aren't in the prompt that i assume are from the training material

#

or are just made from some sort of manner of processing like the music is

#

part of this prompt is me trying to make the voice sound angry (history prompts)

teal grove Jul 14, 2023, 8:50 PM

#

teal grove part of this prompt is me trying to make the voice sound angry (history prompts)

^ the prompt i used to generate the speech

#

perhaps it is impossible to have a clean take because after using distortion what sounds totally clean actually contains faint noise that might say something else or have other noises/sounds, with the loudest sounds being closer to the prompt and the faintest being furthest (usually).

#

idk actually because sometimes it loudly says something else

teal grove Jul 14, 2023, 8:53 PM

#

teal grove perhaps it is impossible to have a clean take because after using distortion wha...

perhaps this is the reason why the prompts degrade with each iteration?

#

(or so I've heard)

teal grove Jul 14, 2023, 9:16 PM

#

I guess there isn’t an AI of any kind that is available to the public that sounds perfectly natural in speech.

hushed hull Jul 15, 2023, 7:37 AM

#

The Selfie Song - Made with Suno 😂 (with postfx)

grizzled shard Jul 15, 2023, 4:27 PM

#

teal grove any possibility for elevenlabs quality tts voice?

I have used the Vocos vocoder in this example: #🐣┃suno-showcase message no manual editing afterwards.
its added in my Whispering Tiger Plugin. Others might have added Vocos as well in their project. not sure which ones though.

quaint night Jul 15, 2023, 7:44 PM

#

tts-generation-webui has vocos from npz, so you can run it on past generations. Vocos can be applied on wav (incl. mp3 etc) as well

teal grove Jul 15, 2023, 9:59 PM

#

grizzled shard long generation with my Plugin.

Amazing!

#

still slightly you-know-ish but yeah

teal grove Jul 15, 2023, 10:02 PM

#

grizzled shard I have used the Vocos vocoder in this example: https://discord.com/channels/1069...

where do i find & install this plugin

#

found this https://github.com/charactr-platform/vocos/blob/main/notebooks/Bark%2BVocos.ipynb

GitHub

vocos/notebooks/Bark+Vocos.ipynb at main · charactr-platform/vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis - charactr-platform/vocos

#

will try this next

teal grove Jul 15, 2023, 10:42 PM

#

how do i fix this

#

#

Oh my, I am a mug, my old scripts have the same error. Oh well.

#

But they still work though

#

Nvm I’ll try reinstalling / installing pysoundfile 🥱

teal grove Jul 15, 2023, 11:43 PM

#

HOLYYYYYY

#

DAAAMMMMMNNNN

#

teal grove Jul 15, 2023, 11:44 PM

#

grizzled shard I have used the Vocos vocoder in this example: https://discord.com/channels/1069...

THANK YOU FOR TELLING ME 🙏 🙏 🙏

#

(btw the text prompt was the test script from the link)

#

the audio quality in these samples are good enough to not cause annoying glitch sounds when being processed by the pitch shifter plugin in FL Studio 👌

#

(so far)

#

I'm gonna try accessing my old voice preset tomorrow and seeing how much better it sounds with vocos

chrome tapir Jul 16, 2023, 2:49 AM

#

chrome tapir Jul 16, 2023, 2:49 AM

#

teal grove the audio quality in these samples are good enough to not cause annoying glitch ...

messing with tempo produces great results too

#

like this for instance

tardy topaz Jul 16, 2023, 4:16 AM

#

Planets Planets. 🌍 🌐

chrome tapir Jul 16, 2023, 5:27 AM

#

whatup fly

#

hav u tried this https://huggingface.co/spaces/lj1995/vocal2guitar

Vocal2guitar - a Hugging Face Space by lj1995

#

seems cool

quaint night Jul 16, 2023, 7:30 AM

#

That's RVC

chrome tapir Jul 16, 2023, 9:34 AM

#

i havent tried RVC yet

#

i cant wait until we get stereo effects. imagine the panning effects AI will be able to do

teal grove Jul 16, 2023, 6:40 PM

#

chrome tapir

Is the title the prompt?

#

I’d want to make the sung parts without background music and then make my own tracks I’ll see how that goes

hazy whale Jul 16, 2023, 7:36 PM

#

using suno to add vocals to my tracks in ableton

teal grove Jul 16, 2023, 9:10 PM

#

hazy whale using suno to add vocals to my tracks in ableton

Neat 🔥

chrome tapir Jul 16, 2023, 11:31 PM

#

teal grove Is the title the prompt?

🎵Aha aha aha aha, doo doo doo doo, yeah yeah yeah yeah, ooo ooo ooo ooo.🎵 was teh full prompt. if you put less you wont get a full 14 seconds

chrome tapir Jul 16, 2023, 11:32 PM

#

hazy whale using suno to add vocals to my tracks in ableton

sounds great. i was lookin for a vst that matches vocals to a beat but the best i can find is if you already have 2 closely matching waveforms

#

and obviously suno output isnt going to match a already existing waveform

hazy whale Jul 17, 2023, 1:15 AM

#

put a ton of effects to make it sound better generally using ovox by waves or maybe antares coudl tune the vocals

tardy topaz Jul 17, 2023, 2:38 AM

#

chrome tapir 🎵Aha aha aha aha, doo doo doo doo, yeah yeah yeah yeah, ooo ooo ooo ooo.🎵 was ...

There is an option to force maximum length no matter what, so you always get 14s, which is quite useful for non speaking prompts like this.

chrome tapir Jul 17, 2023, 2:56 AM

#

chrome tapir Jul 17, 2023, 2:56 AM

#

tardy topaz There is an option to force maximum length no matter what, so you always get 14s...

oh im still using the OG version

chrome tapir Jul 17, 2023, 3:01 AM

#

hazy whale put a ton of effects to make it sound better generally using ovox by waves or ma...

yeah sounds great. im just too lazy to import into a sampler or match up the vocals to the beat properly manually. hopefully AI voice can stay on beat soon

#

RVC seems cool

tardy topaz Jul 17, 2023, 3:07 AM

#

chrome tapir oh im still using the OG version

It's in the original, but in the lower level functions, set allow_early_stop to False right here: https://github.com/suno-ai/bark/blob/599fed040e52c89e0b3580e02e2684b2c9100701/bark/generation.py#L386

GitHub

bark/bark/generation.py at 599fed040e52c89e0b3580e02e2684b2c9100701...

🔊 Text-Prompted Generative Audio Model. Contribute to suno-ai/bark development by creating an account on GitHub.

chrome tapir Jul 17, 2023, 3:19 AM

#

tardy topaz It's in the original, but in the lower level functions, set allow_early_stop to ...

thanks i am gonna try that

#

this npz stays on beat well

#

npz kinda like npc huh

chrome tapir Jul 17, 2023, 3:24 AM

#

chrome tapir thanks i am gonna try that

i set allow_early_stop=False,
but its not workin is there something in bark_perform i need to change

#

i wonder what semantic rate does

tardy topaz Jul 17, 2023, 3:29 AM

#

of it's bark_perform that's --semantic_allow_early_stop False

#

I didn't realize you were using my fork. if that doesn't work let me know, I'll fix that right now

tardy topaz Jul 17, 2023, 3:37 AM

#

chrome tapir i wonder what semantic rate does

semantic_rate is just a constant value, it is time each token represents in terms of actual audio time. It's not something you can change.

chrome tapir Jul 17, 2023, 3:39 AM

#

oh i thought maybe it would speed up speech

#

sweet now i get 15 seconds with only 2 words of prompt

#

will this be useful i guess we'll see

tardy topaz Jul 17, 2023, 3:42 AM

#

chrome tapir sweet now i get 15 seconds with only 2 words of prompt

For music you can also use a blank text prompt, with a previous music .npz file. However I'm not sure that works in bark_perform.py. I can check.

#

BTW, for generic music try stuff like [music][music][music] it seems silly but repeated tags can be good for that

#

Can you run the python bark_webui.py instead? That has a checkbox for 'blank text' prompts for sure

#

#

BTW that's also fun to use with voices... #🐣┃suno-showcase message

#

Even [music][music][music][music][music][music]

chrome tapir Jul 17, 2023, 3:50 AM

#

i am way behind on your fork atm. im still using the very first one but i hacked it up so i dont want to upgrade until i have to

tardy topaz Jul 17, 2023, 3:50 AM

#

chrome tapir i am way behind on your fork atm. im still using the very first one but i hacked...

respect

chrome tapir Jul 17, 2023, 3:51 AM

#

i am gettin some good results with just (dance beat)

#

ill try music too

tardy topaz Jul 17, 2023, 3:51 AM

#

In general you can try both () and []

#

and with some voices, one works, and the other doesn't!

#

Oh also stars. Try * dance beat * or * ominous music * etc

chrome tapir Jul 17, 2023, 3:53 AM

#

i seen someone mention there is a way to use lora's with bark is that just a npz file or different

tardy topaz Jul 17, 2023, 3:54 AM

#

Different. I still need to try it myself. For fine-tuning voice clones on the model.

chrome tapir Jul 17, 2023, 3:54 AM

#

oh yeah like obama heh

#

aw shit

tardy topaz Jul 17, 2023, 3:58 AM

#

Not allowing early stop keeps things... interesting

#

Breaking into random words is kind of a vibe honestly

chrome tapir Jul 17, 2023, 4:00 AM

#

everyone loves the wildcard

#

until it starts screaming in your ear at 0db clipping

tardy topaz Jul 17, 2023, 4:01 AM

#

I don't what your defaults are by setting topp and topk, they might be none. Latley I've been using like topk 200 and 150 on corase

#

topp 0.95 or even a bit higher

#

solid!

#

📎 dance_music_-23-0717-0001-34-SPK-random.mp3.npz

chrome tapir Jul 17, 2023, 4:02 AM

#

wow nice little guitar riff in there

#

sometimes npzs dont work u notice that

tardy topaz Jul 17, 2023, 4:03 AM

#

That might be the old code, not sure

chrome tapir Jul 17, 2023, 4:03 AM

#

i mean they work but sound completely different

tardy topaz Jul 17, 2023, 4:03 AM

#

oh that, yeah

#

that's a complicated issue. especially music is prone to failure

chrome tapir Jul 17, 2023, 4:04 AM

#

i went from 100% music to some guy talkin yeah

tardy topaz Jul 17, 2023, 4:04 AM

#

It's the end of the song, and then the radio DJ talking. Sounds pretty accurate! Not the words, but the tone of speech is great.

chrome tapir Jul 17, 2023, 4:05 AM

#

fastback at the residence great song

tardy topaz Jul 17, 2023, 4:05 AM

#

Bark names it's own songs!

#

lol

#

( dance music ) works but ( Christmas music ) is a nightmare. Only thing I got that was close. I love how they all have a radio DJ outtro though!

#

One downside to not allowing early stop. If your .npz is the radio outro, you won't be able to easily recover the music.

#

Since Bark will continue the end of the clip, which is a person talking. Though you can go in and change this with work, it's not a built in feature. And picking a random point tends to also work bad. But if it's truly a one of a kind .npz, send to me, I'll fix it.

#

Uhhh... sometimes Bark feels like it's giving you a 14s audio clip ripped from the multiverse, just a glimpse of something somewhere happening, lol. Is that a crying baby in the background?!? Actually maybe a cat.

#

Oh there's a tiny tiny bit of really good music in here, but 90% of the clip is total madness.

chrome tapir Jul 17, 2023, 5:05 AM

#

tardy topaz Since Bark will continue the end of the clip, which is a person talking. Though ...

oh good to know thanks

chrome tapir Jul 17, 2023, 5:07 AM

#

tardy topaz Uhhh... sometimes Bark feels like it's giving you a 14s audio clip ripped from t...

yeah its like dialing a random number into bill & teds phone booth

#

considering how little we know of quantum mechanics i sometimes think AI reaches a threshold that allows something from the consciousness field to enter

tardy topaz Jul 17, 2023, 5:08 AM

#

I'm dialing in some tweaks...

chrome tapir Jul 17, 2023, 5:08 AM

#

oo thats nice

#

thats musicmusicmusic or u think the parameters are helping

tardy topaz Jul 17, 2023, 5:10 AM

#

Actually what's making it good is my buggy code has been sampling twice the whole time... !

#

(it's 30 seconds becuse I just bugged it trying stuff, it's just duping the audio)

#

SICK. Okay trying all kinds of sampling settings, everything CAN work.

tardy topaz Jul 17, 2023, 5:33 AM

#

Oh I actually have some other code enabled... doesn't work in base bark. gonna have to reverse engineer what I did to make music so good by accident actually.

tardy topaz Jul 17, 2023, 12:54 PM

#

The range inside Bark is wild.

north loom Jul 17, 2023, 1:12 PM

#

even scaries than original! :p 😮

tardy topaz Jul 17, 2023, 1:19 PM

#

Was there anything beyond the words in the prompt? Oh I am sleepy today. I can just see the chat bot message. Yeah it's very long prompt, nice.

north loom Jul 17, 2023, 1:49 PM

#

tardy topaz Was there anything beyond the words in the prompt? Oh I am sleepy today. I can j...

copy pasted from 2001 movie.

tardy topaz Jul 17, 2023, 1:50 PM

#

north loom copy pasted from 2001 movie.

It's a great prompt, I just tried it a bunch. Bark does vaguely mimic the performance much of the time! Also the outputs are just really good quality.

north loom Jul 17, 2023, 3:21 PM

#

aaaand video: https://www.youtube.com/watch?v=Y4RrMDN_ZDU https://twitter.com/AIlvessuo/status/1680959951756316673

YouTube

MahaAnba

We were akin to plants, rooted in our past, struggling against the ...

Summary from:
https://twitter.com/AndrewCritchCA/status/1680461874171658242

Once, in a past now distant, humanity faced a formidable challenge: the unchecked acceleration of artificial intelligence. Viewed through the lens of AI, humans were but slow-moving, sentient flora, showing flickers of intelligence in their unhurried existence.

Imagin...

▶ Play video

Antti Ilvessuo (@AIlvessuo)

@AndrewCritchCA As video. Slim change that @ESYudkowsky @elonmusk notice also but let's try. @AndrewCritchCA your summary was spot on.

▶ Play video

tardy topaz Jul 17, 2023, 3:23 PM

#

Boltzmann brain. Greg Egan fan maybe?

north loom Jul 17, 2023, 3:49 PM

#

More like fan of all secret stuff 😉 https://www.eurogamer.net/trials-evolutions-insane-century-spanning-arg-scavenger-hunt-solved

Eurogamer.net

Trials Evolution's insane century-spanning ARG/scavenger hunt solved

Astute readers may remember that Trials HD contained loads of cryptic easter eggs and fourth wall-breaking riddles. Nat…

quaint night Jul 17, 2023, 9:01 PM

#

tardy topaz The range inside Bark is wild.

Jonathaaaan, how did you hack Bark again? 😄

quaint night Jul 17, 2023, 9:18 PM

#

quaint night Jul 17, 2023, 9:46 PM

#

tardy topaz Jul 17, 2023, 9:57 PM

#

It's nice, but it's just not a Bark voice to me without like, the sound of baby crying in the background and the speaker bumping the microphone halfway through. lol

quaint night Jul 17, 2023, 9:57 PM

#

lmao

tardy topaz Jul 17, 2023, 9:58 PM

#

pod3000 said "Bark feels like dialing a random number into bill & teds phone booth" truly perfectly put

quaint night Jul 17, 2023, 9:58 PM

#

haha

#

True, some of the best voices are tied to some bgm or inexplicable noise

tardy topaz Jul 17, 2023, 10:00 PM

#

More and more I'm surprised how much control you actually do have with the text prompt.

#

Some prompts are like, 90% very similar voices.

quaint night Jul 17, 2023, 10:03 PM

#

but is it control or specificity? like, can you actually tell it what you want it do to, or do you have the key for a specific output?

tardy topaz Jul 17, 2023, 10:04 PM

#

Yeah I don't mean control as in fine grained control of performance. Just for 'summoning' the the random voice initially. After that it's much hairier.

#

I find direct descriptions can work, but it's usually not the best way. Like saying, "I'm a chatbot" seems to work better than describing the chatbot, via other prompting methods.

quaint night Jul 17, 2023, 10:07 PM

#

I remember trying that with genders, it did not work for me lol. I think the audiobooks in the dataset form a "prompt resistance"

#

however I know that with trying to get a "cheerier" tone, it's useful to add a [laugh] etc

tardy topaz Jul 17, 2023, 10:09 PM

#

On a generic meditation like:

Listen to my soothing, relaxing voice. Breathe calmly in, and out. Slowly close your eyes. Continue to breathe at this slow pace. Feel the air expand your lungs with each in breath.

I was getting like 80+ percent women. Even adding "Bond. James Bond." only knocked it down to like 50.

quaint night Jul 17, 2023, 10:09 PM

#

Oh, yeah, there's a bias

#

I think it would be interesting to have a -audiobook or -female +male "bias controls"

tame cobalt Jul 17, 2023, 10:21 PM

#

Can we train based on our own voices?

quaint night Jul 17, 2023, 10:23 PM

#

tame cobalt Can we train based on our own voices?

There are voice cloning models, which try to make a "voice" that matches yours rather than training the base model

tame cobalt Jul 17, 2023, 10:24 PM

#

Do they work with bark?

#

I only know elevenlabs

quaint night Jul 17, 2023, 10:24 PM

#

Also tokens that silence noise really exist, huh, if only it wasn't such a pain to employ them

#

Yes, bark has voice cloning options

tame cobalt Jul 17, 2023, 10:27 PM

#

Okay, interesting. Thanks!

#

Man what a crazy time to be alive 😄

quaint night Jul 17, 2023, 10:56 PM

#

light bronze Jul 17, 2023, 10:56 PM

#

I watch a lot of “true” (fake) scary stories YouTube channels to fall asleep to at night and decided to try to make one myself completely with AI. The stories are written by chatgpt, images by dalle, and voices by bark. The stories are pretty corny but the voice would almost be passable if it weren’t for the hallucinations between text chunks I think https://youtube.com/watch?v=64iOfT6YI0E&feature=sharea

YouTube

Cameron Sima

True Scary Stories

▶ Play video

#

All stitched together with python/ moviepy so basically 0 human intervention whatsoever lol

tardy topaz Jul 17, 2023, 11:15 PM

#

One thing Bark absolutely crushes at, largely but not entirely by accident, is horror audio. So much creepy distortion, fading voices, babbling, unrecognizable sounds. I have so many samples that could have been a clip in a horror movie.

It's a little hard to leverage it on purpose, but I've scared myself by playing a longer Bark sample at night. You hit a quiet spot where nothing happens for 4 seconds, you think the audio sample is done so you almost forget about it. Then you suddenly hear some unnatural voice screaming out of nowhere fading into static. And it's the end of your Bark audio sample, it just went off.

quaint night Jul 17, 2023, 11:29 PM

#

tardy topaz Jul 17, 2023, 11:32 PM

#

It's crazy consistent, right?

light bronze Jul 18, 2023, 1:02 AM

#

Have any npz prompts you could share oritented towars horror?

tardy topaz Jul 18, 2023, 1:10 AM

#

It's usually the result of a voice switch, where the .npz voice completely changes half way through the audio. Which is normally a bad npz you don't keep, so I haven't been really setting them aside. But I totally should have for the especially creepy ones, and will in future.

tardy topaz Jul 18, 2023, 1:27 AM

#

light bronze Have any npz prompts you could share oritented towars horror?

Actually ripping this idea from Suno's mc, meditation voices are dual use and basically kind of work as horror voices when you give them different prompts, check it

#

4 voices like that, + the clips with the npz, might be something usable. It's fairly creepy.

📎 meditation_turns_to_horror.zip

#

The whiny electric noise... works in horror

#

one more clean audio, though a bit less creepy. actually ends completely blowing the mood, lol

📎 meditation_one_more_clean_audio.zip

#

Probably worth tryingg some meditation prompts to find slow whispery voices, they are a close fit, and meditation prompts are very strong and consistent with random voices.

chrome tapir Jul 18, 2023, 5:41 AM

#

make anything good with audiocraft?

#

i got lucky a few times

#

suno is good for beats but audiocraft is better for melodys and non-percussion instruments

chrome tapir Jul 18, 2023, 6:10 AM

#

actually u can make some sick drums in audiocraft

quaint night Jul 18, 2023, 9:52 AM

#

tardy topaz one more clean audio, though a bit less creepy. actually ends completely blowing...

"jadefixed" lol

quaint night Jul 18, 2023, 12:49 PM

#

你想喝奶茶吗？

#

不喝，不好吃

#

,,,,,,,,

light bronze Jul 18, 2023, 1:50 PM

#

Any tips on getting rid of the hallucinations? I followed the tips on the GitHub (eos setting or whatever it is)

quaint night Jul 18, 2023, 2:23 PM

#

Eos would only help with overextending

#

As for regular old hallucinations, they kind of just happen, but more often for some cases than others

#

Depending on voice, prompt etc

fast girder Jul 18, 2023, 2:25 PM

#

The main one we've seen is attempting not to understuff or overstuff the prompt

quaint night Jul 18, 2023, 2:30 PM

#

I've seen that a longer history (i.e. 2 sentences) doesn't like generating a short phrase (like few words)

tardy topaz Jul 18, 2023, 6:03 PM

#

light bronze Any tips on getting rid of the hallucinations? I followed the tips on the GitHub...

EOS can help maybe for end-of-text hallucinations, but some general things:

Prompt is too long. Generally you can use a longer prompt than the speaker can finish without causing hallucinations, but it can go too far.
Prompt is too short. Bark really wants to generate at least 6-8 seconds, even if it has to add words.
Prompt and speaker style mismatch. In the extreme, a difference language than the voice. Also: formal vs casual, accents, Old English versus modern slang, etc
Prompt with non-spoken text generally. (laughs) [screaming] MAN: WOMAN: and so on. Some voices work fine, others fall apart entirely.
Prompt and speaker lower level mismatch. Your prompt is an unnatural followup to the prompt in original voice. Like if the original random voice prompt ends "I like eating pizza at" Then you prompt: "My name is Suno." as the next words. Bad fit.
A quirk of Bark voices and repetition. When a voice speaks the same (or very similar) prompt as the one that created the voice in the first place.
A thing some speakers have a high tendency to do, for mysterious reasons. There's just something in that voice that makes it more likely.
Random chance. Bark just decided to dial a random number into bill & teds phone booth and record 14s of audio, instead of reading your text. It happens.

#

For a lot of these, it just makes the chance of hallucinations somewhat more likely, and for the most part may work fine.

alpine swift Jul 19, 2023, 1:29 AM

#

Without the voice nor model to have heard sonata arctica's "i have a right", that was quite sour HAhaa

tardy topaz Jul 19, 2023, 1:56 AM

#

alpine swift Without the voice nor model to have heard sonata arctica's "i have a right", tha...

Singing voices are tricky, nice one, regardless!

alpine swift Jul 19, 2023, 1:57 AM

#

tardy topaz Singing voices are tricky, nice one, regardless!

Aye! Thanks :P And wish me luck getting this one to sound not bad kek

tardy topaz Jul 19, 2023, 1:58 AM

#

Haha, good luck. You mght have a lot of ear piercing sounds in your future. I have made some recent progress with music Bark a bit recently, but it's much harder than regular voices.

alpine swift Jul 19, 2023, 1:58 AM

#

Currently using bark infinity. But don't know where i can get better models :P or even how to train a quite high quality to be NPZ as it turned out to sound the same as the sample, even if the long mp3 is quite damn quality :P

tardy topaz Jul 19, 2023, 1:59 AM

#

My guess is a simpler prompt might work better, maybe:
[intense music][intense music]My lyrics [intense music] [intense music]

#

no note symbols maybe too

alpine swift Jul 19, 2023, 1:59 AM

#

tardy topaz Haha, good luck. You mght have a lot of ear piercing sounds in your future. I ha...

Aye. Wanted to use seeds to test different ones manually or "that one i like, let me reuse that seed". No custom stuff worked, just errors out, and can't reuse seeds eugh

tardy topaz Jul 19, 2023, 1:59 AM

#

Change 'intense' or something else

#

That was just a sample. But that works somewhat better with just [music] for example

#

maybe [power ballad] I don't know. It's all undiscovered what works best

alpine swift Jul 19, 2023, 2:00 AM

#

tardy topaz My guess is a simpler prompt might work better, maybe: ```[intense music][intens...

Aye. so guessing all the prompts there are just guessing what works? And not a "these are the ones that will work"? :P Is there a "already discovered library" i can look through for emptions for voice, and types of music?

tardy topaz Jul 19, 2023, 2:01 AM

#

Music is mostly undiscovered. For a random voice, I think focus on the TEXT not the [brackets]

tardy topaz Jul 19, 2023, 2:02 AM

#

alpine swift Aye. so guessing all the prompts there are just guessing what works? And not a "...

The example I've been giving people is to try this prompt:

Listen to my soothing, relaxing voice. Breathe calmly in, and out. Slowly close your eyes. Continue to breathe at this slow pace. Feel the air expand your lungs with each in breath.

Notice how all the random voices are pretty similar!

#

Like 8 of 10 will be a slow super calm, sometimes whispering, female.

#

That's how influential the right text can be.

#

Also lately I've been using top-k 300 semantic, and top-k 150 or 200 coarse. I think it might be better.

alpine swift Jul 19, 2023, 2:03 AM

#

Indeed. Been looking for new/other trained stuff as well, or if bark is the only text to voice synthesizer :P

alpine swift Jul 19, 2023, 2:04 AM

#

tardy topaz Also lately I've been using top-k 300 semantic, and top-k 150 or 200 coarse. I t...

Ah, i can't use those, just errors out

tardy topaz Jul 19, 2023, 2:04 AM

#

Are you on AMD?

#

It should work unless you are on AMD.

#

I can make it work on AMD, but honestly it's a low priority. I might just wait for better AMD support a bit.

alpine swift Jul 19, 2023, 2:05 AM

#

Nvidia. 3090

tardy topaz Jul 19, 2023, 2:05 AM

#

If you get those error on NVIDIA, then something is wrong.

#

Okay two things. One, are are using a seed? Turn it off if so. Set it to 0.

#

Two, well, I'm not sure but your Python AI setup might be a little screwy there.

alpine swift Jul 19, 2023, 2:06 AM

#

That's what i wanted. Why won't seed work? As i wanted a unified result and not first the female voice i trained, then into a random scottish lad a second after throuout the last 13 sec lol

#

Oh boi..

tardy topaz Jul 19, 2023, 2:06 AM

#

Okay, so the seed doesn't help you for that. The seed only meands you get the SAME female and then the SAME scottish lad.

#

The seed should not cause an error with topk, that probably is a bug... but also you shouldn't use a seed.

#

The seed is a random number seed. So it means you get the same voice with the 100% exact same prompt. If you change anything, use the voice again, it does not help with consistency in that way.

alpine swift Jul 19, 2023, 2:08 AM

#

And this is why seed is important kek

tardy topaz Jul 19, 2023, 2:10 AM

#

The seed has its uses, but it's not really useful the way you want to be. It makes the reproducible. Like if you played an RPG and rolled a D20 10 times. If you used the same seed, all D20 rolls will be the same sequence of numbers.

#

However, it does not makes all the D20s rolls similar, right?

alpine swift Jul 19, 2023, 2:10 AM

#

No idea what a D20 is