#🐣┃suno-showcase

1 messages · Page 2 of 1

chrome tapir
#

haha

#

he nailed it

simple bison
#

prompt: "[laughter] [laughs] [sighs] [music] ♪ [gasps] [clears throat]"
Output: strange noise, and guy gets startled awake and attempts to answer the teachers language question.

chrome tapir
#

i think think thats his way of saying i aint got the capacity for dat

simple bison
chrome tapir
#

guitar didnt work for me well

#

i ahve an idea

simple bison
#

prompt: "[slide whistle] [music] [clears throat][laughter] [music] [laughs] [sighs] ♪ [gasps] — [slide whistle]"
output: 🔨 🔨 🔨 🔨 🔨 🔨 🔨

chrome tapir
#

i cant stop thinking about the victim of the mosely

#

what was he trying to say

#

what the hell was that laff

#

haha

simple bison
lost anvil
simple bison
#

"its safe to waze your fuldore feelings"

chrome tapir
simple bison
chrome tapir
#

(beat with lyrics) YEAH, YEAH, YEAH, YEAH, YEAH, YEAH, YEAH, YEAH, hey, hey, you, you, I don't like your girlfriend, no way, no way, I think you need a new one

#

didnt make it all theway

#

got hung up on the yeahs understandably

#

(beat with lyrics) seems good

simple bison
chrome tapir
#

i made some crazy tongue twisters with chatgpt4

simple bison
#

i had to screenshot the prompt because it is blocked by discord, i suspect repeating the same words or all the brackets...
but then Jairo Correa posts audio of the rules...
and i googled "the mosley victim" and there was an event 10 days ago, which did involve some of the things said in the audio clips, lol

chrome tapir
#

wut

simple bison
#

a hearing, medical, etc...

chrome tapir
#

Flendiferous plibber-klorping slazzles zlungled slebbidly dlorbitant blurking klentacles, gleebulating qlibber-mlungulated vlivvers strandiferously, jlurching qlabberwabbled lipty-lpotch plibberations, whilst trondiforously clonking plaggled qlibberwinks, plargulating slibberwocked bligglets, and dlalumphing glabberdoodling rlizzwinks in plibber-sprangled tlazmires, vlurbing qlibber-splattered qlabber-splonks, flewting slabber-splorped plarfiggles splatteringly splangled, plibber-splurting rlizzulated slonktacles plibberwabbled, and slurbblingly qlibber-slurgled slibberwobble-slazzled plappledapples, plibber-plonking qlibber-splorped glabber-slurgles, plibber-plungulating plibber-slazzled dlentiferous glabberwocked spleebulations in plibber-sprangled plazmires.

#

someone try that with a long prompt

#

i cant get it in 14 seconds

simple bison
chrome tapir
#

at least the mic wasnt turned up to 15 like it is most the time

simple bison
simple bison
simple bison
chrome tapir
#

are your temps set to .7?

simple bison
chrome tapir
#

probably .7

simple bison
#

just text + choose voice

#

[speaking fast] djfklglskgjhoewirjg;eiorjhwiqspteiucpnmxitzkqiaistuaiupjaeprfdijtgpetrjatgemverbzqyfvtdczsabqcwbsnudtfvybyghinjmokopls,cxpwkcmjvneuthybvruthvniejkmdicwopkslmxokwsnxjqiysgfoqwenwiuetywejnwfgosduhisugrpqueyrweiuqypqiupgiqusdgkfhgkgfvkbzcmvnncvmzfdvbjdfhglakdjghqeriutyqeoiuryqpiuweyqipwetogqeageveurfcsrdexredzaecryegbxruahwexrihjaeirjewrokmokg,oky,oukympjn,uypj,jukmjuhbndyouhdvnityirtiyegfjkgdskm,fbadfjygarfhnjalsruifagerkjmnlgurbhgliadnriluyghsilhgiyeahioaiyreotiuyerptuiyewriouytrfganfjc,bihn.,ihbx,rwihihbxvhmwvbxwrmfchmcsfdmhrmcfousdfuosrncirxgxvzyqzvqfnxurcfgncgmxbmxerxnqncfruoegmrvqeqvqpetewimtvwregbgjgvkfdvmgks [speaking fast]

simple bison
chrome tapir
#

what did you try to send me in general i just noticed

#

you got busted for sending porn again

#

haha

simple bison
#

a link to the walmart music video

chrome tapir
#

oh ok

#

i forgot about that banger

#

ok u can delete it i got it

simple bison
#

hilarious song

chrome tapir
#

hopefully AI can make a whole video like that soon

simple bison
#

❤️ AI ❤️

#

[rap song] basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket basket [rap song]

#

i think when you make the prompt too big, it does whatever

#

(beat with lyrics) basket basket basket basket basket basket basket basket (beat with lyrics)
girl... it's a song, not a task.

simple bison
chrome tapir
#

hmm

#

its no we goin to walmart

violet narwhal
exotic spoke
simple bison
#

announcer

chrome tapir
novel sage
#

hello, too difficult for me to simply down load this program from GitHub? Do i need to first install PIP?

tardy topaz
#

I'm going to make an installer over the weekend, it's a bit annoying

blissful pulsar
#

create an venv before to stay clean.

hybrid flame
ebon widget
#

haha, amazing clips 🙂 btw we set up a channel #🐶┃bark-technical to encourage better sharing of npz files. hopefully that helps with finding prompts that are fun and clone well into new clips like travolta, jane etc

gleaming vale
#

I thought i'd try to see how far we can push this.... and I think I broke journalism. Here Ive brought together a few AIs - gpt3, Bark, etc. now I can give any 1000 word document to my code, and it will, in a single click, spit out a video.. here's my test: https://www.youtube.com/watch?v=hyi1CgXbZCg - let me know what you think.

Transcript...

VOICE1: Welcome everyone to the show. Today we are discussing a very timely and important topic - synthetic media and the potential harms it can cause. As many of you may know, synthetic media is media that has been produced with the help of artificial intelligence, and it includes things like deepfakes and AI-generated text, vide...

▶ Play video
blissful pulsar
blissful pulsar
hazy whale
blissful pulsar
edgy mango
edgy mango
chrome tapir
#

yo yo sunoheadz

#

dance beats or regular beats that is the question

#

both

#

a lot of these have 3 second brilliant parts and then the other 8 are rough

chrome tapir
#

i cant blame him i put lose control in the prompt

chrome tapir
#

wow i actually made a file that needed to be turned UP

desert vine
desert vine
#

If anyone have had a hard day and need some extra appreciation. 😄

gleaming vale
gleaming vale
tawny saddle
#

hello

edgy mango
desert vine
#

Will try to post it here when I get home. 😉👍

gleaming vale
desert vine
cyan pawn
night spoke
brave bramble
#

man :hi loli how are you today?
girl: hi Petir I'm good

#

man :hi loli how are you today?
girl: hi Petir I'm good

#

😭

night spoke
chrome tapir
#

if you ask bark to drop the bass it really drops the bass

#

lets see if history file works for that

green tartan
#

my favorite thing so far. it improvised a little ditty.

chrome tapir
plain skiff
#

made with chatGPT4:
In the depths of the ocean, creatures glow,
Bioluminescence, a light show,
A jellyfish whispers, "I bet you can't see,
I'm 95% water, just like tea!" [laughs]

Volcanoes erupt, spewing lava and ash,
Their molten rock flows, in a fiery flash,
A mountain yells, "I don't mean to boast,
But when I blow my top, I make the best toast!" [laughs]

The Earth is round, spinning with grace,
A giant blue marble, floating in space,
A cheeky astronaut once said with a grin,
"Gravity's the reason we don't just fly off into the wind!" [laughs]

Einstein was brilliant, his theories profound,
E=mc², a formula renowned,
He quipped with a smile, "It's all relative, you see,
The faster I go, the younger I'll be!" [laughs]

In this world of wonders, mysteries, and jokes,
Nature and humor, together they coax,
A laugh and a lesson, they bring us delight,
In this beautiful world, we take flight.

granite quiver
#

I tried to copy Scatman John's voice, not yet successful. The line "Scatman, fatman, black and white man, tell me about the color of your soul" from Scatland's World ended a bit creepy in my opinion.

stuck stag
#

how do u run it with mps? (my mps is enabled, just wanna know how to make bark work with it)

edgy mango
#

-- extension bark_tts

toxic helm
quasi oyster
#

using count floyd to get short form and long form generations

long egret
long egret
#

it's that infamous book story that Butters wrote, in South Park. this is an example of getting longer than a 13 second output by concatenating the output arrays into a single waveform before exporting that and compressing into MP3.

obtuse sparrow
#

Roughly nine minutes of Bark voices being obsessed with mares: https://youtu.be/_tHnB4BpRRg

Over nine minutes of various expressing love and praise for mares using the latest broad text-to-audio AI; Bark. It's capable of all sorts of audio input from text including speech, singing, music, sound effects, instrument samples, screeching, strangeness, etc.

https://github.com/suno-ai/bark | https://huggingface.co/spaces/suno/bark

All im...

▶ Play video
long egret
#

britney not-spears - hit me baby, one more time as sung/read by suno-bark, concatenating on line with custom voice

long egret
quasi oyster
#

used oobabooga, stable vicuna, bark, sd, and sad talker to make this:

#

bark's able to read stories quite naturally

grim pumice
long egret
#

@quasi oyster it's better to just add the music on top using nonlinear editor 😄

dusty lantern
long egret
fathom wadi
#

mmkay lol

brittle cobalt
#

okay yeah the music functionality is not overly precise at the moment lmao

edgy mango
quasi oyster
# edgy mango I was hoping this would be possible can you clarify how you got the sad talker w...

was a bit of work to get to this final result, I didn't link sad talker with oobabooga, here's my workflow

  1. generate a story from ooba
  2. copy the generated story into bark and generate the audio for it
  3. take a front view of one of your sd character generations and upscale it to 1536x1536
  4. generate the sad talker character using the bark audio, I tried using the long 2 minute audio, but my pc ran out of ram, so I did it in sections
  5. put it all back together in a video editing sofware
next girder
#

interesting, I put in "MAN: [laughs] How about we [beep] this place up!" and it said an actual swear instead of [beep]! Interesting... usually beeps actually work

#

same prompt, for the same speaker ("v2/en_speaker_1") hahaha these are so weird

#

this is an instance where the [beep] was generated correctly

next girder
quasi oyster
#

here's another raw infinite bark story result:

#

took it into adobe audio enhancer, but it changed up some words

#

@long egret how're you enhancing the audio?

long egret
#

the radio one isn't enhanced or edited at all. it's just straight out of Bark using announcer voice

long egret
#

StableLM telling a story about hot dogs for some reason.

granite quiver
icy shore
#

Story in the style of a redditor (by ChatGPT), using the long generation (advanced) method as explained by official Readme

night spoke
fast ferry
#

using smaller model btw

hearty forge
gaunt laurel
long egret
quasi oyster
#

updated to latest countFloyd commit

#

Im Puerto Rican and speak mostly spanglish, the spanish accent cracks me up

fallen rapids
long egret
#

the prompt was:

{man} So, I was thinking I could come over around 3?
{woman} And what would we do?
...
long egret
long egret
ebon widget
#

hehe awesome!

icy shore
tardy topaz
rigid sluice
chrome tapir
#

i might have to try blenderAI

quasi oyster
#

I have yet to figure out how to use the sunoai jupyter notebook locally to setup a conversation, but was able to go about it using count floyd pretty easily

fast ferry
tardy topaz
#

@proud yacht So my one-click installer went fine 4 days ago when I tested it, randomly doesn't work this week, lol. Might just a conda update? AHAH

marble pond
upbeat hull
long egret
quasi oyster
#

it seems that at the end of sentences after every period, the speaker seems to always choose to say either 'and' or 'umm'

ebon widget
#

haha, you can probably lower the threshold min_eos_p to help with that?

long egret
wild nacelle
torn fern
fossil valley
torn fern
slow bane
#

Hey everyone, welcome back to our channel! Today, we're going to explore sustainable living and resource management, and how small changes in our daily lives can make a big difference for our planet. If you're interested in making the world a greener place, then this video is for you! So, let's dive right in!

long egret
fast ferry
grand ether
#

{man} So, I was thinking I could come over around 3?
{woman} And what would we do?
{man} Let's just watch some anime memes [music] [songs]
{woman} [moan!]!

patent stone
#

哈哈哈,你好啊,今天天气真的不错哦

severe musk
#

test for huangxiaofa

robust sedge
#

Anyone else exploring song generation?

bronze peak
#

不会

bronze peak
midnight sphinx
#

introduce youself

floral kernel
floral kernel
long egret
#

use the long form generation notebook in the repo

floral kernel
#

I tried but it doesnt seem to generate a sound file ):

quaint night
tardy topaz
floral kernel
long egret
#

see JF's bark fork link there

quaint night
tardy topaz
quaint night
#

Thank you.
ngl gradio really has heavy limitations, I recommend doing research before investing too much time in a GUI with gradio

tardy topaz
#

I finally went back and changed my text fields to number fields, like somebody who didn't discover a weak ago

#

Yeah I know, I am dying here

#

Like for example, the dropdowns. I kept trying to make them show a different name to the user than the actual value. And apparently this isn't actually a feature. Like, I thought that was a fundamental definition of a dropdown lol

#

So you have pass a function and process it, like what

long egret
quaint night
#

actually the amount of gradio pain is so large we could fill a channel with it, I'll create a thread in technical discussions

tardy topaz
#

It's fascinating how if you use existing song lyrics, you don't need notes. This was sample #2 on a no-notes test, and it tracks the original melody pretty well!

#

Also I don't think line by line formatting makes a difference, or marginal if it does, it's just the lack of periods I believe

#

That acts like a music note

blissful pulsar
tardy topaz
#

I'm probably gonna be tied up for a couple days and never put out anything new, but there is a dev branch if you crave something new. https://github.com/JonathanFly/bark/tree/dev has some cool user templating stuff, and the main functions should be fine unless I broke it right before I pushed.

GitHub

🚀 BARK INFINITY GUI CMD 🎶 Powered Up Bark Text-prompted Generative Audio Model - GitHub - JonathanFly/bark at dev

lavish arrow
#

@tardy topaz incredible stuff dude

#

the whoami speech is insane

gloomy helm
#

how?

quasi oyster
untold smelt
untold smelt
quasi oyster
untold smelt
#

so cost 15 min to generate 2 min audio ? maybe you too need update your software

deft junco
#

run on gpu

#

much faster

untold smelt
deft junco
#

how would you get audio without using the notebook,? i run plain python on pc and got output in bool format playable with vlc not plain windows format wav.

#

not sure why yet

untold smelt
#

just use ffmpeg to fotmat it to mp3 or something ,

deft junco
#

#audio(np.concatenate(pieces), rate=SAMPLE_RATE)
audio_array = (np.concatenate(pieces))
write_wav("bark_generation.wav", SAMPLE_RATE, audio_array)

#

makes it , but bool form

untold smelt
#

as long as you get the audio , you can easily make changes to audio file .

deft junco
#

i think it is the encodeing part

#

what does the encoding?

#

i get audio

#

but my other projects get a windows playable format

#

jut trying to figure it out now

untold smelt
#

😹 i'm new too . don't have answer , yet

deft junco
#

i'm new too

#

🙂

#

this is what i get but plays fine in vlc

#

not windows

#

download and try it

untold smelt
deft junco
#

no small models

#

small gpu

#

also plays in itunes

#

but not in my windows players lol

#

also got it to play from array only

#

like if you wanted a conversation with a chatbot

#

RTX 3060TI

#

Float values allow for more precise representations of the data, which can be important for maintaining fidelity during encoding and decoding. Additionally, some encoding algorithms may require float values as input to perform certain operations, such as normalization or feature extraction. Overall, using float values can help ensure that the encoded data is as accurate and representative of the original data as possible.

floral kernel
tardy topaz
#

Well I could link to the different readme, that'd be easy

floral kernel
tardy topaz
#

It might even be faster, for some reason (more free GPU memory?)

viral lynx
#

haven't been able to extract the source audio from the announcer yet though, as that one is in uint16 format, while the others are in int32 or int64 (although i did have to manually convert int64 since gradio didn't have it built in. (just do data / 4295229444))

#

nvm don't do that division lol

floral kernel
#

That quality is insane, have you trained that model for those voices?

viral lynx
tardy topaz
viral lynx
#

interesting

tardy topaz
#

I turned the mutation up too high, but you can generate more subtle variations. Just access the semantic prompt, and treat it like a new sample. In this verison I also chop it up, so it's super diverged

#

I went a little overboard on the RNG for that one, but the more moderate oen is really good if you have a weird noise or hum

#

it'll usually generate a variant without it

viral lynx
#

i'm mainly looking into how bark history prompts work so i can attempt to make a voice cloner that generates the semantic prompt from the actual audio file instead of just generating one with the same text and then praying that it works lol

tardy topaz
#

You'd have to train a model

viral lynx
#

creating the training data will be easy but time consuming. and i'd probably use a markov chain to just quickly create a bunch of text for it lol

#

also the bark in my webui is like a monkeypatched frankenstein's monster from what it originally was lol

tardy topaz
#

mine too, it's such a mess i keep not integrating my actual new stuff

#

ugh

#

i've got some real smooth long clips now

viral lynx
#

funny to reuse semantics, you get a different voice. but same speech patterns (notice the "ssimilar" and the "like-")

small raven
#

asda

#

||哈哈||

sharp mural
#

Anyone knows how to create deep, rough voice like old man?

viral lynx
boreal crystal
#

这个如何使用来着?

plain sleet
tardy topaz
#
  1. Save the best voice, use it for your actual textt
quaint night
tardy topaz
#

I used this speech too much as a test sample to make sure nothing was bugged, starting to hear it in my dreams

quaint night
agile jungle
warm pond
tardy topaz
#

The very first music I tried was this silly Korean nonsense song and it's still one of the longest coherent clips, musically (also a bunch of others in the YouTube) https://www.youtube.com/watch?v=4pV9d25KqCE

A silly experiment with multi-lingual AI text, drawing, and music.
다국어 인공지능 텍스트, 음악, ChatGPT 그리기 및 노래로 바보 같은 실험.

If you're seeing this silly experiment in your YouTube feed, I apologize, I checked the box that says "don't publish this video" and I thought that's all I have to do. But I haven't made a video in three years I forget how this works...

▶ Play video
#

The second segment was one continuous fully feedback last clip as full history for next clip, no fancy merging or any tricks, but it somehow stayed coherent. The trick is just a single guitar I guess, not too complicatetd

#

I wish I had been saving exact generation parameters but at this time was totally trying random things

#

The only times I've gotten similar coherent Bark output is when I used a very well known song, and it literally outputs an approximation of the melody and chords. But this is the best so far with novel text

topaz dust
#

Hey Im new

fervent briar
weary adder
#

Just started experimenting with bark, hope more sounds can be added

tardy topaz
tardy topaz
acoustic umbra
#

Can anyone tell if i can use bark for creating podcast audio and upload it on YouTube ?

fervent briar
tardy topaz
#

One shotting full music seems tough, but you can gen good beats to build from

abstract basin
#

very nice

#

Punjab Caretaker Chief Minister Mohsin Naqvi has said that staging protests is the right of every political party but “when those political workers reach Cantt, they convert into terrorists”.

“The worker of a political party cannot attack Jinnah House (Lahore Corps Commander House), a terrorist has done it,” he said, adding that around 400 people had gone inside the building while 3,400 were outside.

“No matter what happens, we will not sit idle until each and every person involved in this is arrested.”

He said that there was “no doubt” that these protests were “pre-planned”.#adio

turbid sparrow
#

惨不忍睹,用这个语音就是,中文说的很好的老外。

#

怎么自己训练模型呢

languid canyon
#

我也想知道

#

这个怎么搭的

heady fractal
#

So, we did some benchmarking with Bark on an H100 and the results were very promising. Also, thanks @tardy topaz for the audio snippets. 😊

slim jacinth
#

Can you share the source code for processing in batches? My understanding is bark out of the box doesn't support batch inference. If you guys built this, it'd be awesome to take a look at how you did it!

heady fractal
#

We aren't doing batch inference, we're just batching up the requests in order.

What is batch inference in this case?

#

Is it just taking 10 sentences and running them at the same time or is it more advanced?

tardy topaz
#

Not a song just some recent favorite samples

rigid sluice
blissful pulsar
lofty flint
tardy topaz
#

"What if Trevor Noah was French?" Preview of future Bark Infinity fun features. Don't hammer me with questions yet, still sorting it out and will do writeup later. I am big time under water with real life work so not till weekend at earliest - I need to chill on Bark experiments, like seriously seriously. But experiments like this is why Bark Infinity hasn't been updated. The future of Bark is bright. We haven't seen anything yet. 👀⏳➡️🌟

pastel venture
#

我爱你

spare cipher
#

anyone here cloned any famous person voice?

#

like trump / elon musk / biden / david attenbourough etc?

tardy topaz
#

Tricky to thread the needle between 'speak with any accent you want' and 'speak with a random speech impediment'

pliant spruce
#

The Trump voice clones pause in the middle of speech but also don't stop talking while doing it because the only data they have for the clone is the real Trump - who pauses every few words, lol

dusk plover
#

funny

jolly patio
#

what is your name

pliant spruce
tardy topaz
#

I hear something like that a lot. One thing I always hear is sound like that instead of applause, like in a talk show or crowd. The crowd always sounds like static.

#

So many weird artifacts

pliant spruce
tardy topaz
#

I think it's tricky, whispering has a very specific like microphone tone

pliant spruce
#

Actually, I did get a few whispers

tardy topaz
#

I think it'll work just be trickier. I managed to used sets of voices to influence other voices to have similar accents, but a whisper is probably a little more subtle effect, so I bet you have to do a bit more work to tease it out

#

Even the accents more often than not just cause weird speech problems

#

I bet you just need a lot of good samples. Like if you had 1000 clear non whispering voices, and 100 whispers, you could probably sort of take the difference between them, and get an idea for what tokens to push

pliant spruce
#

I'm trying to find the right words for generating cuz there are certain things you can say to influence it, but 90% of the time it'll just make all the results bad

tardy topaz
#

It's probably not really worth it honestly, versus just randomly finding some cool voices that sound like they are whispering

pliant spruce
#

Like a specific sentence will get the results I want

tardy topaz
#

But like, that sort of general workflow, it mgiht work for any style. Set of voices like X, versus set like Y, take the difference, use that as a nudge

#

that's the long term idea

#

But right now it's like like 1/3 almost passable french accents and 2/3 people who sound like their lips were tied together or get stuck on a syllable

pliant spruce
#

Half of the others got mixed up

tardy topaz
#

Interesting

#

That's a fun result

#

Like, the model is not reading your text. But really it's just actually being smart about what a real human would sound like

pliant spruce
#

Shy sheep show sheepish smiles. <- that was the prompt

#

One of the results paused to get "smiles" right

#

a brief pause

tardy topaz
#

Honestly that's a great little showcase for what makes Bark different. Bark will screw up if you give a tongue twister! That's why it's so cool.

#

It it wasn't way too late AM I might try myself...

#

I did try and do Math using the force to keep going flag, you know, can bark add two numbers together? But it wasn't that interesting so far

pliant spruce
#

Possibly

tardy topaz
#

If you prompt something like, "You want me to shut up? Ok I'll be quiet. (Then some other sentence)

#

You might get like grumble or a whisper

#

Like imagine a TV show scene or something that would be half normal and then switch

#

I don't like using [whispers] and stuff, just feels like the overall quality is worse

#

Though if you DO get a good voice. that's probably one you can use that tag with

pliant spruce
#

I usually have more success with throwing [yawn] or [yawns] somewhere in the middle of a sentence, but if it doesn't work it ruins it

#

cause it makes the result sleepy

tardy topaz
#

Yeah I agree, if you HAVE to use a tag, put it in the middle, between two normal text blocks

pliant spruce
tardy topaz
#

Man that's hilarious

#

I'm actually gonna run like million tongue twister samples, someday. Make a 10 hours youtube video

pliant spruce
tardy topaz
#

Have you tried like, ridiculously large and complicated words to pronounce? If Bark is good, the person will like pause, maybe think for a sec, and then struggle through it?

pliant spruce
#

Not yet.

tardy topaz
#

I haven't tried, but hopefully it's like that, that'd be cool

pliant spruce
#

I think I tried superscript or something and it ends up saying random numbers and letters instead

#

either that or small caps

tardy topaz
#

Hah. I wonder if there are actually some speakers that can like, perfectly do math equations. Like it must be in the training data, math classes on youtube or something

#

But not sure what the subtitles look like there

#

Probably too innacurate

pliant spruce
tardy topaz
#

Maybe random can do it because that's the text that created them. And then an existing speaker might struggle?

#

Like if somebody chose to say that word on TV

#

it's probably like, not a problem for them

#

I'm not sure if it can get stuff like, "You want me to say what? Blah" and then it should have trouble, right?

#

That's probably a pretty common pattern

#

One thing I wanted to try is like, a prompt that is only used as setup. Like you say (I can't pronounce that!) and it gives that to the speaker, renders it, and then uses it in the next sample. But it's not part of your final clip.

#

So you just use it to try and change the audio style

pliant spruce
#

Haven't tried anything like that yet

tardy topaz
#

Not a high priority. I bet it works but just super randomly, so really, not that useful

tardy topaz
#

I'm really starting to genuinely dig the not-quite-singing but not-quite-talking way Bark renders a lot songs. It's musical, but not sung.

heady fractal
#

@tardy topaz I booted up your UI, looks pretty sweet. It's got only the Suno default voices in it though poi_think are yours in there yet!

#

?*

pliant spruce
tardy topaz
viral stag
#

I'm really starting to genuinely dig the not-quite-singing but not-quite-talking way Bark renders a lot songs. It's musical, but not sung

tardy topaz
#

Honest to God Bark out of the Bark is basically a perfect "Spoken Sung" model. As best exemplified in the classic William Shatner "Rocket Man" clip that I can't believe is such bad quality on YouTube that honest to god I might have a better copy on VHS somewhere. How was this not preserved OMG https://www.youtube.com/watch?v=lul-Y8vSr0I

From The Science Fiction Film Awards, William Shatner's unforgettable performance of Elton John's "Rocket Man".

Includes Karen Black's introduction of Bernie Taupin, and Taupin's introduction of Shatner.

Rock-It, Man... :-)

This aired on local Chicago TV on Friday, January 20th 1978.

About The Museum of Classic Chicago Television:

The Muse...

▶ Play video
#

We need the best science and the best AI to restore that clip. All the alternatives on YouTube seem terrible. This is critical.

#

Maybe Bark... I can clear up Bark generations with enough sweat and luck. Once you can reverse semantic like Mylo is working on, and might be days away. Then encode all the lyrics into Bark, regenerate as clear. Or maybe it just works perfect with a single Shatner model and you don't need to do each lyric. Because Bark is that good, seriously.

pliant spruce
#

@tardy topaz I have 29 good/okay results so far, do you want them :p

tardy topaz
#

It's okay if they don't match the text BTW!

#

as long as they make natural sounding audio outputs that sounds like a real person speaking

#

then it probably doesn't matter, the way I use them, which just as a target reference point to nudge bark towards those tokens

pliant spruce
#

I haven't tested yet, I've just been generating them. All of them match the text. One of them is 11 best results out of 200, another is 15 out of 100.

tardy topaz
#

The import thing is, if you use the voice, does it sounds like whispering ? Or maybe it does but only if you prompt them correctly. The first is the best, the second is still useful, but keep in seperate groups

#

Like whispering I mean

#

The first is really good though, since that's what we're going for, just make ANY speaker file whisper with no special prompt

#

BTW if you happen to get any really nice clear singers, send me!

#

I need more

tardy topaz
#

Oh can you try and vary your prompt? I know it's a pain

#

But I think one problem I am having is like, I used voices as reference. But all the voices were speaking THE SAME WORDS. So like if you 'more closely match a set of voices all speaking the same words' the that's kind of pushing it to just to towards whispering, but towards those words.

#

I noticed if I more closely matched the French samples, the output was worse. But all the French samples were speaking the same sentence so this kind of makes sense

pliant spruce
#

ok

tardy topaz
#

If I tried to look at the samples and detect 'what does whispering look like' if they are all saying the same words, then that twill also include the words "I'm whispering"

#

It's still sueful though to have all the ones that are the same

#

But thought I'd mention, if you CAN do it, it's beter

#

You can just use the voices to generate more voices!

#

If they still whisper, and it's high quality, then that's fine to get more diverse output

#

Even I could do that but it's just kind of a boring thing you gotta grind through

pliant spruce
#

I can still try

tardy topaz
#

All that said, you can send your current stuff. Maybe it just works!

#

dropbox or google share I guess?

#

Don't bother right now, tomororw night earliest, and probably sat

pliant spruce
#

I was gonna see if I could use another method like you said

#

maybe include I'm whispering in brackets and see if it doesn't break anything

tardy topaz
#

If this does work I'll have to give a custom script, might be awhile I make this a Bark UI feature, not even sure how yet

#

The easiest way to improve that dataset, take each voice, and make like 100 unique prompts, give each one 2 or 3, save the best from the set. So ideally we have a set of all different words, and some voices can be there 2 or 3 times, is okay I think

#

Maybe take a book and use every sentence, try to cover all the possible basic sounds, is the idea

#

None of that might be needed, but if you're poking around anyway...

pliant spruce
#

Maybe you should add a batch generation feature to make that easier then, lol

#

So, generate 10 results with X and 20 results with Y text

#

if that's not already possible

tardy topaz
#

There is, on the dev branch

pliant spruce
#

ok ok

tardy topaz
#

I think, uhm, let me remember

#

Are you on the UI?
or the CLI?

pliant spruce
tardy topaz
#

But the web user interface?

pliant spruce
#

Says command line

tardy topaz
#

Like do you use a web browser?

pliant spruce
#

yup

tardy topaz
#

God, I got to stop making jokes

#

It's a WebUI, so it's a joke because I put console output in the Gradio app

#

but software you know, not really the best avenue for humor maybe.

pliant spruce
#

Mhm...

tardy topaz
#

Is there an option to give it a folder of voices?

#

Let me check

#

There should be a checkbox like, "don't join the text"

#

so the you can put in a long text

#

and split the text however

#

and it's all seperate

pliant spruce
#

This ?

tardy topaz
#

That's not it

pliant spruce
#

ok

#

w/e don't worry I'll learn later

tardy topaz
#

hmn, i don't think it' sin that version

pliant spruce
#

I doubt it

tardy topaz
#

In dev branch there is this

pliant spruce
#

Don't have that

tardy topaz
#

You can use that now

#

git checkout dev from command line

#

I didn't put the folder input of NPZ files in there alas

#

but that checkbox might help you

pliant spruce
tardy topaz
#

yeah

#

You might also find this useful:

#

So if you use the text again, it won't be split exactly the same, more diversity

#

It's like, whatever number you have, + that value, or that value, randomly. So if you have 150 as your goal size you might get 140 or 160. Then if you use

#

Honestly I'm worried dev is bugged, since obviously nobody is really testing is. But I will AT THE LEAST to do a bug fix pass this weekend

pliant spruce
tardy topaz
#

At one point when you use the split prompts thing, it was not properly clearing the last voice. So all the samples would sound the same. I can't remember when I fixed that, so look out

tardy topaz
#

Is that at common expression? I noticed a lot of that with like popular song lyrics in Bark

#

You get a surprisingly amount of matching cadence

pliant spruce
#

Its a quote from a video game so I don't know.

tardy topaz
#

Is it like a fighting game line or something maybe? If so I bet it's in the YouTube training data like 1000 times, for each match, lol. Just over and over.

pliant spruce
#

His voice has music attached as background

#

Kinda like Bark

tardy topaz
#

Oh yeah

#

I bet you hear that when you click on something in League of Legends

#

So just imagine the incredibly amount of times that's got to be on YouTube

#

I bet there's a like a 'click on character who says iconic line' style in Bark, that you could trigger

#

sort of like how you can trigger 'this is a commercial' style, if you've done that

tardy topaz
#

I love so much how if you just keep cranking up the weights on the French accent Bark actually almost REPHRASES YOUR TEXT PROMPT like a French person. Crazy that it somehow works. It usually doesn't, but even 1 in 10 or 20 is all you need to make a funny YouTube video or whatever.

#

Oh that's an old sample, hmn, still a bit of it but not the one I meant

torpid agate
#

How do you "crank up the weights"?

tardy topaz
#

Right now it's a mess of code and super hit or miss, but some day it will be a feature in my fork. Basically I look at a set of French versus English and try to up the chance of French tokens

#

So in the UI you will basically pick a 'target set of voices' and a 'reference set of voices' the reference set is English voices, the Target is French, and the it tries to find the main difference and increase the odds of those tokens in Bark, for your speaker. And honestly accents is the least interesting thing you might do with this format, you can think of many cool ways to use sets of voices like that. But accents seems like a simple case where I can check if it the idea works.

#

Right now there's like 5 or 6 numbers that are super fiddly, like threshold numbers of 'how common should the token be in French, but probably there's a way to make it automatic based on some rules. But I am shelving this until I made some basic updates or I never will, lol

torpid agate
#

Oh I see, so there is some expected distribution of tokens in English vs. French, and you want to overexpress the French tokens? What's the interface for overexpressing tokens? Does it look like the SDWebUI? thing:1.2?

#

I guess... is this a prompt engineering thing you're doing or is it model surgery?

tardy topaz
#

Right now it's scaling the odds by the frequency in the target distribution, with a lot of cutoffs for outliers, and for using most 'french but not english' tokens, or a specially hard penalty tokens common english set but french. a ton of hardcoded values I picked out a hat, no science at the moment, but a proof of concept.

#

But the method right now is just trying some values, didn't work, up them a bit, worse, okay lower them, okay that worked. etc

#

a big mess

torpid agate
#

I see. Does this work for all models in the public repo? It seems like it should if you're changing the frequency (are you basically adding silent "frenchy" tokens?)

tardy topaz
#

It works for the 3 models I happened to be testing with, so I kind of assumed it was general

#

None of them happened to be Suno, but I was using Suno voices in the target set as French and English examples.

#

I'm just multiplying the odds. Like this token is 4x more common French than English. So in the model, at the last step, you just sap the multiplier in there. I thought I would have to also consider tokens in front or back too, the order of tokens should mattter. But actually kind of just works with a general multiplier. Or negative penalty or English.

#

Oh you can compare like speaker voice against average English speaker, and give their tokens a bit of an exemption from the penalties

#

otherwise you can wipe out og voice

torpid agate
#

Can you show me how you over express tokens? I've seen this in the SDWebUI as well, but am not sure if it's custom or a package

tardy topaz
#

Oh it's not models, sorry, I mean prompts

torpid agate
tardy topaz
#

Right now it's in the Bark core code, just my custom code

#

I should probably look at that, because I bet there's smarter ways to do this!

#

Oh that's weighting parts of the prompt. That would also be cool in Bark!

#

This is a lot more sophisticated than what I did, probably worth looking into

#

I'm not really familiar with Stable Diffusion internally so not totally sure how much applies

#

Gosh, the negative prompt is really fun in Stable Diffusion. I guess you could run one bark generation with a negative prompt, save that token distribution, and then run your positive prompt and try to penalize the tokens in the past negative. 90% chance this is completely useless but 10% could do something interesting?

#

Anyone know why LLMs don't have negative prompts, is it just useless?

pliant spruce
tardy topaz
#

It can't be as easy the idea I explained, or else it would already exist. Probably it just doesn't effect the output in a similar manner like it does in Stable Diffusion.

#

But it still might be interesting in Bark, since it's not quite the same.

#

jeez, also, there is a super easy way to make this actually useful i'm pretty sure will work.

#

I am too overbooked though, just writing it down, not trying this

#

Honestly the french accents thing kind of covers this idea. I can probably work it in there. Essentially you can pick an .npz file, a past sample, as a negative prompt.

#

And maybe I cna get that to work

#

However it's gonna need a LOT of tuning and tests, so you only penalize the right things. Rather than just like, 'the sound of the human voice'

#

But i'm sure it COULD work

#

I'm just imagining the Bark WebUI. This is literally like picking 500 .npz files now, in different menus. Like seriously out of control. hah

#

Gradio is not ready for this.

#

Unless I missed it, THERE IS NO FOLDER PICKER?

pliant spruce
#

no idea

#

if there's a folder picker

tardy topaz
#

Best I can think of. Take 100 average English samples. Take the negative prompt. Find out what's most unique in it. Then penalize that. Maybe, possibly, that works.

pliant spruce
tardy topaz
#

Like for example, if you had a negative prompt of music notes, with the same as your positive prompt. What would you want Bark to do? Just be super formal and monotone?

#

I'm not sure what 'working correctly' means.

pliant spruce
#

The word could be linked to other things so its not 100% guaranteed either, like how I only got 10-20 whispering results out of 200, so its like a 10% reduction on certain things depending on how they're used. I'm sure that yelling or other expressions have higher odds of appearing, so you could at least use it to filter out music, maybe it'll make the audio clearer if you use it that way?

tardy topaz
#

I guess as long as it changes the output in any way you can detect, it's still fun. Just try shoving shakespeare quotes or rap lyrics in the negative prompt. Maybe it has a cool effec.t

#

It's not gonna be essential like SD but has a chance of at least being a fun things to try sometimes.

pliant spruce
#

For sure

tardy topaz
#

For whispering, negative prompt, "I HATE YOUR GUTS!!!!"

#

or something yelling like

#

Everyone is working on voice cloning. Somebody make a negative prompt and let me know if it does anything interesting. I want to try it without having to work out how to do it.

#

I don't really know a thing about Stable Diffusion. If it was like, trained with a negative prompt, then there's basically no chance this is useful.

#

Looks like it's not

pliant spruce
tardy topaz
#

I'm too sleepy to think this through. Will save idea for rainy day though.

pliant spruce
#

And when you negative prompt in stable diffusion you're just adjusting the weight, adding brackets increases the strength of the negative prompt

tardy topaz
#

It does like they just hijack the sampler, kind of like I am, but would have to read more about Diffusion models to know how similar it is

pliant spruce
#

Oh, btw

#

When I had ChatGPT do that invisible prompt thing, what does that do exactly to bark?

#

Like, I had it edit the code this one time to let me include invisible prompts

#

That wouldn't affect the output

tardy topaz
#

did it work?

pliant spruce
#

Yes

#

So I could do "Insert text here" and add another sentence at the end that'd be invisible

tardy topaz
#

Did it effect the output though?

pliant spruce
#

Yes but it also increased the odds of noise

tardy topaz
#

What was the part of the code it edited to do this?

pliant spruce
#

Uhhh I can't remember this was last month

#

It let me do it in commandline

tardy topaz
#

I can think of a simple way, like just treat it as seperate samples

#

And throw away the one invisible one

pliant spruce
#

But I think the noise was generated by the fact that the invisible sentence required using special symbols so the symbols were being included in the sentence which could be fixed

#

cuz i was using || to hide the last part of the sentence

tardy topaz
#

There is a way I can add this as a feature very simply, so I might. But doing it a deeper level would be hard.

#

But if you split the text, as new samples, that does lower overall quality

#

compared to just having one long one probably

#

But still, maybe the simple way is useful

#

What symbol do they use for invisible prompt in SD, do you know?

#

Or some other thing, if there is a standard

pliant spruce
#

"||" the symbol didn't matter, it was just two of these

#

It was something to be included in the prompt that would be removed, and everything after it would be removed as well

tardy topaz
#

If you find the code, if it actually modified generate_text_semantic

pliant spruce
#

the old bark code is simpler so i might check how i did it that way

tardy topaz
#

then please show me cause I'm not sure offhand how to make it work at that level, without experimenting

pliant spruce
#

ok

tardy topaz
#

Maybe it just replaced them with pad tokens or something

pliant spruce
#

I was using the first version of bark so idk how much you changed lol

tardy topaz
#

But the idea is they should still effect the text. so if you I AM YELLING

#

then the person starts yelling

#

I kind of think it just deleted them, and they didn't still effect the audio. But if actually didn't it does work, heck, I'll just add that to Bark Infinity

#

chatgpt can be pretty smart, so it could have done it right

pliant spruce
#

I'll try it again.

#

maybe it was this i needed to edit... def generate_text_to_speech

tardy topaz
#

No hurry honestly I don't need more features, I need basic bug fixes.

#

The easy and fast way to do it fast, just split the text, and ignore that segment in final audio. But still use a bit of as the history_prompt for the next actual audio segment. I could do that simple version and it might be cool. Though I don't have the partial segment audio joining even in Bark Infinity yet...

#

I can't stay up half the night again, no rush, no chance I do anything with this soon.

#

All people want is easy to install Bark Webui or Colab notebook, and a nice set of clear speakers. Literally that's all I should do this weekend.

pliant spruce
#

yup

tardy topaz
#

To be honest I am little confused how I ended up so deep in the audio stuff. The literal reason is that a bunch of really silly ideas keep working and that's addicting.

#

But I don't really need TTS so sometimes I do stop and wonder why I'm trying to hard to make Trevor Noah sing in a French accent. Not only do I not need this, does anyone? lol

#

That said, I bet he does a great French accent while singing. en_fiery always delivers.

#

It is actually ridiculous. I gotta do something with this stuff at least. bug fixes can wait a day, need to make at least one funny video with this Bark tech

pliant spruce
#

what kinda video?

tardy topaz
#

Since I just decided to this, that part is yet to be determined.

#

But you know, singing, accents, something like that.

slim jacinth
#

It is a basic human need ❤️

tardy topaz
#

I had trouble combining both singing and french, but I didn't try too much, lol. But I agree

#

I think I just need more singing samples!

#

Or I guess French singing, then it's just one set. That should work

#

Singing in general works way less well than the accents. Just sounds like autotune most of the time. But the sample of singing I have is small

#

Also like, I'm not using like principle component analysis or other things I should probably be doing, just literally counting tokens. So not exactly optimal lol

#

Frankly it's just yet another thing that really shouldn't work the crude way I did it.

#

My singers are mixed with music, so that's probably why it goes into autotune mode

#

They are not all voice only

#

I mean maybe I can lean into the autotune sound. That could be also cool. Mainly every set requires a ton of fiddly guessing of thresholds right now, so there must be a smarter way to do this where it's not so fragile.

gilded goblet
#

One application of a negative prompt for Bark would be generating logits with it as the prompt but negating them and then adding them to the regular prompt generated logits. (You probably want a control for the relative weight of the negative and positive prompts, too.) One problem, though, that I foresee with doing this in the simple and naive way, though, is that because Bark gets things like language and accent from the prompt (regular and history), a negative prompt like this that is prompted in the same language as the main prompt may cause things like the language used to be less stable, which probably isn't the goal.

long egret
pliant spruce
#

@tardy topaz you're gonna have to run this one locally lmao

tardy topaz
#

Are you on the dev branch, let me check. Recently I've been using a new method for clear voices.

#

I think maybe you can do it

pliant spruce
#

Actually, when I convert it with ffmpeg, it works

tardy topaz
#

Yeah you can't quite do it. Basically start with the clearest voice you can. Then I'll explain later, you erase the coarse prompt. Which is or will be checkbox. And you make the history prompt as small as possible, but still clear. And then make long 14s samples. You get a pretty good range of voices, pretty diverse, all re usually fairly clear!

#

I'll make this easy. It really reduces the worse noisy speakers.

#

There is some common feeling between the voices, they aren't THAT different, but plenty different to make a ton

pliant spruce
#

did you want a french voice?

#

w/o singing as well?

tardy topaz
#

if it's clear sure

#

though I need a bunch so no rush

pliant spruce
#

Ok, I should be able to get one in a few results

pliant spruce
#

not now tho

tardy topaz
#

I can make one for you

#

Wait let me see

#

Honesstly its late let's table this, it's 7 30 am

#

haha

#

sorry but I almost forgot

pliant spruce
#

i still think "high quality:" makes things better

tardy topaz
#

interesting!

#

I mean, it not impossible

pliant spruce
#

its actually VERY consistent

#

no joke

#

lemme send the result

pliant spruce
tardy topaz
#

lol, probably just a good voice

pliant spruce
#

its not a fluke

tardy topaz
#

don't sweat the voices. I'll set you up later weekend with how I do it now. I wish i had done it like this.

pliant spruce
#

ok

tardy topaz
#

Also maybe with voice cloning, nobody needs to do any of this?

pliant spruce
tardy topaz
#

Well, you can't voice clone Barack Obama but french, or whatever.

pliant spruce
tardy topaz
#

Since he doesn't exist. So my stuff is still useful.

#

You can model merge now actually

#

Bark just kind of works. Not all the time but sometimes

#

It's not a feature but it could be

#

I do it like 1000x

#

it's useful

#

I will make a feture

#

it is super useful

#

Sometimes you just get like, actually, a perfect mix somehow

#

For example a TTS voice, with a very human voice, it's like half that person, half TTS. just works sometimes lol

#

Some of my voices I really sweat over. literally 20 or 30 model merges

#

haha

pliant spruce
#

I'll have to learn that kinda stuff later for sure

tardy topaz
#

It's just gonna be, pick two npzx

#

instead of 1

#

Actually it

#

will be a tool. beause it doesn't always work. So I usually make like 10 versions

#

and then one was okay lol

#

I got a Donald Trump whisper, almost. But the voice changes too much. Still pretty close lol.

#

If you use the voice it's even more changed

#

but maybe fixable

#

I wasn't trying for whisper, actually one of the most clear whispers though

pliant spruce
tardy topaz
#

Yeah totally rng

#

only one I can remember lately

#

I saved it mess with

#

try and make it work better

#

singing is similar. if you sing, voice changes

#

it's a really good whisper voice, lol

#

whispering is probably easier than accents, it's a pretty general sound

pliant spruce
#

I think accents are easier

tardy topaz
#

yeah maybe

#

should see if cna voice clone a whisper

#

with mylo's thing

#

then i can use as the samples

#

instead of you finding them

pliant spruce
#

cheat codes aren't fun but ok

#

wait till 8 seconds.

tardy topaz
#

haahh

#

you didn't type that?

#

IS that your prompt?

pliant spruce
#

Its a prompt.

#

high quality: announcer: Hello passengers, this is your captain speaking. This plane is about to crash!

tardy topaz
#

I was hoping bark wrote the last part

#

with the confused mode

pliant spruce
#

LOL

#

the 2nd result is kinda funny

viral lynx
pliant spruce
#

"hello passengers, this is your captain speaking, this plane is about to-" and it cuts off there

tardy topaz
#

Sure

viral lynx
#

so i made a voice based on a dantdm video, just the intro

tardy topaz
#

Can you run it on a partial prompt, like my joke video?

#

And see what it does?

pliant spruce
tardy topaz
#

I mean mylo

pliant spruce
#

oh

tardy topaz
#

"What was six afraid of 7?" and then keep sampling

#

Why was

viral lynx
#

"if you enjoyed this, like this video... check out AA-"

tardy topaz
#

I had the notion of a 10 hour unprompted Bark YouTube video just endless nonsense

viral lynx
#

lol

#

it's the semantics keeping it alive at that point, as spamming random semantics will drown out the voice

tardy topaz
#

Interesting that it's a youtube line

#

though you used books as training?

viral lynx
#

probably because the prompt was the intro

#

it probably recognised that it was an intro, and recognised it

tardy topaz
#

Have you ever seen Whisper, it hears those words so much

#

they are banned in the raw codde

#

lala

#

lol

viral lynx
#

lol

tardy topaz
#

Just give it any audio, if it's not sure, it ouputs thank you for subsribing

viral lynx
#

since it was trained on youtube videos?

tardy topaz
#

Yeah presumably. It literally can't stop hearing it

#

Any time the noisy, it says that

#

Am I hallucinating I can't find it now

viral lynx
#

also, the voice cloning is easy to implement, and i provided some code snippets so you can easily implement it in your webui if you want to

tardy topaz
#

I will, nice

#

I will maybe even train more

#

one thing I got is npz everywhere

#

and usually the prompt

#

not greatest diversity though

viral lynx
#

yeah, there's a 4 and a 14 epoch model on the huggingface repo

#

you could train from there or train from scratch, it doesn't take long to train from scratch

#

and most of the mistakes it makes are not things you will pick up on that much as a human

#

like it might misclassify a token for another token, but if you heard them side by side there would barely be a difference

#

i believe bark, with it's 10000 tokens, has a bunch of duplicate tokens which are interchangable

#

at least from the perspective of HuBERT base

tardy topaz
#

There's some funny stuff, like some tokens are like a description, or at least it feels like it. adds an effect to the whole clip

#

or removes

viral lynx
#

also, the voicemod soundboard sounds pages have a lot of clips that are great for voice cloning as well

tardy topaz
#

Not literally a token

#

but like, a chunk

viral lynx
tardy topaz
#

I think I'm still take some time to tune the clones

#

You can still dial them in a bit

viral lynx
tardy topaz
#

Yeah, background hums

#

sometimes go away for whole clip

#

or appear

#

not predictably but, if you desperate, you can randomly delete. and try to get lucky

#

I kind of thought a hum would be temporal?

#

but it's like almost a little tag ? total no idea here. I was just little surprised

#

like it just changes where the prediction goes probably

#

but subjectively, it was like that

#

I don't know how semantic and language like, the semantic tokens actually are, maybe not impossible

viral lynx
#

i gotta see if i can make bark generate infinite length (and probably decode on cpu from that point on)

tardy topaz
#

What do you mean, in the actual model?

viral lynx
#

or cut into chunks and then decode

tardy topaz
#

You can chunk coarse and fine easily

viral lynx
#

yeah

tardy topaz
#

I think maybe you can put tokens into the inference space, but I didn't get around to trying that

#

Like instead of puttting history where it should go. take up inference space

#

MAYBE you can use that trick to chunk semantic?

#

if you don't do that it just sounds bad

#

Like giving an actor a 3 word first part of a line

#

and nothing else

viral lynx
#

you can chunk semantics probably, just make sure you have a good history prompt

tardy topaz
#

I tried it a lot

#

2 words is the breakdown ponit

#

but it all just sounds bad

#

because it doesn't have enough context to perform the line

#

it works just bad

#

Bark in general, I find, give it a big text if possible. It's more descriptive.

#

So it sounds like you have an actor, right. And you give him a notecard with 2 words on it. He reads it. Then you give him another.

#

It just sounds wrong lol

viral lynx
#

here's a fun experiment

tardy topaz
#

hit me, maybe i tried it

viral lynx
#

use a cloned history prompt, then generate without prompt with early_stopping=False

tardy topaz
#

Oh that was literally my first idea yeah

#

honestly i tried to do that with WHISPER

#

but i couldn't figure it out

#

coudl it predict based on audio, what next tokens were likely, with no other input, based on the internal llm

#

so it's like speech to text but guesses what you say

#

I think you can do it now, in the cpp fork, but I didn't check

viral lynx
#

damn sometimes i forget to add the quick kwargs and then it doesn't auto hide (gradio please implement element replacements)

#

since the point of the webui is more than text-to-speech, voice cloning was just something i wanted because i thought it would be cool.

tardy topaz
#

There's so many easy features I need to add.

#

just mashign two prompts together, works pretty well

#

like a model merge

#

not always but enough

viral lynx
#

just averages of 2 voices with the same semantics?

tardy topaz
#

like, it should not work

viral lynx
#

it should work with the same semantics

tardy topaz
#

but you really get a nice hybrid!

#

usually have to render different size variatns pick the best

#

even like a robot tts, and human

#

it's like half tts

#

lol

#

even 3 prompts, not impossible

#

oh here's a fun one

#

have you tried just taking a speaker. delete every other token

#

they talk twice as fast. still sound pretty natural

#

haha

#

or the opposite, double token

#

i don't know why I was doing but it's actually not even that unnatural

viral lynx
#

what about this though, instead of merging 2 voices by averaging, you extrapolate the difference from voice a to voice b onto voice b or c? like the add difference merging from stable diffusion webui

tardy topaz
#

yeah did you see my accent work, a little like that

#

it's 8:30 am and I haven't slept I'm not sure I can actually explain

#

but I did in discord previous messages, using french setrs

#

of voices

#

and singing

#

I think there's SO much you can do

#

with voices averaging, differences, using a set of voices as a penalty or target

#

the singing sounds like autotune, but I realized half my singing samples were music after

#

so actually, that was probably working correctly

viral lynx
#

also, you keep talking about music, you can finetune bark with music to have it basically be bark but as musiclm

tardy topaz
#

I wonder. Presumably if it could, base bark would be better though?

#

It must have seen a lto?

#

Oh nevermind I understand now

#

You mean finetune, but overwrite exsiting capability

#

Fully music Bark

viral lynx
#

yeah

tardy topaz
#

Yeah that would be cool. Even finetune to specific artist

#

If it's fast

#

I really think there's a billion things left to do in current model

viral lynx
tardy topaz
#

that's be ideal, i just assumed it would still not work great, but maybe

viral lynx
#

it should be higher quality than the voice cloning with my model though, since it would actually use the same things as the original model did during training

autumn cloud
#

@viral lynx great work! Are you integrating the cloned into webui. I tried to get to test it but I was lost.

viral lynx
#

i'll probably release my webui so people have something to play with cloned voices, (also, cloned voices are saved under the same name as the original voices, but with npz, in the custom speakers folder)

autumn cloud
#

Your repo is meant to be used in conjunction with bark’s api right. I was just lost but I will wait for your ui and read the code.

viral lynx
#

yes

#

you can create a voice clone without bark even installed too though

autumn cloud
#

I was trying test_hubert but it’s expecting semantic.npy lol

viral lynx
#

yeah, that's a file to compare to lol, you could technically just create an empty npy file called that, or disable the check

autumn cloud
#

Oh, so it’s ok to comment out ‘original’

#

Makes a bit sense now that’s why it’s a test, you are trying to see if they are identical

viral lynx
#

yeah, you can remove the print as well

#

this here is how you can actually do voice cloning, as a developer

autumn cloud
#

So how do you use the generated npy?

viral lynx
#

you put it in the npz with the coarse and fine from the same audio

#

to make it easy, just use a different voice cloner, and replace the semantic_prompt.npy inside of the npz with the npy from here, make sure it's called semantic_prompt.npy

autumn cloud
viral lynx
#

yeah, correct, the semantic_tokens from that code can be saved to an npy

#

and that can be used inside of a history prompt for the cloned voice, but i'll make my webui public in a bit

autumn cloud
viral lynx
#

it auto installs when you run the run.bat, you can add whatever flags to it as well, i should probably document those

tardy topaz
#

BUT it can be done. You can keep backing up and hit a spot where it sings, and not change. SO GOOD.

#

I think the UI for Bark, rather than pick a prompt, pick a location in the prompt instead. That would make some of this fiddling easy.

autumn cloud
viral lynx
#

it's in the text to speech

#

just pick the bark model and it will load the stuff, as "speaker from" put "upload"

#

and you'll get a thing where you can upload an audio file

autumn cloud
#

Thanks will try now. Restarting ui

viral lynx
#

with no prompt, and squidward as history

autumn cloud
#

Quite impressive. Just tried it

viral lynx
#

yeah, with a good input audio you'll get really good results, + if you generated a really good result, you can download the speaker prompt from that generated audio, they are sometimes more consistent

autumn cloud
#

With some effort, I managed to extract you implementations.

#

The code is a monster bro! How you pulled this off is quite impressive.

#

@viral lynx 🤝

viral lynx
#

thanks

slim jacinth
#

100% - super impressive

autumn cloud
#

@viral lynx you never used the models you generated in the webui? Is there a reason why and how could I try those?

autumn cloud
viral lynx
#

it downloads it from huggingface though?

#

the 14 is the epoch, the other on is on epoch 4, but i didn't rename it lol

autumn cloud
#

Interesting, it didn’t download rot me

viral lynx
#

yep

autumn cloud
#

@viral lynx what are you suggestions for audio length to clone from and why do I sometimes get a different voice between chunks.

viral lynx
#

around 6 - 10 seconds is usually great. sometimes you get a different voice, i recommend saving the npz that comes out of a good result, since that one is fully bark generated

autumn cloud
#

Thanks for the hint. Will probably generate 10 samples then pick the best

tardy topaz
#

Best use of voice cloning, no prompt, no stop, just get cool audio you barely hear from Bark typically!

#

Some of the best sound effects and music instrumentals, and like animal sounds, etc, feels a lot different than typical Bark sample

#

Less structured but also kind of more natural in a chaotic way, super neat

#

There's more sound effects in Bark than I thought

viral lynx
tardy topaz
#

Mylo has given so many ideas I can't move. You can train this in like 8 hours? You could try SOO really wild unbalanced possibly absurd datasets, like 2 a day, and see what happens!

#

Maybe nothing for all of them and you stop on day 3, still cool

viral lynx
#

the 8 hours is the amount of training data i had

#

but it trains faster than realtime

tardy topaz
#

Nice

#

36 models day

shell prism
opal spear
tardy topaz
#

Those are hand made, but you can also clone them now

#

Or both, which is actually still kind of maybe necessary

#

I still had to tweak the wav clones honestly by ear

opal spear
#

I'm downloading that webui rn

#

so yeah

tardy topaz
#

Nice

#

I kind of copied and pasted all that into my code I may even update. Not a polish release but it works.

tardy topaz
#

You can do both now. Clone automatically model merge etc. Though merging is not in next update

hazy rain
tardy topaz
#

you mean voice clone? training no idea

#

just thought I remembered somebody trying a new language

hazy rain
#

(trying to replicate my own voice 😅 )

tardy topaz
#

I should have done but I wasted the day, and now I'm tiried

#

are you technical enough to install via conda yourself?

hazy rain
#

Yep I'm a TD in VFX 🙂

tardy topaz
#

I could push this version

#

but it doesn't have updates ymc for conda or pip list

#

someone would have to just figure it outt

#

and i don't time until maybe late today

hazy rain
tardy topaz
#

it does have the cloning

#

but it's like just a mess

#

for produciton

#

i mean sure whatevr

#

let me just at least remove print statemnts

hazy rain
#

Ahaha, well I've seen everything in the AI/Python world I'm immune to this now 😅

#

Hit me up with the link when you can and I'll contribute back if I can! Thanks Jonathan!

tardy topaz
#

actually if you check for problems, that would be helpful

#

since i plan doing more tonight

#

maybe an hour

#

the cloning will wrok

#

not sure about generation though haha

#

this is a real mess al in one file just to do it in a couple hours

#

haha

#

I think I can make anti voice clone work, or at least, be occasionally funny. I've been saving bucket of clones tokens and trying things. instead of fixing critical filename crash bugs. audio as input, just alone. cool idea. it's just more tokens. even a vague style hint. a minute of audio is a lot of tokens.

red helm
bold token
#

Check out a sample short podcast I created using Bark: https://youtu.be/CW790VwEO9c

🎙️ "Tech Talk with Rob: IoT Workshops in Agriculture" 🌾

Join Rob in this insightful episode as he explores IoT workshops in the Faculty of Agriculture in Israel. Powered by AI and featuring seamless narration by Bark TTS technology, this podcast delves into how technology is revolutionizing agriculture.

Discover how IoT bridges the gap between...

▶ Play video
wicked gull
#

@viral lynx I am getting these errors while running the ./run.bat script. Plz help

viral lynx
#

are you on linux? use the sh files instead if you're on linux

light tide
worn oar
formal plover
tardy topaz
#

It's a bit rough, only even been been out since last Friday. I'll add some soon.

#

The short version in my fork is upload a wav file, you get a bunch of voices. Try the voices maybe you get a real good one.

hazy rain
edgy mango
tacit maple
#

(this is post-processed btw, but the voice in the beginning is straight bark-infinity output)

somber rivet
#

I had a bot I'm building summarize an article, then convert the summary to an audio file.

tardy topaz
#

The prompt was pure laughs and I didn't even mean to the join the segments. Perfection, honestly.

#

I didn't save all the .npz for each segment though, a travesty.

granite quiver
#

That clip sound like it came out a horror movie.

grizzled shard
#

had some similarly haunted. 🙈

white wadi
#

using your self model ?

grizzled shard
#

what self model?

white wadi
#

voice cloning

grizzled shard
#

ah. yes. tried to random generate while using a voice cloned npz

granite quiver
tardy topaz
#

My clip is a totally normal Bark random speaker, just came out perfect.

white wadi
#

have you checked behind you?

#

just kidding 🙂

tardy topaz
#

The laugh at the end after you think it's over, jesus

somber rivet
tacit maple
lofty flint
grizzled shard
obtuse slate
lofty flint
lofty flint