#🪲┃bugs-web
1 messages · Page 1 of 1 (latest)
Hi, I understand that the history param cannot be set to arbitrary voice to avoid impersonating people, but it is possible to add a consistancy feature where we can keep the same voice for the same caracters in different dialogues ?
yeah technically that already works, but you have to use the raw generate functions. we should make it easier going forward
Is there any place that has any documentation on to use this. I used Stable Diffusion and a few other apps. I followed the pull instructions, I am just not sure what to do next
There should be some basic prompts in the main readme or you could look at the google collab notebook to get started.
We will try to add better instructions soon
Can you elaborate a bit more? I am also in need to generate consistent voices (role playing, gaming). What are raw generate functions?
Also curious about if it's possible to fully utilize the GPU. Maybe there's a throughput bottleneck or something? Ryzen 1950x/RTX3090 here.
use
pip uninstall -y torch torchvision torchaudio
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118
please enable self training
waiting for the Arabic language
how can we help with Romanian language?
much appreciated, we will try to set up a poll for language support in future versions of bark
Hey there. Should we create a prompts sharing, tricks sharing, benchmarks sharing channel? I believe most of us are really excited about Bark, but it´s not super intuitive for code-blind users.
I'm thinking somewhere where new users can just search and find answers quickly.
in the github repo add that adding asterisks before and after a phrase or description is better with generating music
although its not very consistant
Hey, I am very interested in Bark, I would love to help adding Arabic support if there is a chance
Maybe a #info section for people to get started? System requirements, download links, etc.
good idea thanks
created a #🪦┃getting-started channel for installation help and conversations
I would appreciate if it learned Swedish. I can help you with training data.
thanks, out of curiosity roughly how many hours? trying to get a sense for how to handle finetuned models incase that becomes an easy option in the future
Can't model with amd gpu(5500M) on macbook?
tensor([1.], device='mps:0')
i wonder if we could automate it better by passing the output back to a speech recognition package, which would check the output against the script, then re-do it if the words came out wrong..... what do you think?
yeah defo an option. instant way would be to back with with whisper or something and redo if doesn't look right. much better would be a little helper classifier that looks only at semantic (not coarse etc) to judge if the model did a good job or should redo it.
as a quick option, I'm just trying out this code. I think it's helping .... def main():
# Read and split the text file
with open("script.txt", "r", encoding="windows-1252") as f:
text = f.read()
#text="Let's try this one and see if it works"
text_chunks = create_text_chunks2(text)
for chunk in text_chunks:
print(chunk)
# Generate audio for each chunk and concatenate the results
audio_chunks = []
for chunk in text_chunks:
# Keep trying until the speech recognition matches the original chunk
while True:
# Generate audio for the current chunk
audio_array, _ = generate_audio(chunk, history_prompt="en_female_professional_reader")
# Write the audio to a temporary file
temp_file = "temp.wav"
sf.write(temp_file, audio_array, SAMPLE_RATE, format='wav')
# Transcribe the audio using speech recognition
recognizer = sr.Recognizer()
with sr.AudioFile(temp_file) as source:
audio = recognizer.record(source)
transcription = recognizer.recognize_google(audio)
# Check if the transcription matches the original chunk
similarity = fuzz.ratio(transcription.lower(), chunk.lower())
if similarity >= 60:
break
# If the transcription doesn't match, wait and try again
time.sleep(1)
# Add the generated audio to the list
audio_chunks.append(audio_array)
concatenated_audio = np.concatenate(audio_chunks)
# Save the concatenated audio to an mp3 file
output_file = "script_read.wav"
Audio(concatenated_audio, rate=SAMPLE_RATE)
write_wav(output_file, SAMPLE_RATE, concatenated_audio)
actually, scrub that - I think there's a problem in that once it's generated one bad result, it continually creates bad ones. I need some kind of reset between each generation.
Does anyone think that there should be a web app
i think ther is alredy a web ui version
Just a little note: I spend a long time with the API examples without creating anything useful, until I specified the speaker. Maybe you should include a speaker in all of your examples, so people trying it out quickly will get more successful results. I noticed that ex. MAN:/WOMAN: in the start of a sentence is often said out loud. And ex. [laughter] at the start of the sentence isn't working. And the audio quality between speakers differs a lot, so some of them are quite hard to make a sound like they're in the same room. That said, it's a wonderful tool to produce life-like dialogue, where you almost feel the speakers are acting and becoming characters. My first attempt at using Bark: https://www.youtube.com/watch?v=AAdQfQjENJU
This film was created with Blender and these add-ons:
Generative AI for the VSE: https://github.com/tin2tin/generative_ai
Using Bark: https://github.com/suno-ai/bark and Stable Diffusion through the Diffusers module: https://github.com/huggingface/diffusers
Blender Screenwriter: https://github.com/tin2tin/Blender_Screenwriter
Screenwriter chatGP...
I really like the idea though of screening the output automatically, at least for obvious fails.
maby with wisper
i did implement it with a google transcribing library along with fuzzy. it did work, but if you set it too tight, it can go into an infinite loop.
I was thinking about screening automatically but it's tricky, a lot of the my absolute favorite clips wouldn't pass. Like this epic Blade Runner quote that went so meta
Could someone please post a list of allowable flags and prompts for the system. That would really help in tuning the output.
Please how train our own dataset and how can load that model on bark ?
Hello, I made a PR that adds an option to offload to cpu, with that option + use small models, the VRAM requirement goes down to 2GB (I only have 2GB, so I am happy that I can use my GPU now 😁)
https://github.com/suno-ai/bark/pull/146
thanks looking!
We need Bark Infinity for COLAB please!
I'm on it
Maybe not tonight though
But maybe
did you update the infinety vesion? also thank you for helping
I am typing at the keyboad doing it now
I kind of broke some stuff experimenting so I'm removing features actually
Just to get it back to working
This is how I got it to work
After the preload_models() step run:
import nltk import numpy as np nltk.download('punkt')
Then you can run:
`
text_prompt = """
Hello, my name is Suno. Here is an excerpt from the Hobbit: Bilbo crept away from the wall more quietly than a mouse; but Gollum stiffened at once, and sniffed,
and his eyes went green. He hissed softly but menacingly. He could not see the hobbit, but now he was on the alert, and he had other senses that the darkness had sharpened:
hearing and smell. He seemed to be crouched right down with his flat hands splayed on the floor, and his head thrust out, nose almost to the stone. Though he was only a black shadow in the gleam of his own eyes,
Bilbo could see or feel that he was tense as a bowstring, gathered for a spring.
"""
history_prompt = "en_speaker_2"
if(history_prompt == ""):
history_prompt = None
text_prompts_list = nltk.sent_tokenize(text)
generate audio from text
audio_arrays = np.array([])
for prompt in text_prompts_list:
print(f"{i} of {len(text_prompts_list)}")
audio_array = generate_audio(prompt, history_prompt)
audio_arrays = np.concatenate((audio_arrays, audio_array))
Audio(audio_arrays, rate=SAMPLE_RATE)
`
Still needs lots of work and it's not the most stable way to do it, but it does work
awesome! how did you make that it creates a new voice every time?
I got distracted some silly ideas:
Sure some ppl suggested things but here are mines. The ui interface is really good, some focus on that and features. Like choosing a seed for stable results (seems like models are similar to SD since many voices come out of each).
Some easier setup or more guides. I have trouble getting it to use the gpu instead it uses the cpu. But the one click install is simpler and uses the gpu
Infinity looks kind of essential and should be integrated more officially
Some better control over what voices come out and then ability to lock those voices in. Overall control the randomness.
Clearly defined nonverbal cues
Stable diffusion/chat gpt like instructions. Ex. "talk about cats for 2 minutes". Instead of only direct tts.
well you would connect chat gpt's output into the txt2speech
Sounds good, yeah, all on the list
Some things can be achieved even if it's not part of the program but just integrated elements.
Thanks. It could go the other way and team up with oobabooga
Already made interface with tools and add-ons. Someone will probably connect it. It has silero and eleven labs. But I'm liking this suno.
Have you guys been able to get consistent voices/characters with bark-infinity?
closest I've gotten is by feeding the full output back in as the history prompt, it works well, but after about 3-4 sentences it develops feed-back like you would get from a microphone.
Someone should pin the important/useful messages in getting-started - beggers can't be choosers though lmao, thanks for this resource whoever's in charge regardless
only by using history_prompts
Hello, I made another PR, this one to allow consistency and deterministic generations, I think it will help some people here, it is just simple modifications in the code but allow people to write scripts for long texts with consistency while keep the bark code generic, see the example in the PR
https://github.com/suno-ai/bark/pull/175
@summer wraith how did you train the gpt model? its so interesting 😮
much appreciated. there are some github issues posted on this. unfortunately we don't release the training code for now, but there are some great papers and repos for very similar architectures
Like eleven labs?
https://github.com/Fictiverse/bark the web ui version
https://cdn.discordapp.com/attachments/1069381916492562585/1101349703565709364/toad-bark-b.jpg
I made some modifications that I think are worthwhile using and posted the UI.py with a list of changes, if you are interested.
link to message from #💬┃general-chat message
In the Bark readme, it says: "Below is a list of some known non-speech sounds, but we are finding more every day." - where can we find an exhaustive list of the speech sounds?
it uses language prossesing these are not predeterment sounds
One feature that Is love to see Billy is the ability to create new voices by the following methods:
-
Mixing voice signatures e.g, 30% Kanye West, 20% John F Kennedy, 20% Marilyn Monroe, 10% Joe Biden and 10% Donald Trump
-
Voice Editor - ability to edit the following parameters: sex spectrum, emotionality expression styles (e.g. how sadness is expressed, how excitement is expressed, etc), typical talking speed, talking speed changes based on emotion and discussion topic, pause frequency, lip effects types, umm frequency, hmm frequency, confidence level during certain topics of discussion, emotion changes based on discussion topic, etc.
Any plans to incorporate SSML support? ✌️
Honestly I don’t have much experience with ssml, would love to see some community support for it though, I’ve seen it come up now a couple of times
Yes, in theory you can write anything you want into brackets or parentheses, with crazy ones being less likely to work of course
Hoping you guys support Filipino/Tagalog language soon, love you guys ~!
Lei from the Philippines
Im not a computer expert But what I meant is that can we use this app without downloading anything to our device locally?
a #technical-discussions channel for more deep-diving into how things work / can be improved
a #1069381916492562583-updates page where people who are maintaining projects related to Bark can post updates / news about their releases. they could have a specific role to apply by asking a moderator for, that allows them to post there.
great idea thanks! there are now two new channels: #📚┃suno-school and #🪦┃community-updates
I just wanted to know if there is a way to have an english audio with different accents(Chinese,polish,english etc and can we clone our own voice. I understand there is a library for different languages)-->https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c
A new tool for teams & individuals that blends everyday work apps into one.
Its possible but some of the accents seem to be skewed to a certain gender. I've managed to get British and Japanese accents by simply adding British/Japanese to the prompt. Its fairly consistent too.
Is there any way that the voice has a Spanish accent from Spain?
hard for me to judge but did you listen to the 10 spanish speaker prompts in notion to check if any of those sound like they are from spain?
in general right now you have to get lucky to generate a certain subtype of a language and once you get a voice you like you can use it as a history prompt for future generations
@tight gorge https://huggingface.co/hhsavich/accent_determinator
you can connect this accent determinator to the output of Bark to detect whether it is the accent you are interested in (at least with some level of probability) and store the generated npz prompt history objects for investigation. this could run overnight unattended
nice! i haven't really found a good out-of-the-box one yet but we are playing around with some new models
{language:en-US} Hey there, how are things going? I wanted to let you know that I'm from the US.
{language:en-GB} Hey there, how are things going? I wanted to let you know that I'm from Britain.
@summer wraith can you use RNNoise to train the fine model to produce better outputs?
en_gb_2 has the characteristic noise surrounding the voice that RNNoise is phenomenal at removing, you could do unsupervised training by using it as a verification
my understanding is limited but you could provide the rough result vs the rnnoise result to fine-tune the model.
so in general we wanna stay away from modifying the base audio that the unsupervised model is trained on (bad microphone speech is also important to learn)
what we need is a knob during inference to tell it what you wanna hear. So we are working on adding indicator variables during training to describe the audio (eg quality 1-10) so that during inference you can just set quality to 10 for TTS
but if you wanna remove the head and add a different head for eg ASR then you want the model to have seen bad audio
oooh that's cool. you could train it on an rnnoise keyword? 😄
hehe
could it be possible to make LoRAs for the LLMs for example a emotion LoRA? we train fine and coarse, training it on IEMOCAP. Say we do happy example : "[HAPPY] This is happy!" -> BERT embedding. Then happy audio -> encodec embedding, training the coarse llm on those pairs. Then after that's trained, coarse LLM to original encodec embedding pair and finetune the fine llm?
we're gonna have a civit.ai for this voice thing before we know it!
i have access to 8xa100 80gbs that I could train it on for a day? $240 doesnt seem too bad? I just need to know the hyper parameters of the config you guys used in nanoGPT
Interesting, let me try to summarize the gist of things:
You wanna finetune the text_to_semantic model to be able to handle prepended emotion tags. And lora is a neat way to do that efficiently. Is that about right? (Coarse and fine should just happily fill out whatever the semantic tokens dictate)
let me clear my understanding, generate_text_semantic is... text->bert->valle?(no LLM) | generate_coarse is valle->weakly aligned encodec tokens?(GPT2-like) | generate_fine is weakly aligned encodec tokens -> strongly aligned encodec tokens.(GPT2-like) | then you decode to create the wav
so we also need to finetune valle or continue training it with emotion data as well as the two LLMs (coarse and fine)
god i hate discord's formatting
all of which yes, to train BARK to understand custom tags like [happy], or whatever you want to train bark to do
if I'm right - I'm more than happy to train it! - as long as ur not about to replace the LLMs with a bigger parameter model lol
i think you are overthinking this a bit
we don't use bert at all, just their tokenizer. also don't use vall-e and also no LLM (other than the underlying network structure being similar)
the first model is trained to convert text to semantic tokens, which are a high level representation of audio. the rest of the models then just convert that into a fancy waveform. so what you are suggesting is to adapt the first of those models: text->semantic, to be able to handle emotion tags explicitly.
and lora is technically unrelated, but could be a useful approach if it works (not much experience here)
no weak alignment or anything needed
Ah my bad, I was just looking at the references for the repo and making some assumptions. Hmm, I'm curious then..
and yes, but in order to train it i'd need to know what that model was trained on and what is it.. which is very cheeky to ask lol
maybe its T5 like they used in valle?
I'm spitballing to see if i guess correctly lol
vall-e doesn't use t5, they use phonemes
sure some training info is internal for now, but in general we gave you the entire code so you can just look 🙂
it's just a trainable embedding. text goes to tokens goes into model
look at the AudioLM paper, that has a lot of good info
ahh i see 😉 , yeah it was cheeky to ask anyway!
so maybe ill leave a suggestion, please spend the budget to build something like https://civitai.com/ but for voices and get some sick finetuning script/ weight merging for the bark models
this is a very cool company you're building@summer wraith
haha thanks, we'll try, thanks for all the support!
no problem - thanks for open sourcing, it helps bring people up more than you may realise 🙂
Yes, I did it. There are common features in the accent of the people in many Latin American countries. For example, the pronunciation of the 'c' and 'z' sounds as 's' instead of 'th'. Thanks for the advice. I will try to clone some voice.
Thanks but that sorter only works with Latin American accents.
yeah, what you need to do is convert the text of the audio using the text->semantic model, then encode the audio using encodec. Then fine tune the semantic->coarse model on semantics and encodec pairs
its a nanogpt model so theres training scripts online - you should be able to train it on a consumer GPU 🙂
maybe take this to #📚┃suno-school 🙂
@summer wraith could you accept a torch.Generator as input for seed value?
setting torch.manual_seed() applies to the whole CPU/CUDA thing within the same process space and i run multiple threads with different pipelines on the same system so i'd need to be able to pass this in upon generation
what would that do? i still don't really understand the use of seeds to be honest
like if you want voice consistency it will only help for the exact same text prompt
if you have two different text prompts the generated audio should be completely different even with same seed and same history prompts
yep, for the exact same prompt
force set seed to maintain consistency between imports of torch/environment resets
if you set the temperature of the LLMs to something incredibly small you'll get consistent results - like 5.4e-079
consistently awful maybe
sure but only for the same text prompt, no?
a different text prompt effectively changes the seed
yes exactly
@summer wraith could i be cheeky - did you use w2v-bert for txt->semantic? i'm 100% going to try finetune the LLMs for custom tags etc
lol you already asked that
and besides, w2v-bert isn't public
it's a google specific model
if you have a decent amount of data i doubt it matters much what you use tbh when finetuning
yeah fair enough -> worse comes to worse ill try train all 3 from scratch
did you guys see this paper https://arxiv.org/abs/2305.02301
new approach to transfer learning- big model trains the smaller model, smaller model then outperforms the big model? magic
would be a very good way to speed up bark especially with all the existing optimisations for t5 online
text_to_semantic process:
input: wte(encoded_text) + wte(semantic_tokens),
maybe use use pretrained embedding will be better?
LIKE THIS:
input: BERT_EMBEDDING(encoded_text) + wte(semantic_tokens)?
Not necessarily, the embeddings of both text and audio need to be similar in the embeddingspace
but im unsure why bark is using an LLM trained on the output of the encoder and not the original txt+wav encoder?
a DECtalk like voice to emulate the TTS hardware from 1993 
Is there a way to create a slider how stable or u stable the voice should be (in terms of variety and not)
that'd be for @neat sail and not the bark devs i think?
Anyone can help me clone my own voice to be used in Bark?
Things to try, turn the temperatures down from 0.7, set top_k like 50 though sometimes higher top_k seems less diverse confusingly, not sure, top_p .95 and lowr
Okay thank you
Actually the most import thing I forgot, but YOUR TEXT MATTERS. The style and format of your text. Try writing a prompt just to find the voice you want. Then use the voice for your ACTUAL text later.
i'm looking into it, it looks like existing voice cloners for bark aren't that great. so i'll do some digging and see if i can find a way to make it better. maybe
Please add arabic,chinese and indian accents
Hi @summer wraith, I want to add Bengali with this model, as per my understanding goes, it should probably work with just finetuning the semantic transformer, (it seems your semantic transformer is something like an overpowered G2P compared to traditional TTS ) , Wanted to know what are targets for the semantic transformers? Is it publicly available or that information is internal ?
unfortunately for now the model is internal to prevent public voice cloning. but you can probably still finetune you own off of the existing checkpoint
please also train on swiss german! its a very varied version of German with many dialects. you can use the "SwissDial" Dataset
#🐶┃bark-technical (happy)你老母大減價
did you make any progress on this? Ive just started looking into it
yes, it's complete, and i'll be releasing it this week
Awesome! Im excited to see it
Do you have a github or something I can follow to see the release?
fully released, also released my webui so people can use this early on, as it's not been implemented anywhere else yet.
model: https://huggingface.co/GitMylo/bark-voice-cloning
model training and running: https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer/
webui with this implemented: https://github.com/gitmylo/audio-webui
The code for the bark-voicecloning model. Training and inference. - GitHub - gitmylo/bark-voice-cloning-HuBERT-quantizer: The code for the bark-voicecloning model. Training and inference.
👏 👏 👏 👏
Do you have any examples with a celebrity voice? Very cool if this works!
Yeah, i can give some examples in a bit
I got hung up some 'just get things worki8ng' stuff, but will do good tests tomorrow .
@honest portal Hi, thank you for your effort, the voice cloning on bark is very promising, I tried to install audio-webui, at the end of the installation I get an error message from the gpu "out of memory".
I have an old 1070ti 8GB.
What is the minimum gpu memory you recommend?
it's fine on 3 gigs if you enable cpu offloading
Also I recommend (if it's an option) to use the small coarse model. It's the biggest and slowest part, and you don't see a huge difference in results, but a 2x speedup/slowdown
right, semantic is the most precise part, so that one is definitely recommended to be the full model
Please connect to a voice channel first!
a channel for resources like documentation would be awesome
Text prompt to unique voice
This would be an interesting feature, and I wouldn't stop using it if it was there
Honestly, that sounds like a really cool feature, but i don't know if there's many datasets describing voices.
i could see if i can create a pseudodeterministic variational autoencoder which specifically creates a speaker based on the same set of semantics. Train it on a lot of speakers, and it will hopefully learn to detect features from those voices, then you might be able to craft your own voice for a speaker file by playing with the latent space coordinates. It would likely be very quick to give a preview, and should work very well as a speaker file.
call that "Craft your custom voice" or something. Good idea
Wanted to post the same yesterday. I personally am not interested in voice cloning, plus it's pretty complicated when it comes to legal stuff. I think the best of all features is a pure synsthetic voice.
It can either work by adjustable parameters (age, gender, pitch, speed, accent, etc ..) but of course the absolute best would be text-prompring a voice. I mean the absolute absolute best would be text prompting a voice in the text prompt, as tags, and build dialogs from that. But it's a stretch and not how bark works ATM.
But having a ways to create a speaker from a prompt would be already quite impressive and useful.
Bark basically is 'text prompt to unique voice' right now. But the effect is subtle and you don't really notice until you use a large sample size. For example write a prompt that sounds like a sermon, "Christ, something something" etc. If you run that 100 times, you will have a higher percentage of voices that sound like preachers, or from a talk radio religious station, than a different sentence. You can imagine making the effect more extreme or even running it tons of times and trying to figure out the voice types it is producing.
I used to have this script I ran, over and over. One single part it the first word was Christ
And with random voices, I was WTF literally I keep getting preachers, and always in this same segment here?
(if I used a random voice for each segment)
Yes, indeed, interesting.
But the description here would be to meta-prompt the voce type rather than have it infer from the content. I mean they did not train the model with tag description in the text, but one day we'll be able to write:
(Old man saying proudly and angrily) I am still your father
However I could see us getting a npz from "an old man with a tendency for humming"
It's not reliable enough to be a technique, but I do think even a silly prompt like "I'm so old and tired and angry. Whatever whatever." you get more voices like that that pure chance. I tried a bunch with something like
Narrator: I'm here to introduce Bob. Bob is old and angry.
Bob: I'm Bob. What do you want?
It even kind of works a little. But in practice using well written sentences that just sound like stuff an old angry man would way, is more effective. So when I'm actually trying to do text-to-voice, I look for books with good writing with dialog from a character of that type. Old angry man dialog, where I can hear the voice in my head when I read the lines in the book. Just good writing. Then I extract all the dialog lines. Then I run a bunch through Bark. I try to find sets that are especially high rates for 'old and angry'. In general using stuff like narrator: or (old man) doesn't work well in Bark, you just a lot of total dud samples. This is all pre-cloning though probably way too much work now.
working on an autoencoder for creating voices by dragging some sliders
i don't know yet which slider is going to do what, but that's the fun in it right?
idk to me it more sounds like bark could be used to create a dataset for a voice description -> voice sample model
then voice sample -- (voice clone) --> bark voice tts
right, but they do need to be tagged, maybe once i finish the autoencoder you should be able to perform text to speaker
that is exciting
text to latent -> latent decoder -> speaker voice
it would depend a lot of the capabilities of the autoencoder, but it would be lightweight and take less than a second to create a new voice, on cpu, probably
Can you explain high level how this works?
numbers go in, voice comes out
So the auto encoder will encode random vectors (in the place of audio tokens) to the latent space ? I mean not random but slider based @honest portal ?
Or referring to this #🐶┃bark-technical message is that impacting weights ? Encoding a real piece of audio ?
right, that's the goal. you can do either random or set the values manually
we could add a meme/funny audio channel
I wanted to add something to the suggestion list it is nice that bark is working on adding more languages I wanted to say that it would be a nice feature to be able to have english with different accents e.g Speaker with Arabic accent english or Speaker with chinese accent english etc
would be nice if I could add the bark bot to my discord server.
you can technically add it with https://discord.com/oauth2/authorize?client_id=1106670547598856222&permissions=8&scope=bot applications.commands
although it probably won't run commands, as i think it's limited to bot-beta
yeah only the application owner can use it, it says
ah
Given Bark has similar style architecture to AudioLM, any plans to include MusicLM going forward?
/bark prompt:Hello, my name is Suno. And - uh - and I like pizza. [laughs] But I also have other interests such as playing tic-tac-toe. voice:suno/Dictating Dan
Is there a roadmap for when we might get to use this commercially?
@prisma heath afaik it's MIT licensed so you already can (but IANAL)
Yes it's MIT Licensed
I would suggest if people are open to it consider asking people here if they can contribute to sharing their voices for data sets. I strongly appreciate suno studios' opensource approach and it is my firm belief that this will likely streamline the process and hope this is available to everyone #🐶┃bark-beta message What do you think of contribute your voice/emotions/sounds threads,each thread can be used for datasets @neat sail I personally think it would be a very nice idea and would help bark reach the vision described by the user
teste
teste
test
Would there be a web app released soon and would there be more voices,being added
We are investigating web - we will keep adding voices to Discord / Bark library too in the meantime
Thank you and I am happy to hear from you guys If you guys give me/users permission can we provide suggestions for what kind of voices we are after @muted flame
teste
testando
hey man
This is amazing and just as much fun (if not more) as creating images with MJ. What would be great is some guidance for usage. I would love to use them.
Thanks! For the near future, the /sing alpha will be free but for non-commercial usage (but will update if that changes)
The rest of Bark / the bark-beta bot are OK for commercial usage
Hello, I love using bark, and I have a suggestion, is there a way you can add voice cloning in the near future?
best thing i ever used can i add it to my server in the future?
Seconding cloning a voice.
Hello, i like the singing voice in bark but I like it more if you guys add a genre controll
better Downloaded file structure ! 😄
Themed bot-sing-alpha creations, not particularly changing the model but what we're trying to achieve rolling the dice
With bark can we create sounds like thunder wind rain car honks traffci etc
You can test that out in the bot channels
/sing
@muted flame, I love bark and I have 2 suggestions, Can you please add voice cloning in the discord server, it would make bark even more fun. Also, can you make it so instead of just doing MAN: and WOMAN:, you could make it a full prompt such as MIDDLE AGED MAN: or CURIOUS WOMAN:. With these features, bark could be even more amazing than it already is!
Thanks for the suggestions, we are thinking about these
/sing
Good suggestion
The rain gently knocks on the window,
The tick-tock sound is melodious and melodious.
Listen to the sound of rain and feel at ease,
The flowers fall on the ground and rhyme for a long time.
Eu sou especialista em dermatologia estética, inclusive, recentemente tive uma reportagem na Globo sobre meu trabalho.
Syllable rhyme enunciator 🎚️
ORANGE 🍊 FOUR-INCH, DOOR HINGE , IN STORAGE , and ate POORIGE with GEORGE
For the uninitiated, the hardest 60 Minutes interview there ever was:
According to the Oxford English Dictionary,, the only word that perfectly rhymes with “orange” is “sporange,” an uncommon botanical term for a part of a fern.
Read more on Genius: https://genius.com/a/eminem-proves-there-are-plenty-of-words-that-rhyme-with-orange
Subscribe to Genius: http://bit.ly/2cNV6nz
Genius on Twitter: https://twitter.com...
hey @fading elk, go to the #🐣┃chirp-beta-1 channel, type /sing, press Enter and paste in these lyrics
haha right 😄
make songs longer
/sing
BRO STOP GO TO #🐣┃chirp-beta-1
/sing
/sing
/bark 说个笑话
keep the bot free.
i'd like to host bark servers using my gpus for external inference. as suggestions I'd like to see the ability to host the models across multiple gpu in a swarm network of gpus, similarly to whats being done with petals with llama2 models. another suggestion would be a wrapper API application that runs BARK. a final suggestion would be to open voice fine tuning capabilities to the model. I understand the risks to voice clone, deep fakes, etc. make it available to those who provide with a valid ID to clone their own voice, in a way we can productify personal assistants with the owner's voice.
Any chance you guys could add a regenerate button when an error occurs?
yeah, we should be able to. Also I think we found the source of the errors from earlier, so should happen a bit less frequently with continuations now
/bark this is a test.
There should be a way to see in what place we are in the queue, for both bark and chirp (if there's even a queue of course)
ya, the queues are pretty variable (we're working on it)
chirp killing me slowly with this song
/chirp killing me slowly with this song
/bark I love cornflakes
/chirp
@steady bay head to #🐣┃chirp-beta-1 or #🐶┃bark-beta. If you want to always see the channels, tick them in id:browse
upload a vocal and gen the music is good
I'm sure it's been requested before, but it would be great to have chirp produce separate tracks for the vocals and accompaniment so they can be further manipulated. Yikes, that sounds sinister. I mean so we can get even more creative with the results.
The ability to prompt the genre, select instruments, style, mood, emotions, and anything else that could guide the desired output.
/bark
Is there a way to suggest a style of music or a specific artist without it incorporating those into the lyrics?
I asked ChatGPT and got an answer that seems to work fairly well.
"Compose a song in the style of this music. The song is about song subject."
/sing
/sing
国际佛教电影节(IBFF)将电影作为一种媒介,让广大观众更广泛地欣赏和更好地理解佛教,特别是当今世界范围内佛教领域明显的种族和文化多样性。展出的作品是通过程序委员会邀请以及国际公开征集征集的方式选出的。
/sing
/sing Compose a 10 second jingle for an pizza ad it should be family oriented and really happy
/bark
Imagine having a text-to-prompt voice system where we could develop a voice based on text prompts etc or a mixer-type system allowing us to craft unique custom voices wouldn't that be cool guys I do know that this is actually easier said than done though
Oh yeah, it's so much fun in Bark.
I would agree and actually look forward to it #🐶┃bark-beta message
/break 说个笑话
134
Hey @modest zealot, go to the #🐶┃bark-beta channel to use the /bark command and to the #🐣┃chirp-beta-1 channel to use the /chirp command
/sing
https://youtu.be/UJ5zcKGrjis GUYS SUNO IN VIDEO!
Suno is a company that builds foundational Audio AI. Their AI models enable developers and creatives to generate hyper-realistic speech, music and sound effects. Today we take a look at their recent CHIRP AI that generates music for free at a high fidelity!
▼ Link(s) From Today’s Video:
✩ Suno AI: https://www.suno.ai/
✩ Suno AI Discord: https...
probbly was suggested but ether way, a "music to music" would be cool , like upload part of a music file and it takes inspiration from it. or give a prompt to derect the style of music.
meta made that
i thought that was closed or somthing?
idk
also sorry kind of brain deth. ment that it would intertwine with this with my sugestion. so like you upload a short music file and give lyrics and it would combine them .
/sing
Is there any guide/tutorial/blog/thing to detail "how to use parameters" like using chirp prompt 123BPM inside curly brackets - style in double quotes etc. or sumin' like that?
/chirp
frances and jubby love the rookery and tommy builds the land
/chirp "frances and jubby love the rookery and tommy builds the land"
/chirp
frances and jubby love the rookery and tommy builds the land
/bark
those don't work right now but are on the roadmap - best way to steer generations is through picking the right lyrics
/chirp
You have to make it mention in a new message once it's complete.
My suggestion is to be able continue other audio samples other than the music generated itself, you can make some pretty cool stuff
Why do you allow 4000 characters if the bot itself will not run it?
dont flag stuff
is there currently no way to control the type of music?
Hey @ivory hamlet, not at this time, but we're actively working on controllability/steerability. If you put in the same prompt multiple times, you're likely to get multiple different genres/vocalists/instrumentation. You can also try "nudging" the model toward certain genres by adding genre-specific words to your prompt.
E.g., DJ Neminem on the beat is more likely to elicit EDM
hi is there any voice for spanish?
Yeah, I found if I just say the genre with a colon right before hand, it usually gets it
we should probably fix this 🙂
/bark
Head over to the #🐣┃chirp-beta-1 channel to use /chirp and to the #🐶┃bark-beta channel to use /bark
/bark
people are idiots
Good example of it, skipping the words, but using them as a genre suggestion
Nice. It's pretty neat that that works - still plan to enable you to do so more explicitly similar to permutations in Midjourney where you can do the same song as {disco, country, folktronica, pirate metal}
/chirp
/bark
Now hear the song of the Zorusson
One of the chosen. A man with a mission.
Elected spokesperson of the Omega Brethren:
He compiled the teachings and he learned all the lessons.(x2)
He dove through his psyche, untangled his emotions
He named, fought and defeated, every one of his demons
He mastered the tricks: the Omegapsychotronics®
He ordered his spirit, like a cubic from Rubik(x2)
He ventured, friendless, through the Palace of Madness
Came out with proper papers, rubber stamped, all in order.
He molded the golems, infused ’em with rhythm.
He released ’em from his dreamin’ to put an end to the sleeping(x2)
Now pick up the signal in this morality fable
and join the capable, at the Omega Table
It’s always the moment to rise ‘bove the torment
Put on your garment, overcome your impairments(x2)
/chirp
Such great progress, Suno team! And thanks for releasing this for the public to preview!
I do have a suggestion though about the /chirp feature. Instead of parsing the Text in the form for stylistic ques, why not let us invoke /chirp with extra tags? For example, I'd type "/chirp --style classical --mood dark" and then the dialog for text would appear.
definitely want to - on the roadmap (some tech to build for that 😄 )
/chirp
@viscid imp, @pulsar frost, @polar owl and @little flame, head over to #🐣┃chirp-beta-1, type /chirp there and press Enter like this:
Been there having fun....accidentally jumped in here. Thanks though
As long as it's something a little bit more fluid, rather than having to fit it into premade styles.
I would love it to be a little bit more like I can describe how I want it to sound not just say the genre.
a good example is "1950s Christmas special" to get something like you would hear out of like Rudolph or something.
yeah, love it, we definitely want to get there!
Could you break the generates into multiple threads? because it is too hard to follow it is so active.
Yeah, we will probably add some more channels soon
is there a way to control the music genre in the chirp model ?
Not control but influence: #💬┃general-chat message
Rap" shake that ass tonight " by Eminem and NateDog
I am not too sure how feasible is this or the technical intricacies involved but would'nt it be nice if there was a way to custom design your own voices and accents?
Accents at least are straightforward. Usually you just have to use a voice with the wrong language to produce an accent of that langauge. It's not 100% but works often enough that you can usually find what you want.
Would it be possible to allow people to upload songs and then extend the song using the extend feature, or possibly use it as like context for what sort of style someone might want?
This is the best thing I have ever seen. Thank you! Like Midjourney, but for music. Love it. Would be epic to get a bit longer bits, but I understand it is complicated, the music tends to shift.
/sing
/chirp midnight blues
Is it possible to let users make their own server. My prompts are drown in the sea of prompts of other people. With Midjourney it is possible to do that, but I cannot find that feature over here. Can you, if possible, help me with this?
Ah, yes another thing. A style reference point would be great. Now you have to wait and hope for the best. But it would be great if we can enter something like e.g: --style classical bigband music.
Yeah, that's on the roadmap. I think we'll add a few more channels for now to help
we are working on this too!
/chirp
Last question for now, Apparently there are a few jobs in the pipeline, but I can't find them. Is there a way to look them up? Or is this something for later concern?
you can search "mentions: YOUR_USERNAME" in the search bar
/chirp
Hey @celest storm and @ember egret, head over to one of the #🐣┃chirp-beta-1 channels and press Enter to get started like this:
A GAN to clean output to make it more realistic
/chirp Oh, have ye heard of Seamus the Leprechaun?
With his great ginger beard and hat of green
He whiles away the hours carvin' wee sculptures
Of creatures mythical, imaginary and seen
thats how it should be sold 30 sec free. then pay for more Brilliant!
instead of sending the results into the channel have it create a private thread in which you then can continue the song and at the end publicize the final result to the channel (will reduce clutter)
写一首生日歌
/chirp
IF the model is deterministic, could there be an option to use the same seed but a different text? That'd be cool when you want to see what exactly a small change in prompt would cause. Eg placing an oxford comma or british vs US spelling.
Suggestion: A channel in this server where people can showcase tidbits or quirks about how chirp works
/chirp
/chirp
/chirp
Excellent initiative. It might be interesting to implement other functions such as choosing the bpm range, or the style of the song (urban, pop etc).
the bot seems to have stopped, mentioning on use of command so now it's even harder to find your generations. It really does need to just re-post with a mention when complete.
why can't I add Suno bot to my personal server in order to nos lost track of my generated sounds?
Has anyone got it working on huggingface?
change thumbnail/cover creation to sdxl1.0
i pay for it
also add a pantheon and better voting system to get the best ouputs displayed daily
can i invite suno bot in my server
Is it possible to get Norwegian added?
@last sage @vestal hamlet we are working on allowing bot on other servers - stay tuned
Suno, could ask or accept some prompts like gender, bpm, language, gender of singer and repeat function
could anyone tell me how I put a song in for reference?
make a cappella generation mode
Definitely working on a bunch of these for next version
At the moment we don’t allow conditioning on other songs sorry.The rights are a bit iffy for that
Please find more equal tier duets like You are the reason by calum scott and Leona Lewis and train those more
It would be nice to have an option to generate beats like this in a easier way
You think its possible to add Norwegian sometime in the future?
should be possible, working on it!
thats awesome, thank you 😁
can you send me two verses?
you can just post into channel here
and Danish please, @summer wraith 😉
same, paste like 2 verses of danish here plz
whatever u wanna generate
yeah ok, I tried with the first verse from this earlier on actually (it's the Danish national anthem):
Der er et yndigt land,
det står med brede bøge
nær salten østerstrand
nær salten østerstrand.
Det bugter sig i bakke, dal,
det hedder gamle Danmark,
og det er Frejas sal,
og det er Frejas sal.
Der sad i fordums tid
de harniskklædte kæmper,
udhvilede fra strid
udhvilede fra strid.
Så drog de frem til fjenders mén,
nu hvile deres bene
bag højens bautasten,
bag højens bautasten.
it worked in the way that it kind of sounded like Danish, but just with a very heavy English accent
besides quality, how is the prononciation in these?
25 sounds more like Swedish, 26 more like Norwegian. 27 sounds closest to Danish
27 sounds a bit like the one I generated earlier, although with a more subtle English accent I would say 😄
hehe yw 🙂
haha ok
it be nice if there was a way to make it so we can just generate music without lyrics
@summer wraith If you're doing multilingual support, my friend really wants to make sure it can do Esperanto
Please keep this free 🥺🙏
we'll try
Would love to see - mixing and sampling prompts added in or some form of premixing module loaded into the model.
Add russian
!Add support to set song style (r'n'b, rap, house, pop, rock, trap, dubstep ans so on)
Make Flemish or nl version to please
Finnish language
add polish pleaseeeee
10 more Chirp channels.
Searching for mentions doesn't always work, so after one minute, I have to scroll through 69 other generations to find the one I submitted.
You can search for your own messages instead from: lolisa_69
is there a way to march the end of the last continuation to the beginning of the first sample?
I wish to make it into a loop song
Chirping without the need to make a picture
follow the use of hints MAN/WOMAN: for bias towards speaker as describe on HF space
That would be a cool feature, but not now. I think it might be pretty challenging to do it really well actually.
generate with seed id support please?
Hmm technically you could just frame it as a slow filling task I suppose.. right now we wanna get generation and control to a good point but next we will definitely also work on some editing tasks
Hello, 2 questions/suggestions:
- When could we expect French language?
- When could we expect option to add this bot to our own server ?
yeah I figure it would be pretty challenging, but would be really cool to have a continuation option but reverse
definitely like the idea
This is really great. With some continued improvements, I really think you've got a winner here. I just need to be able to control genre (rock, dance, pop, etc) and vocals (female, male) first and foremost. But then also ideally length, tempo (slow, normal, fast), and mood (happy, sad, sexy, etc).
Would be pretty cool if there was a way to.choose the chord progression
i hooked barkai to llama also build a barkai demo for the concepts with threading https://github.com/graylan0/ModeZion/blob/main/samples/bark.using.re.1to1.demo.py https://github.com/graylan0/ModeZion/raw/main/samples/573cdeb9-85fc-4377-b0bb-faedf840e2a1.wav (the model says our llama2+bark) llama + barkai intercom very cool bark ai summary code with no summary from llama very smart , very good sounding model. super promising, all my time = bark ai ❤️ woof. here is the details for the longer python output read , u can really tell the model is acting with to summarize and that is incredible https://github.com/graylan0/ModeZion/blob/main/samples/barkai.llama2.intercom.66a7583e-1cca-41b4-9cca-7a2f9b2c7839.wav.log
maybe like an API setup would be nice and a solid pip that use like a differnt name like barktts or something like that or barkapi
A graphics user interface with the latest LLM and image model AI technology . Includes llama2 and Stable Diffusion and barkAI - graylan0/ModeZion
interesting, how would you prefer to input that? like literally text in the input? would you care about durations of each chord? all of this is pretty much the same from a model perspective but we wanna find a framing thats most generalizable. f0, chord, chroma, note conditioning etc
yeah we have to come up with a way to verify that people upload their own songs. kinda like the voice cloning issue
Neaatt!
aw tyty ya we are working on a bunch of those for the new version coming out v soon fingers crossed
Keep me up to date when that comes out! I'd be super excited to try out that feature
Long long shot here. But will be great to be able to upload a melody (guitar, piano, etc...) and imput the lyrics and the AI create the vocals based on the instrument melody
Doesn't sound impossible. You can do something like what the melody conditioning that musicgen does.
Will take a look, thanks!
Oh musicgen doesn't perform with vocals. But I think it'd be pretty easy to add to the chirp AI
I have like 50 instrumentals done and no vocals ahhaha
Would love that
what level of melody conditioning would be most interesting to you. like it should take the basic f0, the base chords, the exact timing of the chords or literally keep the entire backing track and create lyrics on top of that
what musicgen does is it allows you to use a custom chroma to generate new instrumental music with the same chroma
I personally would like instrumental -> vocal
but piano melody (or midi) to vocal and song would also be 🔥
so getting vocals that fit to a backing track input?
Yep
Yeah, that will be ideal to me.
Here is an example of the track:
Hey, check out my track: white dove
https://app.landr.com/projects/07356331-c54d-47f4-b7ff-f7032ce5fb9a?publicToken=cc7126ea-69e1-4a9e-b41c-726a12ea2466&sharing
sounds great, love it!
Is there a way to upload my record vocal so it can create a music accompaniment to it
?
not yet but v interesting application
Did a bit of experimenting with Bark instrumental -> vocal Not working but super cool unique sound.
If possible for there to be a tempo, genre and chords in a line.
With MusicGen, even though it uses a full audio file, it still works if that file is not an actual song, but just a pure bleeps and bloops barebones song too. Or humming a melody. So you can kind of randomly gen a midi track or something even if the user doesn't want to prompt that specifically, and it can handle those cases somewhat too. I think it would be fun to literally image-edit the chroma but never go around to trying it, but the rendered version is totally readable and something you could almost drag and drop to edit.
My guess is that only people who play instruments or compose music people will list actual chords, but humming a melody or providing a reference song for the melody anyone can do.
pdf file to audio file
is that something the team may be interested in doing over time?
Would be great to have the ability to select from a variety of musical styles, tempo, instrumentation, voices, etc. I am sure this is on the roadmap but would be interesting to understand what features can be expected in the future.
This is about Bark. Sou guys it would be very helpful if there was an option to choose voices. For example a list of voices including voices from different regions.
it would be good if my stuff doesnt get flagged
i want ambatakum
and i didnt get ambatakum.
Maybe a way to type in words AND use a prompt with it
Hey Ali, thanks for the suggestion. Check out the Bark Speaker Library (v2), which is a work in progress:
A new tool for teams & individuals that blends everyday work apps into one.
Wow thats pretty cool. That's what i was talking about. BTW, is their currently a way to change the voice when we add prompt?
Hm let me know if I'm misundestanding the question, but a dropdown should appear for the voice: parameter when you type /bark in the #🐶┃bark-beta channel. You can select a specific voice or choose random there like this:
Im wondering, will we ever be able to input our own backing tracks for this?
just to see what riffs it could create
We need beats that go super hard
It would be nice if you could select type of song (as many others have already suggested), and if you could use something to tell chirp that this should be an intro, chorus, solo and outro etc (for example by using {brackets} like {chorus}). And when marking something as a chorus, it would be nice if chirp could look at previous snippets where chorus is used to make them sound similar.
yup already have the tech, testing internally and hopefully release soon
It would be nice to have the option of creating instrumentals for a specified genre with no lyrics or vocals.
It seems like a lot of good generations have just the final syllable cut off.
Maybe suggested, maybe discussed… but this is a big community. I find my lyrics not fitting in a 30s window. Any chance I’d getting that bumped up to 45?
/the text input box on Chirp says 4000 and counts down as you type. Yet, the maximum input is MUCH smaller
yeah... input depends on a lot of things and needs to be tokenized :/
we'll try to make it more obvious but it's a tad tricky. also not super obvious what length works best
Kek, cool to see this become real 😄 https://cdn.discordapp.com/attachments/1128313332273782784/1138674724801609748/chirp.mp4
Ironically the one thing I kind of can get out of base Bark
Report it a bug. It happens many times that by inserting a text, Italian in my example, and attributing a NON-Italian item, Bark generates an incorrect result. The text is not read even if with a foreign pronunciation, but it generates an audio with totally different words than the prompt.
In that clip, you are using the voice 'suno/Bubbly Bill'
Bark often makes mistakes and says the wrong words when you use a voice from one language to speak words in another language. To get a more consistent result, set the voice to Random and try some Italian text prompts, long ones in particular. Then click "Save Voice" button on a voice you like. If you then use that voice with Italian, it will usually be more consistent and less likely to get the words wrong.
Suggestion Is there the possibility to select a voice that automatically detects the language of the text, in order to generate an audio in the appropriate language? Does selecting "Random" as an entry, does this kind of work?
Yes it does!
Not just the language, but the type of the voice.
Thank you @neat sail 😄
You may have to try 4 or 5 times to get a voice that sound good, and works well after you save it. If the audio clip is cut off mid sentence, for example, that voice may not work great, helps to have a natural ending. But shouldn't take too many tries to get a good Italian one.
Perfect! 😄 Thanks Jonathan
You know that you can continue from where it left off? That sounds great, by the way!
how do you fix broken clip? like it just cutting out part why through?
X-posting
#💬┃general-chat message
why does it often seem to bug out for a majority of the clip and then start singing at the very end?https://cdn.discordapp.com/attachments/1128313332273782784/1138828199749488670/chirp.mp4
guys why real time voice changer client not work too amd graphic
My apologies if this has already been suggested- It would be great to be able to download the individual stems for the generated track, as well as have the BPM info. As a musician, this would help tremendously. Just having the acapella would open a world of possibilities!
thanks ya right now we generate everything together so you would have to stem separate after the fact using demucs or similar, but we are looking into doing it separately. in the new version teased in announcements however you should already be able to do things like [female vocals] which will probably skew pretty heavily to acapella (although not guaranteed)
Looking forward to trying to generate some acapellas. I will see what kind of results I can get from a dedicated stem separator.
if you noticed we do that in the little videos. top and bottom sound animation are vocals/background (using demucs)
i already saw chirps in other languages. so i would say, it already can.
improve spanish from spain please
"Hi, I apologize if this has already been suggested. It would be helpful if, when using the 'like' option (The one with the heart icon), we could access a full view of the sequence and opt to use the chorus multiple times. For instance, in this clip, I have a chorus, but trying to replicate it or find a similar one using the /chirp command can be challenging. Typically, songwriters retain the same chorus they've previously created. This tool functions similarly to a clip editor, but it's still difficult because there are distinct parts that we haven't been able to label yet."
I apologies to for my english I use chatgpt for fix the grammar as much as I could.
💡 /chirp in #Thread
Hi
Arabic is not among the supported languages. Not sure if it will be added in the future.
I am a director of a research lab in Arabic NLP and we can help the team in Suno to make Bark support several dialects of Arabic ... let me know of this sounds possible.
much appreciated, our next release will be focused on chirp which should support arabic. would love to get some feedback on that once it comes out
That is great. Sure would be happy to give feedback 😊
Try generating a generic into first, then add lyrics after. Realistically not very many songs start off with lyrics 🤣 think freebird.
how
Click the arrow under the generated song, then the arrow under that one, and then the one after that. When you are happy click the heart and it will combine all the clips like a replay of your musical "choose your own adventure" 😎
I mean how to do just intro?
Okay, I am trying to finish my song to give you an example, get it just keeps giving me a song about gay pirates 🤣
.>
I think I clicked the arrow on the wrong video.
But I just put [post metal]
As Long as you put it in brackets it shouldn't say it. And if it does, just retry. Then the lyrics should start at the beginning of the next clip
Especially when I use the continue function, it continues but almost always cuts off part of the lyrics in the end. Is there a way to avoid this?
Please update!
It sucks at where it is currectly
Bark can hardly do sound effects
Or we need a propsr tutorial
What kind of sound effects are you trying to make? Do you mean speech controls like [sighs] or non speech sound effects like [helicopter nosie] or [jet engine] ?
Well I have good and bad news. Bark is not great at farts and sneezes, but a brand new model that just came out called 'AudioGen' is basically the most advanced fart machine in human history
How do i use it in discord
It's not part of this Discord, it's totally different project. It's pretty new not sure how many easy ways to try it are.
Also I haven't actually tested sneezes and stuff in Bark, maybe I should check, just in case they actually do work well..
Does this work on mobile? https://huggingface.co/spaces/victor/AudioGen
Is there a tutorial for it
For basic formatting
Template etc
I think for that model you just describe the sound effect in normal language. Like "A man whistling while a jet flies overhead"
I tested sneezing in bark. So far [sneeze] is doing sneeze-like things, but no sneezing. Sometimes you get "ah Ah ah" or an intake of breath, just like somebody was *about * to sneeze... but no actual sneeze. Not a single actual sneeze. The third sample she sounds she *has * a cold, like her nose is stuffed. I think I'm getting a cold. [sneeze] Oh now I'm sneezing. [sneeze] I'm sorry.
[sneeze] with non-random Bark voices is almost always pure noise, it only works a little bit with random voices I think. Not a single full-on sneeze in 40 clips. Kind of surprised, there's probably some in there somewhere.
I did get a sneeze eventually. Though it's not as clear as that one. And also this near-miss sneeze clip, sneeze setup into a laugh, and a yell.
haha, ghost in the machine 👻
where is my AI generated horror game? I swear these things are just pulling audio and imagery from another dimension! imagen this audio playing in a game like dead space. throw in information from my Facebook, like posts and altered pictures described in the voices of my friends an family! 💀
I know, right? Bark is legit S tier horror AI model #🐣┃suno-showcase message #🐣┃suno-showcase message
I hope that the Devs at Suno make Kpop, Jpop, Rap, Anger Rap, and Heavy Metal genres available to choose in an upcoming update. And allow us to mix genres as well. Also let us choose whether a male or female sings. But one thing if we could stretch out our 4 lines to 30 seconds in song length. Instead of the random length that we are given.
That song made me feel so uncomfortable lol!
I can't wait for this kind of results..... ElevelLabs will have to run hide :p
will we be able to use our own instrumentals at somepoint?
Training loras on music and getting similar music with the same voice!
Fixed
Will a future bark version adopt the flow analysis technique that meta showed off in their voice box project? Could maybe improve performance
not too familiar with CNFs. it's not autoregressive for better and worse, but worth to keep an eye on. for now there is not open code for it and diffusion prob still more popular
Suggestion: "Continued From" could also note which of the two clips it's from: "Continued From #1", "Continued From #2". Save a click each time from having to check.
Suggestion: Extra field that just adds extra context to the Stable Diffusion prompt, possibly optionally replacing the lyrics in the prompt. Tiny nice-to-have convenience feature.
you can check out VITS (https://github.com/jaywalnut310/vits) has an implementation of normalizing flows
Where can I find my saved voices?
If you go into the Bark channel and generate, they should be in the list of voices you can choose from (you can search the list, they might be not on first page)
I could find them here https://studio.suno.ai/library but do they show up anywhere else?
You can use them in the bot as a voice, in the list of choices.
And when I press [SAVE VOICE] I get this message: ❗️This interaction failed
You may have to start typing the first letter to find the right voice, if you made a billion of them.
Any suggestiong to why I get this message when trying to save a new voice? ❗️This interaction failed
That's probably a bug in the Bark bot, I don't think you're doing anything wrong.
it's a little bugged right now - we're working on that one 😦
is there a guide of consistency for chirp ? im having issue atm
2 verses is really good but it drops after that
beat is off or vocalist voice changes
yeah, if you're using continue, that can happen - we are still working on consistency in general
Ok, I certainly hope it gets fix. I noticed that it always moving towards rap when I want another genre
Should be able to keep the song in the genre and style you want, for longer songs, soon. Having it be perfect like a real song, whre minutes 3-4 of a song properly reference and riff on minutes 1-2 may a more challenging problem, is my guess. Might take longer. But things are moving fast...
Well, I am sectioning of 4 lines as I see more will turn it into rap as it needs to talk faster. Does it count spaces?
yeah, that definitely matters right now. Although in the next version of the model (coming soon) you will just be able to pick your genre
Spacing and newlines generally matter as well
That's will be great, are they improving vocal cut offs? Certain time it would have a solid beat but will not sing until later then I try to enter in from the word it cuts off and either the vocals or beat changes
That is still a little tricky, but we are thinking about it!
That's great. I think the contency from the previous track used as reference will help. A little edit won't hurt on the user side
Beats are usually looped
Bark and Sing both do this thing that reminds me of high school debate, from a billion years ago. (I'm not young.) Where you are speaking and suddenly realize you only have 5 seconds left on the clock, and you BLAZE through the last few sentences as fast as you can possibly talk, a race to the finish. I see that all the time in Bark, and in Sing. Honestly kind of adorable.
😆
ChatGPT does it too. It's doing a great job, and then suddenly, you can almost imagine the model panicking "Oh god we're almost at max length..." and it just goes 'And then do x and y and z and that's it!" super compressed. transformer thing.
You are absolutely right about that. I know we are at the point of separating vocalist and music with ai. That would be a great start if that would be an option
Spaces matter a lot. I also find it helps to use [verse 2] at the start to encourage the model to sing the lyrics right away. I think maybe it makes it more clear it's not the intro to a song, where you might have a section of music only.
Ahhh ok
Spaces AND line breaks. Are extremely important, and change a lot. Try the same song with no line breaks, sometimes Sing will just speak it.
👍, I have been using breaks alot
I used the samples on Sonu website. Just the way they enter in the lyrics
Sometimes it just doesn't work to the result I wanted
May I suggest that the regenerate button could be in a different line from the hearts and continue buttons? Then, you could add a remix button next to the regen button that allowed you to alter your lyrics. Mainly that last part.
yeah, we are going to change the regenerate button to just a remix button which lets you alter anything (or keep it the same). Good suggestion!
.,,.,,, Hey guys I had a suggestion; imagine if Bark had "voice to voice" model that can convert spoken language from one voice into another voice while preserving the linguistic content and intonation.
In other words, it can take audio input in one person's voice e,g mine and generate output audio in a different person's voice while aiming to maintain the original meaning and emotional nuances. Additionally this will also allow suno studio to have a database of human voices which will allow for training purposes and lead to better audio generation. I mean I do not know the technical intricacies involved in the process but I think it may help
AND NO WHEN I SAY GENERATE OUTPUT in a different person's voice,e I am not suggesting that Bark does something like Donald Trump or anything like that but I meant voice-to-voice from the list of voices in the Bark beta bot while preserving the linguistic content and intonation of original words spoken
2)I do like to add though that I have some requests for more variations of voices to be added especially something that sounds like the speaker muffled manny but in different accents of English (Chinese,Korean,Indian polish morrocan etc) I would love to hear from @neat sail on this
heres a suggestion add the feature add to server for the bot
💡 "Audio Source Sample"
something like: /ref: {url}/{value}
w/ likeness parameter input value scaling 0 to 5 ,
where 0 = very similar & 5 = very divergent to sample
well , add more languages mostly the other characters languages like korean
also this - Could you try to make (hum to song) as a reference ?
more languages coming soon. audio conditioning like humming we might add in the future but not in the next release
Hi Syed, there is a way to do #2 suggestion now, with some work.
-
Use Bark to create random voices in all the languages of the accents that you want. Generate Bark samples with a random voice (pick no voice) and use nice natural sounding text prompts for each language. Use fairly long prompts (multiple sentence) and use a native language source for your text prompt. Don't get it from Google Translate or anything. For a Korean accent, you are using 100% Korean text in the Bark prompt.
-
Listen the random voice samples until you get a voice that you like. You won't be able to get 100% voice match with the Manny voice but you should be able to get a voice with a similar speaking style and vibe.
-
Save that audio sample as a new Bark voice. In the Bark bot you can do this with a button. In the Bark source code it's the save_as_prompt function.
-
The Bark magic: Use that new foreign language Bark voice with an English text prompt. Maybe half the time or more what Bark will give you is a voice speaking English, with an accent of the original language. The other times you will get Bark hallucinations (random words), perfect English, or occasionally just noise. So your first try may not work, but keep trying, it absolutely can work. On the Bark bot I think it works more than 50% of the time actually.
It may be hard to judge if a foreign language voice has the same speaking style as Manny, if you don't understand the other languages. But you should be able to get close just by trying a lot of voices, at least.
Your #1 suggestion is a tall order, at least, it's a lot of new capability added or on top of Bark as it is now. Oh actually I misread, I thought you were also translating into a new language. You can almost try something that now with the open source cloning model and using one voices semantic tokens as inputs for another voice, but the output is terrible compared to a pure voice to voice model. The output can sound weird and interesting though. Doing this with music in Bark gives you lots of fun samples, but they are not much like a voice conversation as you suggest. The vits based models like RVC are the gold standard still for that use case, though you do have to train each voice.
What happened to my room? It’s gone.
Suggestion: Being able to crop a clip and extend from there
thanks ya defo on roadmap
Also cancelling generations
Custom syllable emphasizing
,.,,, Dear Jonathan Thank you for the response and the detailed instruction I appreciate the time taken to explain this to me and I think this would benefit a lot of other folks. With #1 being a tall order I reckon it probably is. I did want to clarify in case I did not make it clear When I mentioned voice to voice model I did not mean translating into a new language: it would be nice if this is possible but not necessary I also hear you out on the intricacies and limitations involved given the current state which I really was not aware of but I really think if V2V ever becomes convenient I would really recommend looking into it but either way thankyou so much for writing to me
Hey why is that sometimes first one or two lines are skipped?
@granite tartan any idea?
AI models are unpredictable, happens in regular Bark speech model as well. Even happens in Eleven Labs if you crank the style slider to max. Usually you can just try again, or try slightly shorter lyrics.
Discord Invite Link: This link https://suno.ai/discord was expired, but the one on the github https://discord.gg/J2B2vsjKuE did work. Nevermind, it's only if you click the + button inside Discord with the first link that it fails. It works find in a web browser where you get a new link back.
So basically outpainting but for audio? Wonder what a good name would be for that out audio? Lol inpainting would be another good feature to have
train voices with bark , more languages and pls if you can : fix the emotions
If it's possible make it create.
(Vers)
(Chorus) that is more the same. I I continue a song the Melody will always be difrent. I don't know if this is possible with the resources you have. But I'm guessing it must be possible.
Add a feature where the chirp bot makes vocals for an existing tune or instrumental song, or where it can basically write parodies.
Basically you upload an mp3 file with audio only and the bot makes vocals to fit that audio
definitely something we are thinking about yeah
the main limit rn is context length. even with continuation the model basically only sees 40s at a time, so it might 'forget' the chorus before it comes again
adding to that normally chorus has a different melody from the verses, that actually normal, maybe not in all the songs because some songs uses just the chord progression but they still change little bit of the rhythm, but something that I could see is that after the continue for a second time the song you are making for any reason the intrumental sometimes lost or sounds a bit low compared to the voice.
One suggestion from my side:
There is only one option of English female voice.
There should be multiple with different excents like Indian English excents
Oh oh there are many voices!
It's a misconception.
I'll give you one right now.
Bark can make any voice, on the spot. On demand. It's not one voice it's infinite voices.
It's just a confusing name "speaker directory" but really Bark doesn't even need voices, bark has all the voices inside itself. Every single day somebody comes in here "Oh Bark only has one female English voice, too bad..." !!! 🤦♀️
Explanation: #💬┃general-chat message You use a Hindi prompt first, with no Bark voice. (Leave voice on random.) Then after you get a Hindi voice, give it an English text prompt. Then you have Indian dialect. It does not work 100% of the time, but it works.
For a female voice, just keep trying until you get one. If you can think of a Hindi prompt that is likely to be spoken by a woman it would make it more likely.
Sure I would try this
Thanks
Got a Female Indian English Dialect Bark voice for you, here's the file you need (the .npz file) and a sample. If you're using regular Bark you put the path of the .npz file as the history_prompt parameter to use a voice.
Go to the #🐶┃bark-beta channel to use the /bark command, and check out the Suno Discord Commands guide for step-by-step instructions for using /bark and related commands in the server.
https://suno-ai.notion.site/Suno-Discord-Commands-5b62a5bf426346ad8355164c9ecb5115
Awesome is this available on discord bot as well Can we have a thread where users can request voices/dialects Thankyou so much Jonathan
It's not but if you use very expressive prompts you can get just as good voices in the bot.
- For Indian dialect English you can try starting with Hindi prompts, choose something written with a lot of style so you can hear how it sounds in your head when you read it (in Hindi ). Use large prompts and when the bot creates a nice random voice, save the voice with the Save Voice button. Then try using English prompts with that voice. Sometimes it doesn't work, sometimes you have to try a few times with that voice and it works 1 time in 3, but you can then save the second voice (the Hindi language voice that spoke English with the accent you want) as a new version. Then use that voice in the bot.
- When I was working on this voice I found you can also get the dialect with pure English prompts, if you overload the prompts with Indian references and terms. (I don't speak Hindi so coming up with good prompts for option 1 was more difficult for me, although you can just copy text from a Hindi novel.) The English prompts for Option 2 will seem very silly but you have to try to encourage Bark as much as you can.
I kept a short list of the strongest Indian Dialect Bark keywords: Arre bapu, Aiyo, Jai Ho, Balle Balle, Chak de India, Namaskar, Namaste, Swagatam, Kabaddi, Masala Chai, Dabbawala
The prompts to make the voices will look silly almost like you're making a satire, but you gotta really go over the top so Bark gets the idea. Here's a few I used:
Chak de India! What a sensational cricket match this is turning out to be. Jai Ho to our bowler for that amazing wicket! Arre bapu, even Kabaddi doesn't get this intense. Aiyo, that was a close call! Let's keep the Balle Balle going!
Namaste! Swagatam to our grand Diwali celebration. Prepare for an evening of Balle Balle with dhol and fireworks. Jai Ho, let's make this a night to remember! Don't forget to try the Masala Chai and samosas. Arre bapu!
However if you speak Hindi Option 1 will work better.
It's not a good voice but I there is a bit of some accent here. (and some other samples near that one in the channel.) If you keep trying you can get a good one on the Discord Bark for sure #🐶┃bark-beta message
Possible to use Bark in the DM bot? Looks like it only does chirp?
yeah for now just chirp i think, we'll try to add bark soon as well
Sounds good. Is there a way to generate chirp songs that don’t have music? Just the voices. Thank you
not in a hardcoded way, but in the new chirp version (1-2 weeks) you can add descriptions, so you can try things like male vocals only
Awesome, looking forward to it.
regarding the notes/disclaimers being added to lyrics; instead of telling suno to include no notes or whatever it’s system prompt is, say “minimize any prose” and that will prevent any notes
i have had success with just putting that at the end of my lyric prompts but it shouldn’t be necessary
In the new version try "A cappella" as the genre. Though that does give you a certain style of music, which may not be exactly what you want. Also FWIW 'source seperation' AI models are really good now, they can just take in a song and remove everything but the lyrics. Heck, just playing music while you leave NVIDIA noise-reduction set to max basically does this in real time even though it's not even designed for that.
how do add the genre in the prompt? any practical way please?
Coming in the next version but you can still have some control indirectly, like this: #💬┃general-chat message
I am writing to bring to your attention some concerns I have regarding the Nano-GPT base model currently being utilized in Bark. Specifically, I have noticed that the model occasionally generates new text that is not present in the initial prompt. While I understand that this may be due to the model's attempt to generate new text as generative models meant to do, I believe that it could potentially lead to inaccuracies and inconsistencies in the output.
I would like to suggest exploring alternative models that prioritize instruction following while maintaining efficient memory usage. One such option could be the Neo-GPT architecture base model, which has been shown to excel in these areas. In particular, the LaMini-Neo-1.3B model has demonstrated impressive performance in following instructions while keeping memory usage minimal.
I would appreciate any consideration you can give to my proposal.
Thank You For Taking Your Time To Read This
/beta chrip v1
would love to see this happening for bark-beta as well #📢┃announcements message
Coming soon 😉
Thanks, @strong fox. It's in the works!
If you search for 'hallucination' in this Discord you can find some tips to avoid that. You can minimize somewhat by choice of voice, matching voice to the text style, and some things to avoid in your text prompt. You can also avoid it with 'brute force' by just running Whisper on the output, checking it, and trying again if it doesn't match.
Being more transparent in general would help
Does anyone know if there is a current way to nudge, or prompt Chirp into singing your lyrics in a specific style of genre only, and to prevent it from randomizing / switching between male and female vocals? Much of what I am hearing sounds like it is still in early Alpha, or distorted or sometimes even warbled song output. Chirp is pretty impressive and amazing for what it is though.
Suno Chirp BotBOT
—
Today at 6:13 AM
Suno Chirp Bot is thinking...
Stuck in that state when I just recently tried to regenerate song lyrics with only four lines or words.
Sry some engineering things on fire today but we’ll look as soon as we can. So far no one else reported these issues, weird
Thank you
Is that an error and it should have been 500 credits for $4? Because otherwise the price switches from $1 per 100 to $8 per 100 ... that would be a bit extreme. Especially when a second free account would free up $20 worth of credits (or a second $10 account would free up $80).
Yes - thanks
whoops
Updated. Thanks (and sorry about that), @prime idol!
Ahhh - cool, thanks! 🙂
DMed
First thing comes to mind, is fix how it cuts off the end still. But overall, very nice (well only testing on vocaloid sound so far) 🙂
you guys absolutely killed it with this update! very impressed, this is the beginning of something huge, congrats to the whole team.
quick question: Any chance of a plan with unlimited generations or a much higher number? I've been with you guys since July 10th in this server and currently sitting at 1168 generations. (600/month roughly)
I like to experiment especially now that we have choices for style
Another issue, if being lazy and using the chatgpt, is that somehow the style is in the request. ie. i put hawaiian ukelele as the style, and chatgpt: about Mariamne who loves to make tote bags - i get Mariamne who lives in hawaii and plays ukelele and makes tote bags 🤣 Could of course use chatgpt separately, but defeats the object (maybe it excludes known styles ie. reggae which didn't make a song about reggae lol)
loll wait what??
iirc the genre should be independent from what chatgpt does...
gonna check
worked out nice on this one though (it seemed to put my styles in the lyrics hehe)
#🐣┃suno-showcase message
could be cool if we could have option to push the successful chirps from DMs to some public showcase channel
Awe... When one enters a prompt that's too long, it's kinda tragic to lose it. Especially when it was composed in the prompt window. It sure would be nice if the prompt was included in the rejection message.
...
Another thought would be a "remix" button. This would be especially handy on continuation clips. I often have to backtrack to the place I continued from. It's not too complicated. But, the option to simply edit the prompt would be super convenient.
...
And, one more question.
What's best practice for continuations?
When some of the previous lyrics are included, it sometimes seems to try and continue from the correct spot... Maybe?
But, carefully prompting with the lyrics after the end-point seems to be more reliable.
Just curious what's recommended.
Yes seems to be the tricky part, unless it happened to close at the right time, it's not easily continued (as well as cutting off mid word/sentence)
I have actually had pretty great luck, trying to just start with the final letters of the cut-off word.
what I've found out to work the best is to experiment on the track speed on first verse. if the first gens struggle to fit under 30s, latter verses won't work well either.. and for problematic parts you could gen the verse in two chunks. like if you have 4 lines, you just gen 2 on one go
oh interesting ill have to try that!
to give example: #🐣┃chirp-beta-4 message
uploaded that from my DM gens
you can see how I mixed and matched short and long gens
Our best workaround for this one is if you have a long one you're iterating on for a while, do it in notes or docs and paste it in (sorry thats a bit clunky)
definitely work out the lyrics out of the chirp dialog 😄
Yep. I just forget sometimes.
I like iterating with chatgpt, since you can tell it to change individual verses and chorus etc to have elements you like, or omit things you don't like
Yeah this should work ok. Also agree with tips on using first verse to calibrate. In general we find that 6 lines or so is pretty safe for 30s. With 8 lines (some of our GPT generations) it sometimes cuts off. We are gonna try to tweak our GPT prompt so it doesnt cut off as frequently
I'm guessing that there's some internal association between three lyrics and time code? If that's the case, it would seem ideal, if one could overlap three lyrics where it cut off and Chirp would automatically figure out the overlap. In that scenario, one could paste the same lyrics and it would automatically continue from where it left off. I feel like it's trying to do that. Just some thoughts.
my biggest issues haven't been things getting cut off at 30s but under it
I can estimate when things get past 30s so I just avoid long verses then
...
Also, I just noticed that the 'continue' button now allows editing of the text. Awesome sauce!
but sometimes it just cuts too early way earlier
one thing I'd like to suggest here for quick QoL: do bring the previous genre to the dialog automatically when you continue
now you have to retype the genre every time
it's fine if you want to tweak it in the middle of the song, but it would help if it would at least autofill the same thing you used on the original gen
yeah we'll look at that, good idea. right now if you leave it blank that should generally work ok too
but genre switching in the middle might work still, right?
yep
Can we exetend the 30 second limit because this song was dropping beats at the last 2 seconds ;w; https://cdn.discordapp.com/attachments/1128313332273782784/1148761616771469362/chirp.mp4
Try the continue option for now 😄
hmm but it prompts me to write new lyrics?
do i just put the same thing?
You should put in lyrics where you want it to start -- where the prev one ends use to ... line
ummm did u guys see what i sent? this was a "-" lyric request from chirp and it orta sounds like english but its ai
er, try again?
It's generative so yeah it is possible
I've been having great fun getting it to nail really cool outros. It seems to like my weird punctuation for endings
....
...
..
!
And other oddities. Sometimes lots kinda of this can indicate a dubstep type buildup, too.
So cool!
Try again. Maybe give it a couple lines of intro lyrics to get the music started on the right track?
https://cdn.discordapp.com/attachments/1128313332273782784/1148821600561545336/chirp.mp4 it didnt generate lyrics
also would like to see game music (particulary 16 bit games iirc it only has 8 bit) as a style you can use
You can also try stuff like na na na or la la la at the end of prompts — can tee up some interesting stuff. Check out the "non-lexical vocables" (what a mouth full) at the bottom of this doc:
https://suno-ai.notion.site/Tips-Tricks-Chirp-v1-969852c74c644c6b8262ec5c5be2325c#cda107f566264f0ba029f9429fa95219
Hm you should just be able to click Submit
Same
We’re looking into this!
You won’t be charged credits for timeouts or failed/stuck generations, but I’m sorry for the inconvenience here nonetheless
Try DM the bot -- it worked for me that way
It failed for me a couple of times at the beginning -- retrying works. I'd suspect that the backend gets too busy, or the queue gets confused. Unfortunately we'd probably have to wait until EST morning to fix this X.x
Where do i DM the bot?
@tame latch
When I press /chirp it does not work
@ionic edge
What do I type?
Is it just me or was the old version more creative? Also why when I started to pay this is not working, should I unsubscribe?
The audio is better in this new one but somehow it lacks any soul
The old version inferred the genre from the lyrics. You can get just as expressivity in this version but its' more dependent on what you put in the style field
Try odd things. I like the lyris to happy birthday. Anything CAN work.
If I can try, I have been unsuccessful now for some time to generate anything
Also you can try repeating works. Like country country country pop for stronger country ratio, or instrumentrs.
Kind of like using a strength in Stable Diffusion
Yes, just hope it can start to work consistantly
for now it is useless, can sit and wait and hope it works
Also, remember you can continue other people's songs too. So browse the channel. Look for something that is more what you like. Then either copy their style prompt, or just literally hit the arrow button on their song and continue it yourself.
yes, I have generated a lot of songs
With the old version
But since yesterday, when it was updated it has been a struggle
I subscribed to pro yesterday and after that I have had issues
Not sure if it is related
I wanted to support Suno but not sure if I was too quick
So one thing about this version, using random genre with lyrics isn't the same as using the old version - inferring the genre from the lyrics - as far as I know. Not quite sure about that, will ask tomorrow maybe.
A new feature request might be a mode that is a random like the old version, a 'whatever music genre these lyrics imply' instead. So comparing random is difficult. I believe the current random is more like picking random genres instead.
ok, are the issues generating songs being addressed?
I think of all Suno is asleep, so it might be 4 or 5 hours before technical issues are resolved.
ok, let's hope it get fixed, there is a lot of potential for sure
The generated images are beautiful now as well
hello when i type /chirp i get a message interaction failed, do you know where this is coming from?
Hello. The new version came out yesterday and I think it's very cool. But I thought the other languages would also work now, such as my main language, German. With English texts everything works perfectly and the difference in quality compared to before is extremely cool, but unfortunately with German texts you usually hear nothing or, unfortunately, only a few tones, and when something does come up it only starts at second 26 or so, where you only Has 4 seconds.
Here are a few examples, but almost every request in German came up with something like this:
I tried that too, that's the wrong bot. You can find the right one by clicking on #🐣┃chirp-beta-1 and then clicking up on it, that's how it worked for me too, and there are two different ones.
So take the one that already creates songs in the channel.
To be precise, the @sick smelt bit doesn't work in the DMs, you need the @storm vale
Ok
Thx
hi, i got 2 generations stuck and i can't generate anything while it's stuck. the first one's been stuck for hours, the other for 10 mins
Same problem for me
Is there a way to cancel stuck generations? It currently won't let me generate anything
Just delete ‘random’ and leave the field empty to get same behavior as last version where it uses only lyrics to infer genre
Sorry, should reset after I believe 5mins
We’ll we up and running a couple of hours (us timezone) and check what’s going on
It's messed up for me currently. The beta won't let me chirp
Is the new version working the same otherwise but the audio is better? I think the old version had some quite creative sound but the quality was bad
This version is way better for me
I think the singing in the old had more matching to the notes?
You can definitely get just as good out of the new version. My guess is the default parameters may also get optimized and tweaked a bit over the next few days too, whereas the original model had already gone through that.
for instance this example on my youtube
https://www.youtube.com/watch?v=FowKdgiX264
Lyrics by Tom Kopra, AI rap and vocals by Suno
Feel free to like and subscribe :)
I hope so.. would be cool. The only issue I had with the old was the lyrics and music quality
I've been meaning to test the new version a bit, want me to try with one of your prompts? What was the sort of thing you were going for and couldn't hit on the new version.
I did not have too much possibility to test with the new version yet because of the issues
I will test later when the issues are sorted out
Cool, feel free to msg me later if you can't seem to get the same kind of output.
sure, I will give feedback 🙂
This is the coolest product in AI at the moment together with Stable diffusion, ClipDrop and ideogram.ai
Actually even moore cool because I am unable to make music otherwise, graphics I can do in other ways as well 😄
I hadn't see ideogram looks interesting
good for generating images with texts
I think that they should add whether to add the speaker as either a male or female
In the prompt
You can put male and female, man or woman, into the Style field. In fact you can overload, try female vocals, female, female, woman, woman even
It's not totally reliable but it does influence it
Oh if you mean Bark, that's just a voice thing for now.
Thanks man!
Nah I meant Suno
So I would put "Male Vocals, Rock"
Yeah. If that doesn't work you can keep adding Male vocals, man singing, Rock man man man Rock etc.
It's not a selector, it's just a text field for the AI... so weird things can work.
Thanks
You could also try deep voice or other descriptions that imply a male voice, maybe.
Sorry all, fixing some of the stuff getting stuck. Should be all set now for new chirps.
Sorry about that, should be back up now. Make sure to use the @storm vale
Sorry about that @spark minnow , should be up and running now. Thanks for being an early supporter
Try putting in the language name, it helps for foreign languages, like German Rock
super interesting, no fundamental reason new model should be different from a creative perspective. my guess is it's better so maybe a tad more 'on the rails' by default, but this is an lllm-style model so there are temperature parameters etc to get it to act more crazy and creative 🙂 would love some feedback if you can narrow things down in any way, like certain genres sounding flat or so? honestly learn a ton about music here myself!!
some of my prompt requests have been in "generating" state for hours but nothing happens, i just subscribe to the pro version, do they cost my credits if the generation fail?
just got home so testing, might have been the issues in the beginning but I will test. The complete package is very refined now at least with the AI generated images etc. good job!
nope, sorry, no credits if stuck or "Errored", just rerun them. Sorry about that!
that's a relief, thank you!
Suggestion: It would be very cool if you could continue generating after the 30 seconds, so that he continues from the same point, not with new lyrics, but if he didn't finish with the old ones (which is usually the case)
It's usually the case that the song, the Lysics, only starts at 20 seconds and then you hardly have anything of it, at least in my language
I think it's good, only maybe a bit hard to make it derail once it has started
About the temperature etc. are you going to allow this to be modified? Or is it already possible somehow?
I don't know about future plans but not in current version, but it is the first day of an entirely new model that just added 50 languages and genre control, give them a minute lol
Where is the language list? (found it)
https://suno-ai.notion.site/Supported-Languages-16550b00a3f04ee6bab541d135eaf713
I believe I mentioned this, but just in case, if you repeat style descriptions country country country or any word, it can often sort of work like doing (country:1.3) in Automatic1111 Stable Diffusion. Not the same as temperature but try stacking a ton of things and it get plenty weird, like high temperature
Thanks for the tips
Have to try a Finnish and Swedish song next 😄
Also, consider stacking contradictory genres as well, this: #💬┃general-chat message was prompted with major minor fast slow ascending descending electronic acoustic for example.
Doing those sorts of things can get the output into plenty of strange places, without adjusting temperature.
I wouldn't try it and blow a bunch of credits because the success rate is so low, but as an extreme example you can even prompt quotes that are often found in EDM music. Like the most famous lines in the Apollo recordings for example. And get a clip that has the sound of the space recording with a bit of the static, etc. Basically clips you hear in a trance song repeated over and over, like famous movie clips. Just an example of what kind of crazy stuff you can try in the style field. Happy Birthday lyrics work surprisingly well.
I had to find a database the most common clips to find any that worked passed the famous Apollo lines, by checking https://www.whosampled.com/ to find the most common ones. The clips were not particular cool sounding, beyond the fact that a 'style' like that could sort of work. However:
Happy Birthday to You Happy Birthday to You Happy Birthday Dear (name) Happy Birthday to You.
As a 'style' was pretty interesting. Don't use ChatGPT for lyrics if you do this, it gets confused though.
you should be able to do continue, and just use the lyrics from wherever it left off, it should magically just work
definitely on roadmap ya. for now we wanted to keep it simple to avoid too many bugs or confusion but definitely would be cool to expose the right knobs here. something like 'chaos' i believe is what midjourney does
Yes, I have to say the multilingual part works well. Just took a whatsapp message from my mum in swedish and made it into an EDM song 😄
haha awesome
Pretty much perfect swedish
yay! 🙂 it's still a bit hit or miss sometimes but going in the right direction. hard to test this stuff in anything other than german for me though haha
I speak Swedish, Finnish and English.. Finnish was a bit worse than swedish when I tested
but still ok
Actually, most of the time I want to continue a generation, especially when I like a song on Chirp, the lyrics sung don't make sense or relate to the lyrics I wrote. Is there any way to improve this?
hmm weird that shouldn't happen.. do you have an example?
Yes, using Brazilian Portuguese lyrics
Part 1 - Everything is fine. I just wish there was a way to shorten the instrumental from the beginning, we waste a lot of time, as the limit is only 30 seconds, even in Pro.
Part 2 - The vocal is not directly related to the lyrics at the beginning, but in this example part of the lyrics is sung at the end, but at the beginning it seems to be singing in another language, despite maintaining the style.
You have to keep the lyrics in each part short enough that they have time to be sung in each part
then you continue with the next part
so basically in your example you should only verse one in the lyrics first... then make a continuation with only verse 2 etc.
if the last part of the lyics is cut of, it might be able to finish it in the beginning of next part, sometime I use ... in the beginning to make up for this.
I understand, but the problem is not the length of the lyrics, but the lack of continuation. He sings lyrics that don't exist in either of the 2 videos/lyrics.
yes, this can also happen, you just need to retry in this case?
it happens because the last word was cut of in the first part
so it start to hallucinate
so If you keep the lyrics in the first part shorter so it has time to finish
I understand, thanks!
But another problem is that the instrumental intro is very long, so even if I make the lyrics shorter, sometimes it will cut the same.
you can try to re-run when you have shorter lyrics a couple of times I think the EDM style is more sensitive to long intro?
maybe you can add in the style short intro
Maybe, it'll be cool when we have some sort of control over that or the average speed of the lyrics.
you can add slow paced vocals or BMP etc?
or fast paced vocals
I think no one has a good answer yet since this is new
It actually worked better, shortening the length of the lyrics and it seems like he picked up where he left off. Very good
It works?
I believe we have to do a lot of experimentation, so right?
yes 🙂
but in general keep the lyrics as short as possible so you dont hit the 30 secs
but EDM is tricky
it wants to go off on tangents
because it is dance music
In future will need a way to control voices
yes and to switch voices in the middle from male to female etc
Yeah
so english behaves a bit different than other languages. for english giving it short prompt (~2 verses) is best and easiest. for other languages and long form generation maybe other ways work better. maybe try giving it ALL of the lyrics and then just keep doing continuations where you manually trim away things that have already been sung
this is so much fun 😄
Is that a bug? Too much fun? 😄
a feature
made a swedish rock anthem for my daughter and her gang
the gibberish is also really good
is it possible to invest in this company?
Subscribe and buy more credits? 😄
Hello! I have some suggestions in mind, I don't know if any of them are possible to be made, I just think that they would be great selling points. It would be amazing to see them added in the future!
•Being able to set BPM's accurately
•Being able to make your own vocal models and being able to save random vocal types to have a constant creation
•Being able to either create an intro and outro for the song or instead of forward, also being able to go backwards to create the first part of the song
•Being able to make acapellas (I don't know if this is possible at the moment, it could be)
acapella is definitely possible now
intro is also possible
Some times it only generates music with none of the lyrics. It's fine when it's only for one of the two choices, but it even happens to both (for example when I generated a song in "dark epic rock" style it made them both music only with no text)
Hm sorry yeah sometimes that weirdly happens. It might think of it as a long intro or something. Will see if we can improve that
I noticed whenever i continue lyrics the vocals get more and more robotic and hard to hear
hm yeah this is basically just 'drift' where the model doesn't copy perfectly....
sometimes a bit worse than other times unfortunately. luckily this will get better by itself as the model gets larger in the next version
aah cant wait then
i figured its just the ai being weird because it reminds me of how GPT used to be where if you kept it going with stories it ends up drifting away and repeating itself
yeah that's exactly it
is the solution really just "jam more information at it"?
in our tests smaller models have the issue a lot, so in a way will get fixed automatically
because ik once gpt became well.. chatgpt it doesnt do that much anymore
most things require a lot more engineering but for this particular one... yeah kinda...
mm ic
you could hack your way around it by always prompting it with the original or something to reduce drift, maybe we'll play with that, but also don't wanna make too many hacks here since teeeechnically the model already knows what it should do
the videos play but there is no audio
idk sorry this must be a you issue. does youtube work in the same browser?
Hi, I can't here, it says there is a bug.
same here
Sounds like a discord issue. Not a suno issue. Try downloading em
Hey btws what kinda music is in the training data? Royalty free music? Liscensed? Both? And what sample rate? If so i can see if i can provide some stuff the ai can train on. Notably midi and 16 bit/chiptune music as it falls short on that a little
Opera too but i havent tried it enough (i ran out of creds)
Yea opera seems to be either just a person talking in a telephone for some reason or a bunch of voice cracks
Def could generate better opera pieces 😄 Let me get one for you 🙂
Oo
I suppose classical as the style is the key
Romantic Opera also works :D. #😍┃v1-songs message
try using "soprano, tenor, classical" or similar
super weird, must be a discord bug?
This is all a great try but is the beginning.
I really like it for a general overall feel.
However, It sounds always like it adds music tones random than relevant and usually out of the vocal tone... More suitable for single no-tone beats/rhythms and just "lyrics".(but is more like for words/rap) The musical layers are too chaotic compared with the more solid drum beats. Bass also seems unconnected with the beats in a way like the vocals but not always. However, if it had tones or bar numbers, part numbers, or anything related to actual music and not just text and random results would be better. The sound quality is low sounds like 22khz but this did not bother me as I was not expecting more to be honest. At this point, the tool matters first and we will get to the quality eventually. I have tried every single AI "music" creation tool and I come to the conclusion that most of them are at the same level of development or the same spirit of evolution. Not to offend anyone I am just trying to make constructive suggestions. It is still early I didn't try too much to have the best results here yet but I will. However, because I tried lots I may have some faster concussions though. I like it in general. Is like promising. I guess at this time point, that is interesting for me and cool for anyone who does not care much about music and details and likes to create for posting lyrics with whatever "music" and rhythm elements below like a sample. In the near future when AI will be at a level with models to mimic specific composers, with crazy quality of mimicking vocalists and singers (after the level of human voices that pass now and so on..), in various tones and styles on a much deeper level, it will come with its own style of sound anyway.
Great Try great ideas Keep it up👍🏻 ❤️
Would be great to include the number of credits we have left on our account after each output, so we know when we are getting close to the limit
My suggestion is to add a continue feature to bark, same as there is with chirp
Ya we'll look into this, for now you can do /info
yeah we're gonna do some work there soon too
suggestion: remove credit system
Hey this is such an awesome program! I've been here a bit and I love making songs my friends and family laugh at. Ive brought many smiles with my creations due to you guys. Are you able to implement an unlimited plan? I will fly through the allotted amount. I went through 25 generations in a day but the plan only offers me four times the amount so that less than four days at my going rate.
you will have to wait a month to get credits back. before this new update it was all free no credits. i want that back.
Thanks for your support and great suggestion -- we are definitely working on different plans. Stay tuned.
I noticed that the model suddenly stopped converting polish language 😦
hm, shouln't be the case..
A few request features where possible 1) lyrics are progressively shown/highlighted as the singer/rapper says the lyrics e.g. similar to how splash music pro does it which is very cool. That is better than simply showing the complete lyrics at one go 2) ability to download audio file only.
I feel like when you change the theme or genre of the music it just becomes inaudible
oh, and I think there needs to be a way to ensure that it says all the lyrics before the video ends
/chirp
Go to the #🐣┃chirp-beta-1 channel to use the /chirp command, and check out the #🐣┃how-to-chirp channel for step-by-step instructions for making music using Chirp on Discord.
can you fix the credits. maybe it can repletish every day not month. cuz everyone has 25.
Considering the current system of credits renewing monthly, we've observed that some users, including ourselves, frequently reach the limit of 25 credits within a relatively short period. The monthly reset may, therefore, not be as beneficial or fair to all participants.
To enhance user experience, promote regular engagement, and provide a more equitable allocation, we propose a revision in the credit replenishment system. Instead of a monthly replenishment, adopting a daily reset could potentially cater to active users more effectively, ensuring consistent user engagement without the risk of exhausting credits prematurely.
We believe this modification could greatly improve overall satisfaction and participation rates in our user community. We hope you will consider our suggestion.
Thank you very much for your time and understanding.
i smell a chatgpt request 😂
think about it plz
Bot is a little stuck. Be back up in 5 minutes. Just rerun any generations that got stuck sorry about that
any way for user to add custom language. if yes how to do it?
So I have to start with saying... I am so impressed by this. Your team is doing amazing things 💪 Will definitely be subscribing today and in it for the long haul. That being said, not sure if this has been suggested yet, but it'd be cool for the output to include time signature/tempo/key/etc.
thanks, do you mean analysis of that was generated or being able to control these during generation?
Honestly, both would be ideal. I just got here and will be doing a lot of testing, but you would know way better than me... Maybe with the proper wording the latter of what you suggested can be implemented already
yeah i think some of it the model will already get but maybe not super reliable. like bmp might be ish-right but not perfect or something
as for the analysis we could add something but feels like it could be done with separate models? feels like a pretty pro use case for now. but maybe cool for making existing things searchable or something?
any chance we could get the chirp bot echoing submissions and the bot posting the output to be different usernames?
that would let us block seeing other people's submissions without also blocking the output
Agreed
best workflow is probably DM @storm vale if you want to see only your stuff
Another suggestion that would be neat down the road would be to implement a reference song to help guide the result
perhaps the ability to define stems? possibly with multi-step generation
generate a motif, melody, etc.
then have the user input a song structure like intro-verse-chorus-verse-chorus-outro that uses the given motif and melody
ya definitely something we are thinking about
yeah super interesting. right now the context window is still a bit limited but plans are to scale that up. then longer form song structure should work much better
you guys should renew credits everyday not month. i am so tired of that
yeah, frankly an AI that can generate catchy motifs or melodies alone is already very nice; though maybe that's my bias
Another suggestion for down the road, implementing stereo/surround options
yeah looking into stereo soon, would be cool
Yea I think so too
Also, setting limits on how many chirps a user can produce at a time. That may already be a thing, haven't tested it yet (didn't want to over exhaust the system)
yeah i think it's 2 for free and 4 for pro. don't worry it autoscales, and neeeeever fails 😂
read pls
or this
don't worry, they have heard that idea, both ideas actually. i was bugging the suno folks about this day 1. give it time!
🤯 yay!
Adjusting the setting for people that boost the server so that users don't get confused and think that the tag color implies that they are mod/staff
TIL
oh my gosh a mod no way🤯
jkjk
Go to the #🐣┃chirp-beta-1 channel to use the /chirp command, and check out the #🐣┃how-to-chirp channel for step-by-step instructions for making music using Chirp on Discord.
Go to the #🐶┃bark-beta channel to use the /bark command, and check out our Suno Discord Commands guide for step-by-step instructions for using /bark on Discord:
probably already known but fading out is annoying