#Hi, I'm new to the group and Eleven Labs. I'm part of a non profit organization that has a YouTube

1 messages · Page 1 of 1 (latest)

vital kayakBOT
#
AI-generated Answer for ibros_03183

Hello and welcome to ElevenLabs! It's great to hear that you're interested in translating your class teachings into Spanish. Here's a step-by-step guide to help you get started:

  1. Create an Account: The first step is to create an account on ElevenLabs.

  2. Use the VoiceLab: Once you have an account, you can start by cloning a voice using the VoiceLab. This will allow you to create a unique voice for your translations.

  3. Speech Synthesis: After setting up your voices, you can move on to Speech Synthesis. Here, you can generate audio outputs using the pre-made voices or the ones you've created or cloned.

  4. Dubbing: The next step is to use the Dubbing feature. This is an automatic solution for creating dubs in any language that the AI supports. It keeps both the original voice and tries to retain the same performance across languages.

  5. Projects: You can also use Projects, which is an end-to-end solution for creating voiceovers for long-form content such as longer articles or audiobooks with just a few clicks.

Remember, the AI models have been trained on a vast amount of audio and are suitable for a wide variety of tasks. They understand context, so if you write something in the style of a book, the AI will try to interpret how to perform a certain passage from the context of the writing itself.

For more detailed guidance, you can refer to the complete guides provided on the ElevenLabs website. They cover everything from creating an account to various tips and tricks for using the platform.

I hope this helps! If you have any more questions, feel free to ask.

plucky jasper
#

If I choose Creator account does it include API?

vital kayakBOT
#
AI-generated Answer for ibros_03183

Yes, the Creator account does include API access. However, the specific details about the total monthly character allotment, the number of custom voices simultaneously, and the quality of audio provided can be found on the subscription page. You can view your subscriptions by clicking on the profile picture in the upper right corner and selecting "Subscription".

Please note that custom voices and cloned voices have different meanings. Custom voices are all voices that are not automatically included with the accounts; this includes cloned voices. Cloning is only available on the Starter tier and above.

You can upgrade your subscription at any time, and any unused quota from your last subscription will roll over to the new plan. However, please remember that unused quota does not roll over from month to month; it is an allocated amount for that month only.

From the subscription page, you can also downgrade your subscription at any point in time if you would like. When downgrading, it won’t take effect until the current cycle ends, ensuring that you won’t lose any of the monthly quota before your month is up.

When generating content on our paid plans, you get commercial rights to use that content. If you are on the free plan, you can use the content non-commercially with attribution. If you use a cloned voice, it is necessary to obtain the appropriate permission to use them commercially, even on a paid plan.

For more information on payment methods, please refer to the Help Center.

plucky jasper
#

We have one person's voice in videos, is this called voice cloning?

vital kayakBOT
#
AI-generated Answer for ibros_03183

Yes, the process of creating a digital replica of a person's voice based on audio samples is referred to as voice cloning. There are two types of voice cloning offered by Eleven Labs: Instant Voice Cloning (IVC) and Professional Voice Cloning (PVC).

Instant Voice Cloning lets you clone voices with very short samples nearly instantaneously. On the other hand, Professional Voice Cloning allows you to train a hyper-realistic model of a voice by training a dedicated model on a large set of voice data to produce a model that’s indistinguishable from the original voice.

Since the custom models require fine-tuning and training, it will take a bit longer to train these Professional Voice Clones compared to the Instant Voice Clones. The current estimates for Professional Voice Cloning are approximately 3 hours for English and around 6 hours for multilingual.

Please note that for now, Eleven Labs only allows you to clone your own voice. You will be asked to go through a verification process before submitting your fine-tuning request.

plucky jasper
#

Can we train it with an existing video?

vital kayakBOT
#
AI-generated Answer for ibros_03183

Yes, you can train the AI with an existing video, but there are a few things to keep in mind. The AI will create a near-perfect clone of what it hears, including all the intricacies and characteristics of that voice, but also including any artifacts and unwanted audio present in the samples. This means that if your video has background noise, room reverb/echo, or any other type of unwanted sounds like music or multiple people speaking, the AI will try to replicate all of these elements in the clone as well.

Therefore, it's important to ensure that there's only a single speaking voice throughout the audio, as more than one speaker or excessive noise can confuse the AI. This confusion can result in the AI being unable to discern which voice to clone or misinterpreting what the voice actually sounds like because it is being masked by other sounds, leading to a less-than-optimal clone.

Also, make sure you have enough material to clone the voice properly. The bare minimum recommended is 30 minutes of audio, but for the optimal result and the most accurate clone, it's recommended to have closer to 3 hours of audio. You might be able to get away with less, but at that point, the quality of the resulting clone can't be guaranteed.

Lastly, it's best to use samples where the person is speaking the language that the Professional Voice Cloning (PVC) will mainly be used for. If you clone a voice speaking English and then want it to speak Spanish, it will very likely have an English accent when speaking Spanish. The application will reject your sample if it is recorded in an unsupported language. For now, you are only allowed to clone your own voice. You will be asked to go through a verification process before submitting your fine-tuning request.

plucky jasper
#

Is there a way to clean out background sounds before training?

vital kayakBOT
#
AI-generated Answer for ibros_03183

Yes, it's important to use clean audio samples for training to achieve the best results. The AI will try to replicate everything it hears in the audio, including any artifacts and unwanted audio present in the samples. If you upload low-quality samples with background noise, room reverb/echo, or any other type of unwanted sounds, the AI will try to replicate all of these elements in the clone as well.

To ensure the best quality, you should use high-quality recording equipment and ensure that the audio input doesn’t have any interference, like background music or noise. If necessary, you can use a noise remover to reduce any background noise.

If you're recording new audio, it's recommended to record in an acoustically-treated room to reduce unwanted echoes and background noises. You can make something temporary using a thick duvet or quilt to dampen the recording space.

If you're using existing audio, consider editing your audio beforehand if you’re aiming for a specific sound you want the AI to output. For instance, if you want a polished podcast-like output, pre-process your audio to match that quality.

Remember, the better the quality of the audio you feed into the AI, the better the voice clone will be.

plucky jasper
#

Many of the videos we need translated are in a classroom, with interaction with the audience. WIll this be a problem?

vital kayakBOT
#
AI-generated Answer for ibros_03183

The AI technology used in Eleven Labs is designed to handle multiple speakers. When you create your dub, you can either specify the number of speakers manually or let the AI automatically decide the number of speakers. Each speaker will be assigned a track, and each speaker will have clips on that track which represent when they’re speaking and when they are not.

However, it's important to note that the AI might sometimes pick up background noise and interpret it as a speaker. If this happens, you have the option to remove the track from the project.

In terms of interaction with the audience, as long as the voices are clear and distinguishable, the AI should be able to handle it. However, it's always a good idea to review the AI-generated transcription and translation to ensure accuracy.

plucky jasper
#

Once a voice is cloned can it be improved? As the voice is speaking a bit too fast.

vital kayakBOT
#
AI-generated Answer for ibros_03183

Yes, the speed of the voice can be adjusted. The AI will attempt to mimic everything it hears in the audio, including the speed of the person talking. If the person in the original audio speaks quickly, that is what the AI will try to replicate. However, it's important to note that the AI will try to replicate the performance of the voice you provide. If you talk in a slow, monotone voice without much emotion, that is what the AI will mimic. On the other hand, if you talk quickly with much emotion, that is what the AI will try to replicate.

It is crucial that the voice remains consistent throughout all the samples, not only in tone but also in performance. If there is too much variance, it might confuse the AI, leading to more varied output between generations. So, if you want the AI to speak slower, you should provide samples where the person is speaking slower.

plucky jasper
#

When using text to speech how can it show inflections in the voice?

vital kayakBOT
#
AI-generated Answer for ibros_03183

The AI model used in the Speech-to-Speech (STS) feature is designed to preserve the tone and delivery of the original voice. This includes the ability to capture and reproduce inflections, emotions, and even accents. The model is capable of accurately following the input speaking cadence, which means it can replicate the rhythm and pattern of the original speech.

In addition, the AI model understands context. If you write something in the style of a book, the AI will try to interpret how to perform a certain passage from the context of the writing itself. To achieve a wider emotive range, you can lower the stability slider, although this may sacrifice some degree of predictability.

Remember, the performance guidelines suggest varying your tone and pace naturally when reading. Emphasis, intonations, and emotions should align appropriately with the context of the text to create a realistic voice conversion.

plucky jasper
#

I'm using Creator version, but still limited with size of video files. Most of our videos are over 60 min. long. What do you suggest to be able to translate these YouTube videos?