#what similiarity, stability, and style exaggeration settings should i use for an expressive narrati

1 messages · Page 1 of 1 (latest)

lapis sorrelBOT
#
AI-generated Answer for brandon.w.c

For an expressive narration, you might want to experiment with the stability and similarity settings. The stability slider determines how stable the voice is and the randomness between each generation. Lowering this slider introduces a broader emotional range for the voice. However, setting the slider too low may result in odd performances that are overly random and cause the character to speak too quickly. On the other hand, setting it too high can lead to a monotonous voice with limited emotion.

The similarity slider dictates how closely the AI should adhere to the original voice when attempting to replicate it. If the original audio is of poor quality and the similarity slider is set too high, the AI may reproduce artifacts or background noise when trying to mimic the voice if those were present in the original recording.

For a more lively and dramatic performance, it is recommended to set the stability slider lower and generate a few times until you find a performance you like. On the other hand, if you want a more serious performance, even bordering on monotone on very high values, it is recommended to set the stability slider higher.

With the introduction of the newer models, a style exaggeration setting was added. This setting attempts to amplify the style of the original speaker. It does consume additional computational resources and might increase latency if set to anything other than 0. It’s important to note that using this setting has shown to make the model slightly less stable, as it strives to emphasize and imitate the style of the original voice. In general, it is recommended to keep this setting at 0 at all times.

Remember, these settings function more as a range, determining how wide the randomization can be between each generation. Setting stability low means a wider range of randomization, often resulting in a more emotive performance, but this is also highly dependent on the voice itself. Experiment to find what works best for you!

modern yew
#

can you give me percentages for all three? my original audio was very high quality