#I would love to be able to split a

1 messages · Page 1 of 1 (latest)

upbeat terrace
#

Have you tried selecting a section (whole sentence) and regenerating whole sentence? I think results are much better than regenerating a single word.

outer ivy
#

From my perspective, it's about flexibility:

Sometimes I realize that a sentence needs re-written due to trouble with how it sounds (and that's not always due to the voice generation. It could just be badly written).

If that's a single sentence in a long paragraph, I need to regenerate the FULL thing if I change that one single sentence, even if I love all the other output.

So I see two options:

Cut the paragraph where the sentence is and regenerate the sentence and what follows or what was before (basically allow us to isolate regenerations).

Or you can allow isolated sentence level changes that don't trigger regenerations for full paragraphs.

mossy prism
#

Yeah I'm mainly struggling with consistancy of the speed, the loudness and the emotion. So when you regenerate a whole sentence in the middle of larger paragraph it has no flow. I keep regenerating until it sounds close enough then make further changes (mostly to speed) when I edit the final track.

I've experimented with splitting up everything into one sentence or rendering the whole thing in one but but it's still all over the place from one sentanc to the next so I need to regenerate many times to get it to match the previous paragraph so it flows like one person reading it at the same sitting ( I am replicating my spoken voice that appears alongside other 'live' recording so it needs to sound like me).

I have no idea how people manage to do full audiobooks with consistency.

outer ivy
#

One thing I've noticed, Karen, and you may already do this but I'll post it in case not:

The quality of the original voice is critical. I spend probably 30 minutes and go through 30+ voices in voice design before I choose one. Then I further test it in text-to-speech before selecting it for production.

It's very time consuming but saves a lot of time on production.

mossy prism
#

So i've created a professional voice clone as I am working on a HUGE project and for me to sit and 'live' record the audio is soooo time consuming (recording and editing), the voice clone also means someone else can work on the project other than me. I recorded 3+ hours of me speaking in the way I want to clone (ie slow, meditative, relaxing chill)

So this makes for an interesting process as I know what I need to sound like and it's very obvious when it's off. 😂 Add into the mix that I'm Scottish and that the AI systems generally struggle with accents 😉