#Neuro Sama v2 voice

1 messages · Page 1 of 1 (latest)

dapper cosmos
#

I just want to share my thought on neuro sama v2 voice (that is still is in development).
First I want to say that the new voice bank is a great improvement because it add a lot of emotions and sounds at Neuro's TTS.
But the only problem I see in the new voice is the tone and the lack of Neuro mannerism (like the KEKWA, carrot carrot, wink, the no~, the wot?).
For me the tone doesn't feels right, too much "valley girl" , and also it's not fitting Neuro personality and avatar.
Anny described perfectly what Neuro v2 voice lacks on in term of the tone.
People have to distinguish the progress in term of expression (emotions, sounds ect) and the tone of the voice.
The tone is what give a "personality"/"uniqueness" to the voice, the expressions in the v2 is great with the anger, the pauses ect.
The tone in v1 is more fitting for me (it maybe nostalgy talking), the perfect way would be to combine the tone of the v1 and the way of expressions of the v2 (i'm not an expert I don't know if this possible).

astral basin
#

She does say KEKWA, listen to it.

dapper cosmos
astral basin
#

He did fix the Wink too by simply deleting the * entirely

#

Whats missing is the Heart

dapper cosmos
astral basin
#

That'd be an easy fix though, same as the Kekwa, just telling her to say Carrot instead of Circumflex- I think the tone is just something you need to get used to. Its not BAD, its just NEW.

spiral sentinel
#

Yes basically It's missing the Neuro charm to it, other than the TTS quirks. It does have it's moments with emotion but most of the time it sounds "dull". I feel like the trade off of replacing the voice (At this stage) isn't worth losing the Neuro charm because what what we're mostly getting is new funny noises and maybe 10% of good emotion performance.

dapper cosmos
astral basin
#

I just want her to move forward and improve, becoming more humanlike rather than staying as a monotone robot- Its not perfect, but it'd be a first step forward. And i feel like we're never going to get anywhere if we dont take a step forward, even if it isnt perfect.
And i really don't get the argument that it doesnt fit her- Its a sassy little brat, just like Neuro. She's brattier and harsher than usual, but i feel thats a step in the right direction.

dapper cosmos
limpid burrow
#

Neuro needs to keep upgrading (just not too fast or too slow), its not like she's human, otherwise it will stay at a stale point
Also I really liked the "valley girl" part, its refreshing.... idk its like she hit some maturity (?)
Anyway, perhaps with the new model nothing of this matters

dapper cosmos
cinder loom
#

Its not that easy to just make the v2 have the tone of v1, if it was, vedal wouldve just done it

#

im pretty sure the new voice is trained on the v1 voice, but the problem is that the v2 uses a program to was designed to make things sound more human, so the with the monotone nature of v1, its can't be that easily translated by the program

#

Also, KEKW is totally pronounce as KEKWA without chat having to type it as KEKWA

astral basin
tough pewter
#

i think its fine we can tweak it bit by bit

#

this is a good start

#

besides we can look bat bit by bit at the evolution

#

i do think it needs a bit more cleaning up on the noise tho

#

i dunno how to explain it as in the peaks

prisma kiln
#

I think this is over-analyzing it. Personally, all that needs to be fixed is manually changing the pronunciation of certain words and names, as well as slowing down her speech. Honestly the reality is that you cannot keep the same mannerisms as the v1 voice, while adding emotion and expression. The overall tone needs to be changed in order for it to work. I think people need to remember that actual good and realistic text-to-speech is still very early technology. Especially whatever is available to the public.

flat kelp
#

my only critique is the lack of longer pauses on punctuation, specially in question marks

#

she does not sound like she's "thinking", sounds artificial (pun not intended)

astral basin
#

I noticed that, but that could probably be tweaked

#

Telling her to pause after a sentence is finished before starting another sentence

fringe plover
#

Tuuhe older voice is slower when talkibg so her heart, wink, bye goodbye, meow, etc that have unique intonations disappear in the new ones. Shes speak the same but her intonations is gone. While her new voice help with karaoke, showing certain emotions, faster talking transitions/flow, she lost the "soul" that makes her old catchphrases catchy

sick niche
#

tbh might be impossible to get all the v1 charm
though it looks to a bunch of the most common v1 quirks can be "restored" with a quick re.sub() to the neuro text output
v2 is basically competing on charms with v1
if v2 is able to show its own endearing quirks that should ease up people missing v1
once people are accustomed to v2, v1 could reappear as a points redeem on special events streams maybe??

proud belfry
#

forcing all the v1 quirks onto v2 doesn't feel right, letting v2 be its own thing seems more natural to go about

"soul" is so utterly subjective, and always brought up in comparison to another thing with a contrasting quality
I could say that v1 has none of the soul v2 readily exudes, and that it feels lifeless to me because it literally is a normal TTS voice people are used to hearing
however, that's actually a plus for some because they're already familiar with TTS (like the MLG voice and such) and so it comforts them

already miss v2. all her memey TTS-derived phrases combined won't make up for the moments where she's so visibly charged with feeling I'm forced to view her as someone closer to human

dapper cosmos
proud belfry
#

part of v1's tone comes from the monotone expression that characterizes all classic TTS, trying to replicate that aspect feels redundant and maybe even nerf the AI voice's strengths
there's vague qualities like being "cute", "slightly more composed" , "more spaced out dialogue" that Vedal could implement, though for me cuteness is the only one I really need
composure/formality is subjective and may risk affecting her emotiveness
slower speech just goes against what I like about v2

though, cute voices are as much about a specific resonance as they are about pitch
he might have to train in with more cute and funny stuff, if that's even possible

mental thicket
#

Hmm I have a nice idea
Evil neuro v1
Regular neuro v2
Let the two ai talk to each other and experiment what v2 would sound like in a collab

sick niche
#

sounds interesting