I just want to share my thought on neuro sama v2 voice (that is still is in development).
First I want to say that the new voice bank is a great improvement because it add a lot of emotions and sounds at Neuro's TTS.
But the only problem I see in the new voice is the tone and the lack of Neuro mannerism (like the KEKWA, carrot carrot, wink, the no~, the wot?).
For me the tone doesn't feels right, too much "valley girl" , and also it's not fitting Neuro personality and avatar.
Anny described perfectly what Neuro v2 voice lacks on in term of the tone.
People have to distinguish the progress in term of expression (emotions, sounds ect) and the tone of the voice.
The tone is what give a "personality"/"uniqueness" to the voice, the expressions in the v2 is great with the anger, the pauses ect.
The tone in v1 is more fitting for me (it maybe nostalgy talking), the perfect way would be to combine the tone of the v1 and the way of expressions of the v2 (i'm not an expert I don't know if this possible).
#Neuro Sama v2 voice
1 messages · Page 1 of 1 (latest)
She does say KEKWA, listen to it.
No most of them were already written by chat, Neuro voice v1 can say Kekwa with just the KEKW.
He did fix the Wink too by simply deleting the * entirely
Whats missing is the Heart
Yeah and the carrot carrot for the ^^
That'd be an easy fix though, same as the Kekwa, just telling her to say Carrot instead of Circumflex- I think the tone is just something you need to get used to. Its not BAD, its just NEW.
Yes basically It's missing the Neuro charm to it, other than the TTS quirks. It does have it's moments with emotion but most of the time it sounds "dull". I feel like the trade off of replacing the voice (At this stage) isn't worth losing the Neuro charm because what what we're mostly getting is new funny noises and maybe 10% of good emotion performance.
It's not bad for me, I just think this is not very fitting for Neuro personality/avatar 
I just want her to move forward and improve, becoming more humanlike rather than staying as a monotone robot- Its not perfect, but it'd be a first step forward. And i feel like we're never going to get anywhere if we dont take a step forward, even if it isnt perfect.
And i really don't get the argument that it doesnt fit her- Its a sassy little brat, just like Neuro. She's brattier and harsher than usual, but i feel thats a step in the right direction.
That's why I asked if it was possible to combine the particularity that gives emotions/sounds of v2 with the tone of v1. So that the voice can keep improving with in the same time keeping a part of the old voice.
Neuro needs to keep upgrading (just not too fast or too slow), its not like she's human, otherwise it will stay at a stale point
Also I really liked the "valley girl" part, its refreshing.... idk its like she hit some maturity (?)
Anyway, perhaps with the new model nothing of this matters
The new model is almost the same that the old one, just with Anny style and a better rigging (Teru).
So no big change in terms of the appearance.
Its not that easy to just make the v2 have the tone of v1, if it was, vedal wouldve just done it
im pretty sure the new voice is trained on the v1 voice, but the problem is that the v2 uses a program to was designed to make things sound more human, so the with the monotone nature of v1, its can't be that easily translated by the program
Also, KEKW is totally pronounce as KEKWA without chat having to type it as KEKWA
This- Neuro's subtitles say KEKW but she says KEKWA
i think its fine we can tweak it bit by bit
this is a good start
besides we can look bat bit by bit at the evolution
i do think it needs a bit more cleaning up on the noise tho
i dunno how to explain it as in the peaks
I think this is over-analyzing it. Personally, all that needs to be fixed is manually changing the pronunciation of certain words and names, as well as slowing down her speech. Honestly the reality is that you cannot keep the same mannerisms as the v1 voice, while adding emotion and expression. The overall tone needs to be changed in order for it to work. I think people need to remember that actual good and realistic text-to-speech is still very early technology. Especially whatever is available to the public.
my only critique is the lack of longer pauses on punctuation, specially in question marks
she does not sound like she's "thinking", sounds artificial (pun not intended)
I noticed that, but that could probably be tweaked
Telling her to pause after a sentence is finished before starting another sentence
Well said
Tuuhe older voice is slower when talkibg so her heart, wink, bye goodbye, meow, etc that have unique intonations disappear in the new ones. Shes speak the same but her intonations is gone. While her new voice help with karaoke, showing certain emotions, faster talking transitions/flow, she lost the "soul" that makes her old catchphrases catchy
tbh might be impossible to get all the v1 charm
though it looks to a bunch of the most common v1 quirks can be "restored" with a quick re.sub() to the neuro text output
v2 is basically competing on charms with v1
if v2 is able to show its own endearing quirks that should ease up people missing v1
once people are accustomed to v2, v1 could reappear as a points redeem on special events streams maybe??
forcing all the v1 quirks onto v2 doesn't feel right, letting v2 be its own thing seems more natural to go about
"soul" is so utterly subjective, and always brought up in comparison to another thing with a contrasting quality
I could say that v1 has none of the soul v2 readily exudes, and that it feels lifeless to me because it literally is a normal TTS voice people are used to hearing
however, that's actually a plus for some because they're already familiar with TTS (like the MLG voice and such) and so it comforts them
already miss v2. all her memey TTS-derived phrases combined won't make up for the moments where she's so visibly charged with feeling I'm forced to view her as someone closer to human
I'm not talking about the v2 capacity of mimicking human sounds/emotions (Its great tbh)
I'm talking about the tone here, I find it more appealing in the v1 personally that's all 
part of v1's tone comes from the monotone expression that characterizes all classic TTS, trying to replicate that aspect feels redundant and maybe even nerf the AI voice's strengths
there's vague qualities like being "cute", "slightly more composed" , "more spaced out dialogue" that Vedal could implement, though for me cuteness is the only one I really need
composure/formality is subjective and may risk affecting her emotiveness
slower speech just goes against what I like about v2
though, cute voices are as much about a specific resonance as they are about pitch
he might have to train in with more cute and funny stuff, if that's even possible
Hmm I have a nice idea
Evil neuro v1
Regular neuro v2
Let the two ai talk to each other and experiment what v2 would sound like in a collab
sounds interesting