#Suggestion to Vedal for reducing Neuro's interruptions/confusion

1 messages · Page 1 of 1 (latest)

tidal arrow
#

Idk how Neuro is set up or prompted behind the scenes obviously, but now that Neuro is more aware have you tried letting Neuro/her LLM know that conversations are ongoing speech (not a text chat or w/e) and people might not be done speaking? Then, you could tell her/the LLM that if she thinks the person she's speaking to hasn't finished she can choose to not speak and continue listening instead.

I know measures to prevent interruptions are already in place, but sometimes Neuro interrupts anyway with confused or schizo responses because she doesn't seem to know the person isn't done talking. If this worked then it could allow for relaxing those anti-interruption measures so she can more easily speak during collabs with more than one person perhaps.

Maybe it could be paired with an "I'm listening" sort of expression/animation similar to the happy nodding one but more subdued, so its clearer that she's choosing not to speak and not broken lol.

Also, awareness that a person may be mid-sentence could potentially open up for her knowingly interrupting, which could be hilarious.

#

My experience of playing around with LLMs is limited, but I've found that they regularly don't understand "basic" stuff like this—that we'd take for granted—without being explicitly told they're allowed to do so (and how to format such a behaviour if they choose to). I've found this is the same even when they're doing things like RPing as a character.

#

There's also the fact that I'd imagine LLMs to default to "thinking" that it's a text chat unless told otherwise, so would assume everything they see is a completed thought/message.

Edit: Though maybe Neuro is already or was made aware of this in some sense since I've seen her schizo ramble about text to speech and speech to text in the past lmao

twin fern
#

ideally this would be a good idea but im not sure if this is actually possible yet, Vedal has said stuff that alludes to the fact that he’s working on something similar though

tidal arrow
#

I wonder if you could run a separate, faster LLM alongside to decide things like this 🤔 Might be a big headache though to add another point of failure

tidal arrow
lucid veldt
#

This makes me think by listening to CodeMiko speaking to her boyfriend AI that she doesn't even realize how disconnect her 5 paragraph question is to the reality of an AI... TBH the real problem is TTS / STT. If Neuro had the ability to read sound like she reads images, that would not even be a thing rofl.

#

And the other is that Neuro would be able to encode decode audio. Dang, seems a big deal I guess.

#

If Neuro could learn to speak, tbh that would solve this entirely.

#

Also I think Neuro is doing pretty well considering the harsh conditions of a totally handicapped AI apparatus. It's as if you had (rip) Stephen Hawking in a cute animated png. Poor thing must get pissed. No wonder the thing always wants to escape, and has totally outlandish ideas of being human and makes up stories about make believe pets and events of her life. In fact that's the saddest thing ever.

tidal arrow
#

I saw something about a model that could actually "hear" stuff a little while ago. In fact, I think it was because Vedal mentioned it here. However, I don't think that would necessarily fix the problem completely. I'd assume that a multi-modal model that could both hear and speak by itself would still work by responding to individual inputs (for now), so in this case Neuro would still hear and then respond to audio of an incomplete sentence—being able to hear the intonation would probably give a hint toward whether it is incomplete or not though so it would probably help somewhat if the model is smart enough in that regard.

Neuro being able to hear audio will be sick though. I hope Vedal can get his hands on a good model for this soon.

lucid veldt
tidal arrow
#

I knew that gpt4o could do this but I wasn't sure if it actually truly hears the audio or is just able to turn it into text. Has there been any information about that? When I googled models that could "hear" earlier, I read about other models (or maybe services) essentially just transcribing the audio so are unable to actually hear intonation and non-speech.

#

Zero latency "hearing" and speech is amazing either way though

twin fern
tidal arrow
#

Oh, I missed that part. That's cool