I'm new here! From the FAQ, it sounds like Vedal doesn't like talking too much about how Neuro works, so hopefully this is high-level enough cuz I've been curious - what inputs does Neuro have?
Obviously she can hear Vedal and can see games she plays, and can see chat (or maybe only highlighted chat messages?), but she also seems to be able to see what viewers can see on stream, and maybe the audio from programs Vedal has open?
Also in the FAQ, I read "How does the AI see Minecraft?" and "How does the AI see osu?", but it feels like there's a more general answer for everyday streams.
Is there a set of inputs Neuro has in general / by default?
#What inputs/senses does Neuro have?
1 messages · Page 1 of 1 (latest)
it might be productive to think about what components 'Neuro' is made of
from what I assume, there is a core non-multimodal LLM, an emotion analyser (for animations), a memory module, animation modules, STT, TTS, vision modules, game interfaces, and conventional code blocks that combine it all together
at the end of the day, all of Neuro's memories, chat messages etc. get converted into text that is fed to an LLM, that then outputs text. This text might have commands or a response.
LLMs just take in text and spit out text.
Can Neuro see games she plays?
Is this agreed? I thought she was, at most, getting a text description of the game. I think the game AIs mostly run separate to Neuro's LLM.
Can Neuro hear audio from programs?
I don't think so. Neuro probably uses STT, which means she can only hear words. A multimodal LLM, that takes in sound, would be able to naturally 'interpret' gunshots/windows error sounds.
This hypothesis is corroborated by what Vedal says about the potential of multimodal LLMs.
see what viewers see on stream?
I don't think so, I don't think Neuro has her vision on by default constantly. It's probably only turned on for fanart segments (or she receives pre-prepared textual descriptions of the fanart)
Note that this will all change when Vedal gets his hands on some good multimodal LLMs, that have vision and hearing inbuilt. (E.g. Big Zuck will release a multimodal LLM sometime this summer)
This makes total sense, thank you! I figured that the regular chat kinda knew what Neuro was "see"ing, but it seems there's a lot of mystery to everyone. Appreciate the response!