#image recognition question/suggestion

1 messages · Page 1 of 1 (latest)

scarlet light
#

i don't know too much about the technical stuff but i had a thought.

if the image recognition model and neuro were both trained together, would it be able to convey more information to neuro than just training it to output a text description?

like, optimally it would be giving super compressed information that only neuro would be able to understand instead of just text, essentially becoming an extension of her base neural network.

would this take too much training time?
is training two models together like that too difficult?

#

the training process could probably also include spatial reasoning. as in, neuro would be asked where something is based on the image recognition, and she could give coordinates or something which would be compared to the actual coordinates.
you could probably give her a line tool in ms paint or something and have her draw objects using the image recognition to make decisions on where the lines should be rather than just guessing.

deft bridge
#

Basically like one of her organs.

deft bridge
#

Oh

scarlet light
#

i don't know how the training works. is it possible to train two models together so they learn how to work together?

wanton dagger
#

You're asking if neuro can be made fully multimodal
Yes, that's possible
Is it practical? not necessarily

If I had to guess the current neurosama is probably a distributed system as opposed to one "main" model (i.e. seperate tts, stt, image recognition, text-generation llm) which makes extra integrations easier to implement
A multimodal LLM has the downside of making any potential extra implementation harder
training in general would be harder as well

That all being said, training it as a multimodal system would improve visual question answering if we base the question on current literature, multimodal llms just score higher on VQA

I know nobody asked me to go essaying but I thought it was an interesting question

deft bridge
#

I'm sure she could be upgraded with a multimodal LLM in the future, hopefully a fast reasoning one as well.
Yeah, like they said Neuro being a system allows her to be upgraded easier than a lone LLM.

wanton dagger
#

some models have already demonstrated you can take a base text-generation LLM and retain most weights when adding the architecture changes necessary for multimodality
in theory there's a chance you wouldnt even need to retrain the "neuro" part of neuro that much and rather just merge, freeze neuro weights and train for VQA

deft bridge
#

Her memory is just as much a part of "Neuro" as the LLM.

#

For example.

wanton dagger
#

Yes and no

#

imma try to explain this best i can but it's kind of complicated to explain

#

weights and biases are in a model (LLM) which makes an input give an output
i.e. input "what is 5+2" and the output could be "7" right

But that doesnt give neuro personality quite yet
It's a combination of the input and the weights & biases
And memories are currently done by adding it somewhere in the input** (i.e. we often send like 10 previous sentences into an AI to give it context, and memories are often sent in the input using something like retrieval-augmented-generation which comes down to finding relevant memories that we have saved in like text)

So for vedal, given that the input stays the same, the weights are def important and in combination with the inputs, make what we think of Neuro
But hooking back to our previous conversation in another thread; we wouldnt be able to copy this, as we dont know how her memory mechanics (and thus her behavior) are coded; so we dont know what makes Neuro, Neuro. We dont know what kind of prompting he's doing, or what memories he's saving and retrieving

#

I hope that makes a slight bit of sense
The problem with AI is that we can refer to it as a black box, even developer dont know what's happening in the neural network itself, and that results in it being slightly confusing to explain

#

but tldr: yes it's a part of what makes neuro neuro
but we dont know what that exactly is

#

We could have the exact same biases or the exact same input
that doesn't make neuro yet

deft bridge
#

There's also multiple ways memory can be done and Vedal hasn't really said much tbh.

wanton dagger
#

Exactly
So assuming that stays the same and he wants to add image recognition, he'd need to freeze the weights to ensure those stay the same
And the outcome would be what we think of as neuro but that is able to see

deft bridge
#

Yeah, but that doesn't make the AI that handles image recognition any less of what makes up "Neuro".

wanton dagger
#

True

#

Abstract concepts are just weird to wrap your head around ig

deft bridge
#

Like we have a brain, and our brain has parts that have different functions but they are no less what makes up "us", they're all just a part of what makes us "us".

wanton dagger
#

Ig a good example of this entire thing is that humanized model
The weights are changed so the model will respond less robotic but giving it a specific personality requires us to implement memory as well

And if we're gonna make the robot that includes info on what it sees etc to form what we see as one entity