i don't know too much about the technical stuff but i had a thought.
if the image recognition model and neuro were both trained together, would it be able to convey more information to neuro than just training it to output a text description?
like, optimally it would be giving super compressed information that only neuro would be able to understand instead of just text, essentially becoming an extension of her base neural network.
would this take too much training time?
is training two models together like that too difficult?