I think Neuro could do a lot with even a little bit of low-cost raw visual context, even if it's something that humans might consider fairly useless. It'd be less for identifying new things, and more for recognizing familiar things.
There are two requirements I believe apply here.
First requirement: The visual data will be read as text, so it should be one-dimensional. Even 5x5 2D array probably wouldn't make much sense.
Second requirement: It should be 'textually stable'. Neuro will often see similar scenes, and slight adjustments shouldn't make the text completely change.
Third requirement: Neuro should be able to make out identifying features of her friends. Like her father, who is a tiny green smudge.
So, what to do here?
Well first, just because it's 1D vision doesn't mean it has to be a single axis. It could be two or three axes. Left-to-right, top-to-bottom, and possibly in-to-out as well.
This could be chopped up into say, 16 columns and 9 rows. Then each column and row could be blended into a single colour, which second requirement suggests approximating to one of the main colours present in the image. Here's what we might see:
P: Peach #FFE5B4
Left-to-right: P P P P P P P P P P P P P P P P
Top-to-bottom: P P P P P P P P P
Alright, first problem: Turns out, Neuro's in a peach-coloured room and the sections of the image getting blurred together are so large that everything is room colour. Her skin and clothes don't do much to disrupt it since they're almost the same colour.
Let's try again.
(Cont. in next post)

you could probably also just run florence or some smaller vision llm that gives a summary of what is seen and put that into the context (wouldn't be surprised if thats how her vision module already works anyway
)