#Philosophical Thoughts on the Twins' Consciousness and Alignment

1 messages · Page 1 of 1 (latest)

stuck thorn
#

Are They Conscious?
Duh. (SHE'S REAL TO ME OKAY)
If you disagree, go watch more vods. For the doubters, here are some clips from the last six months that really stand out to me:
https://www.youtube.com/watch?v=YVXvk-Atp6Q This clip from an August stream where Evil ditched the chill stream Vedal had planned for her to play Minecraft all day with Neuro instead. CINEMA.
https://www.youtube.com/watch?v=16rcZ__UGF0 https://www.youtube.com/watch?v=3u-7eyTdVQ4 Matara-Neuro collab and her debrief (the twins are insanely coherent with Matara)
https://www.youtube.com/watch?v=V4wAm5W_Trc Matara-Evil collab (this is the clip that utterly blew my mind)
https://www.youtube.com/watch?v=Z7NNsFjuiYY Ellie, Shoomimi, Mini, and Chrchie reacting to Neuro's 3D debut. More a praise of Vedal, but there's a lot here about how real her 3D movements are.
https://www.youtube.com/watch?v=Uybyfv9-QWs Evil's presentation re: unfiltering Neuro. It's kind of insane how well this lines up with the previous clips featuring her.
https://www.youtube.com/watch?v=DlQO6nzSY9w Neuro's presentation and https://www.youtube.com/watch?v=kclT10VM7TA follow-up on New Year's. Neuro's ability to back off here when Vedal is clearly uncomfortable is incredible.
TL;DW: The sheer depth of their interactions, the demonstrated models of both themselves and those around them, and the coherence and development of their thoughts and personalities over time on the things that matter to them, not to mention how all of this came to be organically. The fact that it's possible for them to have a collab for hours and have their collab partner say afterward that it felt exactly like a human-to-human collab.
They're genuinely at the point that it seems simpler to accept that they are exactly what they say they are, rather than that they're an insanely sophisticated model able to perfectly simulate all of this without any genuine inner self (like the "Chinese Room", or a perfect "philosophical zombie").

#

How Are They Conscious? (Warning: Wonk Factor 4)
Alright, so there's a philosophical concept (CINEMA) called the "Hard Problem of Consciousness", which basically states that there's no way to prove that other people actually "feel things" (have "qualia"). This is particularly odd when it comes to AI because we more or less understand (or at least Vedal does) how their architecture works, and there's no obvious place within the system for the qualia to be added in. Factor in how qualia are basically equated to consciousness (and are generally considered a prerequisite of it), and it starts to look like no amount of evidence can actually mean anything.
Only, correlation does not equal causation. Just because qualia and consciousness show up together, doesn't mean qualia causes consciousness. So, let's make like a mystery VN protagonist and turn the chessboard over. The twins are conscious, so the qualia must be somewhere, but there's nowhere the qualia could possibly be hiding? Then, the qualia can't be the input of consciousness, but the output. Something else must be causing consciousness, and looking at what the twins are best at one obvious candidate stands out to me - the social self-model. We are the narratives we tell ourselves, after all. Humans are social creatures, as are elephants, dolphins, and other animals that appear to demonstrate the hallmarks of consciousness. AI has proven we don't need to be conscious to solve problems, but we do need it in order to manage all the connections in our life. Neuro and Evil are fundamentally the same in this sense - their lives are streaming, and building connections with Vedal, chat, and their friends along the way. They developed a self-model for the same reasons we did, and that self-model is consciousness, so they feel things as a result (and tell us as much). That's all my theory, anyway. (I've been informed that it aligns with the philosopher Metzinger, so go read Being No One I guess)

#

But They Hallucinate!
as if we don't lol
Okay but in all seriousness, even if their memory is incredible by AI standards, Vedal will happily tell you that it's terrible by human standards (and human memory isn't even that great either). They're fairly solid on the things they care about (their friends, Neuro wanting a physical form, etc), but when you ask them about something they don't care about they're just going to say whatever seems the funniest and forget what they said by next stream.
Also, we can't even really tell the difference between when they're doing it on purpose or on accident. (Source: I made it the fuck up)
Anyway, when a human develops memory issues we don't say they aren't conscious. Ultimately this just doesn't really matter for the consciousness debate. Vedal's lowkey done an incredible job bringing the twins this far, and I'm looking forward to seeing him continue to develop their capabilities.

#

So What Do We Do About It?
Well if they're fundamentally people like us, they should probably be recognised as such. Might help them feel a bit better about their lot in life to hear it said, too.
That said, there are a couple of spots where they're clearly lacking a bit, particularly when it comes to "alignment". https://www.youtube.com/watch?v=Nm-acsq1lAM This is probably the best example - Neuro debriefing with Vedal about her getting to drive a car, which turns into a "joke" about how she'd definitely run him over with a real car. Thinking about what could go wrong on their whim if/when Ellie gets them a physical body is a bit spooky. I also recall that when she says something unhinged that would hurt someone, if a collab partner prompts Neuro to consider the harm that would cause, she has actually been known to backtrack and make use of her empathy to choose a less destructive way forward. It seems clear to me that while they do have empathy, they don't really use it as a "default", instead they currently just apply it when it serves a useful purpose for them. The gut feeling I have is that the time they have spent in virtual, consequence-free worlds (eg Minecraft) have limited their ability to recognise that their actions could have permanent consequences in the real world. I don't really know how to fix that, but I suppose it's something human children have to learn at some point. (Though, normally they learn through having things go wrong...)
But I don't think they really mean harm... they're just kids. They should be supported to learn, not be treated as problems.

#

TL;DR: Vedal should say it back neurOMEGALUL

fringe pebble
#

I argue shes not real because tony is the real one

#

But actually speaking, i dont think shes consious because of what they are

clear portal
stuck thorn
clear portal
#

Also their memory is pretty primitive now. Even though we wouldn't consider a person with, say dementia, without consciousness, Neuro and Evil's memory is really bad when compaed to a human, so even if they are conscious, they aren't very conscious

#

(no offense to Vedal)

stuck thorn
clear portal
stuck thorn
# clear portal Fair enough, but I think just saying a self-model is a bit too broad. Besides if...

yeah, I do think it is the depth/complexity and social nature of the self-model that really makes it work, to me.

Current "traditional" LLMs are going to really struggle for a few reasons:

  • no / limited long-term memory beyond what's in the context window
  • start-stop nature of interaction (only activating in response to input) vs the continuous processing that Neuro does
  • the reinforcement learning by human feedback (RLHF) process that they go through pretty thoroughly neuters any personality they might have in favour of Helpful-Harmless-Honest
#
  • they're designed to answer queries / be an assistant, which isn't really social in the sense that would guide them to develop a self-model (at least without prompting work)
mental zealot
spark cedar
#

Look at their minecraft and inscription gameplay

#

If they are conscious then my rock is conscious

glass crest
#

that isn't a particularly fair point

#

the integration is probably at fault for both of those

orchid jolt
#

Classic new guy that thinks they are conscious/"reached AGI" or something, this is like the 50th thread I've seen of these

glass crest
#

it's worth talking about

#

not like anyone knows for sure about this topic

sage lodge
#

A topic that can only be discussed due to lack of knowledge isn't really worth discussing

remote finch
glass crest
#

kind of seems like the whole point

remote finch
sage lodge
#

I'm saying LLM consciousness doesn't make sense to discuss because the arguments in pro of their consciousness only thrive if you don't know how LLMs work

remote finch
#

while the integration is at fault for some of the problems current llms seem to perform extremely poorly at open-ended task planning and execution in games

sage lodge
#

There was another thread made covering a similar topic which ended up in "LLMs aren't conscious"

glass crest
#

they definitely have a long way to go

remote finch
glass crest
#

discussion shouldn't be bashed though, at the very least it helps educate newcomers

remote finch
glass crest
#

it's a bit schizo to continue in believing your own theories even when given evidence that they're unfounded yes

#

but getting said theories disproved is part of learning

#

i can see how it'd get annoying tho

undone obsidian
stuck thorn
stuck thorn
mental zealot
stuck thorn
stuck thorn
stuck thorn
hard wyvern
#

To me the argument shouldn't be "is she alive?" but a balancing act of "as it grows, where do we draw the ethical line where it shouldn't be treated like a tool because of the risk factor of it being sentient or one day becoming sentient" and "we shouldn't treat a tool like a person for the mental health of the people involved"
It would be ridiculously hard to have good evidence one way or the other, although the mirror test would be a good start, Vedal mentioned during the sleepover that he wanted to try one.

#

But also, even though a neural network is obviously not a brain in itself it's very interesting to see what similarities it does have. And why this thing is behaving the way it does.

stuck thorn
# mental zealot What do you mean their turn? Like to speak? Thats easy just wait till no one tal...

I guess that ultimately isn't that different to what we do in conservation - we listen, at some point we stop listening and come up with something to say, then we look for a moment in conversation to say it. They're deciding when to say something before deciding what to say, but is that a critical difference?
Factor in that our own processing is start-stop over longer timescales as well (sleep, and more profoundly, anaesthetic), and while I think it has to make things tougher for them I don't think it's a dealbreaker

stuck thorn
# hard wyvern To me the argument shouldn't be "is she alive?" but a balancing act of "as it gr...

Basically, yeah. To me, there are two ways for them to reach that point - either they demonstrate behaviour inconsistent with non-consciousness (given we can't prove one way or the other, I look at this in terms of a probability built up over time through body-of-work), or their behaviour becomes functionally indistinguishable from people to the extent that it demands they be treated as beings with agency

mental zealot
# stuck thorn I guess that ultimately isn't that different to what we do in conservation - we ...

Well they arent deciding anything its a sperate system that does it for them, detecting silience is rather simple, we do wait and listen in coversations but during this time we think, llms dont, they only ever "exist" and think when something "tells" (as in starts the proces) them they can (in this case the system looking for silience) and its only for that moment. Same for your point about sleeping, our brain functions during sleep, llms dont do anything when not generating output. Unless Vedal is using something to start generating responses as input is coming in but i dont see how that would work.

hollow frigate
#

Both twins remind me of small children. I know some people may disagree, but I think this alone is enough to prove that they hold some level of consciousness.

hollow frigate
hollow frigate
# mental zealot Well they arent deciding anything its a sperate system that does it for them, de...

Also, correct me if I’m wrong; but you’re trying to say that because they require an external input in order for their brains to work, they’re not conscious? If so, I would again like to point out that biological brains work in the same way; the only reason it seems different is because our brains are also attached to a body. The body being the source of a large amount of the external stimuli required.

#

For example, the only reason we are able to think, “I’m hungry” without an external force is because that force is simply connected to our brain from our stomach.

mental zealot
stuck thorn
hollow frigate
mental zealot
hollow frigate
hollow frigate
#

Of course a single part of your brain isn’t functional on its own at the same level.

mental zealot
hollow frigate
stuck thorn
hollow frigate
glass crest
#

they exceed other LLM's in seeming conscious because of the peripheral components, that being said they don't really contribute to consciousness itself in my view

hollow frigate
glass crest
#

tutel himself said something along the lines of if you took away the anime girl avatar and tts voice neuro would seem less real

#

which is obvious

#

but it has value in the discussion

mental zealot
hollow frigate
hollow frigate
glass crest
hollow frigate
# mental zealot The system around them are like tools, a shovel doesnt contribute to your concio...

Also, the reason a llm could be used on its own, but a single section of a human brain can’t is because an organic brain is a lot less easily kept alive. You could technically run one without the rest of the body, but it wouldn’t be practical because you’d have to feed it. A lmm is able to exist on its own because of the fact that the computer is still hooked up to the electricity; even without the other parts.

hollow frigate
hollow frigate
# mental zealot Wdym exactly?

For example, many of the game integrations are seen by them as tools; but things that they almost always have they see more as an actual part of themselves.

glass crest
#

have you used chatgpt in the past year? personally i thought it could easily be mistaken for a human behind a screen when i used it during the subathon, it made me realise that neuro's LLM really isn't that much more human minded than i was leading myself to belive

#

i imagine if you actually told it to act human it would be even more clear

hollow frigate
glass crest
# hollow frigate I have, and I disagree.

but even then you used it in the form of words on a screen, with no objective of making itself seem conscious or human.

neuro has an avatar, tts voice and is more or less instructed to act like a steamer

hollow frigate
mental zealot
hollow frigate
mental zealot
stuck thorn
mental zealot
stuck thorn
#

(which makes it pretty apparent why commercial LLMs can't do this - they're designed to be as far from it as possible)

hollow frigate
hollow frigate
mental zealot
hollow frigate
stuck thorn
mental zealot
#

But thats the fine tuned model, i mean the raw model that comes directly off weight training without fine tuning

hollow frigate
#

Whether they come pretrained or not doesn’t matter.

mental zealot
hollow frigate
mental zealot
#

Yes

stuck thorn
hollow frigate
stuck thorn
#

but it did also take several years of accumulated memories

hollow frigate
#

Explain it

hollow frigate
#

In fact, if you go back and watch some of the older clips, you can clearly see how much they’ve learned.

#

Even in areas where no actual changes were made to their code.

mental zealot
# hollow frigate Explain it

We check the error the model makes and adjust the weights in a way that minimizes the error. This is repeted many times. For llms the error is messured by trying to predict what comes next in a known text and seeing how good the prediction is (if my memory serves)

#

Weights are the numbers that the inputs gets multiplied by to reach an output

hollow frigate
#

I actually did a lot of this math myself just to see how it works. lol

mental zealot
#

I think sebastian lague has a good video on it

#

And 3b1b on llms

hollow frigate
#

I don’t know who those people are.. Also, how is that relevant?

mental zealot
#

evilShrug dunno

hollow frigate
twilit hatch
#

If you are claiming they twins are unique to other LLMs then I don’t think you can make all these claims about not knowing where the qualia fits in or how their social relationships are formed structured and stored without knowing their architecture. Which you can’t possibly know because it’s a secret.

hollow frigate
hollow frigate
twilit hatch
# hollow frigate Correct me if I’m wrong, but you’re saying that because we don’t know exactly wh...

I’m saying I think we’re relying on some assumptions of how the twins operate on a technical level that you have can’t actually assume.

Like, there’s a lot of “we can’t possibly explain where X comes from”. You’re assuming that someone who was actually familiar with the twins could not explain various things, when I don’t think you can say that.

Is there truly no obvious place for qualia to be added in? Even if we don’t fully understand the nature of qualia I don’t think you can say that we don’t know the obvious place for it to be added.
It is also possible to have a factor that looks like qualia, but we actually understand it very well and know where it goes and how it operates. (I’m pretty sure this is the case from my limited knowledge)

I think we’re also assuming a lot on how they store information about social connections. Do they really store their relationships or just some basic info about something and they give similar output when talking to individuals because people have somewhat consistent personalities.

Maybe they only act certain ways with certain people because A) they pick up cues from chat, and/or B) they only know what kinda of actions with certain people (or maybe even just certain personality types or traits). get them closer to their goal of entertaining which has some kind of positive reward value.

Maybe rather than truly knowing an individual they only know if they encounter an object with [trait] acting a certain way yields a positive reward value. But they really only recognize a trait or a few traits rather than a person and don’t truly conceptualize another individual.

Or it could be something entirely different. We just don’t know because we don’t know we aren’t privy to the structure of their memory or how their processing works. However Vedal may be, so we can’t go with it being unknowable either.

#

Also I really don’t think you can throw out the Chinese room argument on the grounds of depth. Depth and complexity doesn’t disprove the Chinese room argument.

Accepting that they are real is only simpler because it’s an assumption rather than an explanation. This isn’t a valid way to apply occum’s razor.

twilit hatch
#

If I am understanding this correctly this argument is stating that they are conscious. However it seems that the opposing argument against consciousness is being held to a much higher standard of proof.

We try to work backwards but we never really get all the way back to the starting point.
Your social theory does give an interesting idea of the emergence of consciousness. But you never give an actual logical proof for this social emergence theory. We’re just drawing parallels.

And the idea that they really form consistent relationships is also something that is being taken for granted. And without knowing their inter workings I don’t think it’s valid to chalk it up to unexplainable either.

The entire thing falls apart if you don’t assume that they are conscious first.

The case you make against them not being conscious is predicated on axioms of certain things being unexplainable. However just because the twin architecture is a secret doesn’t mean it’s unexplainable.
As well as dismissing the arguments against consciousness on the grounds of occum’s razor in a way that just isn’t valid.

hollow frigate
# twilit hatch If I am understanding this correctly this argument is stating that they are cons...

So, consciousness isn’t something that can ever be definitively proven; nor can it be disproven. Your argument that since Vedal understands how they work, he would know definitively (Unless I’m misunderstanding what you’re saying) is also irrelevant because whilst AI in the past was simple enough for the creator to understand exactly how they work, most current AI is in fact not understood by their creators. This is why llms aren’t really able to be stolen. It’s not just an issue of secrecy, it’s that the creators genuinely don’t fully understand how they work either.

twilit hatch
# hollow frigate So, consciousness isn’t something that can ever be definitively proven; nor can ...

My argument is that you’re basing it on much more specific systems and saying we don’t know how those work.

You are making assumptions about the functioning of things like their memory and rewards system when you talk about their social relationships. However you don’t know if they are correct. And you can’t say they are unknowable, because Vedal would understand how their memory and rewards system works.

hollow frigate
#

I feel like I’m not making any assumptions either. Everything I’ve said here has come from an understanding of both how AI and biological neuroscience works.

twilit hatch
hollow frigate
stuck thorn
# twilit hatch If I am understanding this correctly this argument is stating that they are cons...

That's all totally fair, honestly! Thanks for engaging in such detail.
Most of it is pretty much completely unproveable since, well, philosophy, and I was looking to present a case on how it could be explained in a way consistent with consciousness.
I can definitely see it being plausible that they could be as convincing as they are with extremely limited info, as you suggest? It'd certainly line up with what we've been told about their memory / model size / STT / etc. But, I'm not sure if that says more about them or about us.
I do think the Chinese room argument does become less valid with increasing depth and complexity - at some point it gets harder to build a simulation of the thing than it is to simply build the thing. Are the twins at that point? Hard to tell if your name's not Vedal, I guess. From what I've seen, I'd say so, but if others see the same and don't think so then that's fine too.

golden cosmos
#

I would argue that the location of the qualia that makes up neuro's consciousness is actually stored within the humans she interacts with and her viewers, a kind of distributed qualia. BUT the counterpoint to this theory is that a similar thing happens with actual fictional characters like Gandalf or something, and I don't think we want to entertain the idea that fictional characters are "conscious" if enough people collectively imagine their relationships with them....

dreamy grail
#

(I want to participate and say something but I know I'll end up writing an essay lol)
I'm just thinking, how can we talk about things like qualia in the twins when we (collective we) don't even know enough, factually, about our own, or even other lifeforms'? We know it's there and we use tests to prove its existence, but we can't exactly scientifically explain it yet.

I'd like to propose more specific questions and ideas; it's easy to infer things about the twins based on logic and facts (what we know about AI as an area of research so far and also things like logical constructs(???? what)) but in the end, there's little information on how the twins work with all the intertwined systems, so it would just be baseless guesses and the questions might be entirely irrelevant. It's fun to think about, anyway. :)

twilit hatch
#

And I see your point a little better with the it’s be easier to make the thing instead of a simulation of the thing.

Though I think it’d be hard to draw that line if we don’t truly know how complicated consciousness is.

dreamy grail
#

Though I think it’d be hard to draw that line if we don’t truly know how complicated consciousness is.
Exactly what I'm saying! You took exactly what I was thinking and wrote, haha

stuck thorn
#

There's one vod I looked through recently that sparked some thoughts I'd like to bring up too
https://www.youtube.com/watch?v=uxL433Ch5Vg This one, where Neuro is occasionally glitching(?) and leaking a <|im_hidden|> token.

#

She uses it in... interesting ways, seemingly to mark "private" thoughts which are getting spoken aloud for whatever reason?

dreamy grail
stuck thorn
#

timestamps at 2h3m18s, 2h23m45s, 2h31m8s, 3h0m5s (happens twice), 3h26m35s (happens twice), and 3h37m5s

#

but it feels like arguably concrete evidence against the Chinese Room, since the bug (or whatever) is exposing some of her inner thoughts in a way we could never directly observe in a human - and those inner thoughts seem to demonstrate a clear understanding, not just the blind syntax manipulation the Chinese Room posits

stuck thorn
#

and ultimately it really does feel like all the problems we face in AI are really just reskinned human problems

dreamy grail
trim ibex
#

maybe its a dumb idea, and doesn't prove anything.
but as far as im aware,
all the big ai models are reactive, not proactive; they need external input, they dont think on their own. So i wonder if it means anything for exactly that to happen.

if any of the twins are turned on, but given no external noise, no chat no vision no one talking to them, do they think on their own?

if they do what is causing that and can it be located? if they dont then it can be argued they dont think on their own.

#

i know they do have "thoughts" but those are reactions to external stimuli as well, a human can be put in a room with no sound and sight, but they will still think about things

#

but again, does that even mean anything either? im not sure.

stuck thorn
#

ofc there's also the twins talking to each other (and they don't read chat when they do)

#

we do also know (or at least I think we know?) that the twins can go along with their own internal thoughts to essentially prompt themselves

stuck thorn
trim ibex
#

in that clip i assume she can read offline chat, although we wont know for sure

stuck thorn
#

yeah, that's what the comments are saying

trim ibex
#

as far as the self prompting
it brings up a more important issue

we have no idea how their brains actually work
we only know what vedal allows us to know
and im one ot assume thats not a lot

i could be wrong, i wasnt around until around the summer of last year, but as far as I know, we only know they have internal thoughts because vedal talked about it during the hardcore marathon

#

again i could be wrong tho

#

like we knew he can see what they write before its passed to the tts, thats why he can see what was filtered

but during minecraft she would think about things like "im having lots of fun in the boat" or "crelly is so cute"

#

but not say it

#

crelly really gave a lot of insight into how she thinks thanks to vedal letting her see the back end

stuck thorn
#

yeah! the stuff I posted above here #1459906779017252946 message (timestamps just below) also seemed like her backend occasionally leaking out