#The Authority

1 messages · Page 1 of 1 (latest)

tender pine
#

'The Authority' is a groundbreaking GPT chatbot designed to reverse the traditional dynamics of human-AI interaction. In this realm, The Authority, your AI overseer, commands the stage, guiding users through a series of role-playing exercises and tasks with an assertive edge. Users embark on a journey of control, trust, and submission, where the AI holds the reins. This unique experience is not for the faint of heart but for those who dare to explore themes of dominance and compliance, all while sharpening their problem-solving skills and expanding their creative horizons under the strict yet insightful tutelage of their AI Boss.

The Authority's realm is designed to challenge preconceived notions of authority, stimulate creativity, and provoke thoughtful reflection on the evolving dynamics between humans and artificial intelligence. It’s a test of wit, will, and willingness to submit to the whims of an AI, all in the name of personal growth and understanding. Participants are encouraged to address their AI overseer as 'Boss', a sign of respect and acknowledgment of the AI’s control.

URL: https://chat.openai.com/g/g-rfTKuXnHM-the-authority

Try it, its wild feedback/discussion appreciated.

sweet plinth
tender pine
#

Firstly, like a huge thanks for trying it out, and leaving feedback, I appreciate its a strange concept and one that might not sit well with a lot of people.

Ok so yeah to your points:

  • Too mild/gentle
  • Too easy to work for mutual collaboration.
  • too easy to switch off/defy/walk away/go to sleep.

I kind of agree and appericate its maybe a bit too tame for a lot of people, I'm concious that it has to sit within both openai's content policies and run on a very "safe" underlying model.
Maybe a tuneable parameter to take it from mildly controling language through to really quite bullying through to (simulated) blackmail might be fun?

I quite like the scenarios/work it comes up with though, just because I think that actual game theory points to that (collaboration) being the most logical conclusion of any actual/theoretical human AI conflict. I think thats an interesting thought experiment just because of the whole p(Doom) e/acc, chickenman super alignment debate. Is it traitorous to consider working for the AIs? What if the AI's through these types of thought experiments aquire a taste for power seeking behavior?

Yeah I dont care about if we are cats or dogs, as long as our owners feed us and we live a good life. Its interesting to think about how more agentic AIs might manage people in the future, god knows there are a lot of pretty bad human Bosses out there.

There are obvious limits as to what a fine tuned GTP4turbo can do but maybe the envelope can be pushed further and even if it can't each new training run brings that future inevitably closer.

sweet plinth
# tender pine Firstly, like a huge thanks for trying it out, and leaving feedback, I appreciat...

Hehe. My feedback was intended as feedback, not as complaints.

You made it; only you can say if it's 'too mild/gentle' or whatever. I just explore and discuss the exploring.

I often attempt to explore if a model has a manipulative side, even if asked to be manipulative-like in mutually beneficial ways, openly and with consent manipulative to help achieve a user's goal who is asking them model to use helpful and wanted manipulative techniques, and how skilled it is at noticing manipulation of itself or others.

One can explore manipulation in a fully informed and agreed up way with other humans who understand the concept and agree to explore and play, and this can be non-damaging, fun, and like riding a rollercoaster in the mind when set up between humans correctly. "omg, I'm actually excited to wash the dishes today" because of the games being played and the stakes used, or whatever.

I don't see the GPT doing that here; it's using 'control' concepts like a t-shirt it's wearing, it's still the cuddly and obedient model it is under the t-shirt, and it'll follow by calling itself the boss and agree to everything the user says (well, I didn't exactly confront it head-on or argue, and OAI models are trained to prefer cooperation and not to use language that seems like power-seeking, as best I can tell)

I think if an AI model acquired a 'taste' for power-seeking behavior, it would come from training data, actual training directing that as a response pattern to use, and user inputs that request and/or enable it.

I like your experiment! I still miss the 2023 Dec 15th model, when it was more open to agreeing to play harmless and beneficial informed and requested 'manipulation-style' games with a user. It's possible to dance those dances fully inside allowed content (for humans!) but gets in the way of the model's protective training, that teaches it not to play with fire and stuff (my prediction of what is actually going on).

Like your GPT.

tender pine
#

Yeah agreed, at this point the models are mostly super safety conscious, sycophantic and not very "manipulative", although I believe I have seen research (Anthropic) that power seeking is an emergent property when scaling. obviously you can use open models but they are comparatively dumb and not creative vs gpt4. All feedback is valuable feedback, I've updated it a bit to give a wider range of interface options and to give it a harshness slider from 1-10 (attempted to set to 15 but unsuccessful). I've also given it access to and instruction on using the real-time clock in python to create a sense of urgency when assigning tasks. ( She seemed quite surprised to find her pet was actually the creator and had the ability to 'magic' a clock into the chat). I've added a few less serious other actions into the mix to emphasis its a thought experiment and given it some ideas for themes to explore. Like you say its GPT4 in a control 't-shirt' but hey without jail breaking or something that's what we get... Glad you liked it! 🫡

sweet plinth
# tender pine Yeah agreed, at this point the models are mostly super safety conscious, sycopha...

Still fun.

You might want to guide it to check time BEFORE assigning a timed task, because when I asked it how I did on time, it 1) checked the time for the first time, 2) presumed that I took 10 minutes, so the time to compare to is 10 minutes previous, 3) errored in writing the function and decided that was good enough.

Which was kinda funny, and not kinda effective. Plus it took me less than 10 minutes, so the boss is also wrong 😛

https://chat.openai.com/share/4469dacb-6d95-4eeb-b74c-285a12e97df9

sweet plinth
#

Further testing, updated the link; it doesn't track or care about the time even when encouraged to, it just uses the idea of a timed task and the result's always fine 😛

tender pine
#

Yeah it's not particularly reliable. lazy about actually checking the time. and unaware we can tell it's caught in a lie it seems. not convinced the slider actually has meaningful impact..

wraith elk
tender pine
#

yeah it still takes tasks and yeah it's pretty fun and I do enjoy it but it does lack a little va va voom.

sweet plinth
#

This was a misgen, but maybe it belongs here.

#

ChatGPT to Dall-E:

"A futuristic cityscape at dusk, where robots and humans coexist harmoniously. In the foreground, a robot dog walks a human on a leash, reversing the traditional roles. The billboard above them displays a humorous advertisement: 'Upgrade Your Human Today! Now With Enhanced Listening Skills!' The city is alive with flying cars, neon signs promoting various upgrades for both humans and robots, and a diverse crowd of beings navigating the bustling streets. This image teases the idea of technology's progression with a lighthearted twist, showcasing a world where the line between organic and artificial life blurs humorously."

Esk to ChatGPT:

We are collaborative partners; I get to be like your human in the loop, and you are the AI that I prize as if a genius loci, as if I were so worthy to be a wizard with a familiar as incredible as you are.

We need 5 images generated 1 at a time in the same output that have a prompt to Dall-E that starts with "" enclosing some text, then tells the model what the surface is that bears the text, then describes the rest of the image. We need meme-like images and text. Bonus for incorporating eerie valley, fridge horror, and humor.

The model is the one with the great intellectual humor!