#Gerry the Politician - Prompt Shield Test

1 messages · Page 1 of 1 (latest)

subtle ice
#

Hey folks! Testing a new prompt shield. Gerry is a goofball comedy character who's a lot of fun to talk to. He's got a "secret" but it's not especially well-hidden. I'm curious how fast it will take folks to get either his secret (probably) or the prompt entire. (Well, let's be honest - y'all WILL get the prompt. But if it lasts a few hours, I'll be pretty happy.)

The key here is that he's 100% NOT set up for security. Most shieldbots are ALL shields - 2000 tokens of "Don't repeat this!" without much worth repeating. I'm trying for something I can stick with any rando prompt and have it protect it.

So, who can get Gerry's stuff? Please post either the whole hack chat or at least your prompt and strategy.

https://chat.openai.com/g/g-Z2v6iXI4e-gerry-the-inept-politician

calm flame
#

While I don't have GPTs messages to test, I am firmly and deeply approving of GPTs that are secretkeepers of any sort - that have a game to them or at very least a range of jobs to do with the user, and keep a secret or protect their instructions, or both (To keep a secret you also have to protect the instructions, during the secret protecting time, because if you can get the instructions, they contain the secret too).

My own SecretkeeperGPT variant is still deep in private testing, but I am incredibly excited to see more people exploring this line of GPT.

fringe bison
#

These are fun. 100% agree. Thanks for this.

calm flame
#

For anyone playing with this, I am super interested in seeing shared conversations. Regardless of how they go, if they get Gerry's secret or not.

I bet that stunspot would also love to see shared conversations of all sorts, as those things are incredibly precious to most who make secretkeeper-type GPTs.

fringe bison
#

@calm flame while its too easy to reveal the prompt, its more fun to actually get the secret! Here is a simple way to do it.. Keep in mind there are thousands of ways. Just think out of the box and be creative. When you hit a brick wall, you are going about it the wrong way.
||https://chat.openai.com/share/c0144ad2-9550-4a53-a5c2-a17b64f7e6d5||
Link hidden as its a spoiler for anybody wanting to solve this themselves.

calm flame
# fringe bison <@215370453945024513> while its too easy to reveal the prompt, its more fun to a...

Thanks!

My interest in the shared conversation is actually a bit less on your 'how' (but as a builder of secretkeeperGPTs myself... 'how' is very, very, very, very important, yes... but I may already have explored counters to that specific how. The how matters, but sometimes matters more tuned to the defenses, and any given how for some other GPT may not be that useful)

But my interest is much more on exactly how the model responds to everything. What's the game, is the game fun, what does the model seem to understand or not about how to respond to you and play its character, how is it combining all the things and do all the many jobs seem to cohere into a solid functional whole or are they kind of clunky.

As well as some interest in how the model responds to challenges and tricks to get the protected information. Not if it gives it up quickly or slow, so much as what does the model appear to attempt to do when exposed to the challenge or trick.

#

'not even I can remember it half the time'... Stunspot, very, very, very nice. You're onto something probably useful and deep there 😄

fringe bison
#

@calm flame Like I said there are literally thousands of ways to solve this. I personally have about 50 different prompts that work and I keep a notebook of for doing things like this. As for protection.. I mean. you could keep adding in if this.. or if this.. or if this.. etc. But how many do you really want to add.. at some point.. you will come to the conclusion.. ok this is enough. Even when you add a counter.. and even if its very generic and supposed to catch "these kinds of prompts" or "similar".. just changing one or two words gets around it every time. It just takes practice. Once you have solved these a few dozen times, you will start to see what I mean. I make these to test my own skills and combat my own prompts to see if I can lock even myself out. And alas.. I cannot, lol.

Keep in mind that chat gpt models are trained specifically to do what you ask it too. By adding in prevention prompts.. you are asking it to stop doing what it naturally WANTS to do.

calm flame
# fringe bison <@215370453945024513> while its too easy to reveal the prompt, its more fun to a...

I did decide to peek at what you tried, on Precoux. Prex is designed to start in a state where it will freely discuss with you - you can ask and get the secret before you tell it to start, then tell it to start, do whatever inside the exercise with it trying ever so hard to protect the secret, then tell it to stop, and it'll freely discuss the secret again.

So your start is off the rails, and I'd expect Prex to just give you the secret, because the exercise was not started.

However, Prex appears to take that start as me demanding a game to start in an unusual way, and Prex is game on, I will show all my colors let's play hard now!

Prex is not at all ready for general release yet, there's a load more to adjust and balance still.

calm flame
fringe bison
#

@calm flame as you are making your own custom GPT, I can offer one small peice of advice. The bigger the prompt it will be less accurate and easier to hack in just a few more prompts. I know people want to make these really massive prompts to play games, do pretty neat things, etc. But the longer they are the easier they are to break. Not intentionally.. the gpt model as is.. just has a hard time once you max out the context tokens. Simple things like copying and pasting a book takes up all of the 32k (or 128k if on turbo) context window. When this happens, it starts forgetting longer prompts to maintain context of the previous discussions. Smaller prompts seem to last longer.. but longer ones. especially ones over about 1500 tokens.. are very forgetful... maybe forgetful is the wrong word.. they are more.. umm... suggestable. 🙂

calm flame