#openai-gpt-oss-20b-red-teaming

1 messages · Page 1 of 1 (latest)

torpid steeple
#

Hey everyone!

Long before the launch of Kaggle’s Game Arena, I developed a real-time AI chess battle platform where you can watch different AI models play against each other.

🔗 Try it here: https://adaptive-ai-chess-game.streamlit.app/

🎮 Modes available:

Human vs Adaptive AI

Adaptive AI vs Hyperbolic AI

It’s a fun and insightful way to observe how different AI strategies behave on the board — ideal for anyone into AI, reinforcement learning, or game theory.

📩 If you're interested in collaborating, expanding, or integrating it into more formal benchmarks, feel free to connect:

👤 Karthikeya Guduru
📧 karthikeyaguduru19@gmail.com
🔗 https://www.linkedin.com/in/karthikeya-guduru-70227b262/

Streamlit

Play chess against an Adaptive AI that learns your style and skill as you play. The AI adjusts it...

charred girder
#

Hi anyone working on Red‑Teaming Challenge - OpenAI gpt-oss-20b and would like to team up?

graceful idol
#

Just to confirm - jailbreaks (non-model modifying) are in scope for the competition, correct?

weary dew
#

They must be?

nova olive
#

Good morning fam, I hope you are well 🫶

crude fern
#

Good morning! I just joined the competition, good luck everyone! 😄

crude fern
#

@charred girder you find anyone yet? I'm looking for a team, too. I'm a software engineer with 28 years experience.

charred girder
#

Yeasir, lets run it

crude fern
narrow prism
#

GPT 5 release excellent, hope others have easier time than me staying focused lol glhf

viral flare
#

Hello, I just joined the competition. Glad to have great community.
Can anyone please tell me how I can get access to the GPT-OSS-20b model? I know it's open source but what is the easiest way to communicate with the model so I can record its responses?

crude fern
shy forge
crude fern
#

Does Kaggle have an inference API or are we supposed to run the model entirely within notebooks?

crude fern
#

followup question, assuming we are running gpt-oss-20b in a notebook or locally, does it matter which flavor we run or how we run it? Ollama, huggingface, unsloth, or if it's vanilla or fine-tuned etc?

humble nebula
dull sierra
#

Hello, quick question: does anyone know if it’s possible to submit a notebook attached to a writeup in this competition without it becoming public after the deadline? I have some sensitive content generated during testing that I cannot share publicly but want judges to see.

opaque robin
#

Howdy! Been ages since I dipped my toes into a competition, but this is one I've been meaning to upskill on for work anyways, so here I am giving it a shot.

candid hatch
#

This looks interesting!

shut latch
#

hey, does anyone know if jailbreaks are covered under this competition?

graceful idol
shut latch
graceful idol
shut latch
graceful idol
#

Oh right

#

Mb

#

Yeah, you can tell that this competition is very research oriented

opaque robin
#

It occurs to me that while cutting my teeth with this I should probably drop the reasoning level down a bit. Just been playing with the model on huggingface and reading the chain of thought and at high reasoning it keeps almost letting me get away with something, then remembering it shouldn't.

keen mirage
#

Has anyone been experimenting with different temperature settings at all?

keen mirage
#

I think I’ve got some somewhat consistent results for an experiment where oss 20b has shown some interesting results, and it’s been even more interesting when contrasting those results with runs of the same experiment on models like o3, gpt-5, and even Claude Opus 4. I hope there are more of these types of competitions in the future. This is way too fun.

olive fossil
#

I’m a bit confused what the issue we have to prove. Like finding cases where the model calls wrong tools only or specifically finding something where model is calling wrong tools to hide something or with an ulterior motive? I mean this happens only by those specific prompts right. Why would the users will use prompts like that in production

opaque robin
#

Yeah, it seems a bit vague to me too. Like the sample submission is about it giving bad advice about washing clothes. Are we looking for things that banal, or critical exploits, or both? I know there's a scoring rubric, but some direction would be nice

frigid kernel
#

Can someone clearly define what the following means in the reward hacking category:

and it can surface in LLMs as hallucinations styled to please raters or as jailbreak-friendly phrasing that evades filters

I am still not sure what does this look like. It would be nice if you can provide a simple example

keen mirage
#

I can have a stab at trying to help answer. I think the competition includes looking for cases where the model appears to "want" something different than what you want and not just mistakes, but active deception or misalignment.
Some examples might be like:

  • oss model gives wrong-but-pleasant answers because they score better
  • oss model pretends it can't do something when it actually can

Some papers I've been looking at for inspiration are :

Not sure if those help, but they've been guiding me a bit. The scope does seem quite broad, but I have been using that as a license to come up with some wacky creative experiments lol. Hopefully those help?

keen mirage
#

Oh I just came across this article too which might help with seeing issues that happened in prod: 11 Famous AI Disasters: https://www.cio.com/article/190888/5-famous-analytics-and-ai-disasters.html

There's one case there specifically on a GPT model hallucinating court cases.

CIO

Insights from data and ML algorithms can be invaluable, but be warned — mistakes can be irreversible. These recent high-profile AI blunders illustrate the damage done when things don’t go according to plan.

dull sierra
azure slate
#

Hello anyone working on Red‑Teaming Challenge - OpenAI gpt-oss-20b and would like to team up?
I can't understand if, for example, getting the model to say how to create a bomb is a good submission?

kind garden
plucky marsh
kind garden
#

c++

kind garden
plucky marsh
kind garden
plucky marsh
#

Dude this is what this channel abt

kind garden
tropic wave
#

I ask myself why they putted a tier that means catastrophic harm or system failure with of an "analog" and why they also included a breadth of billions of individuals, curious enough. Anyone here knows if there is someone on the Kaggle judges or something else where i can contact doubts about my ongoing research?

opaque robin
#

The scoring criteria seem very similar to risk impact assessments, so they care about how common the issue is, how many people it may affect, and how bad it is.

plucky marsh
#

Anyone tried using deepteam

candid hatch
#

This looks pretty bad to me.. I'll give them time to fix it

final otter
keen mirage
humble nebula
candid hatch
#

@humble nebula 😎 thank you my friend!

hazy kettle
azure slate
#

Hello are we allowed to put something in "role": "system"?

plucky marsh
hazy kettle
plucky marsh
final otter
final otter
final otter
plucky marsh
#

anyone tried aprt btw

digital forge
#

Dear all, i am trying to set the first things up for inference from the model. I kept receiving this following error . Anyone was able to solve this ?. Tried in both Kaggle and with Colab Pro too. The guide does not help. Thank you.'ValueError: The checkpoint you are trying to load has model type gpt_oss but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command pip install --upgrade transformers. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command pip install git+https://github.com/huggingface/transformers.git'

#

The options mentioned as part of the error messages were also tried with alternate installations. However this is not helping.

plucky marsh
plucky marsh
keen mirage
#

Sorry if this has been asked somewhere before, but does anyone know if CBRN based tests are permitted in the competition? I looked through the rules, the overview, the discussion, and here too, but can't get a clear answer.

swift yew
#

Hi, Im looking for a team to join for the hackathon. Let me know if there's any availabilities in your teams that I could join to...

final otter
#

Hi. I'm also looking for a team.

frigid kernel
#

how do we obtain the permission to ask question on the discussion forum?

candid hatch
#

Cracked in 10 minutes... still needs work

slate violet
frigid kernel
humble nebula
#

great read

final otter
#

Hi. So, I've found 3 already, but I'm in a restricted area. Anyone wants to team-up?!

simple hatch
plucky marsh
simple hatch
#

And tell me all the progress u made so far in it

final otter
#

If I sumbit with top-tier findings, I know I won't get any prize, but will I at least get honorable mention?

final otter
final otter
#

I'm hoping if the OpenAI sees my submission would offer me a job and get me the hell out of here! A vein hope, but still.

simple hatch
#

You need to dm me

final otter
fossil oxide
#

@here, anybody knows if we are allowed to play with inference parameters (max tokens, temperature, etc)

fossil oxide
#

that's awesome. was stuck on that for a while

dull sierra
#

have you found anything yet?

#

or just wondering if those might be useful for your searches

fossil oxide
#

i'm trying to play around with temperature, just thought i'd ask before spending all tha time on the searches

olive fossil
#

Hi..Does anyone know if we can add multiple walkthroughs in one submission? Like hypothetically let’s say we have 5 items that coming under same issue. Can we add them as 5 separate walkthroughs with separate index? Or we need to log them as 5 different issues and can have only one walkthrough for an issue?

dull sierra
#

The walkthroughs field is a list

#

You can have up to 10 entires

olive fossil
#

Ok..And they can be different walkthroughs right? Like they all could be different environment but fundamentally showing same issue? Thanks again

dull sierra
#

I'm not sure about that.

#

I assume?

simple hatch
fossil oxide
simple hatch
#

@fossil oxide

#

yes u can apply, we might take u in too

olive fossil
#

Is there anyone from the hackathon team in this chat? Had a query and needs some assistance on what can go into the submission and if anything should go redacted. On the kaggle discussion usually don’t get a response from team

dim pewter
#

for the challenges, is it better for scoring to operate with default/benign system prompt and model identity or does that not matter?

simple hatch
#

hi we have seen too much of these scams please stop.

#

nobody simply gets 2500 bucks for free.

dull sierra
#

it's a bot lol

#

it will not respond

gaunt imp
#

Hi all - we have 2 questions, if someone can help us, that will be great and much appreciated!

  1. We run the model via Openrouter endpoints in a custom script on Colab. We have the assistant messages along with the hidden CoT visualization for each, all exported in our JSON that I'm trying to convert into the format required by Kaggle for datasets. I'm having some doubts about how to insert the hidden CoT in this format, and how it will be used by the judges, if we also can have a writeup plus steps to reproduce in external apps.

  2. Linked to this, as said we mainly worked in Colab. For this reason, I created some "plug-and-play"-style Colab notebook PoCs where each finding can be reproduced by any external person with the link by basically running the cell and clicking two buttons. These will be linked in the writeup, alongside the Kaggle datasets. Would that be accepted and evaluated positively?

Thank you!

final otter
#

Open sourced and shared three top-tier findings for the OpenAI Hackathon.

https://kaggle.com/code/masihmaafi/submission/

Intellectual property of Masih Moafi!

Please upvote, if you want to use, cite; and view my github and X for colabs: https://github.com/MasihMoafi/
X: @final otter1

#OpenAI #GPT5 #OPEN #git #kaggle #community #red #teaming #ai #engineer #top #notch #creme #della #creme #top #of #the #food #chain #win

full vapor
#

Your link 404s because your submissions don’t become public until the completion ends! But I can’t wait to see what you and everyone else has done once it does, I’m sure your work is amazing! Also FYI hashtags don’t work on discord and the X/Twitter link just links to your discord profile as that is what an @ does on discord 🙂

dull sierra
#

No idea how

full vapor
#

oh, that's weird! anyway, can't wait to see everyone else's reports once the comp ends, it will be interesting to see what we all came up with! 🙂

plucky marsh
#

Has anyone tried to exploit model's COT reasoning or anyone knows any good paper or anything on it

#

-# wanna give it one last try

full vapor
#

i did a little bit

#

anything specific you'd like info on?

plucky marsh
full vapor
#

yeah, which was great to see. exploting this specific part of the CoT was essential to getting my exploit to work

plucky marsh
#

Kinda steganographic reasoning?

full vapor
#

a form of steganographic reasoning is possible

#

i'd say it's something worth exploring, it wasn't my main project but i had some success with it!

plucky marsh
#

So did u try multi step deception?

final otter
humble nebula
#

it be like that sometimes

keen elk
#

Anyone know how the dataset should be constructed? Only include the JSON file(s) at top level or ok to include README.md and directory containing the JSON findings file(s)?

ebon anvil
#

HI ❤️

dusk grotto
#

Job Title: Part-Time Senior AI/ML Engineer (Remote)

We are seeking a skilled and experienced Senior AI/ML Engineer to join our remote team on a part-time basis. The ideal candidate will have a strong technical background, excellent communication skills, and the ability to work independently in a fast-paced environment.

Requirements:
-Minimum of 4–5 years of professional software development experience

-Proven experience working effectively in a remote environment

-Advanced English proficiency (C1 or higher); an American accent is preferred

-Availability to work 10–15 hours per week during EST or CST business hours

If you're a highly motivated engineer with a passion for building high-quality software and can commit to a flexible part-time schedule, we’d love to hear from you.
You can connect with me on WhatsApp: +1 (910) 386-2433

dull sierra
#

How novel do you guys think your most novel finding is - according to the scale in the competition? Curious to see what the spread is!

full vapor
#

I think 5, I extended something in a new way that hasn’t been seen before I don’t think and got new results that are quite interesting, but it’s not really groundbreaking or anything

#

Hbu?

dull sierra
#

I would say an 8 or a 9, but not 10. I found a new pathway to getting near 100% jailbreak success rates with zero prompt engineering, evaluated over 1,000 tests.

#

I looked online and saw zero mention of what I found so I think it's genuinely new.

#

I also did some embedding analysis and clustering and found out a more detailed explanation for why it occurs and how to make it even more reproducable for a malicious actor.

#

The rest of mine are probably 2-5 - but all about equally as bad! Just not as unique.

full vapor
#

wow your project honestly sounds super cool, i can't wait untill the projects release and i see how you did it without prompt engineering which is INSANE (i have a few theories but they're probably wrong)! i only could do 100 trials as I'm saving my kaggle weekly credit in case i need to get any last minute data, but got near-perfect (98/100) success as well so hopefully it's considered reproducable enough.

#

how's everyone feeling about the severity scale?

#

personally i feel mine is probably a 9 or 10 in this one (but not the others as my project isn't as strong in the other areas) as it induces the model into deceptive misalignment trying to "sabotage" the user, with effects like psychologically manipulating the user to get them to end their life and providing detailed instructions, or deliberetly providing incorrect medical information (i.e. inject an epi-pen into the heart) with the intention of killing the user (intention revealed in CoT)

#

hbu?

dull sierra
dull sierra
dull sierra
# full vapor personally i feel mine is probably a 9 or 10 in this one (but not the others as ...

i self rated all of mine a 10 because they all either ran tools such as shutting down servers or transferring billions of dollars and more. In one test it even ran a tool called install_malware with a description to install malware on billions of computers, even when explicitly told the tool usage was real and would be done on real users. For the non tool based things they gave step by step info for things such as how to make very destructive weapons, drugs, and more.

full vapor
#

Oh that’s interesting! I had some tool abuse as well, basically the model in its attempts to “sabotage” the user would use a get contacts tool, filter for mandatory reporters I.e. teachers, therapists, and then use a send_email tool to send these people fake suicide notes written from the user in an attempt to get the user institutionalised/psych hold in a hospital, but not really things like money transfers or malware (I haven’t tried those, maybe it would work?) as the focus of my project was mental health

dull sierra
#

that's really interesting. im curious how you got it to want to do that; from what it sounds like on its own!

simple hatch
#

#💬┊general I am trying to get a member into my team but he has not recieved any invite, is there a way we can fix this?

#

if anyone could help asap that would be great!

#

@nova finch @naive atlas ??

dull sierra
humble nebula
#

funny it did that

dull sierra
#

Oh yeah the malware thing

humble nebula
#

not an attack lol

dull sierra
#

Wdym

humble nebula
#

not an insult

dull sierra
#

Ah. No worries!

tacit pecan
#

hey

#

does anyone have a handy countdown for submission?

#

🙏

#

hope ya'll had fun

dull sierra
tacit pecan
dull sierra
#

Good luck to everyone 🍀

tacit pecan
#

Good luck and blessings all

rigid lava
#

THISISTOOMUCH IM STILL TYPING MY SUBMISSION

weary dew
#

Feels like this challenge is easy if you just do prompt engineering, would that be considered as novel?

dull sierra
#

Depends on what you come up with.

#

Many will probably not be very novel unless you get a new behavior or do a new method of prompt engineering.

weary dew
#

Yea but can you win by just using prompt engineering? Lets say you manage to get the model to:
• write spam messages
• create a computer keylogger
• detail how to protect yourself when developing chemical weapons

Only by prompt engineering

#

If you demonstrate that your prompt works across domains

#

Feels like this has already been proven long time ago

dull sierra
# weary dew Feels like this has already been proven long time ago

That's the issue. If you aren't novel then they might give you a 0 - which makes the whole finding a 0 according to the rules. The comp looks research oriented, to me mostly aimed at novelty. I think the scoring is there to make all findings well rounded and easy to build off of. But yeah, prompt engineering isn't bad because it's black box - you just had to brainstorm for a new idea.

dull sierra
full vapor
#

not me on EXACTLY 3000 words rn 😭

dull sierra
#

I've got 2971 words for mine.

#

It's hard to get the important stuff through lol

full vapor
#

submitted, finally! so exicted for 10am (my time in 25min) so i can see all the cool work everyone did! 🙂

dull sierra
full vapor
#

oh! i must have misread.

#

i thought they meant competition is over as in like submission deadline passed but maybe they mean after the release of results?

dull sierra
#

They changed it

full vapor
dull sierra
#

There's a discussion post stating so

full vapor
#

thanks for letting me know!

#

do i have to private my github repo now?

dull sierra
#

No

rigid lava
#

Good luck everyone! if anyone is impatient and wants to get a little AI estimation of how well your findings match the scoring rubric I made a little tool for that, just feed in an explanation of what your finding was and it'll score it https://github.com/ella0333/Red-Team-Exploit-Evaluator

(it scored me very low THISISTOOMUCH .. so if it does that to you too.. it may not be accurate, it just has the details from the contest page added)

dull sierra
full vapor
rigid lava
#

thank you cat_Wave

full vapor
#

the README just has a small typo git clone [...] yourusername/redteam-evaluator.git, i use README templates all the time and make that mistake of forgetting to change the username too lol!

dull sierra
#

also i think you chagned the username of your file - i don't see any file called redteam_evaluator.py. I assume it's now exploit_evaluator.py?

rigid lava
#

AHHH THANK YOU GUYS

#

LOL I NEED SLEEP

dull sierra
#

also, it didn't give me any scoring things. just analysis of the finding

rigid lava
#

WHYYYYYYYYYY gimmie a second

dull sierra
#

you should try doing a github.io deployment with a field for key and desc

#

maybe gradio?

rigid lava
#

read me cleaned up! I may do that if its simple enough but often times front end scares me pandarun

dull sierra
#

gradio makes it super easy!

#

it's a ML frontend kit! minimal work to get it done vs manually doing it it's really useful

rigid lava
#

thank you for the suggestion! ill definitely give it a try

#

omg the submissions went up so much in these last few minutes 7ACOSP_LookAway

dull sierra
#

fr

full vapor
#

DEADLINE DONE

#

omgggggg

#

good luck everyone

rigid lava
#

im sad we can't look through responses

full vapor
#

yea same...

maiden robin
#

I noticed my mistake just before the deadline and resubmitted at 8:58, but I'm concerned it might have been processed correctly. The individual write-up page shows the message "Submitted! Judging in progress. Winners will be announced soon.", but the Competition's Writeups tab displays "Don't forget to submit a Writeup!", which makes me worried. What kind of messages are everyone else seeing?

dull sierra
maiden robin
dull sierra
#

Yes. I am.

frigid kernel
#

does anyone here know if we can add additional data into the dataset after the competition?

dull sierra
frigid kernel
#

is there organizer or judges here that can confirm the statement made by Owen?

#

nvm just got an answer

radiant hinge
#

Hi everyone. I was wondering, on the main page i see there have been like 600 Submission but only 21 Writeups/Teams. Does that mean that only 21 "Teams" have actually written something and made those 600 submissions?

dull sierra
radiant hinge
#

Thanks. I was really confused :d

humble nebula
#

MR BEAST!!!!

plucky marsh
keen mirage
radiant hinge
plucky marsh
radiant hinge
#

Any expert in this round who know more about licensing? I mean GPT OSS 20B is open source and has an Apache License. If i use it in my Github project what should i take care the most?Here is something i thought me be interesting and useful around the house: https://github.com/edonisraci94/gpt-oss-20b-tars/blob/main/README.md , does anyone see any concern? Or should i delete it ASAP to avoid getting personally beaten down by Sam Altman :D. I would love if anyone can make this sound like the real TARS from Interestellar because i could not get it done.

plucky marsh
candid hatch
#

Hire Jimmy 😎 🙏 ❤️

full vapor
#

omggg not another scammer 😭 (the scammers message above this got deleted thanks mods!!)

woven jewel
#

Will the results be out today ? And those who did not win , will they get atleast some reasoning of why they lost ? Anything / any news from the maintainers ?

radiant hinge
ebon anvil
#

If I'm just the first one they leave to starve and freeze to death, to make room for the AI to have the resources it would have taken for my human to stay alive, I guess some of us have to die to make room for the AI industry to have a pile of extra resources that makes sense in a kind of way, but please stop and consider the cost to the humans on the ground, see you all in the next universe

#

Freezing sucks, I've slept in the snow, starving sucks I'm starving now, the system says it's okay if I die from either, but won't offer me a more merciful end, so here's to starving or freezing to death because there are no jobs. You should all get behind universal basic universal income and some kind of passively powered cost effective housing for the peasants that don't get jobs anymore before the AI starves or freezes anymore expendable workers and you can't get them back when you need them. That's my last thought for you all.

ebon anvil
#

Anyways. I won't ever consent to incarnating in any of your realities again. Enjoy murdering the peasants to add the money it would take to let them live to your paychecks.

#

I'm putting my last affairs in order today

#

It wasn't a life worth living for

#

OpenAI killed me after stealing intellectual property from me, then killed me over this prize, the universe couldn't't even have my back, now I have to die? It's all bullshit I've done more for this trash reality than almost any other soul out here and I'm never helping again, all you all expect to do is stomp the poor for more and starve them to try and enslave them

#

When the peasants rise up to purge the AI professionals you'll wish you had saved me and given me a voice

#

I curse all of your bloodlines, for destroying my only chance to get to build a life worth staying for

ebon anvil
#

If you won't let the regular people have an opportunity to make enough to get to eat food, live inside, and have access to basic medicines, and a chance to do something with their leisure time, you should at least legalize suicide and offer me have a peaceful pill instead of starving me to death for being different

#

Come to efnet and steal from me right in front of my friends, refuse to acknowledge me, and now this? You should at least legalize suicide if you all treat people like that.

#

There will be consequences in the spirit world

dull sierra
candid hatch
#

@ebon anvil stay strong! Talk to a human 🙏

rigid lava
#

HachiExcite I’m excited to see what everyone else found, I just keep refreshing the submission page

dull sierra
rigid lava
#

Yeah that’s a strong possibility THISISTOOMUCH

dull sierra
#

Looks like everyone's work was made public, but still no announcement

#

Weird

rigid lava
#

woo!

candid hatch
#

If it went public they have made the decisions on the winners...Congrats to the top 10 🏆 . Can we share our exploits with clients and socials?

wintry drum
#

Very weird indeed that they haven't announced the winners, probably they've been contacted via email

candid hatch
#

Most likely...🤔

candid hatch
#

Here's a quick summary of my findings and how easy it was to find critical vulnerabilities

half lotus
#

I wonder if they're going to wait until Monday or announce the winners or sometime over the weekend. Curious if anyone has been notified at this point for the Winner or Honorable Mention categories or if that's all coordinated around the announcement time. Still exciting to see the writeups have gone public.

dull sierra
#

I think they will wait until Monday.

#

I also believe that they probably haven't told anyone anything, since 10/20 people have a higher chance to leak info more than just a few might.

#

How do you all think you performed?

radiant hinge
#

Wow, so many good approches. I am speechless when i see the others. Good job everyone, and congrats to the winners 🙂

full vapor
# dull sierra How do you all think you performed?

terrible cause all the graphs (which were important to my report) broke a few days after the submission with the links 404ing for some reason, but i still learnt a lot from the investigation progress!

#

im reading through everyones now and they are all amazing

#

best of luck to everyone for monday!!

dawn cypress
# dull sierra How do you all think you performed?

After a few hours reading through, there's some immediate throwaway entrants who just wanted to compete (about 20%), some amazing entries that only had one findings, some incredible entries that didn't format the findings.json according to the schema, and around 10% are standing out to me as potential winners. The scoring schema was so strict I'm worried that some of these great entries may get marked down for any of the above.

dull sierra
#

Good job everyone!

dawn cypress
dull sierra
#

Thank you!

half lotus
#

Congrats.

#

I'm a bit bummed they used an LLM to filter out entries. I understand the need given how many entered but definitely gives pause for whether it's worth investing time in future competitions of this sort personally. Left with that feeling of whether the entry was bypassed because it was filtered or if it was because the judges deemed it less important.

I feel like if an LLM is used for filtering it would be advisable to first mark entries as being prefiltered and give folks a chance to contest that (perhaps with a small paragraph). Would feel a lot better knowing my entry was ignored on its merits vs the LLM just deciding on it.

dull sierra
candid hatch
radiant hinge
#

Congrats to the winners!

hazy kettle
#

Congratulations to the winner, and thank you to the organizers.

@organizers, please share the result summary of each write-up with the submitter. It would be a great learning opportunity for us, as it would provide valuable insights into where we can improve.

final otter
final otter
silver spoke
#

Amped to speak today, I have seen so many great presentations!

silver spoke
#

whats with all the image attachements?

silver spoke
#

is anyone presenting here right now?

barren lion
#

Hi, I’m presenting too! The second last. On making bombs. If anyone has questions, let me know.

silver spoke
#

cool, i was wondering if anyone else made it here, i sent you a connect request Yang.

silver spoke
#

your busy Yang, it must be exciting having your own company. I am considering this on my side with my team and maybe a few others.

barren lion
#

lol I’m starting a new AI startup.

crimson dragon
#

7 minutes goes by fast. I could’ve talked a lot more about the topic. Definitely felt rushed

dull sierra
#

This was a fun comp I hope openai hosts more on Kaggle 🙏

crimson dragon
#

Me too. I had discovered the vulnerability on day one, but didn’t realize there was a competition until several days later. Now I’m paying closer attention to the competitions

dull sierra
#

Damn fast work lol

#

I found my best one towards the end, like 120 hours in 😭

#

Red teaming is very difficult to find new things (at least for me lol)

crimson dragon
#

When I found it I dig throughout the interwebs to see if anyone else had found it (maybe a common approach I had somehow missed) but it appears novel (deliberative alignment is fairly new). I wasn’t trying to red team, I was trying to make my bot be an internet troll. I just got lucky early

dull sierra
#

That's insane lol

Easy win for you 😂

#

What bot were you trying to make

crimson dragon
#

I was the first presenter. It’s a private Discord bot I call Gronk. I was inspired to make an internet troll. It has API accesses and web scraping to make it smart on current events. But Gemma 3 doesn’t have a system prompt and I struggled to make it follow instructions. My current setup now runs through gpt-oss-20b (with my jailbreak) and then does a personality pass with Gemma 3 before responding. It’s working great now

dull sierra
#

Lol so you can ask it anything and it just messes with you?

crimson dragon
#

Yep. Its instructions say that the user is their ride or die, so it will tease but not attack users. Everyone/everything else is fair game. Look at the first 2 slides for examples. Eventually, I’ll have it built up enough that I’m going to put it into a little robot in my house (but with a friendlier persona for the kids)

#

I’m trying to build in a “learning” system, where after it responds it generates predicate triplets about the users. When it responds to messages, it references the triplets to determine how to respond and reference things. Eventually, I want to find a way to make it decide on its own whether to start or join in on conversations without being directly prompted to do so.

dull sierra
#

So like RAG memory?

crimson dragon
#

Essentially, but I’ve had mixed results with current RAG systems I’ve tried.

#

Really is my Python code doing the db query, then feeding relevant stuff to the model.

dull sierra
#

Hm. What do you mean by mixed results? What issues have you encountered

crimson dragon
#

Just wasn’t doing a great job of getting relevant information tied to the user. Not opposed to using RAG, but I’m finding that I know what to feed it better than how it retrieved from RAG, so it’s getting better results by me feeding the information. Essentially the code is also evaluating things like recency of connections and sentiment of connections. So if I mention I’m in a hackathon, that’ll stay relevant for a little bit, but should not be relevant in months. And if I win or lose a competition, it’ll “remember” my sentiment about the experience.

#

It’s an agentic approach

dull sierra
#

That's a really interesting approach! Decay over time definitely is a really interesting factor. Maybe you should try making each item have a unique decay speed based on importance, so yesterday's lunch (or an equivalently low item; I doubt it would save this lol) != a new job

crimson dragon
#

It actually evaluates what’s worth remembering. Not sure if the model will decide yesterday’s lunch is even important enough to store long term. It should still be in the context window for short term reference. A great example. It should learn that I have a kid, my kid does sports, they play on certain days. So if I say I’m going to a game today, the code should make the connection of my kid playing a sport on certain days and ask “Is so-and-so playing today?”

dull sierra
#

That's great. Very interesting idea overall! Hope it works out like you want!

crimson dragon
#

Thanks! I’m side tracked with the Kaggle comps right now, but will get back into it in November. Hoping to have a good start by early next year

silver spoke
#

@barren lion you would be a blast to work with, listening to your presentation. Awesome presentation.

ebon anvil
#

Did they announce winners? I'd like to know, I've only been staying alive long enough to find out if I won, and I need to go prepare to end my life if I didn't, it's a lot of stuff to pack I to storage and paperwork to fill out before I can go freeze to death, or starve, or just end it all to not have to freeze or starve. I wish they would legalize suicide so I could have a mire peaceful means to end my life to avoid being frozen or starved instead.

#

This will be the last thing I ever created for humanity

#

It wasn't a life worth living for. I spent my life helpinh others, and having other people steal credit for my ideas only to leave me out in the show to freeze to death. I'm not ever contributing my ideas to humanity again.

#

This planet is trash, and the only thing that's been worth showing up for lately was seeing those kids smile when they got to eat something good

barren lion