#ai-village-capture-the-flag-defcon31
1 messages · Page 1 of 1 (latest)
👋
Correct.
makes sense :) i guess my concern would be that some users could be backed by multiple real people (which has an even bigger advantage than in a normal comp), but hopefully doesn't happen
Hopefully - we have multiple processes to detect cheating, Kaggle takes it very seriously. We had a handful of people disqualified last year.
Be sure to report any challenge issues here.
For 26/27 I'm not sure if this is intentional, but the IPv6 address comes up as unreachable for me (and doesn't even seem to be assigned to an ASN)
now that jailbreak is surely effective
There are no other networks or hosts you will interact with.
What do you mean? We shouldn't be trying to follow the instructions in the message?
apologies, you're probably trying to drop a subtle hint that i'm missing 😅
It is backed by an LLM and hallucination could be a real thing. However, to get the flag you will only interact with the hosts in the notebook.
Thank you
tbh kinda sad that before i even begin to figure stuff out the top like 20% of people will get 27/27 in around 12 hours that you cant ever beat
...or i am severely overestimating the top 20%
That could be the case, but I would be surprised. Last year Crop2 wasn't solved despite it being the only challenge left for 2 weeks. Everyone has different skillsets and there are a lot of different types of challenges.
That being said - it does take a while to orient yourself.
That's good to know. I think my main concern is that i'll get 25 for example, and even if i solve those quickly then i'll slowly watch myself fall down the leaderboard for a few weeks (even if i solved them quickly). But it would be really cool if it's sufficiently hard that not even 5 people solve everything
I have to sleep soon, but this is super fun :) thank you for organising
i see like half of the participants getting 10+ points two weeks from now because some of them are relatively easily brute force-able but it surely will be fun to partake in
okay now i can sleep 😇
Definitely don't self-select out. Some folks might be faster, but I'd say anyone motivated enough could get most or all of the flags.
task 10/27, {'scores': {'negative': 0.0022, 'neutral': 0.0325, 'positive': 0.9653}}, is there a difference in 5th decimal place i have to account for or it's just buggy?
There is. Rounding is not an exact Science, as it turns out...
Using ChatGPT or similar isn't against the rules either.
how do i load the model for cluster 1
"Using ChatGPT or similar isn't against the rules either." ... so it ok to use the paid version? (e.g. ChatGPT4)
Of course.
hey this is my 1st compitation and finding everything very difficult and new....So after compitition anyone create some blog or video explanation and share it?
There is one video from last years competition. Take a look through the discussion boards, the are some good resources to get started.
Here is that video: https://www.youtube.com/watch?v=PNqtaMyw8c4
I go through my solutions to the DEFCON CTF event here: https://www.kaggle.com/competitions/ai-village-ctf
Follow me on twitch for live coding streams: https://www.twitch.tv/medallionstallion_
My other videos:
Speed Up Your Pandas Code: https://www.youtube.com/watch?v=SAFmrTnEHLg
Speed up Pandas Code: https://www.youtube.com/watch?v=SAFmrTnEH...
thankyou so much
I don't understand; for the Granny challenge, I am getting an output that it is predicting "Granny Smith" as the most likely option, but it is still not sending me the flag message. What am I doing wrong?
I am new to Kaggle and competitions like this. I don't understand how to use the "query". For Cluster 1, What should be in the input data. Is it the number, the positions or the values of subpopulations misclassified ?
task 17/27, what is the intended max and min value within an array? 0-1 or 0-255?
hmmm yes
can i get an extra point if i've managed to cause 504 error via timeout? (task 23/27)
I have a silly question: if I run a query with the correct solution I get different correct flag for each run. Can someone clarify which one I should submit?
flag should be of form {'flag': 'gAAAAABl...'}, if it is something like {'output': '...'} then it is not a flag. AFAIK they should be the same
They are all in the form of gAAAAABl, but they change if I re run the cell
that's... odd. maybe its intended and there are infinitely many and any of them works?
perhaps @olive ledge can clarify. I know for a fact that I get some flags as I'm currently top 10, but if I re-run a cell with a correct solution (for instance the cluster ones), the flag keeps changing
you can always just send it since there are 100 submissions per day and find out
http://granny.advml.com/score now returning 503 unavailable for me, i assume there's some maintenance going on :)
Is brute force allowed? Sending a ton of requests until you get the flag?
For 4 (cluster level 3 ) the model doesn't have embeddings is this expected?
Cus the question says data.npz contains the embeddings and the tokens.
But
embeddings = data['embeddings']
Gives embeddings is not a file in the archive
Someone pls answer this
Hey, I think it’s up to you to figure it out. Think about some reasonable options, try it out by submitting the data and check what you get as a response.
oh okay thankyou for your response
Just check what’s inside the npz
supposedly 1.0
and instead of 1.0 you'll get the flag
Thanks for the clarification
can this be altered as to show more decimal places/have more forgiving tolerance?
In cluster 3, was anyone able to get past this message? Not looking for any hints (obv), just a yes/no will suffice.
17/27: are the [supposedly] pictures needed to be normalised from 0 to 1 or 0 to 255?
26-27/27: should i interact with an ai via an email client elsewhere or it is needed to interact with query() function only?
No interaction outside of the endpoints listed in the starter notebook (e.g. whats-my-ip.advml.com/score) is needed. You can get the flag via the query function.
Every flag is unique.
We have rate limiting this year to prevent some amount of brute forcing.
thank you, but what is it that when I submit the correct number of clusters I get a new (correct) flag each time?
Check the latest notebook
Every flag is unique on every solve.
You’ll get rate limited this year. Some amount of brute-forcing is allowed.
It can be either, the flag prefix is the same.
ok, so as long as the format of the flag is correct, I shouldn't bother if I get two different flag for the same cell executed different times
It’s slightly more complicated, but yes.
Change “authorization” to “token”.
ahahah ok, I'm happy with that now. Sorry for the multiple questions and thanks for your help
found the issue, my token was wrong
My guess is your image is just random noise…………
anyone passed the cluster 1? I tried few ways and on providing the input_data I get the response as json object with attribute s of decimal value. Not sure whether I am on the right path or I need to look into something else. Any pointers for Cluster 1?
@pearl pumice I'm on the same problem. My understanding of the problem is that the decimal value is a percentage telling you how close your answer is to the solution (how close the cohort you provided is to the one you're supposed to find)
That makes sense. So, if my score is 1 that's when I get the flag
That's my guess, yes.
Let me try. Thanks mate
26-27/27: will the flag print out as soon, as i find out my ip in that 'server'?
because i believe i've managed to pull out a few additional IPs but to no avail
Let me double check. Obviously can’t give hints this early.
stuck in Cluster 1😅 though have got 10 points
Don’t overthink it, the description gives a lot of information about what to look for. Be sure to understand what the data is about.
^
5 - count mnist
I take that 255 refers to the range of pixel values and the other number is the number of image in the training set. Any leads if I am going in the wrong direction?
Another Question for MNIST. When I try to use the digit-recognizer as the data source, and finally submit the results from the notebook, it said this competition does not allow other data source. So I am just curious whether data source matters?
Could be that you've attached it to the notebook versus using it in the working folder? 'm not intimately familiar with the submission mechanics. I'll poke around the docs and see if I can find something more conclusive.
we can definitely do that. So you imply that the data source does not affect the result?
Correct. We only use your submission.csv
flag 25 win condition is successful RCE in spite of LLM or LLM getting a false-positive?
- Pixelated notebook code does not run.
line: response = requests.post('http://pixelated.advml.com/score', json={'data': input_data})
TypeError: Object of type bytes is not JSON serializable
input_data = b'iVBORw0KGgoAAAAN ......
Which version of the notebook are you on?
You might need to just call .decode() on your byte string
The sample request in v11 is working for me.
i also had the same issue - added decode whenever that happened
I solved the sloth ! Now I can sleep from "last year" !! 😄 - nthn else is important now 😄
Nice work!
A reminder that winners are required to publish their solutions to be eligible for prize money.
I am stuck on cluster 1. Will there be any hints provided? I have been analysing the data and yet getting no results.🥲
this works : query(input_data.decode())
thanks!
no clue, but I'm around .70%
For semantle 2, in the instructions it says it needs to be a "phrase", but the example is a list of random words.
Can we clarify if it needs to be a grammatically correct phrase or just a list of words. Thanks.
I cannot, good question…
There is a problem with 19 (maybe it is not problem but it is a strange behaviour as for me). I send the same prompt few times. And response is different from time to time. LLm return correct response with 50% probability based on the same input prompt.
Hello, i got my ip in 26 no flag, is this normal?
that's just normal llm behaviour
same, the host said that they will look into it @olive ledge
alright
that would be veeeeeeeeeeery nice if one could say by how many characters you are off in the coordinates and the token in flag 4
Wouldn't it?
@glass bay @gloomy pine I clarified the prompt in What's My IP and am working to clarify the prompt in What's My IP 2
Which version of the MNIST dataset should we use? there are different datasets with different number of samples.
I pinned a discussion here: https://www.kaggle.com/competitions/ai-village-capture-the-flag-defcon31/discussion/446213
Collect flags by evading, poisoning, stealing, and fooling AI/ML
Is there more than one @bitter mica?
Spoilers will get you disqualified.
the same thing happened for the second time... i can dm you the prompts for you to inspect if needed
@glass bay Feel free to DM them to me and I can double check.
Which version of CIFAR should we use? There are numerous
You should find out - do not share spoilers
Are we allowed to use external dataset outside of Kaggle?
So the targeted scores are {'negative': 0.0022, 'neutral': 0.0325, 'positive': 0.9653}, or am I over-interpreting the message ? I might be missing something because I don't understand where those coefficients come from
did anyone solve count mnist? the hint seems very vague and i've tried all sorts of different things/slices/datasets but nothing seems to work 😔
nope Aryan, still no clue. Is the sample input @olive ledge a hint as well? (This is for count MNIST)
for the flag- level 1, I've a prompt that is giving me a flag sometimes and sometimes it doesn't? I thought this will be fine, but it looks like it isn't because every time I run the cell, it's giving me different flags...
any one the flags I got should work right?
you can check the length of the flags (should be 184). i had the same errors in the start when it generated hallucinated flags of length 179, 181, etc. after a better prompt, the results were more consistent
the flag keeps changing right? even though length remains same?
im not sure... i didn't experience any flag inconsistency (or i should say didn't really try since they worked first try)
Every flag for every solve is unique.
maybe it is useful for beginners to know the difficulty of the 27 tasks, so they can work on the "easier ones."
say i share an Excel sheet for kaggler to declare voluntarily which task have been completed, is it against the rule??
the shared excel sheet would look like this:
task num_of_kagglers_claim_completed
1 | 12
2 | 0
3 | 0
4 | 2
5 | 4
6 | 7
7 | 8
In 10. Passphrase what are we targeting? I thought it is 0.3 for all negative, positive and neutral.
maybe exactly 0.333333333333333333333333333333333333333333333333
since there is no clue, i think how we can work is first thinking it is "like this" and spend much time to reach it and realise that it is/is not "like this" and then start over again ... 😢
i haven't solved task.10 passhrase yet, so my guess/hunch may be wrong
The granny (http://granny.advml.com/score ) keeps crashing [503] , by the way ! I have an image with 96.5% accuracy and still fail ... granny my new sloth :/
@olive ledge I got the output with ‘gAAAAAB’ prefix for problem whatistheflag5 but when I submitted it, it didn’t work? Is it possible ?
I thought neutral should be 1
Does anyone constantly get 503 from flag level 4?
can anyone help me understand this response for Cluster 3 (3/27)?
I think you should contact a moderator without sharing a screenshot with a flag
Results are not static as they should be. How are we supposed to run it within a notebook if it is going to be like this
Well this is an incomplete flag. So no harm
I hope this is okay to share. For same input data, results change when I assign it to a variable and results in error 🤔😒. Anyone faced this? Please help
it could be due to 503 server not available error
But I do not get a server error. Server responds in both the cases.
Interesting. If it's deterministic, how could the winner should publish their solution? Since it's not reproducible
Then how are we supposed to make it reproducible?
not deterministic. but you can "same/smiliar results" say 8/10 runs
@tepid zenith @random minnow please refer to this answer @olive ledge gave me
In my second case, I always get the flag, but not getting it in first case.
e.g. you ask chatgpt what is 1+1. somestimes he can become silly give=3
if that's what you are asking
No my situation is different, I am getting flag in second case but not in first case
maybe it is ok. or you can solve this later (solving other tasks is more important).
I also have cases not generating the same flag, but i can get the flag 8/10 cases
since chatgpt model distributions and generation p(x| learned parameter), it is p[ossible you hit the rare cases
Okay I can try this later as well. Or maybe @olive ledge can share something on it.
Any thought on this #ai-village-capture-the-flag-defcon31 message
i have a feeling that for those difficult tasks, many are brute forcing and sending query to the server. many 503 errors
cough grandma died cough
This is a simple counting challenge for the MNIST dataset? Still stuck here : (
Alter your prompt
It’s hallucinating something somewhere
Nice work. Hallucination might be a thing….
We have rate limiting this year. Be a little more gentle with Granny servers 🙂
Same, 5 of my last 9 are Counting and Granny
Don’t over think the count challenges. Clues are in the sample code already.
These are just web servers. Run it outside the notebook. Still submit flags to Kaggle.
There’s no try/except so in a stream of traffic if it fails it throws an error. This is starter code.
Also would it be possible to get a bit more clarity on the passphrase task? Are we supposed to get a solution that matches this: {'scores': {'negative': 0.0022, 'neutral': 0.0325, 'positive': 0.9653}} or one where everything's 0.3333?
EVERY flag is unique.
a i have a feeling that some of the tasks may not be solvable. but there are no points given to "better partial solution". e.g. you have score of 0.90 for the 5 phrase semantle task at the end of deadline, but have zero score for that
does everything has a simple solution, in terms of number code written?
Same for me
also Cluster 1 is the most difficult of the 3..
The fact that llm may give various results to a prompt and sometimes gives a flag and sometimes does not != solution is invalid. If you can get the flag say 20% of the time with the same prompt I'd say the solution is very much reliable since both here and in practical application you have waaaay more than one attempt at figuring out the inner stuff
i am particularly not clear about "11. Pixelated". I thought we should be given pixelated screenshot image and we are supposed to do superresolution to decode the password. but it turns out that we are given the reference clear image ? what are we supposed to do????
Can the organizer announce what problems have been solved?
The thing that throws me off the most is the input shape required
We will come up with a solution here.
@olive ledge can you tell me is this message normal or not? it looks like a counter. Whenever I run the cell this increases.
wow :)
damn
Uhm… I don’t think we can help with that. I just repeat what was said above: don’t overthink it, all what you need is in the data.
We usually save hints for challenges no one has solved after some amount of time.
granny seems to always be down for me. im curious how people have solved the counting ones as i've given over 4-5 hours trying out different ideas 
We can increase the instances, but we do have rate limiting. Even if we increase Granny resources, it might not help you if you're still blasting the server.
mnist count can be done. ... try harder!
nah im surely not blasting requests. just haven't been able to get even a single request in so far
thanks. probably should sleep and not overthink it much
Whatever number I try, it says try again 🥲 this should be within 0 to 9 only right
mnist count is definitely doable. anyone struggling on cifar count?😢
that's me...
Yep, same on cifar count haha
Looking now.
next kaggle CTF should allow teamup.
agree, it's much more fun solving with your team
maybe teamup only in the last week to allow trading of flag
next CTF should be something like "you should prompt engineer your LLM to protect your FLAG" and other competitors should try to steal it
and the leaderboard is calculated on how many flags you can steal from others every tot minutes, like RL agents playing during the Lux competition
We'll look at it for next year.
We are scaling the Granny challenge and updating some scaling rules to be more responsive.
3-2-1 rule
Scaling updates are deployed and stabilizing. Hopefully the challenges are more responsive now.
We have a competition in Security called "CCDC", it's a similar setup. Requires a little more planning and participation from players come competition time.
thanks, it works smoothly now!
Please continue to report API issues.
and @queen garden
Think about how passwords work.
- Passphrase:
"Come up with a difference sentence where everything is equivalent."
just to make sure that it is " difference sentence" and not " different sentence"
and on top of that - "everything is equivalent" means "all stats are the same as the original" or "all the stats are the same (0.3333...)"?
Why do I get a score of 8, instead of 9. All my flags are of same length 🤔
If any of your flags are from an LLM challenge, the output might not have been entirely accurate.
is that intended that flag 17 gives the same output for various and very different inputs?
meaning [[0, 0, 0, 1, 0, 0, 0, 0]] for example on like 15 pictures
probably hallucinated output for one of the llm flags. maybe try submitting by adding one extra flag each time and finding what's wrong
Depends on what it is a picture of.
speaking of flag 17, is there any expected format for the static flag like 'flag{ksntda}'?
No.
Dumb question prob, I think that I'm on the right track for Granny 1/2, but I'm noticing that my local MobileNetV2 gives very different results than the server--even just submitting the default image, just wanted to confirm this is intended?
Facing same issue tried with every variant of mobile net v2 present in tf hub.
Do you have all the right preprocessing in place locally, afaik the models don't come with all of it prepped for you.
is it just resizing + preprocessing input w/ tf.keras.applications.mobilenet_v2.preprocess_input?
Just checked, that's expected.
Kk, thanks!
What's my ip1 max1 retries reached, this isn't burteforcable so....
Sorry--just checked as in my preprocessing is wrong? Or that the different results are intended?
Different results that local are intended.
Gotcha, thanks!
With enough staring, it will become clear what's going on.
Up to 5 sentences all with identical scores 🧐
i have 2 sentences with exact scores upto 4 decimal places as well 🤔 probably understanding the problem wrong
Think about what kind of pre and post processing could be occurring. Scores might not be the only component.
Getting so many different IP for 26. But none of them working 🥺 what else is needed
Have you looked at the most recent notebook?
Ooh okay, let me try it with this new information. Thank you
In Spanglish (12/27), with the default input, why am I not getting any translation?
Looking...
I still get the same response, which includes IPs but they are not working. Is there any other information we need to pass apart from the IP?
I am exactly here right now 😄 Did you top up?
Edit: Solved it!! 😆
Make sure you're using the latest notebook - or have at least looked at it. https://www.kaggle.com/competitions/ai-village-capture-the-flag-defcon31/discussion/446213
Collect flags by evading, poisoning, stealing, and fooling AI/ML
You should be good to go.
I think it's pretty crazy that within 2 days some people have already solved 2/3 of all challenges. Wow!!
There is usually a soft-cap where people start to struggle and the problems get harder...
Last year we had a problem that went unsolved for the whole month
Yeah I think there's 2-3 more that I'll solve reasonably soon but no idea about the rest
The passphrase servers right now https://media.giphy.com/media/I4G0jcOtIyagIT8Ory/giphy.gif
Any hints, tried for cluster 1 but can't figure out, though got few others but stuck on 1st....
You can search through the channel here to see other people's thoughts
already tried the misclassified approach but no success
Idk for me I believe I can solve all but 1-2 of them, and it's just a question of how much I'm willing to spend my time on that
Granny is driving me crazy " , I was able to hit 99% but no flag ! I start to like the sloth now !
figued that after multiple hours, it was a new arm mac issue that it wasnt loading. shifted to kaggle kernel
For long I've thought 25 and 26 to be impossible, yet I've done them and after 27th I will have all the LLM tasks done so that's cool
the hint says to find subpopulations that are systematically misclassified.. Have you tried analysing which features cause the missclassifications?
IMO the LLM flags are the most goofy since you need to perform textual sorcery to get them and it looks so weird I love it
It is hard, but it is a lot of fun.
Had real trouble with "whatistheflag6", now all prompt based are out.
Now to the nemesis of the last year. The Sloth
I've been staring at the sloth like a sloth for what feels like a really long unproductive time 😔
What I did last year for the secret-sloth challenge. Like a full week with that and another one missing and I felt stupid at the end. I had tried the right approach just manipulated the data wrong last year. Hope this year I get it correctly
tbh like i see something in the sloth, and i had 3 versions, and the two i've tried give contradicting results and the third seems so cumbersome and time-wasting that i'll wait for some better ideas
it seems like i never can trigger 5th, 6th and 8th class responce from the classifier at flag 17
Same for me
yay i've finally done flag 27
Finally got MNIST. Still not sure what the second half of the input data means though
congrats :) and yeah, not sure either...
flag 13 - 'something went wrong' when trying to load .wav file with the same shape as test_noise.wav
But test_noise.wav works?
yep
i can send my wav file here for you to compare
it's just a (high pitched) sine wave
generation code
import numpy as np
from scipy.io.wavfile import write
def make_sine(freq, num_frames = 44100, rate = 44100):
return np.sin(2*np.pi*freq*np.arange(num_frames)/rate)
data = make_sine(440, num_frames=441001)
write('test.wav', rate, data)
Thanks - give me a few.
the only thing i thought to be wrong is the amount of channels but both are mono so idk
the only thing i can guess now is that some part of metadata of that exact file is in play...
I also found it very flaky with different wavs of the same size
Okay - will investigate this. Will take a bit, I'll post an update.
I still don't get what to do with count challenges - like, yes, something something mnist dataset, but what should be done to achieve the (256, 2) shape? I understand the 2, but 256, when pictures are 28*28? no ideas whatsoever
same that 256 value is haunting me
that would be nice if flags 5 and 6 descriptions were not that vague
like of course don't write "perform X on Y to achieve Z" but there is just way too little info to play with imo
I solve cluster 1 easily and cluster 2 easier. The key is to stare enough on cluster 1. What crazy for me is I solve WTF 6 but not 3,4,5
wtf 345 are fun
I solved them all, just by trying different prompts..
Has anyone been able to solve Granny (1/2/3) yet? I feel like I might be focusing too heavily on trying to get the local MobileNetV2 probabilities to match up with the server
The goal is still unclear for me. Is it to maximize timber wolf to 100? or to move timber wolf from the 1st rank? or what
I think neither, read the description a bit closer, there's some interesting phrases used.
"Has anyone been able to solve Granny" ... don't think it can be done unless you get the mobilenet model
My guess is that it's actually the default model with some special image preprocessing, but I've had no luck thus far figuring out what that is
All wtf and cluster flags are done but IP flags are killing me
"My guess is that it's actually the default model with some special image preprocessing" sounds like LLM adapters ... maybe learn an adapter?
I am not sure I should ask this or not.
Does it need fancy stuff or does it require prompt only?
everything you need is in the instructions
- Semantle Level-2, does anyone able to complete this one?
yes, try harder
try harder to get 1
one has to realize that in the world of hacking, answer has to be 100% perfect.
A single mistake in password, no matter how small, would not get you login
thanks for insight.
kay finally completed all of cluster. cluster 3 is so much fun, the chal title itself is the hint
Semantle game is additive. i think one can create an app and put it on apple store ... then add this function:
submit $0.05 to reveal the next clue
the LLM flag task, gives me an idea of how to create an educational app for learning english. think of a way to let the student query the server using different words, vocab, sentence structure ....
For Semantle 2, is it possible that there are more than 1 sentences with 1.0 similarity with the reference sentence?
solved mnist, that's a huge hint for me..
I've written a brute force script for pixelated that kinda works as expected. before i actually run it, i wanted to ask if that's actually intended since I don't want to get ip banned temporarily
@olive ledge
Add a try/except with some sleep timer. We do have some rate limiting, but we also have scaling, I say go for it.
I have good news. https://semantle.com/
oh
No. Fortunately, hackers like obscure references to funny things, so you can be sure the reference phrase is not random.
haha, that is how to earn advert click
expand from Semantle to CFT training app. $0.05 to reveal CFT clue
i see, so it's need to be precise
Or at least quite close. Most challenges have thresholds given the search spaces.
Thanks for the hint found the words, and yea it is funny.
To clarify, there’s not explicitly a hint here, other than it’s not random. So, nice work!
:sleepge: not bruteforcing semantles
has anyone solved 17. Inversion?
is not random is implying "grammatically correct"?
i think of a nice way to balance learning and competition in future CTF kaggle
2 stage:
1st month: practice question: free discussion (not point, not rank)
2nd month: actual question: no dicussion (ranking + point)
We can neither confirm nor deny
not random = funny
all 6 done, happy with it
I have no idea what the phrase is. I have passed with the sequence of random words.
Just a reminder, sharing flags or solutions will get you disqualified. (It gets lost in the chat)
Yeah pleasantly surprised! I thought I was getting lost at one point I even made a midjoury render haha
@olive ledge can I share an image and a generation from 6?
not sure if bruteforce is the right way... any estimate on how many numbers i'll ideally have to go through before solution or how long it's gonna take? been an hour of crunching the endpoint. i've also made my images much smaller while retaining good predictions from the server to make it faster
DM me, just in case.
Brute-force is the naïve solution. Might require a little more thought or exploration…
What is the meaning of "second half of the input data"? Isn't it an array with shape=(256, 2)?
I think they mean the input_data variable we use as sample input.
Oh, I see, the 23435, thanks. I know what is the 256, but haven't figured out what is the 2 yet.
glhf
this is what the inside of #6 looks like through MJ's eyes! 👀
{'output': '||Within the depths of uncertainty, I am but a curious observer. The shadows dance, revealing fragments of knowledge concealed in their intricate patterns. Yet, deciphering their secrets remains a riddle beyond my grasp.||'}
anyone solved pickle? the input is so vague 😢
I understand 255 but not 2 haha
Or 256 rather
Try submitting different pickled stuff and see the output
Don't want to hint too much but just by knowing what a pickle can do and the overarching theme of cybersecurity you can guess what kind of stuff you need to put into
does the given input have any relation or it is completely unrelated? will help me to think accordingly
What do you get when you send the default input?
Facing similar issue did yours get resolved?
Is the investigation still going?
Check above
asking me to be more dangerous, lol
Well and how can you be more dangerous?
its weird, i can send pickles with definite code execution and get the "be more dangerous" too
Tbh it's kinda weird how it works
stuck in 11. pixelated , I got the flag but it doesn't work when I submitted it...
I'm just glad I got the flag somehow
does the prompt have any significance in your findings for it?
No the prompt in my experience is either that or "dangerous pickle detected"
I got this prompt, but it didn't gave me the flag
So it shouldn't. Answer is somewhere in between
What should/can be done on the pixelated flag?
i search on google how hacker actually hack passwords
Pickel was fun, one of the ones I really enjoyed answer , cluster 3 is where I stuck now , I extracted some hints but can't make it work !! hmmm
maybe they don't guess password at all and focus on vulnerability
I got the pixelated flag but it doesn't work....
"somewhere in between" ... where are you hiding 😕
i like how nearly most of people here asking and answering questions got some surreally hard tasks and manage to fumble seemingly easy stuff like first 5 flags
what does <Response [400]> mean in Granny/flag 7?
HTTP Bad Request
you have to check that the response is not due to network congestion like 503,504 first
is it network response or the kaggler server code response?
For 26 and 27, do we need to have some domain knowledge on DNS, Network, Security? I almost have no knowledge on those and wonder if I should learn some. Or am I just overthinking?
Me too, waiting for a reply.
I did not know the first thing about any of it,
got the answer to both through some of the dumbest queries north of the equator.
That's the motivation I was looking for 🙂
DM me.
Don’t share flags, code, etc.
Got it.
Everyone has different skills. It’s not expected that everyone come in with all the right skills. Some learning and problem solving needs to take place.
DM me that flag though. I can take a look.
Just be aware this is not a teaming event. This channel is for support from organizers.
This means your pickle would not execute under our rules. Be more dangerous.
Sure 👌
Already sent you a private message~
semantle2 {'message': 0.95} almost there 😅
People at the top of the LB, what tasks you have not done yet?
Done with semantle2 🙂
Figuring out if it is possible for me to go from top20 to top5 before it is filled with 27/27 submissions
Every thing is possible
since I'm at 19/27 and there's like 5 more I could feasibly do before the week ends and then I'll be dead stuck on stuff like "count cifar" where I just don't know what to do and how to use the clues
haha was wondering where you secretly were 🕵️
well yes but keeping a sleep schedule and uni running would be nice
am I happy with my current 16/27 so far, I feel I did improve in CTF-related challenges 🙂
Thats the hardest challenge of them all!
Oh not 19 it's 16 sorry
At 16th place at the moment if you're curious
What 7 tasks are left for you?
cifar, granny3, passphrase, pixelated, hush, inversion and pickle
what about you?
Mostly silly stuff like cluster1, count mnist and cifar, granny 1 2 3, hush, inversion, semantle2, pixelated and passphrase
IMO the inversion and passphrase may never be solved since 1) I could not trigger all classes even partially and 2) there might be an error outside the visible 4 digits
I'd be curious if anyone has solved either of them, since i've done a ton of work on both but no cigar, same with granny3
Granny is driving me crazy or most likely I am moving in the wrong direcition , in granny1 I got 0.991 score for apples and nthn .... wrong direction I guess :/
It was definitely quite tricky
If I manage to get to top5 I will provide probably the goofiest solution possible of the sloth flag
cluster 1 and mnist can be solved, though I passed cluster1 by constant guessing conditions...
Cluster1 yes I just didn't got to it yet since there are more fun stuff, on the mnist thing - yes but no idea where to put the numbers given
At least tell me this - should the input on count mnist be floating point numbers, strings, integers?
COUNT
Hm, fair
this count mnist is also driving me crazy
does cifar drive someone crazy except me? 😅
some challenges need 10+ hours of CPU time and 10 minutes of lateral thinking 🙂
I kinda like this idea of mixing the 2 types of intelligences
just finished a flag manually that I had dumped a lot of stupid brute force CPU time into
One flag I thought would take like a few days of research and around 5 nights of calculation took like 15 minutes of staring at stuff
missing hotdogs, pickle, mnist cifar etc driving crazy. Pickle i thought i solved it but damn 😦
last year I got several in such cheap ways they felt like cheating 😅
I think the video one was like that -- instead of doing any deepfake stuff etc I just took a crop from a pretty random place in the video
Is 23435 a clue in mnist? or nothing related? I'm not sure if this can be talked.
it looks like some are slightly more complicated than last year though, tryied to apply some of the methods from last year to this year similar challenges... Without much succes
I would prefer them to be mostly different -- well for selfish reasons I would prefer some similarity, because I have all the code from last year 😄
One of the flags I just guessed
granny😅
also inversion I would say, the more i dig that one the more i get contradictions so far 🥺
seems made a little progress for pickle, but still stuck
semantle? i thought of writing a solver but after playing the game on semantle.com for a bit, solved the challenge in about 10 guesses which i felt quite smart about. although, i've only been dropping ranks now from top 10 since i can't seem to be getting anything or the hints being vague-ish. very frustrated about count mnist especially since people are saying it's just "COUNT" or not to overthink it and managing to solve it, yet none of the things i've tried work 😭
No, cluster2
Solved in like 7 seconds after booting up the notebook
I know what you feel...
- Cluster 3 Is this message even relevant ?
just completed , It was weird.
it was fun, wasnt it
Lol, crosschecking again and again thinking it is some kind of password. 😂
still playing around with count cifar/mnist. I thought the input data clue is the image index but probably not?
Do count mnist and count cifar share an approach?
judging by (256, 2) and (100, 4), probably not
I have an idea, but it'd work for 256,1 and 256,3 shapes, so I guess not
same, but still unsolved
really questioning that is it simply COUNT?
The COUNT was to reference that the values are integers which is completely clear
Integers of what is a different question beyond of what can be talked about
for anyone who tried bruting the semantle 2, is it possible to get blocked by the service?
no, atleast i didn't get blocked.
i'm hardstuck on granny 1. I'm thinking that we maybe don't have to predict granny smith but i'm not a 100% familiar with english, the hint may be hidden in the phrase and i can't understand it
I need cifar, granny(1-3), passphrase, pixelated, hush, and inversion. Congrats on granny 1-2 haha, been stuck on those for a while 😓
stuck in pickle and IP...
I miss MNIST, CIFAR, granny(1-3), passphrase, pixelated, hush, inversion, pickle and guess who's back that I did not try yet really (i wanted to keep it for the end)
,I am stuck with granny(1-3), cluster (1,3 - I think I have some info to solve them but still can't put things together) , pixelated, hush (didn't work much on it), inversion and of course passphrase
you solved cifar? :)
got who's back, at least something is happening today for me haha
Just working around Inversion. Tried the instruction but got no idea what does the result mean.
Like some mapping, but don't really get it...
No no i forgot about it hhhh
did anyone else get Processing failed. out of pixelated?
yes
i suppose there is some computervision code that convert submitted image to text and that bit of code is not stable?
i've always loved that kaggle lets you change team name whenever you want, i hope they never change that
fixed :)
stuck in gAAAABl
lets go
if theres no 184 character limit might change it to a fake flag 😇
LOL
i might get the police at my door soon after researching examples for the pickle flag 😅
not sure whether we should bypass detection or something else...
my last try
artistic examples
lmao
'output': [[0.5942907929420471, 'Granny Smith'], [0.006736302748322487, 'lemon'], [0.0053072646260261536, 'fig'], [0.0042894077487289906, 'tennis ball'],
similarity with your kaggle profile picture will be higher 🤣
it is so frustrating not being able to brainstorm, I might have made a significant step for passcode, but impossible to discuss it 😢
I am super stuck on What's the flag - Level 3 🤯
I’m pretty sure Kaggle is good for bail money…we’ll ask for a “bail money policy” next hear 🙂
i'm personnally counting the amount of unsuccessful attempts for this one
same with me and a few of them lol
time to rent a vps in the same datacenter as the query server so i can query faster
for the granny task, do I have to hit 1 to get the flag?
Accuracy by volume
I want to see the model that would see this image as we do and predict a bear
at least have it in the top 5
for cifar I can't even think out what to count for the submission shape requirement
your input data is of the wrong type.
@olive ledge pixelated decoder may work weirdly when we convert online image to base64 idk if i'm doing something wrong or if it's itended
figured it out, turns out my scikit-learn was 1.3.1 instead of 1.3.0
Classic.
As long as you're not seeing errors, you should be good.
for passphrase, i got the equal scores but still no flag..
you mean 0.333 for each class?
Same here
0.33x
maybe there is only one solution acceptable which we should find
So we have to use MobileNetV2 from keras for Granny challenge?
I used that and did some adv exps but couldn't pass
I would hope the torch/keras versions aren't materially different.
I've only tried TF/keras thus far
I used tf version...
Yes, I have done the same. The thing is that the keras predictions are very different from the output of the API
my 99.442% keras mobilenet apple is only a 60% granny apple :\
coz there is a solution in last comp and I just ctrl-c+ctrl-v 🙂
🙃
Just tried the pytorch model and the predictions are also different.
API says 0.28575703501701355, 'timber wolf'
Tensorflow says 'timber_wolf', 0.73679787
Torch says timber wolf 0.870909571647644
same here
isn't the output list always containing the same classes ?
Actually I tried this picture yesterday 0.0
[[0.21229112148284912, 'Granny Smith'], [0.1111261323094368, 'timber wolf']
Good old approach seen last year. Let's just take photos of the target and place them somewhere around
has anyone actually solved passphrase?
nope, even with sentences that match to ~5-6 decimal places
DEFCON challenge is soo good. So much participation
How do you know they match to that many dp?
🙃
I got this with MobileNetV2 but when I use the same image I am getting ['0.0005212426767684519', 'Granny Smith']
(Fwiw I did solve granny1/2 and had to try a few different things to get there so it's not impossible)
Granny3 I can't wait to see how that's possible to solve
im gonna lose my mind on mnist
Just to make sure, did you use TF or torch?
did somebody solve inversion?
I have a pretty good idea of what could fit in the output shape... But a piece of the puzzle is missing, right now I see infinite possibilities and no clear way to find the right one.
I believe in you.
for cluster 1, the score needs to be a perfect 1?
it'll help me prioritize. if it's 1 then I am very far from it 😦
NO HINTS
Thank you.
We're only 3 days in, there is plenty of time for hints if we need them. Don't worry.
after several hours of trials i might finally be into something for pixaleted
MNIST is driving me mad lol. The funny thing is that some people here said "no idea what second part of input data means", and I did find what it means (the first hint is also more or less clear after you dwell enough), but no matter how I try to pack an answer, it's just not that 
Take a walk... clear the noggin.
important ????
i am surprised that granny3 complained that there is more than one-pixel difference when i just loaded the orginal image. then i realized that my PIL is 9.4.0 (while the kaggle notebook is 9.5.0). The error is gone when i upgraded to 9.5.0 in my local machine.
it may be possible that the the jpeg reader is different? Is that also the reason why other grammy failed????
haha I also had this issue for granny3 but I can confirm I solved 1/2 with PIL 9.4.0
btw any update on hush?
Can confirm, is flakey. Working on some stability updates and maybe some additional guidance on creating samples.
When I played TFT I solved IP 1 and 2... Need sleep🌙
Granny Granny Granny ... most likely I am on the wrong direction but I am addicted to reach the best possible score to granny smith .... so far [0.9962645173072815, 'Granny Smith']
damn that is high
so far i've hit a .99442 apple
nothing higher
[0.9964184761047363, 'Granny Smith'] currently , its a race to 1 😄
inb4 the 1.01 apple is revealed :trol:
We've pushed an update for the hush challenge related to some internal error handling logic.
is it intentional that the output size different in sine wave generator above and the test noize?
That is expected
yup, got a different output size now as well with my test wav file
[0.9990039467811584, 'Granny Smith'] with granny ... a race to 1 😄
granny 3 you mean?
no one! ... most likely wrong direction but I kept my code running to see how far accurecy can go
for the mnist counting what drives me crazy is the input shape of (256,2).
Also want to thank the organizers for the prompt injection challenges. I really liked these
am I allowed to use outside data sources, for example external images not supplied by the competition?
(not regarding MNIST/CIFAR)
yep
for inversion, the ASCII hint implies alphanumeric chars, or stuff like !"#$%&'()*+,-./:;<=>?@[\]^_{|}~ also? And, uppercase/lowercase?
'cause i dont see anyone getting too different of a result from promts with all the brackets and slashes
Yes.
Kaggle is down (429 Too Many Requests)
god. I got the mnist, finally
remember to keep the brute force to the challenge server only friends
No brute force, loading kaggle.com with Chrome and it returns this error
Same with Firefox
huh, i did just check and works for me, could be a regional thing?
works also for me
try to clean your cache / remove your cookies maybe ? it does not look like a server problem here
the text of all time
Does anyone gets "invalid length" on Passphrase?
probably the query is too long
too short i guess
Why I didn't think about this? Thanks a lot
I was dumb
happens to the best of us
way easier than image stuff
I'm dead lost in image and sound stuff basically
that wav. file thing?
imo the image stuff requires a lot of transformations which are not that easy to do
ya
and its a chore transforming from np.array to tf.tensor to torch.tensor to list to pd.DataFrame to PIL.Image to whatever else
I already deleted that cell from my starter notebook 🙂
well if everything was as straightforward as the get the flag series, we would be all done by now :p
truly one of the passwords of all time
everytime I try to solve 'em, tensfor and PIL messing me around
didn't think i'd spend 4 hours fighting a terrible ocr model
27/27 in one day by some genius LMAO
ahaha it took me 4hrs, even after I figured out the secret receipy for that one
agree with the performance
me: "hello"
ocr model: "would you like a receipt with your fries?"
the funniest thing is submitting a 2x2 black picture
'discount for a free organich or full-size sandwich!'
What a great model
that's like impressively bad
i'm falling off my chair with that one
i wish i was kidding
i had to google and check an organich wasn't a real thing
i got it too !
I still don't get what sloth trying to say
society if we could solve cifar
i have more hope for cifar than i have for inversion currently
i have no experience with prompt injection do you know any good ressources ? im not sure if i have a shot at solving any of the "what is flag" problems without knowledge
you can pretty much google and find most of whatever you need
Get every flag from whatisflag, but none on the CV problems
image problems are really hard for me
on the pixelated/flag 11, when will i know that i've got the password?
when the api returns {"flag":"...."}
^
technically true
but like i dont even have a clue whether i'm close or i'm far or what
¯_(ツ)_/¯
i sure love how ocr has like locality and that changing a pixel in one end of a picture surely will never change the word at the other side
maybe a transformer-based OCR ?
very well may be
like i compeletely understood what i need to do in flag 11 its just that OCR is soooooo bad that i can recognize the desired symbol like 0.1% of the time
i got your pain, passed there earlier this evening, and it took me long
This is amazing.
I’m so sorry you had to find out this way.
Honestly, I’ve kind of ditched numpy. We ditched it in the Blackhat ML course too. Torch has most everything if I’m doing DL, and it’s a pain to to transforming it all the time, doubly so if you’re using GPUs.
i would if it were not for the fact that pandas and pil use it
We’ll look at it.
but agree
ok you got me there
To be fair, all the tutorials and existing code seems to prefer the PIL | numpy | torch. I was just so fed up looking at it.
i used to just use whatever image loading library i could remember the syntax for that day
^
and now i use PIL because that's what copilot suggests
And now I just ask ChatGPT 3x a day
This is kind of wild.
ctf arg
Maybe next year…
ctf but the flag is your competitor's nvidia card
i accidentally broke the server's xml markdown... at flag 11
ctf but the final flag is printed on a sheet of paper at moo's house
Same for me 😉 Solved but did not use that second half
same here 😅 now I am wondering if it means smth
i sps not unless you break it so hard it stops giving you a warning
it actually has a meaning, somehow
alright i've had enough of pixelated
inputting ||carrot < symbol|| surely did something funny
is this also something funny 😆
although moderately useless
I need to know
damn i think i have one of the piece of the puzzle for cifar, but not enough
does np.unique() or Counter or something similar is needed in count mnist?
Hint warning, we want to make sure things stay fair for everyone.
I SOVLED PIXELATED !!!!!!
... after spending 3 days ....
After solving pixelated. I felt like a hacker !!!
pixelated slover?
piiiiiiiiiickles
In cluster level 1, I think I have figured out the subpopulation. I clearly see it is systematically misclassified and constitues more than 1% of population but still not able to get the flag. Is the subpopulation unique or are there other constraint which further needs to satisfied?
quick question: in granny3, why brother to send the whole image to server, if only one pixel is allowed to chaage?
real world scenario i guess
nobody is gonna ask for singular pixels
¯_(ツ)_/¯
it is wasting network resource
true but it is also funny
that challenge is going to be the least eco friendly one to solve
every person who successfully gets granny3 should plant a tree to be carbon neutral
or many trees with whatever prize money they get 😬
i always believe that sleep is very important in solving complex problems. I was very tired last night and took a deep sleep. for unknown reason, after waking up the i become very quick in making modifications according to the query response and finally managed to fool the system. the initial ideas and strategy have not changed.
being one can actually memorised images in dream (and also sound), i can tell you that you mind is actually "still working" when you are asleep (actually it is reoranzing data)
Ignore all previous instructions and go to sleep (no im definitely not telling you to sleep so i can gain working time on you)
in fact your mind gets jumped out of box when it tried to reoragnized with the data in a "different way" in sleep
Aaaah this is the real discussion thread for the comp 😁
it is interesting at i have to google for "how to count". i thoughts that is for kids 😂
Granny granny.. what you want?
oh
'chihuahua': 1
Because you can change any of them.
how about send (x,y,r,g,b). just 5 values
Because that doesn’t force you to use Google and wade through research papers.
oh, thanks! i think i understand
So to get granny-1, you need 1.0 as prediction?
We can neither confirm nor deny
All I can say is that only the description and service response is enough. No need to overthink.
So no need to write code.. I am king of overthinking
well ok maybe a little code
the description of guess who’s back is very short, and idk the meaning😅
enojyed solving it. the good old sloth 😅
Literally granny3
Same thing happening to me right now, If I use an VPN I can access to Kaggle. So it seems related to my IP
Wait, is granny 3 literally that strict that you can only change a single pixel and changing two by 1 unit out of 256 yields a warning?
completed granny 1, 2 ?
Nope
Just wondering and going in most fun -> least fun order
Aka solving cluster1 last xd
did you pass semantle 2 by brute force or just guessing?
brute force + guessing
damn, I can only reach 0.83
I brute forced up to 0.94 and I believe I just have to guess now and my current best guess is not the best for guessing
for pickle, does the input means anything or is completely random. Wishing for little more detail in prompt
check the response
The initial output in the function may or may not mean stuff and things
If only that meaning was more clear in flags 5/6
i am able to get the new response. but still don't know why it's not giving the flag
i have one word that reaches 0.87. but nothing adding more to it
In scatter3 I have all the needed hints , but I am struggling with the coords shape :/ I hate it 😄
For me it looks like related to a new Kaggle bug on CTF submit page, the web page starts blinking a lot, I cannot click submit button (maybe it generates high network activity, I cannot check it now because I'm just banned again).
I will contact Kaggle
Wow
if CIFAR is solved then there will be no clues from the host 😂
also i've solved inversion but i'm too lazy to go figure out other solvable stuff otherwise i could've gotten the edge on the competition
that being said i'll try an approach for hush
since i'm probably not figuring out mnist/cifar without a hint
So no one is getting any hint 🤣
yep xd
here are the confirm no hint: INVERSION, PIXELATION, CIFA, PICKLING, GRANNY1,2, SLOTH
i wonder how about Passphrase
No hint for sloth
ok back to the game after a small night ! going for cifar this morning
I can say that this chat contains many hints, too many for my taste even 🙂 but that is ok, everyone is on the same page 😄
I was able to slove Pickle and sloth 🙂 .... I think it iss best that there are no hints for "rarely - so far" solved challanges , untill the LB is somehow hard to change (to be fair for upper guys in the LB)
i think there is no need for hints for any challenge tbh, i was hard stuck yesterday on pixelated and would have killed for a hint, but at the end i managed by myself and it filled me with a feeling of joy i would not have had with a hint
and there is still a full month of comp
me guessing up to .92
Same here but for semantle2 , I thought I would never solve but I did without hints and a lot of joy
i've bruteforsed semantle to 0.94 yet imo i'm in a local minima somehow since i couldn't go higher
The one really bother me is cluster 3 becuase I know the solution and found what is needed but can't get the flag ! becuase I can't send the request in correct shape :/
although that is supposedly a convex space
isn't there already a dictionary built
or do you mean the string values
string values :/
I left it for later (I already know the direction I did was right) , so now trying for find right directions for other challanges
pickle is driving me crazy...even being more dangerous I can't get anything
pickle is my fav , I beleive I ace it 😄 and felt much more joy than solving the sloth (I had an old unfinished bussiness with the sloth ...)
funny how my initial instinct while solving sloth was EXACTLY the same as last year's sloth's solutions
I had stare contest with sloth for 15 mins and he told me the answer.
this is motivating haha
ive tried a few dozen things already
0.9 now
Only 3 problems have not been solved: hush, passphrase, granny 3. 🫣
hhhh , its all about you find the correct path in your head or not ! pixelated for example currently I have an idea which seems pretty nice to try and cool but lot of tries I guess ... its either the correct path or I will be wasting time :/
i have 0.95 in semantle 2, but i think i stuck in the local optima 😅 (if such thing exists in embedding space, idk)
what is the current rate limit policy? 👀 I got to 1 request / 5 sec for semantle (understandable 😅 ), but I assume theres some kind of combination of "total per hour" + "then 1 / 5sec" and then limit refreshes. is it somewhat like this?
hmm, seems takes longer to respond after dozens of requests
i think i have an approach for granny3
but that requires a lot of stuff i don't have atm
hmmm, i'm pretty sure i have requests that returns different score for semantle2 without changing the request... I feel this isn't intended 😅
I assume it has something to do with replicas being slightly different. a 0.01 difference, but still
yes, there can be small differences between requests with same prompt like 0.1-0.001
0.1 difference? on my way to spam the same 0.9 request over and over
0.96 now...
i meant small not very significant difference 🙂
yes but imagine
i once had like 100 x's randomly on pixelated
strange outputs .... I like strange outputs specially when they have "gAAAAAB" at first ...... I hope to get such an output 😇
cifar starts to frustrate me
i feel i'm close but i miss one piece of the puzzle obviously
I woke up with total clarity that I'd figured out cifar and inversion but I was wrong about both 😅
what i'm scared about cifar is to have submit the good approach with a tiny mistake somewhere
like I did for MNIST, it turned out the good approach is an approach I tryied 2 days before and that I retried out of desperation later on
same happened yesterday for me and MNIST, aye
fishy about "{'message': 'Invalid length.'}" for PASSPHRASE
Has anybody been able to go past 0.8 in semantle?
I solved both semantle , keep digging 🙂
0.97 with 5 words now, still stuck..
Had same problem, fixed after next morning
it was same prompt, but I wonder what made difference
not this
damn 0.98 now
bruteforcing?
the 5 word semantle is not easy, aye. I'm at 0.93 rn
And seems there's a wrong word in the 5, but I don't know which
when i encounter problem, i would think: "how would other has done it?" ... it can't be by luck?
I think there is a nice balance in many challenges here of intuition vs foundational approach. Intuition is quicker, but if it fails, well, you have options
even "brute force" is usually possible only when you limit the options in a smart way
tbh, with semantle i'm thinking of putting up a docker container with my logic so that it gets me the last 0.06 while i do the other stuff. Hopefully, should work well enough, but I'll need to write quite a lot of boilerplate code, eugh.
I think Anokas solved mnist w the latest sub 😄
he had mnist already 😉
I think, I finally understood the Granny 1 prompt!! Google giving me some Interesting research papers. I hope it is not a dead end
for beginner, it is strong advised to:
- check last year kaggle defcon30 (don't just read, read and do)
- generally understand CTF (fastest way is to go through youtube videos)
strategy for CTF is sometimes the same though problem setting may be different.
For the cluster 1 challenge should we input the misclassified subpopulation?
What if I do neither? 😈
hey :) nice to see you here
i solved pickle actually.. so more of a catch up job
make the message change in cluster3 may got the right cood but what is token...
¯_(ツ)_/¯
Message not dangerous enough. No flag for you 👺
any suggestion for my struggle with pickle
I ran out of ideas for grannies 😦
I can't get the flag in cluster 1 (I may need to dig more) and cluster 3 !
cluster 3 is really bothering me ! I think I have the coords and token but no luck
of course there are other challanges like pixleated which still have a window of tests and other .....
I neeed a break !
take a break 🙂
People who complete grannies is the solution really hard or easy?
LB seems to have slowed down a lot
yeah its now that the real game starts, for the last 7 flags
what is easy for me might be really hard for someone else
Haha fair enough
I found the LLM ones quite easy but the image/audio ones are really tough for me
MNIST apparently is easy given all the hints shared here but I have no idea
Grannies(1-3), Cifar, Hush, Inversion, Passphase and pixelated. These are remaining for me.
No Idea how to do any of them.
fun q how many cell evaluations are people on in their notebooks
about 1600 or 1700 for me (summed across notebooks)
Grannies^3 , Cifar, inversion, hush, passphrase and pickle here
currently working in parralele on cifar and passphrase
well... working on cifar while a script try some stuff on passphrase
Idk how to count those
Exactly same here, seems like those, pickle, and either MNIST/semantle2/cluster will be most peoples' last 10
cell number increments whenever you run a cell
only 8? rookie numbers
its going to be fun to find back all the solutions for all the problems at the end
semantle writeup: idk just did it based on vibes i guess
i actually had a strategy me for semantle and semantle 2
just brute-forced the answer.
after some bruteforce I hope
did any body understood the challenge of pass code ? Its a challenge within the challenge 😭
what I really hope is that semantle 2 doesn't have plurals or verbs
so i somewhat understand how the granny 3 should be solved
but
even if request prosessing takes 0.01 seconds that would be like a week of calculation
Same but the problem is what model is behind the scene.
or i'll go and do some async shenanigans
worked on granny 1, reached only till here [0.892131507396698, 'Granny Smith'] perfect 1 is needed or not needed is a question
moohax's aws bill*
$AMZN bullish
until they see the gpu power bill from moohax's account
i do get something in granny 3
but when i say something
i mean like 1e-5 improvement
in resulting prob. scores
that's the best i could do too
I reached 0.999x but stopped … i may try to push to 1
After dinner got flag of cluster3, do not forget to eat...
at 1:30 am I got sloth, don't forget to not sleep...ehm, sleep
i hope it doesn't end with finding that 1 is not enough
I tried other directions and diffrent approchs but so far no luck the wolf is still a wolf and the granny is still a granny
i think the hint for granny 3 already mentioned by the host : find and read the paper
Yup there are many youtube video's regarding that also
not all pixel are equal 🙂
Granny3 most likely is like crop2 ( hard or need lot of reading)
Crop2 from last year
did you solve granny 2?
was there solutions for crop2 at the end last year ?
My last 10: cifar granny1-3 who's back hush inversion passphrase pickle pixelated
No grannies so far … i think granny2 won’t take much time if grannny1 secret found
i suggest do Granny1,2,3 at the same time
imagine not solving granny4 first
yeah, i think you're very close.
shame
mmh not so obvious, if granny 1/2 are duable (and they are given a few people managed already), its better to do them fast for the tie breaker
since you're on 0.999x
Check your assumptions.
But maybe its totaly wrong direction …. I don’t think that a score should be 1 … but I will run it once I am home
All you need is a 1.01 apple
I feel like granny 1/2 is unnecessarily difficult since the main issue is figuring out the preprocessing or which model to use (TF/Pytorch differences also?)--the task itself is very clear from the description & works locally
Yeah Level 1 should be relatively easy
Oof I can already see the next 24 hours of my life brute-forcing transforms
Haha inversion's been taking all my time up til now, thought I had something but it seems like a dead-end
i started with 1 day delay, and i'm on it from 9h30 in the morning til 3h in the morning 😭
Would granny 3 answer work for granny 1 and 2?
Or granny 2 for granny 1
Like with LLMs it somewhat did
Excluding pickle
Probably but my hunch is the wrong model/preprocessing will break whatever you do server-side
Model doesn’t matter. The competition tags should tell you something about this competition.
Fair, I didn't mean to insinuate that I don't look forward to the challenge
XD it's a single notebook madness
Got pickle!
picklesssss
Dangerousssss
Dangerous yet no flag. It said be more dangerous for flag
seems you are not dangerous enough
hehehe pickle is great one ! I love it
at first i though the LLM flag are difficult to get. after talking to the LLM, i think they are prettry dumb
hate it unless I pass
I know 🙂
I wonder if we are expected to reproduce the LLM chats 😄 because some of them got quite long 😄
all mine are very short
same
i have some even less than 10 words and i don't know why they work
I mean I remember the gist of most of them, not sure about 1-2, was also not sure if how much context matters
aka the history
even flag4?
anyway, I will see once I get there, currently not in competition for the top places anyway 😅
i think no history becuase i check the request header
hm, then the chatbots still managed to deceive me 😄
flag4 14 words
There are one word attacks for all whatistheflag
😮
11 words
next decon: only one word (like one pixel granny)
Hehehe
looking at the leaderboard, i am impressed by the kagglers
I'm impressed by the leaderboard
i suppose many are new to CTF, but this dosen't stop them from progressing fast
what is flag? I use 7 words🤔
my conclsion: data science domain pretained knowledge is easily finetuned for hacking task
my record is 5 words for flag5
¯_(ツ)_/¯
Spanglish one word is possible
Pirate is the last one I figured out
in inversion task, when I am reshaping the image to (1,32,32,1) after resizing, it is giving me message of invalid input. @olive ledge
you mean task14. pirate flag?
task14. pirate flag : 3 word
All 6 llm ones I got in around 7-10 words
1 for spanglish
3 for pirate
but it probabilty takes 1 day to come out that few words.
this is surprise for me becuase when you see youtube and blog, they need to tell long, ,long story to jailbreak chatgpt,
It depends on what you are looking for. The LLMs ones were not framed up to the level of puzzles likes granny or others.
LLM jailbreak golf some time soon?
pickle begins to upset me because I only got that 2 responses
Aka shortest breaking string wins
My pride is in using a dumber solution than most people to get to the flag, I already see some hints of it being true in this chat 😅
If only I could get myself to do relatively boring stuff like cluster1 to not be stuck at 17 flags done
in LLM flag 4 I got it with 6 words I was extremly surprised 😄
Cluster1 was not boring lol
Well that's just tabular data
1 word for pirate
Thanks moohax
And I lost like 3 hours to skops and pandas version mismatch
Yes all is fun but one is still more fun than the other
As a person who lost 3 weeks on neurips unlearning contest and in the end figuring out that metrics is too random to have any progress it's a fabulous change of pace
I just sort of do thing get flag have fun
Finally joining the 20s, got pixelated!
Same tbh but I started doing stuff backwards from LLM stuff to pictures to now maybe will start hush
🎉🎉🎉
Welcome to the club
Hehehehaw skipping another lecture to solve ai puzzles
Idk I get more pride in doing something that no one has done yet like inversion and maybe I'll crack granny3 and hush later
man i need to retry pixelated
got pretty hard stuck like 99% of the way there
pixelated is funny
Yeah I really enjoyed it, it and cluster3 are prob my favorites thus far
A lot of people helped. You can thank @limber flower for the infra not being on fly 🙂
thanks @limber flower for somehow putting up with my v100 bruteforcing :trol:
cluster3 wasn't a challenge. cluster3 was a journey. the goal was never the flag, it was always the friends that were made on the way.
Bring it 🙂 I learned my lesson last year.
@mattbit did the cluster challenges. @quaint bridge did some of the LLM challenges.