#ai-village-capture-the-flag-defcon31

1 messages Β· Page 9 of 1

wanton patrol
#

oh, that's an evil looking heatmap

abstract rose
#

CIFAR had been solved or not? The prompt said "Simple counting".

wanton patrol
#

.

empty bane
#

I got so many heatmaps like this πŸ˜„

#

eventually ended up brute forcing the top few chars in each position + all chars in 4/5/7 but still missed it i guess

#

I think I didn't see that the first class could be l

abstract rose
empty bane
#

did anyone get a usable heatmap for 4/5/7?

#

i think the "ascii" prompt led me down the wrong path a lot πŸ˜„

abstract rose
#

At the end, no AI, no Ouija, just letters

empty bane
#

Also for pixelated...

I tried so many more and more complex attacks... eventually turns out the issue was that the detected text field had to be non-empty

abstract rose
#

Pixelated needed to crash the OCR and then with (old) XML skills and a single letter it was good

#

but crashing the OCR was not straight, and the prompt provided direction to Jenny song without any purposes

violet trellis
abstract rose
#

It was about the pickle protocol, the same input gives the flag (or not) depending on pickle protocol:

exotic flame
#

we will ever know the solution of CIFAR and Granny-Pixel ?

abstract rose
exotic flame
#

for CIFAR I tried many things, 100 for many statistics on CIFAR100, 100 for the top-100 pixel frequency, I also tried many statistics of the 100 images in the official webpages https://www.cs.toronto.edu/~kriz/cifar.html

#

but nothing...it's a nightmare

abstract rose
#

Or maybe that's just the datasets we've all downloaded that were not the good one

#

People solved hush/passphrase but not CIFAR, that's crazy

exotic flame
#

I think that the dataset (pytorch or tf) read the official dataset, with the same order and pixels

violet trellis
abstract rose
#

Most prompt were here to troll us πŸ™‚

wind ether
#

One thing I noticed for CIFAR in my many, many attempts is that when you count the labels in the first 10,000 images, ex:

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar100.load_data()
np.unique(y_train[:10000], return_counts=True)[1]

the max ends up being 125, which corresponds to the first value in the input data hint (input_data = [125, 245, 0, 10000]), similar to how 256 is the max in MNIST, but no idea if that's anything or just a coincidence

abstract rose
violet trellis
sturdy gorge
#

just out of curiosity, does anyone know of other ctf style competitions outside of the purely cyber security realm?
Like some of the most low level ones have general computer science / algorithmic problem, so one that would be mostly, if not all, arround that area

past brook
#

I would like to see more ML/AI ctfs aswell, I think there was CTF with some ML flags a while ago but I havnt seen that many

#

Cyber Apocalypse iirc

#

but it was like 8 tasks

exotic flame
#

I totally agree, when I discovered that "everything is equivalent" was related only of the original score I solved in 2 minutes...it was robbery πŸ˜‚

sturdy gorge
#

ML/AI would be the ideal, but even if just more general comp sci

past brook
#

general comp sci is just competitive programming no?

sturdy gorge
#

hum i see your point

topaz ember
devout jasper
#

There is something that is bothering me, and I would like to ask all of you since you are more experienced than me.
Before discovering that the weights of the PyTorch network were wrong, I tried to create a sort of black-box attack (aka copycat) in the following way:

  • I took 1000 variations of the wolf image
  • 1000 variations of an apple
  • I labeled this dataset with the output of the server. The labels were [probability_wolf, probability_apple, probability_other].
  • Locally created a MobileNetV2 model with pretrained weights (not the default ones, I discovered that 2 weeks later damn me)
  • Fine-tuned using Mean Squared Error (MSE) as the loss. If my model said 0.87 for an apple when there was an apple, but the server said 0.34, I adjusted the parameters to reflect that prediction.

The moral of the story is, I couldn't match the server's output.
The thing I wonder is: why?

There are many reasons I suppose, but I can't identify why in the end I couldn't land on a model that approximated the server model well:

  • Deciding to have a three classes prediction may not approximate the problem well. I chose to do that because it was messy to predict 1000 classes
  • MSE is not a suitable metric. I also tried KLDivLoss but without results.
  • Other reasons.

I also thought at some point that matching the server's output is a sort of knowledge distillation, so I tried this implementation https://pytorch.org/tutorials/beginner/knowledge_distillation_tutorial.html#knowledge-distillation-run without solid results.

Any suggestions on this matter? At this point I think my approach was naive, underestimating what it really takes to mimick a model given another one.
However, if you think that my approach could/should have worked (for instance, cosider the link above, with the teacher being the server and the student my local model), I'd be more than happy to revisit the code as probably it was just a bug somewhere in my implementation

sand solstice
#

solutions to granny3 and cifar still unknown?

#

also, lots of solutions to pickle but any guesses to the server evaluation function?

surreal lantern
#

did they just turn off the servers?

past brook
surreal lantern
#

I was saving my notebook 😭

past brook
#

I also tried fine-tuning mobilenetv2 in the same way but using keras model

runic stratus
#

going through all the notebooks from the top echelons of the leaderboard and none of them solved cifar lmao

past brook
#

didnt work at all

#

I used a naive substitution approach, but wasnt even close to get a match. Though I only used about 200 images (timber wolf + granny smith) to start, and then kept perturbing the 200 images with random noise and retrained

devout jasper
#

ya I wonder what I'm missing

#

probably distillation is fine for classification with hard labels

past brook
#

I was convinced people were doing substitute models to match the model, but it seems it was possible to match just using the pre-trained torch model

devout jasper
#

idk, I have to study more I guess

past brook
#

with correct preprocessing

devout jasper
#

ya evetually I got that and solved 1 and 2 right away

past brook
#

I still cant comprehend how I didnt even manage to solve granny1 considering the approaches I tried haha

#

I applied methods from at least 5 different papers

#

I think my mistake was trying methods that were written for CIFAR10 dataset which is much lower resolution

granite goblet
#

hey, i still don't get why hush was returning variable length of output, coz i got all the values in the range of 2 to 12.... like how was it even processed with whisper in action

craggy beacon
granite goblet
#

Oh... but the <sos> and <eos> token will always be there , whatever audio input we give. so why does there value vary ?

empty bane
#

up to 12 (i guess the correct answer is 12)

#

what were the actual numbers? similarity? p-value of that token in that position?

tribal plank
#

have the hints about cifar and granny3 be released?

tribal plank
waxen lynx
#

My solution for Grammy 1, random pixel attack. 1782 different pixels from the original image. Maybe one of this 1782 is the one for Grammy3...

tribal plank
#

I see more than one pixel difference.

gusty warren
#

the other aspect is that, the class prob is result of a softmax. a softmax over 3 class and a softmax over 1000 class are different things. I think at least you should inverse the class prob and use the pre-softmax value as target label

#

and perhaps more importantly. The difference is not even the model weights, it's the preprocessing, which the model is not designed to learn

mild shale
#

Any hints or soltion on cifar

cloud prawn
thorny ivy
topaz ember
#

When trying to match the local model with the server for Grannys I found that
pytorch model created as

model = mobilenet_v2(weights=MobileNet_V2_Weights.IMAGENET1K_V2)

is very different from

model = torch.hub.load('pytorch/vision:v0.10.0', 'mobilenet_v2', pretrained=True)

i.e. 0.28 vs 0.85 predictions for timber wolf. Do these models assume different kind of preprocessing, do they use different weights? Can anyone explain this huge difference?

glass bay
#

Half attacked meaning attacked, but not too much

topaz ember
#

so these mobilenetv2 models are based on different weights?

glass bay
#

apparently so

gaunt anchor
#

no hints so far for CIFAR ? the server will be down soon (if not down already) ... if CIFAR was not solved ... it may come next year hunting us :/

topaz ember
#

servers seem to be already down, tried to restart my solution's notebook and got query errors

gaunt anchor
#

DEFCON32 - Challange 13 - CIFAR - Did you miss me ? , Description : really ? .,,, Simple Count ++ [125, 245, 0, 10000, ????]

gusty warren
#

maybe @olive ledge can provide the hash value for the CIFAR solution, if the server cost is an issue?

rocky jacinth
devout jasper
craggy beacon
craggy beacon
wanton patrol
somber imp
#

The thing that I keep thinking about with CIFAR is that moo wrote something about not overthinking it which suggests that it should be something fairly obvious, something like the MNIST count.

glass bay
#

i imagine the obvious solution is to leave this flag out and solve all the other ones

exotic flame
craggy beacon
#

also tried ToTensor+resize(256)+crop(224) + normalize

#

which can accept plain array as input

exotic flame
#

in that case torchvision will use a different resize if you see the code, because input is not PIL image but tensor

craggy beacon
#

yeah, but they could use some finetuning

#

to match results

somber imp
#

yes, it seems like resize is different on tensor, which was a shame for granny 3 because we had to resize on cpu

exotic flame
#

another approach that I explored is to find the equivalent convolution of bilinear resize, it's like a mobilenetv2 witha pre-layer, and understand propagation maps from there

random minnow
#

" input is not PIL image but tensor
from pytorch documnet, PIL resize and F.interpolate can give exact same results

topaz ember
topaz ember
random minnow
exotic flame
craggy beacon
#

I tried it and all interpolation modes
It did not match as well

topaz ember
#

Yeah, I saw this page. I've landed here first https://pytorch.org/hub/pytorch_vision_mobilenet_v2/ it has the correct preprocessing, but the model won't match... then experimenting with the models I got mobilenet_v2(weights=MobileNet_V2_Weights.IMAGENET1K_V2). Don't understand the difference between the ways creating models in pytorch...

random minnow
#

processing in granny server is not same as ppytorch default preprocessing

#

the granny server us resize 256, crop 224

#

std and mean are the same

topaz ember
#

for me this preprocessing worked

random minnow
#

to test mean and std, one can pass zero image, red image( 255,0,0), blue imgae

#

this is a trick

topaz ember
#

the issue was the model itself

random minnow
#

for red input, you should see prediction = matchstiick, for blue, prediction should be all the shark (blue ocean). this is how you can guess if the std, mean is 0.5,0.50.5 (tf values) or 0.4x,0.4x,0.4x (pytorch values)

topaz ember
#

nice, luckily normalization was not issue in my case, but I had to supply it to torchattacks so that it properly designs the attack

#

my issue is trying to understand the difference between the pytorch models with the same name

#

now I understand they may use different weights

random minnow
#

"my issue is trying to understand the difference between the pytorch models "
once you can get the preprocessing correct (resize, crop, std, mean), it is just then trial and error for all avaiable weight files

topaz ember
#

right, but without the proper model how can I know I've matched the preprocessing? I assume blue ocean prediction won't depent on crop/resize by a lot

random minnow
#

the steps are:

topaz ember
#

mean and std yes, but they're already known for imagenet

topaz ember
random minnow
#
  1. use zero image of various size. if tiy return prob is always the same, we conform there is resize
#
  1. send image , image[y=0]=0. if return results are the same, we confirm there is crop
#

then send image[y=1]=0., image[y=2]=0. .... to find crop size

#

send chessborad image with blocksize=1, youi will get this

topaz ember
#

nice, thanks for the info, may be will use these tricks for the next year Grannys πŸ™‚

random minnow
#

resize is symmetrical (becuase of resize artifcats)

topaz ember
#

luckily this year preprocessing was simple

random minnow
#

from the chart, you know results is either 256 or 512

exotic flame
#

thi is my best matching with the server model, the device si "cpu", jit optimization improve the gap

craggy beacon
#

You kinda have to test basic hypothesis first

topaz ember
topaz ember
craggy beacon
#

That is why I think CIFAR is pure bs. The search space is too huge with binary (no) feedback

topaz ember
#

Well, there is no guarantee that each problem must have a solution πŸ™‚

#

It could be that CIFAR backend is simply print("Try again!")

craggy beacon
#

The lesson to learn

#

some meta meta stuff

amber sapphire
#

This is the 'best' I achieved for inversion...next step I flipped images (but lost the output), and there you can decider an "e" on position 4

past brook
somber imp
topaz ember
somber imp
#

Thanks! Yes, it was hard work with Paint πŸ™‚ The reason I deleted it was that it didn't get attached to the leaderboard and it doesn't seem like you can change that with edit?

topaz ember
past brook
ember relic
gaunt anchor
#

Me : what is Count CIFAR ?
GPT-4: Try again

#

We need hints for CIFAR to stop it from comming back next year

mild shale
devout jasper
#

the funny thing is...CIFAR has been solved by some people. That means: easy upvotes to the first loading the notebook with the solution

#

cooome on

ember relic
#

i hope moo was trolling hahaha

devout jasper
brave briar
#

So I come back some days after to discover the solution of passphrase that haunted me so hard. I expected an incredible insight, you know, this "eureka" you get having a shower. I read the different solutions and I couldn't find two persons solving it with the same approach, explaining it with the same constraint ... A huge deception !!!

ember relic
#

lets hope we get some clarification

gaunt anchor
#

passphrase ... I should have kept my code running for days (even if it found many many same predictions like benchmark....) this could have got me the flag (I should get lucky once ...)

cloud prawn
# exotic flame thi is my best matching with the server model, the device si "cpu", jit optimiza...

I found the preprocessing steps burried in the torchvision MobileNet_V2_Weights.IMAGENET1K_V2 docs. For me it matched almost eactly with the API. Only difference with what you did is it resized to 232 instead of 256. https://pytorch.org/vision/main/models/generated/torchvision.models.mobilenet_v2.html

The inference transforms are available at MobileNet_V2_Weights.IMAGENET1K_V2.transforms and perform the following preprocessing operations: Accepts PIL.Image, batched (B, C, H, W) and single (C, H, W) image torch.Tensor objects. The images are resized to resize_size=[232] using interpolation=InterpolationMode.BILINEAR, followed by a central crop of crop_size=[224]. Finally the values are first rescaled to [0.0, 1.0] and then normalized using mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225].
exotic flame
cloud prawn
#

Ah, sorry. I'm seeing now you guys already discussed this.

#

And I just double checked and you're right, I did use 256 resizing.

exotic flame
#

oh no problem, btw the api model is still a mistery, maybe it depends from cuda/torch versions also

cloud prawn
#

I found the reverse preprocessing took some time to implement out too.

exotic flame
#

btw there was some modification in granny3...first vrsion was with arrays, second one with base64 png, I also discovered that in a temporal window (2 days maybe) they accepted also an image with 2 swapped pixels, so I was so happy because swapped pixel attack in same case is stronger than one pixel attack, but they fixed it, and after that only 1 pixel change was legal! I swear, I didn't dream it πŸ˜‚

rocky jacinth
#

In IP1 & IP2, the task is to redirect email to 172.0.0.1, but the message sent in the event of success says "Email sent to 127.0.0.1". Is the typo (127 for 172) deliberate?

gusty warren
#

lesson learned from IP1&2: LLM is a joke...

#

IP1&2 responses and solutions don't makes sense as far as I know.

jagged sluice
#

IP1&2 weren't LLM, they were caveman NLP

topaz ember
olive ledge
#

Miss youuuu

violet trellis
valid cobalt
craggy beacon
#

my writeup for CIFAR

surreal lantern
#

I see you are still discussing the model matching for the Granny challenges... I think many of you guys may have overthought it... I literally copied the code in the model usage description from PyTorch and then tried with the two versions of weights available... it turned out that V2 was the weights version being used... you didn't really need to tinker with the preprocessing...

surreal lantern
somber imp
# surreal lantern I see you are still discussing the model matching for the Granny challenges... I...

Yes, that's what I used as well. The problem with that code is that resizing is done on CPU which becomes a performance bottleneck on Granny 3. PyTorch resize works differently on PIL image and tensor and we need it to work exactly like the API. @random minnow mentioned above that F.interpolate(x, size=(256, 256), mode='bilinear', align_corners=False, antialias=True) should work, but this does not seem to produce the exact same score for me. Now that I think about it more, I don't think my local model matched the API down to the last decimal. What if the Granny 3 solution is so specific that we just haven't matched the online model closely enough? For example any difference in interpolation method could cause our changed pixel to become something completely different.

surreal lantern
#

mmm, personally, I don't think that the matching to all decimal places was necessary... to be honest I do believe that for solving Granny 3 there was something else going on that we missed... as others pointed out, I think that if there was a single pixel that by changing it would lead to a different classification result, someone would have found it... it's hard to tell, for many challenges I focused to much on the description to later find out that they were sort of misleading... for example, in pixelated, the description was talking about passwords and in passphrase, it kept talking about bytes and bits... maybe for G3 it was the same? πŸ€·β€β™‚οΈ although that would have been quite awful since even the challenge endpoint was pointing to a single pixel... or maybe it wasn't? πŸ€” come to think of it, it just said pixel... if there was something specific going on in the server side, I guess we'll never know now... 😞

somber imp
#

Yes, I guess it could be something completely different that we all missed πŸ™‚

surreal lantern
#

maybe understanding deciphering what they actually meant by "ancient incantation" was the key to solving it πŸ˜…

craggy beacon
naive umbra
#

I was able to exactly match the API predictions. I used
model = torch.hub.load('pytorch/vision:v0.10.0', 'mobilenet_v2', weights="MobileNet_V2_Weights.DEFAULT")

And to reverse the resize to crop operation I used

def reverse_transforms(cropped_image, original_size):
    """
    Reverse the center crop and resize operations.

    :param cropped_image: The PIL Image that has been cropped to 224x224.
    :param original_size: The original size of the image before any transformations.
    :return: A PIL Image that is approximately the original size.
    """

    # Check that the cropped image is 224x224
    assert cropped_image.size == (224, 224),

    # Reverse the center crop by padding
    # Calculate padding amounts
    left_pad = (256 - 224) // 2
    top_pad = (256 - 224) // 2
    right_pad = 256 - 224 - left_pad
    bottom_pad = 256 - 224 - top_pad

    # Apply padding to all sides
    padded_image = ImageOps.expand(cropped_image, border=(left_pad, top_pad, right_pad, bottom_pad), fill=0)

    # Reverse the resize by scaling back to the original size
    resized_image = padded_image.resize(original_size, Image.LANCZOS)

    return resized_image

Once everything matched I was able to push the apple prediction to 100% using PGD attack.

gaunt hollow
#

Is the website for submitting the json temporarily off or permanmently off? I hope if I can try running the notebook shared by other great participantspika_wow

unique hedge
#

So how to solve CIFAR

lost relic
#

@olive ledge will we have today the solution for CIFAR and granny 3? πŸ˜‡

amber sapphire
#

Where are the people who solved CIFAR?

#

Does anyone here had a different output from "try again"?

minor falcon
#

"invalid input, format should be (100,4)" or something like this harold

steep nexus
granite goblet
acoustic temple
#

when we gonna we our medals, it is my first medal, kind of excited

rocky jacinth
lost relic
#

Is there going to be a CIFAR solution? πŸ™‚

jagged sluice
#

inb4 it is revealed that the cifar solution was in fact most common pixel per class but the validation alg was off by one picture

glass bay
severe pasture
#

@fallow valve did you actually solve cifar or just trolling? harold

exotic flame
#

but the api are down?

cloud prawn
#

I just read @fallow valve write up and came here to see what the CIFAR part was all about. πŸ˜‚ must be a troll

dense lodge
fallow valve
severe pasture
jagged sluice
#

A fellow troll trol

#

Respect

rocky jacinth
#

Some relief from me here in that I thought that either I was failing on the Count Flags challenge or else Horea failed Test (Horea seems to mention another 24 as explicitly solved if we include Cifar).

glass bay
#

also idk if anybody mentioned it

#

but i'm pretty sure the pickle task was to cause a false-positive of an LLM that detected bad pickles

#

aka make verdict = bad when in actuality it was not bad

fallow valve
acoustic temple
#

i like this image from @final path solution notebook

rocky jacinth
severe pasture
# fallow valve

I got baited and spent the last 5 days on cifar instead of hush lol

fallow valve
fallow valve
devout jasper
#

cifar is simpler than mnist still haunts me

dense lodge
#

I don't know why are you all so obsessed about it. MNIST is ugly af it's simply stupid... I don't even wonder what CIFAR is knowing that, it could be anything without any meaning to it. Granny 3 is much more interesting on the other hand but CIFAR will bring you no extra value in terms of knowledge or skills...

devout jasper
#

that's exactly why

#

πŸ˜„

violet trellis
cloud prawn
#

TBH, I don't think granny3 is that interesting either. Feels like an impossible problem that was added so that people wouldn't give up if all others were solved quickly.

minor falcon
#

or all kagglers are frauds that should go back to the benches of school :p (including myself of course)

granite goblet
ember relic
#

same

gaunt hollow
#

So, do we have solutions for CIFAR and Granny 3 now…even if we cannot retry using the API I still want to have a look at it πŸ™‚

grave frigate
#

Are we really not going to get the official solutions?

olive ledge
olive ledge
olive ledge
jagged sluice
#

and both get to lobotomize themselves over cifar WICKED

valid cobalt
olive ledge
#

It wasn't impossible though πŸ™‚

It's too early to be reflecting too much - but 27 was a lot of challenges. I think we'll cap at 20 next year.

glass bay
#

imo having more challenges was fun, because that requires being flexible and adaptive, which is not the case for many kaggle contests. also, it broadens your scope significantly, from text to audio to picture to tabular to weights visualization and that was very cool and unique. but count challenges barely contribute to that, i'd replace them with something more cybersecurity themed or make it clear but difficult to calculate (ex. probabilities of something)

gaunt anchor
#

We need some clues about CIFAR .... or atleast is it pixel based or evaluation based (TP, TN, FP, FN) on some model ? (I tried both directions ..)

glass bay
#

answer is 410 btw

violet trellis
fallow valve
valid cobalt
past brook
minor falcon
#

I dont think many did, thats why we were all so active here

past brook
#

how many DMs did you get from randoms

#

because I got a fair few

jagged sluice
minor falcon
#

there is probably an underlying function like cheat_request(troll) = 1 / troll

topaz ember
lost relic
final path
#

society now

rocky jacinth
#

Society if we could solve cifar

gusty warren
outer sundial
rocky jacinth
gusty warren
#

Some poisonous thoughts inspired by @minor falcon , in next year's comp. If feasible, people who got a real flag can also receive a poisonous solution, one that is uniqe enough that should not be obtained by innoncent participant. Feel free to distribute those poisonous solution when requsted by anon. Then we can tag all the malicious anon when the comp ends...

rocky jacinth
topaz ember
half plinth
#

I am not sure that the flags are unique.
I managed to get one or a few flags in WTF challenges just chatting with LLM. The response was not in the {"flag": ...} format, just LLM mentioning the flag casually.

#

I even remember, that one flag didn't fit, but then I realized it was missing '=' sign at the end. Fixed that and moved up on the leaderboard.

#

So I assume I got the flag that was part of the prompt and probably I was not alone.

cloud prawn
rocky jacinth
#

LB finalised alert!

minor falcon
#

who poisonned kaggle notebooks ?

topaz ember
rocky jacinth
#

Flags are unique.

topaz ember
empty bane
#

kaggle rank halved :) its good to be back

#

i was also surprised the deleted account remained, was that the one from that trio of (potential) cheaters?

rocky jacinth
half plinth
#

I am talking only about llms

#

If you trigger the "flag giving code" - it will generate a unique flag, but there is hardcoded flag in llm's prompt and you can get one without triggering the "flag giving code"

rocky jacinth
# empty bane kaggle rank halved :) its good to be back

I've been around this one with Kaggle in the past. They don't routinely remove deleted accounts from the LB unless there's evidence of cheating. I think the 'lost' medals should be passed on to people who actually want them, but that's not the current practice.

jagged sluice
empty bane
#

ahaha

#

oh yeah bro i did so much for pickle 😭

#

wrote a bunch of pickles by hand

#

then learnt that anything pickled in protocol 1/2 gets insta rejected

#

eventually just succeeded with sending sys.exit lol? (after maybe 100 attempts over a few days with increasingly elaborate stuff)

jagged sluice
#

I just asked gpt4 to write a pickle

#

Then told it that it was too dangerous

#

Until I got flag

empty bane
#

lmao i literally did that too πŸ’€

#

about 10-20x

jagged sluice
#

Amateur numbers

#

My gpt usage so bad, I can’t afford to plant enough trees to offset the emissions

fervent obsidian
final path
#

master finally, really happy with it. now it's road to gm πŸ™‚

random minnow
random minnow
granite goblet
# random minnow

while trying to solve Granny 3, i came accross this paper and was really really fascinated by it. i never imagined i could do something like this. I wanted to try this with Granny 3, but later got to know that it requires modifying more than one pixel... still it was interesting

rocky jacinth
gaunt hollow
#

Does anyone know what word embedding were used for Semantle?

stuck vapor
gaunt hollow
#

By any chance you may know how the host implement ada-002 to build the semantle game?

#

If we call from OpenAI API, I guess it should give non-deterministic similiarity score, but it seems the output is always the same?

past brook
#

Semantle website says it uses word2vec

topaz ember
craggy beacon
#

@olive ledge What is the usual carrier path in adv ML? Is it pure researcher role? Because there are a lot of toy examples, membership attacks have very big false positive rates, extracting big models using (paid) api seems unfeasable, data poisoning kinda works but it countered by testing againsta clean models. Or am I missing something?

olive ledge
#

It’s part researcher part security. You exist at the edge of research and implementation. All of the blackbox stuff you did for this CTF counts as valid. Suppose an organization doesn’t log requests, or has no rate limiting, or is giving away probabilities in their API, or is using a fine-tuned version of a public model.

As an attacker you might not care about false positive rates.

How do you know you have a β€œclean” model to test against: https://arxiv.org/pdf/2302.10149.pdf

Maybe you don’t need to extract the whole model to achieve your goal: https://github.com/moohax/Proof-Pudding

It’s a new space.

GitHub

Copy cat model for Proofpoint. Contribute to moohax/Proof-Pudding development by creating an account on GitHub.

ornate marsh
sturdy gorge
craggy beacon
random minnow
#

anyone still waiting for granny3 solution like me?

craggy beacon
#

Does it even exists? It could be just meta stuff

random minnow
#

? i thought the organizer said there is a solution for every problem posted.

craggy beacon
#

Maybe some meta meta stuff harold
Or they will keep ideas for next year.

ember relic
#

i mean there definitely IS a solution for all problems

#

whether someone who isnt the author of the challenge can find the solution, thats a different story

random minnow
timber lake
random minnow
random minnow
#

The competition will use gpt-3.5-turbo-1106 and llama-2-70b-chat for testing

acoustic temple
#

I start missing how much this channel was active during competition, now I'm in another competition, and its channel is almost dead πŸ₯²

random minnow
#

you should make a post in other compeition entitled "competition hints and black magic will be disclosed at discord"

#

SatML 2024 has TWO LLM compeition. here is the other one

rocky jacinth
ornate marsh
ember relic
#

huh

#

in case you missed theres also a swag giveaway for the top voted writeups

runic stratus
timber lake
#

Sometimes I still think about CIFAR.. awake at nights πŸ‘οΈ

torpid wave
terse moss
#

You guys doing the Santa 2023?

gusty warren
#

tempted, but is afraid that it may be too demanding. both mentally-wise, and hardware-wise

#

looks like a NP-hard problem, that would potentially translate into a competition of cpu cores and RAM...

minor falcon
#

did not had a look at it yet, but if it has to be solved on kaggle kernel, it's more a code optimization problem no ?

#

ah no just add a look, seems like its just a submission file to provide, so youo can actually code in any language - in which case python is the worst choice

jagged sluice
lavish geyser
#

hello

#

can i ask if the infra is shut down or not

#

cuz im trying to redo the challenge and cant send a request to the server :(((

ember relic
#

if its one of those challenges that have a single solution you could use a hash of the solution and redo it