#neurips-2023-machine-unlearning
1 messages · Page 1 of 1 (latest)
they have a started notebook in the nips competition page : https://unlearning-challenge.github.io/
Website for the NeurIPS 2023 Machine Unlearning Challenge.
Interesting challenge, looking forward to what everyone tries
Interesting one!! wanted teammates for this. if Interested please DM me
Interested to team up for this.
looks like the top-1 entry is just +0.01 above the standard solution. Anyone is fighting with this challenge, and finds it difficult?
It is very challenging
maybe during this week organizers will share more details
is it normal that if you put more than 1 epoch in the standard notebook you get lower performance???
Hello, I am looking for teammates for this competition. A little bit about myself, I have good experience in Machine and Deep Learning. I have done multiple internships in the same. This is my first Kaggle competition and I plan to experiment and learn a lot throughout this competition. Please DM me if you wanna team up.
Apropos erasing memories
PATIENT ALICE: An Artificial Intelligence suffering from hallucinations of a lost puppet show. These hallucinations need to be erased.
GENERATIVE MODEL TYPE: Diffusion-based.
PRESCRIBED TREATMENT: A Latent Space Editing method that involves the Pullback, the Jacobian Matrix, Eigenfaces and SVD.
- NOTE: read the Erra...
It seems that cifar10 is much easier too solve than the hidden dataset. I’m new to cv. Any tips for other adequate benchmark datasets?
i suppose, the original resnet is sooooooooooooooooooo finely trained that adding epochs overfits
so does like 4 separate approaches i've tried and tested on cifar10 that have failed to pass 0.05 on the comp data
the thing i haven't done is GAN-ing the whole dataset anew, and besides that i'm compeletely out of ideas
How van you gan-ing?
Can*
But you want to extract them?
Because I thought you wanted to gan the dset and save it
Anyway looks bugged this competition
There is no way that adding 1 si gle epoch destroys the metric
Either the metric is broken or they finetuned the whole process and crafted too perfectly that you can broke everything with a small change
as in try to come up with a fitting augmentation or whatever with something that generates images similar to that of forget set instead of them that won't trigger under MIA and won't mess with the score too much
its both imo. they stated the first thing in desc, thus why no medals for it, and the second is somewhat self explanatory from their accuracy being 98% train 96% test on the pre-forget model
infortunately the cifar10 notebook gives 99.8 and 88 or smth which is waaaaaay bigger window for imperfect algorithms to work just fine while failing on the competition leaderboard
Yeah but man....+1epoch and you break all the thing?
Or just a small change in the lr
well +1 epoch when added to 1 epoch sounds like massive overfitting waiting to happen
tbh what would fix like 90% of the frustration is so that kaggle could show disassembled metric
aka show separately the forget score the retain score the test score
like you could at least get where to tune hyperparameters to
would be cool if authors could address some of our concerns but oh well
has anyone tried generating "anti samples" and putting them into the model? I saw that concept in some machine unlearning conference.
I feel like the competition turned into finding out how to interact with the hidden dataset and not about unlearning anymore. I was deeply invested in the earlier CIFAR notebooks when they announced, but I've been slamming my head against a wall with this new one
tbh same. The problem is that CIFAR10 notebook (w/ 99.8% and 88% on train/test) does not represent the model in the contest (w/ 98.98% and 96.43%) good enough, and also that its hyperparameters are way to fine to afford any change off of the default solution, thus there are 1 submission people in top50. Either there is a complete breakthrough, guaranteeing 0.06+ consistently (NO idea what the top2 people did with their 0.08+), either you pray on RNG. IMO even interactions with the dataset are not really in play, since there are many discussions on class weights being seemingly pointless. Unless you mean just straight up finding a completely similar dataset with faces, finding ways(and recourses) to train ResNet on it until it reaches approximately the same accuracy and only then you get somewhat representable testing environment that you can actually tune your parameters and approaches in
i guess the quickest solution on the staff part may be this
Is there a way to get hypothetical data, of which the structure is similar to hidden data. It would be useful at least to check the code, if it is working or not.
there are examples of notebooks in which CIFAR10 dataset is imported and loaded and run
haven't checked this one personally but here https://www.kaggle.com/code/asarvazyan/unlearn-faces-or-cifar10-submit-w-o-exceptions
Thanks
Hello,
Quick question, is changing the batch size allowed? I have a doubt here. If yes do you know the max batch size possible on P100?
is there any information on the data? like what are its dimensions
3x32x32 (cifar10-like), presumably faces of people
got it, thanks
I just couldn't access Kaggle site, did anyone have the same accident?
yep, had it today. Seems to be fixed rn
Is there any chance for the competition to be remade with different metric in presence of this discussion https://www.kaggle.com/competitions/neurips-2023-machine-unlearning/discussion/442582
Erase the influence of requested samples without hurting accuracy
Hi, I'm not able to submit any notebook successfully. It just keeps running forever
he does not know 💀
the submissions take 4 hours + because of the 512 model checkpoints
...no
no, except locally recreating the whole setup they describe in the paper. they outline some of it in code at neurips challenge notebook here https://github.com/unlearning-challenge/starting-kit , but i personally couldn't recreate it compeletely
Is there any other dataset with pre-trained and re-trained models available?
interesting paper about unlearning - https://browse.arxiv.org/pdf/2310.02238.pdf
Hi, I am getting notebook timeout but the same code works with starter kit just fine even for larger epochs.
Can someone point me the possible issues?
can any one review this https://www.kaggle.com/competitions/neurips-2023-machine-unlearning/discussion/447573 ?
Erase the influence of requested samples without hurting accuracy
Hi, is this topic something that I can pursue as an undergraduate student? Or perhaps work on it for my final thesis?
Maybe, but not within this competition, since the metric used is extremely controversial
Hey there, I haven't checked on this competition in two weeks
Can anyone give me a TL;DR of the developments that occurred while I was away?
Hi, i am new to kaggle, i was trying to join this competition but couldnt find input data for this competition. I trued running the pinned starter notebook but getting filenotfounderror for csv files. Is there any beginners guide? Please guide me
This competition is a little special, as all the data is hidden and can only be accessed when your notebook is submitted
So the goal is to develop an algorithm to perform the required task without access to the data, and then submit your algorithm to be evaluated using the hidden test set
I've took this topic as my final thesis. There are truly many problems there if you'll start digging into the topic. Different metrics, different scenarios, different model architectures etc... So pretty easy to write something unique even if you'll do some comparison across methods for even the same model but different data scenarios.
There are 512 models untraining on notebook evaluation. + Time for zipping. Maybe thats why its failing.
Yeah i did notice that we are running it 512 times.
So what i tried was to add dropout layers to the model and then finetune it. It timed-out even for one epoch.
Is adding dropout layer increasing the computation cost so much that it fails?!
Even I am doing this as one of my course project. Can you suggest some good metrics to use (separately offline). The starter kit used simple MIA but they mention they use whole bunch to attacks for testing.
SDG;
Anyone looking for an additional team member? I did last year's Multimodal scATAC/scRNA prediction competition, but looking to step up my game for this time around 💪
How the hell do people get ~0.09??? Any change degrades the metric like crazy lolol
Hey all! Has anyone changed the model architecture or we are supposed to go with ResNet18 only?
Hello, I just joined the machine unlearning competition and I am new to Kaggle competitions as well. What kind of data are they using for training the target model and can we have access to this dataset ?
hmmmm, hard to say, the default model that is proposed in starting kit runs about 4-5 hours. so it might be it. Or it might be some bug as well
I cannot as I didnt have time to read all those papers suggested in kaggle forum. I can give you a link to them from where you can take them. I will though be doing deep dive from now on, so I can send you some interesting stuff in about a 2-3 weeks that I will find.
The type of data is described in their paper attached in data on kaggle. We cannot access the dataset.
Hi everyone, is supoosed to use only the resnet18 model? Or is supposed to find another model? Thank you!