Hi guys, I'm trying to improve the robustness of a well-trained model.
Goal
I have a relatively small set of adversarial examples (about 1000, CIFAR10), and a ResNet50 model with about 96% accuracy. The goal is to improve the model's accuracy on the given adversarial examples without sacrificing accuracy on the clean dataset.
What I did
I found that adversarial fine-tuning can serve my purpose, so I used the small adversarial dataset to fine-tune the model for about 5 epochs, using a small learning rate (1e-5, with SGD, 2e-5 weight decay). However, the fine-tuned model can only achieve about 25% accuracy on the adversarial dataset, and its accuracy on the clean dataset drops to about 80% (or even lower, I cannot really remember).
Question
So what is the correct way to do adversarial fine-tuning? I think that I'm missing something in the fine-tuning. Maybe a smaller learning rate? Or a larger adversarial dataset? Any advice or link to useful resources would be appreciated. Thanks in advance.