#sklearn: accuracy vs balanced accuracy

4 messages · Page 1 of 1 (latest)

flat oar
#

Are there any reasons to ever use accuracy over balanced accuracy for assessing a model, considering that balanced accuracy is equal to accuracy for balanced datasets (if I understood the sklearn documentation correctly)?

mystic thunder
# flat oar Are there any reasons to ever use *accuracy* over *balanced accuracy* for assess...

So there are a few things. First, when we evaluate a model, we never evaluate based on one metric. We always evaluate from multiple angles, depending on the problem statement. If we are dealing with imbalanced classes, we'd look deeper into recall and precision, as they get class specific, and we can decide on the trade-offs. Second, when dealing with imbalanced datasets, we likely will be treating them to a certain extent, so that the classes are relatively balanced. You likely won't see exact number of records for each class, so the discrepencies between balanced accuracy vs. accuracy becomes less intuitive. Now, evaluation is not just for ourselves, but also to communicate. The relationship between accuracy, recall, precision, is very well established, and easier to communicate even among data folks. Balanced accuracy would be less so, due to it being averaging instead, and not necessarily more interpretable when you add more classes. So at most, I'd put it as an additional evaluation metrics, but not a replacement.

flat oar
mystic thunder
# flat oar Ok, I see. But when tuning hyperparameters using a grid search, we need to choos...

Not necessarily. You have to think about the pros and cons. I don’t remember much about sklearn implementation, but it should be flexible enough. By going balanced accuracy, you’re just measuring the true cases for all classes. But there often is cost associated with false positives and false negatives that you’d need to account for. F1 has been used more often for just a generic case, see discussions here: https://datascience.stackexchange.com/questions/73974/balanced-accuracy-vs-f1-score

Yet again, depending on the problem, you may care more about precession specifically, or you care about only optimizing for recall at precision 96%+. It’s a mental exercise to think which one to use