I am working on a binary dataset in which they want to classify the dataset against one target variable. This dataset is incredibly imbalanced. So far, I have tried:
- Combining several of the closely similar columns
- Used the VIF metric to remove features that show a high degree of multicollinearity
- Used SelectKBest to find the 10 best features
- Downsampled the majority class (0's) down for all of the selected features vastly to make the target feature more balanced
- Created association rules from mlxtend and then used the features that mostly resulted in my target variable as a consequent
For all of these I ran XGBoost Classifier and Decision Tree Classifier. However, I am getting in the 0.8's for F1-Score for 0. This is vastly the majority class. For the minority class, my values have not gotten above 0.05.
Does anyone have any advice on how to get this number higher?