#Classification for Binary Dataset #classification

2 messages · Page 1 of 1 (latest)

bitter beacon
#

I am working on a binary dataset in which they want to classify the dataset against one target variable. This dataset is incredibly imbalanced. So far, I have tried:

  1. Combining several of the closely similar columns
  2. Used the VIF metric to remove features that show a high degree of multicollinearity
  3. Used SelectKBest to find the 10 best features
  4. Downsampled the majority class (0's) down for all of the selected features vastly to make the target feature more balanced
  5. Created association rules from mlxtend and then used the features that mostly resulted in my target variable as a consequent

For all of these I ran XGBoost Classifier and Decision Tree Classifier. However, I am getting in the 0.8's for F1-Score for 0. This is vastly the majority class. For the minority class, my values have not gotten above 0.05.

Does anyone have any advice on how to get this number higher?

void trench
#

It seems like you’ve already implemented various techniques to address the imbalanced dataset. To further improve performance, you might consider:

1.    Experiment with different algorithms: Try other classification algorithms like Random Forest, Support Vector Machines, or Gradient Boosting to see if they perform better on your dataset.
2.    Tune hyperparameters: Optimize hyperparameters for your chosen algorithm(s) using techniques like grid search or random search to find the best combination for your dataset. 
 3.     Cross-validation: Ensure you’re using cross-validation to assess the model’s performance robustly and avoid overfitting.