#RandomForrest ML

12 messages · Page 1 of 1 (latest)

native oxide
#

I'm happy to help. What specific questions do you have?

heady hill
#

I would start framing the problem, do you have current loan data? How familiar are you with measuring performance, assessing a model, tuning it. Are you creating this model from scratch?

gentle flume
# native oxide I'm happy to help. What specific questions do you have?

Hi, thank you so much. I already did this:

  • Cleaned the data (from LendingClub)
  • Removed unnecessary columns (I now have 43 columns of the 151 left)

After this, I did the following, but I am not sure this is the way to do it.

  • Divided the data in X and y (X = data - loan status : y = loan status)
  • Splitted X and y with train_test_split
  • Took 5 best features using the SelectKBest tool from sklearn
  • Defined the RandomForest model and fitted this with the selected data
  • Evaluated with predict_proba(X_test
  • Created a roc curve

The reason I am not sure this is the way is that my ROC curve is too good I think, which probably means it is overfitted?

Thank you again!

gentle flume
heady hill
#

What is the orange line?

gentle flume
heady hill
#

How come it says feature selection?

gentle flume
heady hill
#

That’s a good roc curve

#

Is this predicting X train or X test

native oxide