RandomForrest ML | Learn AI Together | Page 1

native oxide Dec 12, 2022, 5:42 AM

#

I'm happy to help. What specific questions do you have?

heady hill Dec 12, 2022, 9:05 AM

#

I would start framing the problem, do you have current loan data? How familiar are you with measuring performance, assessing a model, tuning it. Are you creating this model from scratch?

gentle flume Dec 12, 2022, 9:46 AM

#

native oxide I'm happy to help. What specific questions do you have?

Hi, thank you so much. I already did this:

Cleaned the data (from LendingClub)
Removed unnecessary columns (I now have 43 columns of the 151 left)

After this, I did the following, but I am not sure this is the way to do it.

Divided the data in X and y (X = data - loan status : y = loan status)
Splitted X and y with train_test_split
Took 5 best features using the SelectKBest tool from sklearn
Defined the RandomForest model and fitted this with the selected data
Evaluated with predict_proba(X_test
Created a roc curve

The reason I am not sure this is the way is that my ROC curve is too good I think, which probably means it is overfitted?

Thank you again!

#

gentle flume Dec 12, 2022, 9:47 AM

#

heady hill I would start framing the problem, do you have current loan data? How familiar a...

I do have hat but I am very new to all of this. I have some sample code from lectures but other than that, everything is from scratch.

heady hill Dec 12, 2022, 9:49 AM

#

What is the orange line?

gentle flume Dec 12, 2022, 10:04 AM

#

heady hill What is the orange line?

the orange line should be the performance per threshold

heady hill Dec 12, 2022, 10:04 AM

#

How come it says feature selection?

gentle flume Dec 12, 2022, 10:19 AM

#

heady hill How come it says feature selection?

because this is a name i gave just because i used the features from SelectKBest 🙂

heady hill Dec 12, 2022, 2:20 PM

#

That’s a good roc curve

#

Is this predicting X train or X test

native oxide Dec 13, 2022, 5:32 AM

#

gentle flume Hi, thank you so much. I already did this: - Cleaned the data (from LendingClub)...

^ + 1, Is this curve the performance on the train dataset or the test dataset?

#RandomForrest ML