#What tecniques should i follow for obtaining a good result for RandomForestClassifier probabilities?

3 messages · Page 1 of 1 (latest)

boreal niche Feb 18, 2024, 11:37 AM

Hello, I'm wondering in what cases is it needed to use the following ones on my script: MinMaxScaler, CalibratedClassifierCV and GridSearchCV (without it slowing too much the load time of the script).

My first guess is that we don't need MinMaxScaler; we just need to use GridSearchCV once for each dataset and use CalibratedClassifierCV to check where the log loss is lower (either on not calibrated, sigmoid or isotonic)

Also if you have any more suggestions to improve the results of my predict_proba output, I'll be really thankful

neon osprey Feb 18, 2024, 10:50 PM

boreal niche Hello, I'm wondering in what cases is it needed to use the following ones on my ...

These are all for different purposes... although I'm pretty sure GridSearch in general is an abandoned approach

Scaling or not depends on your data. You should also check out different scalers.
Not sure what do you mean by GridSearch by each dataset, but I guess you mean hyper parameterize every fold out of K-fold datasets. That's fine
I'd suggest find other hyper parameter tuning methods. GridSearch is a brute force method that would miss out a lot of parameter opportunities (think dots between grids)
Calibration should be based used after you've trained your model to just calibrate the distribution
Less important on test set result, but more important on actual real world result. Don't be too hung up on it