#playground-series-s4e1
1 messages · Page 1 of 1 (latest)
I'm also interested in the question in general
I feel like there's an interesting variance-bias question here but I'm not 100% sure
I usually do some experiments with cross validation, than if I find correct parameters and data transformations, train the model on the whole dataset.
thanks for your insight, that's my routine as well, I'm interested to see if there's other opinions !
I have a question
for the challenge my pipeline is the following :
- I have a file that takes care of feature generation, it generates many features with various methods
- I have a file that trains many models with cross validation and generates oof predictions as well as test predictions
- I have a file that stacks predictions using optuna : it creates 5 folds like in cross validation, finds good weights with optuna using score on train set and the cross val score at the end uses the validation set of each fold
I had a submission that scored 0.890 on the public leaderboard
then I added some features, trained more models and calculated the cross val score in file number 3. it was significantly higher but the score on the public leaderboard is significantly lower (0.884)
I do not understand how this is possible, how can I overfit if I always look at scores on validation sets that were not used to train ? seems odd to me.
I didn't use any parameters that were fine tuned using cross validation from the new features during part 2 to train models
and I only calculate cross validation once in part 3 to stack predictions, I don't optimise it with multiple reperitions
Yeah, the public leader board uses such a small portion of the test results that scoring may be unstable
interesting
so you should only look at cross val score when making final submissions ?
Yes, I do
I also have more or less the same issue. My first model got 0.888 on the test set, but went down to 0.886 in the leaderboard. Then I did some feature extraction (mainly combining geography and gender and binning age, salary, and credit) and model improvements and got 0.890 but 0.885 on the leaderboard. The number of FP/FN in the confusion matrix are the same in all cases. I share my notebook which is public and have a lot of detailed explanations. Any Feedback is welcome and hope it helps others to learn and improve.
https://www.kaggle.com/code/bmart80/bank-churn-dataset-votingclassifier
If you like it please upvote and feel free to to connect with me
https://www.linkedin.com/in/benitomzh/
Final score after the end, on the entire test set, will likely be closer to CV. Good luck
This happened on another competition that recently closed - Predicting Writing Quality. There was a HUGE change from the public to private leaderboard and the people who trusted their CV scores ended up on top in the end. I'm not sure if this is always the case.
Yeah, that’s true. I was shocked to see that one competitor jumped from 865 public place to 2nd in private score.
that makes it fun
https://www.kaggle.com/code/samvelkoch/s4e1-ml-public-scores-ensembles-mlwave // how bad is idea to build auto-encoder above top 50 public submission scores ?
was total fun, thanks for costing, looking forward into next one 🙂
Hello, hope my notebook will be helpful for someone! https://www.kaggle.com/code/kapturovalexander/kapturov-s-solution-of-ps-s4e1