#playground-series-s5e3
1 messages · Page 1 of 1 (latest)
me too,!
Me too
mee too
hello everyone... can anyone please help !!??
how can i handle this imbalance data ?
You can address imbalanced data using techniques like data augmentation, resampling, or applying class weights during model training.
I don't think oversampling is necessary in this case. You can also use StratifiedKFold which allows same class distribution (as that of the dataset) of the target variable values across each fold
I also doing it...
Hmmm, two people has 1.0000 in leaderboard...
yeahh
Anyone want to teamup for this competition? Rainfall prediction
I think there's some gaming of the system because even some of the better out of the box fits are closer to 0.89
you can read the discussion posts, there was some leader boardprobing basically. And there's a public notebook with the results. just means you can't trust the public leaderboard
outside of leaderboard probing , what do we think is the "best" score so far?
Yup it's a textbook case of Goodhardt's law. When you know the metric you're being scored against people start figuring out ways to game it.
Without spending a huge amount of time on this I was able to get a 0.88 pretty reasonably. So that score range is likely legit. I spent like maybe an hour on a weekend.
There's one data point which is missing an entry you have to figure out what to do with. I did a simple nearest neighbor replacement.
I'm getting a huge discrepancy between my CV and LB. .89 on CV and .82 on LB. anyone else?
That is your area under the curve metric on your cross-valdiation correct?
yes
For the missing point what did you do for that one?
I didn't do anything yet with that. I was trying to get a baseline but it seems weird that I'm getting such a big difference with only 1 missibng point
It does add quite a bit since huge outliers can wreck some common metrics. Try fixing that first and see what you get.
Especially since the LB dataset is only about what? 700 entries?
The public LB dataset is right now only 146 rows of data
I have been getting these huge differences too, 0.88-0.89CV - 0.84-0.85LB..
Hello, I am a beginner in Data Science and very new to Kaggle competitions, and I've been pretty stuck on how to improve my score in this contest as most of my attempts have seen no improvements.
Currently, my approach has been engineering features, adding temporal features, selecting the best features, and then using an XGBoost model to make my predictions.
However, I cannot seem to increase my public LB score past .82 (which I got by just throwing the unprocessed dataframes into my XGB), and many of my attempts I do to improve my public LB score end up making my score worse. For example, I tried engineering more features but that decreased my public LB score from ~.82 to ~.81. I tried doing forward feature selection, but that also decreased my LB score by .01
I'm pretty stuck here because I don't know what I'm doing wrong or if I'm unknowingly doing a common beginner's mistake. I don't really understand how other people's XGB models are getting public LB scores of above 0.85. I'm not sure if my feature engineering is lacking, if I'm using the wrong model, if other submissions are overfitting the public LB, or something else. Any advice helps!
My current notebook: https://www.kaggle.com/code/michael927/rainfall-pred
you can check the OOF method (out of fold) brought up in the discussion, should be helpful but be careful on not overfitting your model
Pretty much what I expected. 0.003-0.009 was the difference between 500th and 1st
Spent an hour and got about 0.006 below first place. Can't complain about that
what was your approach?
Hello.
Big shake up, from 2.5k to 222
Damn, can't complain either though
Will we be getting an access to the winning solution of this competition? If yes, where? In the discussion or here?
PS: I’m new here.
Small Neural Network using a training procedure we use in my day job. Imputer to fill in the missing data.