#playground-series-s5e3 | Kaggle | Page 1

glossy crescent Mar 6, 2025, 5:15 PM

#

Hi I am doing this kaggle contest

subtle river Mar 6, 2025, 9:54 PM

#

me too,!

hazy moss Mar 7, 2025, 1:28 AM

#

Me too

maiden nova Mar 7, 2025, 2:57 AM

#

mee too

quartz fable Mar 8, 2025, 5:54 AM

#

hello everyone... can anyone please help !!??

#

how can i handle this imbalance data ?

hazy moss Mar 9, 2025, 12:35 PM

#

quartz fable how can i handle this imbalance data ?

You can address imbalanced data using techniques like data augmentation, resampling, or applying class weights during model training.

nimble briar Mar 13, 2025, 4:02 AM

#

quartz fable how can i handle this imbalance data ?

I don't think oversampling is necessary in this case. You can also use StratifiedKFold which allows same class distribution (as that of the dataset) of the target variable values across each fold

mint burrow Mar 13, 2025, 5:32 PM

#

I also doing it...

hazy moss Mar 14, 2025, 3:17 AM

#

Hmmm, two people has 1.0000 in leaderboard...

quartz fable Mar 17, 2025, 6:51 PM

#

hazy moss Hmmm, two people has 1.0000 in leaderboard...

yeahh

haughty hatch Mar 19, 2025, 1:27 AM

#

Anyone want to teamup for this competition? Rainfall prediction

wheat aurora Mar 20, 2025, 12:18 AM

#

hazy moss Hmmm, two people has 1.0000 in leaderboard...

I think there's some gaming of the system because even some of the better out of the box fits are closer to 0.89

lethal fox Mar 22, 2025, 4:53 PM

#

wheat aurora I think there's some gaming of the system because even some of the better out of...

you can read the discussion posts, there was some leader boardprobing basically. And there's a public notebook with the results. just means you can't trust the public leaderboard

vital oak Mar 22, 2025, 4:59 PM

#

outside of leaderboard probing , what do we think is the "best" score so far?

wheat aurora Mar 22, 2025, 6:02 PM

#

lethal fox you can read the discussion posts, there was some leader boardprobing basically....

Yup it's a textbook case of Goodhardt's law. When you know the metric you're being scored against people start figuring out ways to game it.

wheat aurora Mar 22, 2025, 6:06 PM

#

vital oak outside of leaderboard probing , what do we think is the "best" score so far?

Without spending a huge amount of time on this I was able to get a 0.88 pretty reasonably. So that score range is likely legit. I spent like maybe an hour on a weekend.

There's one data point which is missing an entry you have to figure out what to do with. I did a simple nearest neighbor replacement.

vital oak Mar 22, 2025, 6:15 PM

#

wheat aurora Without spending a huge amount of time on this I was able to get a 0.88 pretty r...

I'm getting a huge discrepancy between my CV and LB. .89 on CV and .82 on LB. anyone else?

wheat aurora Mar 22, 2025, 6:17 PM

#

vital oak I'm getting a huge discrepancy between my CV and LB. .89 on CV and .82 on LB. an...

That is your area under the curve metric on your cross-valdiation correct?

vital oak Mar 22, 2025, 6:17 PM

#

yes

wheat aurora Mar 22, 2025, 6:18 PM

#

For the missing point what did you do for that one?

vital oak Mar 22, 2025, 6:18 PM

#

I didn't do anything yet with that. I was trying to get a baseline but it seems weird that I'm getting such a big difference with only 1 missibng point

wheat aurora Mar 22, 2025, 6:19 PM

#

It does add quite a bit since huge outliers can wreck some common metrics. Try fixing that first and see what you get.

#

Especially since the LB dataset is only about what? 700 entries?

cold otter Mar 24, 2025, 7:34 AM

#

The public LB dataset is right now only 146 rows of data

#

I have been getting these huge differences too, 0.88-0.89CV - 0.84-0.85LB..

glossy wind Mar 30, 2025, 9:29 PM

#

Hello, I am a beginner in Data Science and very new to Kaggle competitions, and I've been pretty stuck on how to improve my score in this contest as most of my attempts have seen no improvements.

Currently, my approach has been engineering features, adding temporal features, selecting the best features, and then using an XGBoost model to make my predictions.

However, I cannot seem to increase my public LB score past .82 (which I got by just throwing the unprocessed dataframes into my XGB), and many of my attempts I do to improve my public LB score end up making my score worse. For example, I tried engineering more features but that decreased my public LB score from ~.82 to ~.81. I tried doing forward feature selection, but that also decreased my LB score by .01

I'm pretty stuck here because I don't know what I'm doing wrong or if I'm unknowingly doing a common beginner's mistake. I don't really understand how other people's XGB models are getting public LB scores of above 0.85. I'm not sure if my feature engineering is lacking, if I'm using the wrong model, if other submissions are overfitting the public LB, or something else. Any advice helps!

My current notebook: https://www.kaggle.com/code/michael927/rainfall-pred

Rainfall Pred

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

whole furnace Mar 31, 2025, 2:35 AM

#

you can check the OOF method (out of fold) brought up in the discussion, should be helpful but be careful on not overfitting your model

wheat aurora Apr 1, 2025, 1:37 AM

#

Pretty much what I expected. 0.003-0.009 was the difference between 500th and 1st

#

Spent an hour and got about 0.006 below first place. Can't complain about that

glossy wind Apr 1, 2025, 3:04 AM

#

wheat aurora Spent an hour and got about 0.006 below first place. Can't complain about that

what was your approach?

ember thunder Apr 1, 2025, 3:51 AM

#

Hello.

cold otter Apr 1, 2025, 10:11 AM

#

Big shake up, from 2.5k to 222
Damn, can't complain either though

ashen ruin Apr 1, 2025, 3:23 PM

#

Will we be getting an access to the winning solution of this competition? If yes, where? In the discussion or here?

PS: I’m new here.

wheat aurora Apr 1, 2025, 4:30 PM

#

glossy wind what was your approach?

Small Neural Network using a training procedure we use in my day job. Imputer to fill in the missing data.