#playground-series-s5e1
1 messages · Page 1 of 1 (latest)
Hi!
greetings
anyone doing it?
I just looked in... planning to, anyone else doing it?
Yes, I am on it.
It is a time series forecasting, which is different from the usual Regression / Classification.
There is more focus on linear models than the usual boosted trees.
Hey!
I am just new to Kaggle and I entered into this competition!
I created 3 notebooks but the public score is around 2-3!
Help me!
This is a good notebook to get started: https://www.kaggle.com/code/cabaxiom/s5e1-previous-years-baseline-no-model
This discussion is another good way to get started: https://www.kaggle.com/competitions/playground-series-s5e1/discussion/554331
hello! i am also doing this competition. this is time series data, so i should use lstm? (i am new at kaggle)
It seems simpler models (ridge regression) are doing better on this dataset (not many features), so I would start here for this one.
features engineering + simple linear model @sharp wren
I would recommend starting with the notebooks:
https://www.kaggle.com/code/cabaxiom/s5e1-previous-years-baseline-no-model
https://www.kaggle.com/code/cabaxiom/s5e1-eda-and-linear-regression-baseline
okk, thanksss
Just wanted to say that I am only starting with the ML stuff, got through the good entry course my company sent me to, and these notebooks are really fantastic resource to get a better grip what can be done with the data. Thank You for these links!
yes those are great notebooks!
This one is great too: https://www.kaggle.com/code/ravi20076/playgrounds5e01-public-baseline-v1
although less relevant on this particular competition but generaly very useful
thanks! It's my first competition and at this point I got to discord to look for examples to study others' approach and methods. I was happy yesterday that I submitted the entry and wasn't last. Now looking at these, well, I know why I'm not anywhere higher yet :D Damn, these are satisfying to read. Super motivating to set aside this hour or two a day to just practice!
Hey! This is my first time participating in Kaggle competitions and so I'm a little bit confused about the data formatting. So, even in the training data, there are dates that appear to be just ##### and some datapoints for the number sold are missing. Are the missing sales numbers 0s or are we meant to fill them in?
Hey! All dates are there, maybe some formatting issue in your IDE? And yes, there are some missing calues in num_sold - apparently that's one of the main tasks to solve that problem, finding out HOW to fill them :) William posted really good example notebooks from the pros that I'm studying, check them out!
Could you explain why you classify this as a time series forecasting problem and how it differs from regression? I tried a regression-based approach using stacked ensemble and achieved around 0.15. I would appreciate your insight on this difference.
Because we have to predict future sales from past sales data. So you want to avoid predicting past sales from future sales or things like that.
This is why the classical k fold split might not been the best one here. Some have used a grouped K fold by year with good results, I think.
That makes sense, thanks for the response
Just wanted to congratulate all of you on completing the Playground Series S5E1 challenge! This was my first Kaggle competition, and wow, it was much more difficult than I had imagined! 😅 My biggest takeaway learning is to start early next time! I got off to a late start and felt a bit left out of the community conversations and iterative improvements.
That said, I learned a ton and had a lot of fun. I’d love any feedback or advice from those willing to share! Here’s my notebook: [https://www.kaggle.com/code/josephnehrenz/regression-s5e1-lightgbm-sales-forecast-in-r]
Thanks!