#playground-series-s4e9 | Kaggle | Page 1

sick bramble Sep 6, 2024, 6:31 PM

#

Does anybody have some thoughts/advice they'd be willing to share in terms of how to preprocess the data before feeding it into a model? For instance, there are 57 unique car brands and 1897 models. It doesn't seem like a good idea to one-hot-encode these features. What would be a better method for preprocessing? It's been a while since I worked on a Kaggle competition dataset, so I need to jog my memory! 😁

fast sky Sep 6, 2024, 9:42 PM

#

Hello

fast sky Sep 6, 2024, 11:03 PM

#

sick bramble Does anybody have some thoughts/advice they'd be willing to share in terms of ho...

col_name unique_vals

brand 57
model 1897
fuel_type 7
engine 1117
transmission 52
ext_col 319
int_col 156
accident 2
clean_title 1

Yes, you are right. I am also thinking of something to tackle this.

worthy thunder Sep 8, 2024, 5:39 PM

#

sick bramble Does anybody have some thoughts/advice they'd be willing to share in terms of ho...

Hello! The way I approached to that column is, if you check the data in "model" column you will see some specific keywords like "base", "premium", "limited" and so on. Perhaps you can create a new feature with reducing the amount of categories and then apply one hot encoding.
For the "brand" column, if you check their value_counts you'll see some categories might introduce noise (ofc thats my assumption) so you may decide on a threshold and if the occurence is less than that, you can group them to another value, perhaps "others".

However I'd like to remind you that I'm not an expert, barely a junior DS myself 😅

fast sky Sep 9, 2024, 3:56 PM

#

worthy thunder Hello! The way I approached to that column is, if you check the data in "model" ...

Did you participate in the competition?

worthy thunder Sep 9, 2024, 4:39 PM

#

fast sky Did you participate in the competition?

I've joined but haven't submitted anything yet, still exploring & feature engineering the columns

sick bramble Sep 16, 2024, 5:51 PM

#

Sorry for the slow response on my end. Thanks for your thoughts here. I did a very simple regression model as a baseline using 3 features, brand, model year, and mileage. I ended up doing a numerical encoding of the brand based on the average sale price. 1 being the cheapest models and 4 being the most expensive. I thought this would do a reasonable job at predicting, but my RMSE values are crap! I did practice doing a prediction of the test data and submitting it. I plan to try out some more advanced models and see what happens. I'd love to hear any breakthroughs anybody had. I'm enjoying working on this, though! 😃

distant python Sep 20, 2024, 7:10 PM

#

Are we supposed to discuss details like this? I've submitted something using numerical columns, model year and mileage only, the results are far from spectacular. I have some ideas but not sure how much I will manage in the next 10 days.

sick bramble Sep 23, 2024, 8:39 PM

#

distant python Are we supposed to discuss details like this? I've submitted something using num...

This is a playground challenge, so I think it's fine for us to discuss. Maybe I'm wrong...

#

Something I realized after digging into this data some more is that it was generated from a model trained on another dataset with 4009 samples. I think the model must have been crap, because it predicted a Ford F-150 has a sale price of $2.954M! A Dodge Ram also went for the same price. That is ridiculous! Kind of frustrating imho.

worthy thunder Sep 27, 2024, 4:33 PM

#

sick bramble Something I realized after digging into this data some more is that it was gener...

do you scale the price column when you train your model?

distant python Sep 29, 2024, 3:58 PM

#

sick bramble Something I realized after digging into this data some more is that it was gener...

yep there are certain weird outliers in terms of price, I was struggling to understand what they have in common to get the same high price

#

anyway, I've done a small improvement from where I was 10 days ago, for me it is very much a playground to improve my knowledge of regression, and to understand how kaggle in general works.

sick bramble Sep 30, 2024, 5:58 PM

#

worthy thunder do you scale the price column when you train your model?

My first Linear Regression model had some negative price predictions. So, what I did was do a log transform of the price data, train a model on that, and then on the predicted data I did the inverse of a log (ie exponential). That way none of my predictions were negative. My best result was with an xgboost model. I'm 1845 on the leaderboard. Not great, but not horrible either.

distant python Sep 30, 2024, 7:29 PM

#

but log doesn't just guarantee positive price, it changes the relationship into non linear

#

which is probaby correct in this case, but might not be in other cases of positive only target

fast sky Oct 1, 2024, 10:07 AM

#

Hello all. I just wanted to know that are the late submissions be evaluated on private set and public set both?

worthy thunder Oct 1, 2024, 11:01 AM

#

fast sky Hello all. I just wanted to know that are the late submissions be evaluated on p...

AFAIK it is evaluated on both but it doesn't change your ranking, someone can correct me if im wrong

fast sky Oct 1, 2024, 11:20 AM

#

worthy thunder AFAIK it is evaluated on both but it doesn't change your ranking, someone can co...

That means I will get both the scores simultaneously?

worthy thunder Oct 1, 2024, 11:24 AM

#

fast sky That means I will get both the scores simultaneously?

that was my experience from another due competition, yes

fast sky Oct 1, 2024, 11:35 AM

#

worthy thunder that was my experience from another due competition, yes

Yes you are right. I am getting both the scores.

pulsar crown Oct 2, 2024, 9:55 AM

#

Hey everyone,

This is Harsh and I am currently working as an AI researcher and have experience in machine learning. I recently started participating in Kaggle and would really appreciate it if you could take a look at my notebook for the Used Car Regression competition and provide feedback. Here’s the link: https://www.kaggle.com/code/harshsharma1128/used-car-regression/notebook?scriptVersionId=199146028.

Any insights or suggestions would be incredibly helpful. Thanks in advance!

used-car-regression

Explore and run machine learning code with Kaggle Notebooks | Using data from Regression of Used Car Prices

raven spruce Oct 2, 2024, 1:32 PM

#

Am I allowed to post pictures here?

#

I am facing "Evaluation metric raised an unexpected error", this particular error while submitting my CSV file

#

Can anyone please help me

fringe aspen Oct 3, 2024, 4:18 AM

#

raven spruce Am I allowed to post pictures here?

yes you can