#um-game-playing-strength-of-mcts-variants

1 messages · Page 1 of 1 (latest)

lilac oasis
#

hi guys its my first time here and also making my first time competition sumbit

idk how to make it i have seen videos on youtube but they all just uploading file and here not

so if can anyone help me i would be pleasure

wooden elk
ruby jetty
#

hi, I want to know some information about test dataset

#

Then can I think test samples play games with 15, 30, 45times only?

wooden elk
north wren
#

I can't guarantee with 100% certainty that it's only 15/30/45 times, but you may expect a similar distribution as in training data (where the vast majority of <game, agent1, agent2> 3-tuples had 15 plays, and any with more plays were rare).

Note also that a row having more plays only means that the target label becomes a bit less noisy / more reliable (because we have more plays to average our noisy simulation results over). Apart from that it shouldn't really matter

clear glade
#

No module named 'kaggle_evaluation'

#

This is my first Kaggle competition. How can I import files into the notebook to be able to use such import statements?

clear glade
#

so i have a question. For the LightGBM models, are most people using more or less the same template for LightGBM but only changing the seeds and number of folds essentially? I am sort of new to this kind of data science but there seems to be a pretty large variation in the leaderboards and I understand the split between ensemble and simple one-model approach and how the single model can probably go through more iterations within the alloted time than the ensemble but I am unsure how a simple model can be further optimized if people are (if i understand correctly) mostly using Optuna for finding the parameters to use for their hyper-parameters

#

Essentially I am asking if everyone is using similar or the same hyperparameters via Optuna for these competitions or if there are more ways to find the hyperparameters which people higher up on the LB are using or if it is mainly coming down to fold distribution and seeds used

wooden elk
# clear glade so i have a question. For the LightGBM models, are most people using more or les...

Finding the best hyperparameters for a single model is certainly an important step, but it's far from the only way to increase score. This competition is a pretty good show of one of the fundamentals of Data Science: overfitting.

Ensemble methods are pretty good, because it combines the different types of learning of multiple algorithms, which can strengthen each other. The issue with them isn't the time they take (tho finding the best hyperparameters can be annoying), but rather the fact that they can overfit more easily. Also, often problems are simple enough to where the thinking of simpler methods works just as well with less risk of overfitting. Not in this case tho.

To counteract overfitting you can use cross-fold validation on your own validation set. However, another important realization is that the test set isn't the same as a random split sample of the train set. If you split your training set randomly to create a validation set, the validation set will contain some games which the training set also has and so it gets rewarded for learning who won those exact games. Except the test set will has completely new games never before seen, so you want to create a training set and validation set where no games in the validation set are present in the training set. (Or use GroupKFold if you're cross-folding.)

So everyone is pretty certain ensemble methods with GroupKFold is the way to go. How to proceed is less clear. Despite the many many games in the training set, the ensemble method can still somehow find ways to overfit a little using all the columns. So people have started looking at feature selection to stop that and also steer the models a little. The question is: what features are important for all games, including those unseen?

There's also feature engineering: supplying the models with info they can't get themselves. LudRules may be useful, but a lot of that info is already in the columns. Some people have also looked at generating more data.

wooden elk
clear glade
#

wow

#

thank you very much for the in-depth answer!

#

I have heard though that the generation of more data isnt necessarily going to help with the training

#

maybe its useful for closing the gap between CV and LB though?

wooden elk
wooden elk
#

They might have already put every game possible (that's not in the test set, which we can't generate) inside the training set.

#

If you can generate data of new games, then yes, this would close the gap between CV and LB. (It might theoretically also make the gap larger if the LB test set is somehow skewed, but it'd still be better even if it doesn't seem like it.)

muted rampart
#

can anyone help me ? whenever i went for submit it show inference error and got reject

digital turtle
#

Hi,
I am able to run a solution offline but when i try to run it inside Kaggle, I get the following Error :-

GatewayRuntimeError: (, '<_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "Exception calling application: No OpenCL device found"\n\tdebug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Exception calling application: No OpenCL device found", grpc_status:2, created_time:"2024-09-22T21:46:33.734053034+00:00"}"\n')

This leads me to think this is some intrnal errror and i need to register some RPC to get the actual error. Please help.

trail dust
digital turtle
digital turtle
trail dust
#

This comp is so funny, compared to my baseline, I tried different approaches, whether they worsen or improve CV (even substantially), the LB is always X - 0.001, X or X + 0.001 🤣

I sense some cv overfitting or my model is too stable 😂 (which I doubt) can CV even be trusted lol. The variation I see in the CV is not present in LB

warped perch
#

This competition is funny

warped perch
muted rampart
#

Still i can't submit my solution can anyone help me?

digital turtle
muted rampart
muted rampart
brisk geyser
#

Two methods I've seen for debugging MCTS server errors:

  1. Run your predict function from a cell below the MCTS server implementation. Get the cell working without errors. Once the standalone cell functionally calls the predict function, then try the MCTS server. This method yields verbose errors in the predict function. (Bonus: documentation says anything below MCTS server will be ignored during scoring runs).

  2. Instead of running predictions on the test data, try running the predict function on the training data. Change the MCTS local implementation by pointing it at a training csv without the target column. This will give you a timeframe for the server to complete such a task and raise other errors that the 3 observation test set cannot.

Hopefully one of those helps...

trail dust
#

I have a (set of) features which gives 0.006 improvement on CV, but LB is slightly worse

not sure what's actually happening lol, I scanned my code more than 5 times still no bug or any points of leakage, maybe I should trust CV 🤣

#

but its pretty alarming the public NN is giving 0.432 CV (0.440 without leak) and 0.435 LB (the gap isn't that big) - didn't test the version without leak on LB but I expect it to be same

I have 0.404 CV and 0.428 LB. Meanwhile I also have another version with 0.419 CV and 0.430 LB from earlier. Feels strange that all those added stuff don't contribute that much...can CV even be trusted lol

#

*different ways of splitting the dataset for CV also yielded a significant improvement >>> 0.002

trail dust
#

maybe I haven’t investigated enough haha

twin hornet
#

@trail dust hi 👋

trail dust
trail dust
#

hopefully this split correlates now lol, cos it’s actually much harder to get CV improvements compared to the standard GroupKFold that public notebooks are using 😅

ruby jetty
#

It's too hard to predict; I think the dataset is too low to predict something T.T

river leaf
#

Is there any paper or some kind of written document on the dataset?

north wren
river leaf
shell flicker
#

Hi everybody! Does a MLP suit for this competition? I tried to implement one but my model isn't learning at all and I'm running out of ideas to fix this...

shell flicker