#um-game-playing-strength-of-mcts-variants | Kaggle | Page 1

lilac oasis Sep 10, 2024, 8:46 PM

#

hi guys its my first time here and also making my first time competition sumbit

idk how to make it i have seen videos on youtube but they all just uploading file and here not

so if can anyone help me i would be pleasure

wooden elk Sep 11, 2024, 10:19 AM

#

lilac oasis hi guys its my first time here and also making my first time competition sumbit ...

https://www.kaggle.com/code/inversion/mcts-variants-getting-started For this competition you can simply edit this notebook and add your training/prediction code to the train/predict functions.

MCTS Variants - Getting Started

Explore and run machine learning code with Kaggle Notebooks | Using data from UM - Game-Playing Strength of MCTS Variants

ruby jetty Sep 13, 2024, 7:31 AM

#

hi, I want to know some information about test dataset

#

When I look https://www.kaggle.com/competitions/um-game-playing-strength-of-mcts-variants/discussion/532514, It said test samples were generated with same method

UM - Game-Playing Strength of MCTS Variants

Predict which variants of Monte-Carlo Tree Search will perform well or poorly against each other in hundreds of board games

#

Then can I think test samples play games with 15, 30, 45times only?

wooden elk Sep 13, 2024, 1:10 PM

#

ruby jetty Then can I think test samples play games with 15, 30, 45times only?

You would have to ask the creators, but it's likely the same job scripts were used with the same parameters, so yes probably.

north wren Sep 13, 2024, 1:15 PM

#

I can't guarantee with 100% certainty that it's only 15/30/45 times, but you may expect a similar distribution as in training data (where the vast majority of <game, agent1, agent2> 3-tuples had 15 plays, and any with more plays were rare).

Note also that a row having more plays only means that the target label becomes a bit less noisy / more reliable (because we have more plays to average our noisy simulation results over). Apart from that it shouldn't really matter

clear glade Sep 17, 2024, 9:40 AM

#

No module named 'kaggle_evaluation'

#

This is my first Kaggle competition. How can I import files into the notebook to be able to use such import statements?

clear glade Sep 18, 2024, 8:21 AM

#

so i have a question. For the LightGBM models, are most people using more or less the same template for LightGBM but only changing the seeds and number of folds essentially? I am sort of new to this kind of data science but there seems to be a pretty large variation in the leaderboards and I understand the split between ensemble and simple one-model approach and how the single model can probably go through more iterations within the alloted time than the ensemble but I am unsure how a simple model can be further optimized if people are (if i understand correctly) mostly using Optuna for finding the parameters to use for their hyper-parameters

#

Essentially I am asking if everyone is using similar or the same hyperparameters via Optuna for these competitions or if there are more ways to find the hyperparameters which people higher up on the LB are using or if it is mainly coming down to fold distribution and seeds used

wooden elk Sep 18, 2024, 3:10 PM

#

clear glade so i have a question. For the LightGBM models, are most people using more or les...

Finding the best hyperparameters for a single model is certainly an important step, but it's far from the only way to increase score. This competition is a pretty good show of one of the fundamentals of Data Science: overfitting.

Ensemble methods are pretty good, because it combines the different types of learning of multiple algorithms, which can strengthen each other. The issue with them isn't the time they take (tho finding the best hyperparameters can be annoying), but rather the fact that they can overfit more easily. Also, often problems are simple enough to where the thinking of simpler methods works just as well with less risk of overfitting. Not in this case tho.

To counteract overfitting you can use cross-fold validation on your own validation set. However, another important realization is that the test set isn't the same as a random split sample of the train set. If you split your training set randomly to create a validation set, the validation set will contain some games which the training set also has and so it gets rewarded for learning who won those exact games. Except the test set will has completely new games never before seen, so you want to create a training set and validation set where no games in the validation set are present in the training set. (Or use GroupKFold if you're cross-folding.)

So everyone is pretty certain ensemble methods with GroupKFold is the way to go. How to proceed is less clear. Despite the many many games in the training set, the ensemble method can still somehow find ways to overfit a little using all the columns. So people have started looking at feature selection to stop that and also steer the models a little. The question is: what features are important for all games, including those unseen?

There's also feature engineering: supplying the models with info they can't get themselves. LudRules may be useful, but a lot of that info is already in the columns. Some people have also looked at generating more data.

wooden elk Sep 18, 2024, 3:13 PM

#

clear glade Essentially I am asking if everyone is using similar or the same hyperparameters...

So the best scores are in large part indeed because they took the time and energy to find the best hyperparameters/seeds/distribution. But they might also have good insights on feature selection, feature engineering or generating data which they're not sharing which puts them over the edge. And if they don't already, they will in the future.

clear glade Sep 18, 2024, 3:14 PM

#

wow

#

thank you very much for the in-depth answer!

#

I have heard though that the generation of more data isnt necessarily going to help with the training

#

maybe its useful for closing the gap between CV and LB though?

wooden elk Sep 18, 2024, 3:18 PM

#

clear glade thank you very much for the in-depth answer!

Yw ❤️

wooden elk Sep 18, 2024, 3:19 PM

#

clear glade I have heard though that the generation of more data isnt necessarily going to h...

I haven't looked into it, but generating data would only be useful if you can generate data of games different from the training set.

#

They might have already put every game possible (that's not in the test set, which we can't generate) inside the training set.

#

If you can generate data of new games, then yes, this would close the gap between CV and LB. (It might theoretically also make the gap larger if the LB test set is somehow skewed, but it'd still be better even if it doesn't seem like it.)

muted rampart Sep 19, 2024, 11:08 AM

#

can anyone help me ? whenever i went for submit it show inference error and got reject

digital turtle Sep 22, 2024, 10:05 PM

#

Hi,
I am able to run a solution offline but when i try to run it inside Kaggle, I get the following Error :-

GatewayRuntimeError: (, '<_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "Exception calling application: No OpenCL device found"\n\tdebug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Exception calling application: No OpenCL device found", grpc_status:2, created_time:"2024-09-22T21:46:33.734053034+00:00"}"\n')

This leads me to think this is some intrnal errror and i need to register some RPC to get the actual error. Please help.

trail dust Sep 23, 2024, 3:03 AM

#

digital turtle Hi, I am able to run a solution offline but when i try to run it inside Kaggle, ...

This is probably a LightGBM error…have seen it before in past competitions

digital turtle Sep 23, 2024, 7:11 PM

#

trail dust This is probably a LightGBM error…have seen it before in past competitions

thanks is there anyway to understand the exact error ?

digital turtle Sep 23, 2024, 7:52 PM

#

trail dust This is probably a LightGBM error…have seen it before in past competitions

Ok so Claude to the rescue, its an error elated to gpu vs cpu. Was able to fix it via enabling the gpu.

trail dust Sep 29, 2024, 1:37 PM

#

This comp is so funny, compared to my baseline, I tried different approaches, whether they worsen or improve CV (even substantially), the LB is always X - 0.001, X or X + 0.001 🤣

I sense some cv overfitting or my model is too stable 😂 (which I doubt) can CV even be trusted lol. The variation I see in the CV is not present in LB

warped perch Sep 30, 2024, 10:35 PM

#

This competition is funny

warped perch Sep 30, 2024, 10:37 PM

#

trail dust This comp is so funny, compared to my baseline, I tried different approaches, wh...

I tried many things that I was almost sure it would enhance performance and it didn't even more it worsen performance compare to random choices that I didn't imagine it could help but it surprisingly did 😅

muted rampart Oct 3, 2024, 1:12 AM

#

Still i can't submit my solution can anyone help me?

digital turtle Oct 3, 2024, 10:53 PM

#

muted rampart Still i can't submit my solution can anyone help me?

what is the issue ?

digital turtle Oct 3, 2024, 10:53 PM

#

warped perch I tried many things that I was almost sure it would enhance performance and it d...

yeah the same

muted rampart Oct 4, 2024, 10:11 AM

#

digital turtle what is the issue ?

Inference error

muted rampart Oct 5, 2024, 9:15 AM

#

muted rampart Inference error

Anyone?

brisk geyser Oct 7, 2024, 12:10 AM

#

Two methods I've seen for debugging MCTS server errors:

Run your predict function from a cell below the MCTS server implementation. Get the cell working without errors. Once the standalone cell functionally calls the predict function, then try the MCTS server. This method yields verbose errors in the predict function. (Bonus: documentation says anything below MCTS server will be ignored during scoring runs).
Instead of running predictions on the test data, try running the predict function on the training data. Change the MCTS local implementation by pointing it at a training csv without the target column. This will give you a timeframe for the server to complete such a task and raise other errors that the 3 observation test set cannot.

Hopefully one of those helps...

trail dust Oct 13, 2024, 1:07 PM

#

I have a (set of) features which gives 0.006 improvement on CV, but LB is slightly worse

not sure what's actually happening lol, I scanned my code more than 5 times still no bug or any points of leakage, maybe I should trust CV 🤣

#

but its pretty alarming the public NN is giving 0.432 CV (0.440 without leak) and 0.435 LB (the gap isn't that big) - didn't test the version without leak on LB but I expect it to be same

I have 0.404 CV and 0.428 LB. Meanwhile I also have another version with 0.419 CV and 0.430 LB from earlier. Feels strange that all those added stuff don't contribute that much...can CV even be trusted lol

#

*different ways of splitting the dataset for CV also yielded a significant improvement >>> 0.002

trail dust Oct 13, 2024, 2:13 PM

#

maybe I haven’t investigated enough haha

twin hornet Oct 13, 2024, 2:57 PM

#

@trail dust hi 👋

trail dust Oct 22, 2024, 3:03 AM

#

trail dust I have a (set of) features which gives 0.006 improvement on CV, but LB is slight...

No improvement in my custom split lol

trail dust Oct 22, 2024, 3:25 AM

#

hopefully this split correlates now lol, cos it’s actually much harder to get CV improvements compared to the standard GroupKFold that public notebooks are using 😅

ruby jetty Oct 23, 2024, 9:56 AM

#

It's too hard to predict; I think the dataset is too low to predict something T.T

river leaf Oct 29, 2024, 5:43 AM

#

Is there any paper or some kind of written document on the dataset?

north wren Oct 29, 2024, 4:06 PM

#

river leaf Is there any paper or some kind of written document on the dataset?

not on the final version of the dataset used in the competition. But https://arxiv.org/abs/2406.09242 describes an early version of the dataset. Based on some of the lessons we learned as described in there, we generated an improved version of the dataset (also just a plain larger one), and completely new data for the test set, and that's what's used in the competition.

river leaf Oct 29, 2024, 4:23 PM

#

north wren not on the final version of the dataset used in the competition. But https://arx...

wow... thanks a bunch @north wren

shell flicker Nov 26, 2024, 8:18 AM

#

Hi everybody! Does a MLP suit for this competition? I tried to implement one but my model isn't learning at all and I'm running out of ideas to fix this...

shell flicker Nov 26, 2024, 11:23 AM

#

shell flicker Hi everybody! Does a MLP suit for this competition? I tried to implement one but...

Nevermind, I've fixed my issue, I just used the wrong activation in my hidden layers