#um-game-playing-strength-of-mcts-variants
1 messages · Page 1 of 1 (latest)
https://www.kaggle.com/code/inversion/mcts-variants-getting-started For this competition you can simply edit this notebook and add your training/prediction code to the train/predict functions.
hi, I want to know some information about test dataset
When I look https://www.kaggle.com/competitions/um-game-playing-strength-of-mcts-variants/discussion/532514, It said test samples were generated with same method
Then can I think test samples play games with 15, 30, 45times only?
You would have to ask the creators, but it's likely the same job scripts were used with the same parameters, so yes probably.
I can't guarantee with 100% certainty that it's only 15/30/45 times, but you may expect a similar distribution as in training data (where the vast majority of <game, agent1, agent2> 3-tuples had 15 plays, and any with more plays were rare).
Note also that a row having more plays only means that the target label becomes a bit less noisy / more reliable (because we have more plays to average our noisy simulation results over). Apart from that it shouldn't really matter
No module named 'kaggle_evaluation'
This is my first Kaggle competition. How can I import files into the notebook to be able to use such import statements?
so i have a question. For the LightGBM models, are most people using more or less the same template for LightGBM but only changing the seeds and number of folds essentially? I am sort of new to this kind of data science but there seems to be a pretty large variation in the leaderboards and I understand the split between ensemble and simple one-model approach and how the single model can probably go through more iterations within the alloted time than the ensemble but I am unsure how a simple model can be further optimized if people are (if i understand correctly) mostly using Optuna for finding the parameters to use for their hyper-parameters
Essentially I am asking if everyone is using similar or the same hyperparameters via Optuna for these competitions or if there are more ways to find the hyperparameters which people higher up on the LB are using or if it is mainly coming down to fold distribution and seeds used
Finding the best hyperparameters for a single model is certainly an important step, but it's far from the only way to increase score. This competition is a pretty good show of one of the fundamentals of Data Science: overfitting.
Ensemble methods are pretty good, because it combines the different types of learning of multiple algorithms, which can strengthen each other. The issue with them isn't the time they take (tho finding the best hyperparameters can be annoying), but rather the fact that they can overfit more easily. Also, often problems are simple enough to where the thinking of simpler methods works just as well with less risk of overfitting. Not in this case tho.
To counteract overfitting you can use cross-fold validation on your own validation set. However, another important realization is that the test set isn't the same as a random split sample of the train set. If you split your training set randomly to create a validation set, the validation set will contain some games which the training set also has and so it gets rewarded for learning who won those exact games. Except the test set will has completely new games never before seen, so you want to create a training set and validation set where no games in the validation set are present in the training set. (Or use GroupKFold if you're cross-folding.)
So everyone is pretty certain ensemble methods with GroupKFold is the way to go. How to proceed is less clear. Despite the many many games in the training set, the ensemble method can still somehow find ways to overfit a little using all the columns. So people have started looking at feature selection to stop that and also steer the models a little. The question is: what features are important for all games, including those unseen?
There's also feature engineering: supplying the models with info they can't get themselves. LudRules may be useful, but a lot of that info is already in the columns. Some people have also looked at generating more data.
So the best scores are in large part indeed because they took the time and energy to find the best hyperparameters/seeds/distribution. But they might also have good insights on feature selection, feature engineering or generating data which they're not sharing which puts them over the edge. And if they don't already, they will in the future.
wow
thank you very much for the in-depth answer!
I have heard though that the generation of more data isnt necessarily going to help with the training
maybe its useful for closing the gap between CV and LB though?
Yw ❤️
I haven't looked into it, but generating data would only be useful if you can generate data of games different from the training set.
They might have already put every game possible (that's not in the test set, which we can't generate) inside the training set.
If you can generate data of new games, then yes, this would close the gap between CV and LB. (It might theoretically also make the gap larger if the LB test set is somehow skewed, but it'd still be better even if it doesn't seem like it.)
can anyone help me ? whenever i went for submit it show inference error and got reject
Hi,
I am able to run a solution offline but when i try to run it inside Kaggle, I get the following Error :-
GatewayRuntimeError: (, '<_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "Exception calling application: No OpenCL device found"\n\tdebug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Exception calling application: No OpenCL device found", grpc_status:2, created_time:"2024-09-22T21:46:33.734053034+00:00"}"\n')
This leads me to think this is some intrnal errror and i need to register some RPC to get the actual error. Please help.
This is probably a LightGBM error…have seen it before in past competitions
thanks is there anyway to understand the exact error ?
Ok so Claude to the rescue, its an error elated to gpu vs cpu. Was able to fix it via enabling the gpu.
This comp is so funny, compared to my baseline, I tried different approaches, whether they worsen or improve CV (even substantially), the LB is always X - 0.001, X or X + 0.001 🤣
I sense some cv overfitting or my model is too stable 😂 (which I doubt) can CV even be trusted lol. The variation I see in the CV is not present in LB
This competition is funny
I tried many things that I was almost sure it would enhance performance and it didn't even more it worsen performance compare to random choices that I didn't imagine it could help but it surprisingly did 😅
Still i can't submit my solution can anyone help me?
what is the issue ?
yeah the same
Inference error
Anyone?
Two methods I've seen for debugging MCTS server errors:
-
Run your predict function from a cell below the MCTS server implementation. Get the cell working without errors. Once the standalone cell functionally calls the predict function, then try the MCTS server. This method yields verbose errors in the predict function. (Bonus: documentation says anything below MCTS server will be ignored during scoring runs).
-
Instead of running predictions on the test data, try running the predict function on the training data. Change the MCTS local implementation by pointing it at a training csv without the target column. This will give you a timeframe for the server to complete such a task and raise other errors that the 3 observation test set cannot.
Hopefully one of those helps...
I have a (set of) features which gives 0.006 improvement on CV, but LB is slightly worse
not sure what's actually happening lol, I scanned my code more than 5 times still no bug or any points of leakage, maybe I should trust CV 🤣
but its pretty alarming the public NN is giving 0.432 CV (0.440 without leak) and 0.435 LB (the gap isn't that big) - didn't test the version without leak on LB but I expect it to be same
I have 0.404 CV and 0.428 LB. Meanwhile I also have another version with 0.419 CV and 0.430 LB from earlier. Feels strange that all those added stuff don't contribute that much...can CV even be trusted lol
*different ways of splitting the dataset for CV also yielded a significant improvement >>> 0.002
maybe I haven’t investigated enough haha
@trail dust hi 👋
No improvement in my custom split lol
hopefully this split correlates now lol, cos it’s actually much harder to get CV improvements compared to the standard GroupKFold that public notebooks are using 😅
It's too hard to predict; I think the dataset is too low to predict something T.T
Is there any paper or some kind of written document on the dataset?
not on the final version of the dataset used in the competition. But https://arxiv.org/abs/2406.09242 describes an early version of the dataset. Based on some of the lessons we learned as described in there, we generated an improved version of the dataset (also just a plain larger one), and completely new data for the test set, and that's what's used in the competition.
wow... thanks a bunch @north wren
Hi everybody! Does a MLP suit for this competition? I tried to implement one but my model isn't learning at all and I'm running out of ideas to fix this...
Nevermind, I've fixed my issue, I just used the wrong activation in my hidden layers