#playground-series-s4e7 | Kaggle | Page 1

teal skiff Jul 1, 2024, 3:52 AM

#

Is this the largest ever dataset used in a playground competition ? 🤔

#

11 million rows lol

teal skiff Jul 1, 2024, 4:24 AM

#

This definitely increases the complexity of the challenge and seems like a fun one to participate in

sleek pagoda Jul 1, 2024, 9:16 AM

#

Yup, This is a largest ever dataset in the playground seasons competitions

#

I've tried it roughly on 1M dataset and gain 87% accuracy till now

teal skiff Jul 3, 2024, 1:29 PM

#

I’m trying to make 0.87-0.88 with at most simple approaches, without GBDT or AutoML, let’s see if it’s possible. If I achieve it I will make my kernel public, Efficiency approach holyfuck

sick kiln Jul 3, 2024, 2:46 PM

#

teal skiff I’m trying to make 0.87-0.88 with at most simple approaches, without GBDT or Aut...

Go for it!

dim lodge Jul 8, 2024, 9:36 PM

#

teal skiff I’m trying to make 0.87-0.88 with at most simple approaches, without GBDT or Aut...

Same stuff even I'm trying lol

teal skiff Jul 12, 2024, 4:26 PM

#

oof I think I hit a wall with logistic regression based approach. 0.88938 with 3 min CPU runtime though 🙂

muted crescent Jul 21, 2024, 1:07 PM

#

Hello, everyone. This is my first time trying kaggle, and having a lot of fun here. I tried TabNET since it is a well-known DL architecture for processing tabular data, but got very poor results of AUC score 0.78. I took a lot of time to calculate, but not much learning done. Is there some mistake(hyper-parameter settings maybe?) or do I need the more computing resource, or simply TabNET does not fits well with this data. Also, I have fitted the pretrainer, but can't find way to implement it to the TabNetClassifier() class.

                 Model  Accuracy ROC AUC

0 CatBoost 0.877547 0.866438
1 LightGBM 0.877851 0.865248
2 XGBoost 0.877555 0.864283
3 keras-classifier 0.876747 0.854385
4 TabNET 0.877401 0.778539

The hyperparameters I used is as below.

clf = TabNetClassifier(
n_d=64, # Feature transformer dimensions
n_a=64, # Attentive transformer dimensions
n_steps=5, # Number of steps in each decision
gamma=1.5, # Gamma parameter
n_independent=2, # Number of independent Gated Linear Units
n_shared=2, # Number of shared Gated Linear Units
optimizer_fn=torch.optim.Adam,
optimizer_params=dict(lr=1e-3),
mask_type='entmax', # Use "sparsemax" for comparison
scheduler_params={"step_size": 50, "gamma": 0.9},
scheduler_fn=torch.optim.lr_scheduler.StepLR,
verbose=1
)

clf.fit(X_train, y_train, eval_set=[(X_test, y_test)], patience=30, max_epochs=100, eval_name=['test'], eval_metric=['accuracy'])

strange jewel Jul 23, 2024, 4:44 PM

#

I guess that you should change your eval metric into auc

#

eval_metric=['auc'],