#playground-series-s4e7
1 messages · Page 1 of 1 (latest)
This definitely increases the complexity of the challenge and seems like a fun one to participate in
Yup, This is a largest ever dataset in the playground seasons competitions
I've tried it roughly on 1M dataset and gain 87% accuracy till now
I’m trying to make 0.87-0.88 with at most simple approaches, without GBDT or AutoML, let’s see if it’s possible. If I achieve it I will make my kernel public, Efficiency approach 
Go for it!
Same stuff even I'm trying lol
oof I think I hit a wall with logistic regression based approach. 0.88938 with 3 min CPU runtime though 🙂
Hello, everyone. This is my first time trying kaggle, and having a lot of fun here. I tried TabNET since it is a well-known DL architecture for processing tabular data, but got very poor results of AUC score 0.78. I took a lot of time to calculate, but not much learning done. Is there some mistake(hyper-parameter settings maybe?) or do I need the more computing resource, or simply TabNET does not fits well with this data. Also, I have fitted the pretrainer, but can't find way to implement it to the TabNetClassifier() class.
Model Accuracy ROC AUC
0 CatBoost 0.877547 0.866438
1 LightGBM 0.877851 0.865248
2 XGBoost 0.877555 0.864283
3 keras-classifier 0.876747 0.854385
4 TabNET 0.877401 0.778539
The hyperparameters I used is as below.
clf = TabNetClassifier(
n_d=64, # Feature transformer dimensions
n_a=64, # Attentive transformer dimensions
n_steps=5, # Number of steps in each decision
gamma=1.5, # Gamma parameter
n_independent=2, # Number of independent Gated Linear Units
n_shared=2, # Number of shared Gated Linear Units
optimizer_fn=torch.optim.Adam,
optimizer_params=dict(lr=1e-3),
mask_type='entmax', # Use "sparsemax" for comparison
scheduler_params={"step_size": 50, "gamma": 0.9},
scheduler_fn=torch.optim.lr_scheduler.StepLR,
verbose=1
)
clf.fit(X_train, y_train, eval_set=[(X_test, y_test)], patience=30, max_epochs=100, eval_name=['test'], eval_metric=['accuracy'])