#tabular | Kaggle | Page 1

dreamy pumice Sep 1, 2023, 9:30 PM

#

outer gazelle Sep 1, 2023, 10:29 PM

#

https://tenor.com/view/cry-gif-25866484

Tenor

next sky Sep 1, 2023, 10:32 PM

#

yesss it worked

outer gazelle Sep 1, 2023, 10:37 PM

#

The saddest channel on the saddest app I’ve ever used. 😢😢😢😭😭😭

next sky Sep 1, 2023, 10:41 PM

#

OMG 😰

hushed sparrow Sep 2, 2023, 12:33 AM

#

outer gazelle The saddest channel on the saddest app I’ve ever used. 😢😢😢😭😭😭

Go for a trifecta? The saddest user on the saddest app surfing the saddest channel?

blazing pebble Sep 2, 2023, 11:35 AM

#

I am working with tabular data from an old prediction competition. Will I need to submit my notebook and will the results be ranked?

sick basin Sep 2, 2023, 1:18 PM

#

blazing pebble I am working with tabular data from an old prediction competition. Will I need t...

Hi @blazing pebble you will not be ranked on the leaderboard, but you will get your public and private score

blazing pebble Sep 2, 2023, 1:20 PM

#

sick basin Hi <@893037006123315210> you will not be ranked on the leaderboard, but you wil...

Whats the difference between public and private

sick basin Sep 2, 2023, 1:29 PM

#

blazing pebble Whats the difference between public and private

The public LB is computed on a portion of the test set, the private is computed on the remainder of the test set . One reason is to measure your model generalization (Overfitted or not )

blazing pebble Sep 2, 2023, 1:32 PM

#

sick basin The public LB is computed on a portion of the test set, the private is computed ...

Okay I understand. Thanks for this.

sick basin Sep 4, 2023, 3:56 PM

#

sick basin The public LB is computed on a portion of the test set, the private is computed ...

How to Use Kaggle:
https://www.kaggle.com/docs/competitions

Competitions Documentation

Find challenges for every interest level

#

solemn plover Sep 4, 2023, 6:06 PM

#

if anyone wants to experiment https://arxiv.org/abs/2209.15421 shows some promise (tabular data augmentation)-- it can help with really unbalanced datasets (or at least it did in my very unscientific test of 2 datasets)

arXiv.org

TabDDPM: Modelling Tabular Data with Diffusion Models

Denoising diffusion probabilistic models are currently becoming the leading
paradigm of generative modeling for many important data modalities. Being the
most prevalent in the computer vision community, diffusion models have also
recently gained some attention in other domains, including speech, NLP, and
graph-like data. In this work, we investi...

blazing pebble Sep 6, 2023, 2:22 PM

#

I have another question. The dataset is normalized. Do you have any references for dealing with outliers on normalized data?

wary ridge Sep 6, 2023, 7:44 PM

#

blazing pebble I have another question. The dataset is normalized. Do you have any references ...

i think that should depend on the kind of problem your solving,
though popular methods can be to set a cap/threshold on your data. do that preferabley before normalizing your data

dark kiln Sep 29, 2023, 3:08 PM

#

Hello, anyone loves LightGBM here ?

sick basin Sep 29, 2023, 8:44 PM

#

dark kiln Hello, anyone loves LightGBM here ?

Hello , why not XGBoost?!

dark kiln Sep 29, 2023, 8:47 PM

#

XGBoost is amazing too of course

sick basin Sep 29, 2023, 8:49 PM

#

Just wanted to ask whats special in lightgbm which you prefer ?

dark kiln Sep 29, 2023, 8:54 PM

#

Just to make it clear, I am not saying "lightgbm is better than xgboost". Otherwise, i find lightgbm faster, it uses Exclusive Feature Bundling, i read about it and it helps with sparse features.

sick basin Sep 29, 2023, 9:00 PM

#

You are right , but in my experience xgboost works better with sparse features

#

Also i should add , based on “No Free Lunch” theorem, the choice between XGBoost and LightGBM depends on the specific problem and the available data.

dark kiln Sep 30, 2023, 1:06 AM

#

I agree, it depends on the data and the problem. Sometimes, when data contains too many categorical featurss, CatBOOST would be a good alternative.

hushed sparrow Sep 30, 2023, 1:14 AM

#

So the question here was who loves LightGBM and we ended up with a conclusion that everyone loves at least some kind of GBMs?

sick basin Sep 30, 2023, 7:52 AM

#

hushed sparrow So the question here was who loves LightGBM and we ended up with a conclusion th...

Ended up with the conclusion that loving a specific algorithm is not a good idea !

dark kiln Sep 30, 2023, 9:30 AM

#

The conclusion that: Tabular ---> GBM

dreamy pumice Mar 24, 2024, 10:44 AM

#

Still one month to go in my community competition aiming at understanding and improving ML on tabular data: https://www.kaggle.com/competitions/bench-tab-v1/leaderboard

Tabular Data: are gbdt still outperforming DL?

Multi-task benchmark to evaluate the performance of ML models. Competition based on: https://arxiv.org/pdf/2207.08815.pdf

#

it seems to provide a good overview of the state of gbdts for tabular data

terse gale Mar 25, 2024, 11:24 AM

#

hushed sparrow So the question here was who loves LightGBM and we ended up with a conclusion th...

i hate GBM

dreamy pumice Mar 25, 2024, 7:06 PM

#

terse gale i hate GBM

oh no

true phoenix Apr 20, 2024, 7:38 AM

#

So for tabular data gradient boosting is the way to go? 👀

dreamy pumice Apr 21, 2024, 3:11 PM

#

Yes

silent sand Nov 11, 2024, 10:54 AM

#

HI, I am Abdullah I am an ML engineer want to join any team to particapte in kaggle competions

true yoke Jan 9, 2025, 1:45 PM

#

https://www.nature.com/articles/s41586-024-08328-6 TabPFN v2: new foundation model for tabular data with super impressive results on tables up to 10Kx500!

Nature

Accurate predictions on small data with a tabular foundation model

Nature - Tabular Prior-data Fitted Network, a tabular foundation model, provides accurate predictions on small data and outperforms all previous methods on datasets with up to 10,000 samples by a...

worldly pagoda May 11, 2025, 12:19 PM

#

Has anyone come up with a method for sensitivity analysis that isn't too overly dependent on the model itself?

I was thinking of maybe making a separate polynomial-searching model to fit and get the gradients of for features with respect to the target feature, in an attempt to find some kind of non-linear trends independent of the production deep learning model, as just observing the gradients doesn't tell me what I want to know, otherwise not sure what else is out there

brisk plinth Sep 5, 2025, 4:10 PM

#

Job Title: Part-Time Senior AI/ML Engineer (Remote)

We are seeking a skilled and experienced Senior AI/ML Engineer to join our remote team on a part-time basis. The ideal candidate will have a strong technical background, excellent communication skills, and the ability to work independently in a fast-paced environment.

Requirements:
-Minimum of 7–10 years of professional software development experience

-Proven experience working effectively in a remote environment

-Advanced English proficiency (C1 or higher); an American accent is preferred

-Availability to work 10–15 hours per week during EST or CST business hours

If you're a highly motivated engineer with a passion for building high-quality software and can commit to a flexible part-time schedule, we’d love to hear from you.
You can connect with me on WhatsApp: +1 (567) 469-5384

jovial kernel Sep 15, 2025, 6:57 PM

#

Hi, @everybody
I have one question, I'm training ml models for the prediction, which is classification problem of 3 classes, where the number of samples are similar but the predition is skewed.
First class and second class is predicted with low precision tough, third class is never predicted. What's the reason? I can' t find the reason.
Before, when I applyed reinforcement learning, where the three classes were assigned to three actions and one action is never selected, too.
Actually, that is the preeiction model of forex eur/usd.

jovial kernel Nov 10, 2025, 5:53 PM

#

I'm finding a US developer for the collaboration. If anybody interested, please dm me.

regal echo Mar 11, 2026, 3:52 AM

#

Dataset on student learnings

https://www.kaggle.com/datasets/mabubakrsiddiq/students-learning-trajectory