#tabular
1 messages Β· Page 1 of 1 (latest)
yesss it worked
The saddest channel on the saddest app Iβve ever used. π’π’π’πππ
OMG π°
Go for a trifecta? The saddest user on the saddest app surfing the saddest channel?
I am working with tabular data from an old prediction competition. Will I need to submit my notebook and will the results be ranked?
Hi @blazing pebble you will not be ranked on the leaderboard, but you will get your public and private score
Whats the difference between public and private
The public LB is computed on a portion of the test set, the private is computed on the remainder of the test set . One reason is to measure your model generalization (Overfitted or not )
Okay I understand. Thanks for this.
How to Use Kaggle:
https://www.kaggle.com/docs/competitions
Find challenges for every interest level
if anyone wants to experiment https://arxiv.org/abs/2209.15421 shows some promise (tabular data augmentation)-- it can help with really unbalanced datasets (or at least it did in my very unscientific test of 2 datasets)
Denoising diffusion probabilistic models are currently becoming the leading
paradigm of generative modeling for many important data modalities. Being the
most prevalent in the computer vision community, diffusion models have also
recently gained some attention in other domains, including speech, NLP, and
graph-like data. In this work, we investi...
I have another question. The dataset is normalized. Do you have any references for dealing with outliers on normalized data?
i think that should depend on the kind of problem your solving,
though popular methods can be to set a cap/threshold on your data. do that preferabley before normalizing your data
Hello, anyone loves LightGBM here ?
Hello , why not XGBoost?!
XGBoost is amazing too of course
Just wanted to ask whats special in lightgbm which you prefer ?
Just to make it clear, I am not saying "lightgbm is better than xgboost". Otherwise, i find lightgbm faster, it uses Exclusive Feature Bundling, i read about it and it helps with sparse features.
You are right , but in my experience xgboost works better with sparse features
Also i should add , based on βNo Free Lunchβ theorem, the choice between XGBoost and LightGBM depends on the specific problem and the available data.
I agree, it depends on the data and the problem. Sometimes, when data contains too many categorical featurss, CatBOOST would be a good alternative.
So the question here was who loves LightGBM and we ended up with a conclusion that everyone loves at least some kind of GBMs?
Ended up with the conclusion that loving a specific algorithm is not a good idea !
The conclusion that: Tabular ---> GBM
Still one month to go in my community competition aiming at understanding and improving ML on tabular data: https://www.kaggle.com/competitions/bench-tab-v1/leaderboard
Multi-task benchmark to evaluate the performance of ML models. Competition based on: https://arxiv.org/pdf/2207.08815.pdf
it seems to provide a good overview of the state of gbdts for tabular data
i hate GBM
oh no
So for tabular data gradient boosting is the way to go? π
Yes
HI, I am Abdullah I am an ML engineer want to join any team to particapte in kaggle competions
https://www.nature.com/articles/s41586-024-08328-6 TabPFN v2: new foundation model for tabular data with super impressive results on tables up to 10Kx500!
Has anyone come up with a method for sensitivity analysis that isn't too overly dependent on the model itself?
I was thinking of maybe making a separate polynomial-searching model to fit and get the gradients of for features with respect to the target feature, in an attempt to find some kind of non-linear trends independent of the production deep learning model, as just observing the gradients doesn't tell me what I want to know, otherwise not sure what else is out there
Job Title: Part-Time Senior AI/ML Engineer (Remote)
We are seeking a skilled and experienced Senior AI/ML Engineer to join our remote team on a part-time basis. The ideal candidate will have a strong technical background, excellent communication skills, and the ability to work independently in a fast-paced environment.
Requirements:
-Minimum of 7β10 years of professional software development experience
-Proven experience working effectively in a remote environment
-Advanced English proficiency (C1 or higher); an American accent is preferred
-Availability to work 10β15 hours per week during EST or CST business hours
If you're a highly motivated engineer with a passion for building high-quality software and can commit to a flexible part-time schedule, weβd love to hear from you.
You can connect with me on WhatsApp: +1 (567) 469-5384
Hi, @everybody
I have one question, I'm training ml models for the prediction, which is classification problem of 3 classes, where the number of samples are similar but the predition is skewed.
First class and second class is predicted with low precision tough, third class is never predicted. What's the reason? I can' t find the reason.
Before, when I applyed reinforcement learning, where the three classes were assigned to three actions and one action is never selected, too.
Actually, that is the preeiction model of forex eur/usd.
I'm finding a US developer for the collaboration. If anybody interested, please dm me.
Dataset on student learnings
https://www.kaggle.com/datasets/mabubakrsiddiq/students-learning-trajectory