#playground-series-s4e10
1 messages · Page 1 of 1 (latest)
Hola, mr Sanwal
Hello Guys !
hi
is this the one for the loan payment?
I am facing "Evaluation metric raised an unexpected error", this particular error while submitting my CSV file
Can anyone please help me
HI, I am a data analyst beginner. Can anyone explain why the sample output given in dataset has 'loan_status' values 0.5? when it should be either 1 or 0? Please help me understand what am I missing?
have you found a solution for this? I have had it before, and I solved it by removing the index in my submission.
Ohh, I'll try wait
It's just a sample, actual would be 1 or 0 only
It worked, thanks a lot !!!
perfect! Glad it helps!
If you want to improve your score with the same model, try submitting the logits (predict_proba) instead of the classes (predict). The granularity of the submission often scores higher by several percentage points.
Oh, Interesting! Worth a try, Thanks !!
Hi! It's because we can predict the probabilities of the outcome rather than stricting the predictions to 0 or 1 depending on your preferences.
You can obtain predictions as probabilities by using {model}.predict_proba function.
As a conclusion, you can feel free to try with both approaches for what gives better score from the same model. I have primarily used the method of probabilities
hello, I joined the competition Loan Approval Prediction. Is there somebody who can help me to understand how to get the original data? Spanish speakers are welcome.
The link to the original data is given in the competition,
This is the link:
https://www.kaggle.com/datasets/chilledwanker/loan-approval-prediction
Thank you @vague yoke
hello all, I'm new to ML and trying to understand how to get a decent score in this competition. I have a basic model with 2 hidden layers (16 and 10 features). I use ReLU for the hidden layers and sigmoid for the output layer. (There is also a batch norm before the activation function). After a fairly short while training, my training loss stopped decreasing and started oscilating. I'm wondering what are some ways to get around this? I've tried increasing the number of hidden layers and the number of features in each hidden layer, but I'm not making progress, it still fluctuates around the same loss. I'm using Adam optimizer so not sure if i really need to fine tune the learning rate more. If I am at the point where I am overfitting the training data, how can I tell if that is the case or if I hit some local minimum? For reference when submitting from this model, I am at around 83-87% on my submissions. Probably a dumb question, but if I'm already overfitting the training set, does that mean my only other option is to add some regularization to my loss function? I'm using a cross_entropy loss function from pytorch
Hi! You can try using callbacks such as ReduceLROnPleateu, that adjusts learning rates if the loos doesn't go further down for X epoches. Additionally, use EarlyStopping callback with restore_best_weights parameter to True, which will restore the model parameters from the best epoch, so you don't need to worry about the fluctuation issue.
Lastly, take a look at Kaggle's free "Intro to Deep Learning" course that will give you more idea about neural networks.
Our objective in this competition is to maximize ROC-AUC score.
You're welcome
hi im new to this
i have question about dataset
- person_income is annual income ?
- person_emp_length is person employment length in years?
- these columns, loan_intent, loan_amnt , loan_int_rate are for the loan they are applying for ? the one we are trying to predict the approval or not ?
- anything about the loan tenure ?
- how is loan_grade determined ?
The playground Dataset is synthetic. Explore relationships, but don't go too far down the rabbit hole of the data generating process.
is the answer we are supposed to give a binary 0 or 1 is can it be a range b/w 0 and 1
Hi, anyone using polynomial features in tree-based models? Does the score improve? I am asking this because I see lots of people using polynomial features but I don’t really think it would work as in tree based models the interaction is built naturally.
Float not int, a probability a person can get loan approved (label 1). Thus it can be used to compute AUC ROC.
CV might improve a little. Find a way to iterate quickly and prove it for yourself.
Never had an improve at scores with using polynomial features. person_income looks more similar to normal distribution after taking the log but it doesn't increase scores that much at the end.
Anybody else try upsampling with SMOTE/ADASYN to fix the class imbalance? The difference between the two methods wasn't statistically significant for me (chi-squared test), but I'd be interested to see how other people approached this. My public score was ~95%
A waste of time here.
Hi guys, I'm currently trying to get my models to 96 on the last day if possible. I achieved a 93 with a random forest model and I've been trying to incorporate a blend of random forest, xg boost, and cat boost with a logistic regression as the meta model, however my performance keeps coming out poorly. I'm not sure what I'm doing wrong, but I would deeply appreciate any tips if possible!
Apologies for it being very disorganized, I'm trying to get better about keeping things tidy