#🚢┊titanic | Kaggle | Page 2

red ether Jul 10, 2025, 4:56 PM

#

Hey everyone, so I'm working on the titanic dataset and trying to impute some missing values in the age column. But somehow I can't wrap my head around imputing them. Could someone explain or give any advice related to this

plush vortex Jul 10, 2025, 6:54 PM

#

Are we allowed to share some github or Streamlit Cloud url to answer a question ?

compact mesa Jul 12, 2025, 3:23 AM

#

Hey everyone!
I’ve just published a new Kaggle notebook:
"Titanic Survival Prediction using Machine Learning"
I used various ML models and feature engineering to predict passenger survival, and got a solid score on the leaderboard!

Would love it if you could check it out and drop an upvote if you find it helpful! 🙌

🔗 https://www.kaggle.com/code/mrmelvin/titanic-survival-prediction-using-machine-learning

Thanks a ton for the support! 💙

Titanic Survival Prediction using Machine Learning

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

outer rover Jul 13, 2025, 1:13 AM

#

hello guys, i'm just getting started w/ the titanic, I have basics of ML and I espacially know about the transformer architecture however I read that it maybe isn't the best fit for this challenge. Should I stick w/ a randomn forest or do you think a transformer could be a good fit ?

wary heron Jul 14, 2025, 6:21 PM

#

outer rover hello guys, i'm just getting started w/ the titanic, I have basics of ML and I e...

Try atleast could be a fit

outer rover Jul 14, 2025, 6:22 PM

#

ok thks

somber dragon Jul 15, 2025, 1:58 AM

#

Hey guys! I’m pretty new to Kaggle competitions and currently working on the Titanic dataset. I’ve got a few things I’m confused about and hoping someone can help:
1- Preprocessing Test Data
In my train data, I drop useless columns (like Name, Ticket, Cabin), fill missing values, and use get_dummies to encode Sex and Embarked. Now when working with the test data — do I need to apply exactly the same steps? Like same encoding and all that?Does the model expect train and test to have exactly the same columns after preprocessing?
2- Using Target Column During Training
Another thing — when training the model, should the Survived column be included in the features?
What I’m doing now is:
Dropping Survived from the input features
Using it as the target (y)
Is that the correct way, or should the model actually see the target during training somehow? I feel like this is obvious but I’m doubting myself.
3- How Does Kaggle Submission Work?
Once I finish training the model, should I:
Run predictions locally on test.csv and upload the results (as submission.csv)? OR
Just submit my code and Kaggle will automatically run it on their test set?
I’m confused whether I’m supposed to generate predictions locally or if Kaggle runs my notebook/code for me after submission.

minor heart Jul 15, 2025, 11:10 AM

#

somber dragon Hey guys! I’m pretty new to Kaggle competitions and currently working on the Tit...

Hey Destiny, I got you!

So for your preprocessing, I recommend putting all the cleaning steps you went through for the training set into a function. And then yes, you'll apply all of that again on the testing set. Just note that the testing set does not have the 'Target' value, everything else will be the same (though check to make sure there's not any new types in the testing set that your cleaning does not account for.
No! You should seperate the Target column into a "y" variable and then the rest of the columns can be in an "X" variable. So yes,, you are doing it the correct way. You do want to keep it to make sure you have a way to check your predictions though.
For your predictions, you'll submit a csv file with 2 columns (PassengerId, and Prediction), they detail the exact format. When you output it make sure you have index=False though. [eg. results.to_csv('Titanic_Predictions_Random_Forest.csv',index=False)]

#

Anyways, I just did the titanic dataset myself. I used Decision Trees and Random Forest. Ended up getting 0.75 and 0.76 respectively. Though I only used n_estimators and depth as hyperparameters for RF, so I'll go through it again with more parameters. Curious to see which models perform the best for this classification assignment.

somber dragon Jul 15, 2025, 3:26 PM

#

minor heart Hey Destiny, I got you! 1. So for your preprocessing, I recommend putting all th...

thank you so much michael i do undrstand now

#

now i am facing this problem :

#

In my notebook, I save the submission file like this:

# Option 1
submission = test[['PassengerId', 'Survived']]
submission.to_csv('submission.csv', index=False)

# Option 2 (also tried)
submission = test[['PassengerId', 'Survived']]
submission.to_csv('/kaggle/working/submission.csv', index=False)

I double-checked, the file looks like this:

PassengerId  Survived
0          892         0
1          893         0
...
(418, 2)
PassengerId    0
Survived       0
dtype: int64

It appears correctly in the output folder in Kaggle after running, but when I submit the notebook, I still get: "Submission CSV Not Found."

Any idea what could be wrong? Does Kaggle expect any specific step to detect it?

Thanks in advance!

#

and yea, i got the reason now
while submiting, kaggle will run the cells from the start so any errs or typo will stop the submition eventhough i could run the cell by my self wihtout errs

minor heart Jul 16, 2025, 5:52 AM

#

somber dragon and yea, i got the reason now while submiting, kaggle will run the cells from th...

Ah great. But yeah when I first saw your response I heard you submitted the notebook, instead of submitted the csv file. You have it working now?

opal carbon Jul 16, 2025, 3:38 PM

#

Hello everyone! I'm here to share the code I used to make predictions and submit my solution for the Titanic competition.

I'm Brazilian, so I initially wrote the notebook in Portuguese. Depending on the interest or feedback, I’d be happy to create an English version to make it more accessible for everyone.

Using a model that combines Random Forest and SVM, I achieved an accuracy of 78.23%.
Here’s the link:

🔗 https://www.kaggle.com/code/marcomata/titanic-submission

Titanic_submission

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

quartz sonnet Jul 16, 2025, 4:14 PM

#

opal carbon Hello everyone! I'm here to share the code I used to make predictions and submit...

That's interesting. VotingClassifier is one option. Another option is to use the average of probabilities of the Survived to determine the outcome. Another option is to train a new model with the probabiilities of the two models as predictors.

opal carbon Jul 16, 2025, 5:57 PM

#

quartz sonnet That's interesting. VotingClassifier is one option. Another option is to use th...

One thing I noticed is that my model classified a lot of men as survivors (20).
Maybe if I adjust the logic so that a prediction of '1' is only made when the model's confidence is above 70% or something similar, my final accuracy could improve.

quartz sonnet Jul 18, 2025, 4:40 PM

#

In fact. A simple decision tree can achieve "good" results in the titanic dataset. Simple rules with age + gender, etc

oak spindle Jul 22, 2025, 2:20 PM

#

i am trying to getting start with projects and i tried with titanic dataset and i am getting 82% accuracy... i dont know its good start or not ??

can anyone guide me here what i did wrong ?

bronze path Jul 22, 2025, 2:27 PM

#

And can anybody guide me how to participate and what to do in titanic competitions as I am new to ai stuff and a beginner

hallow gust Jul 22, 2025, 3:40 PM

#

oak spindle i am trying to getting start with projects and i tried with titanic dataset and ...

If you are under 30% on the leaderboard then that is good as a beginner

lament badge Jul 23, 2025, 4:25 PM

#

Kaggle is the best platform to practice Ml ,As a beginner you should go with logistic regression model in titanic compitition I have 79 percent accuracy 🫡

quartz sonnet Jul 23, 2025, 5:06 PM

#

oak spindle i am trying to getting start with projects and i tried with titanic dataset and ...

You did well. In this competition, the test labels are known, and some people use them to get better results.

oak spindle Jul 26, 2025, 1:36 AM

#

quartz sonnet You did well. In this competition, the test labels are known, and some people us...

I have a query that I drop some columns because I thought that might not be helpful.... But I actually don't know should I drop those or not ?

Is there any way to check that i should drop this column and i should drop this one

#

@opaque lotus sir please delete this, it's scam

quaint copper Jul 28, 2025, 11:40 PM

#

Hello! Im new here as i started to learn ML quite recently, so be patient 😉

I've used 4 types of models: DT, KNN, MLP and XGboost.

Using stratified kfold with 5 splits and some hyperparameter optimization (just basic grid search with not much exploration) i got validation accuracies on the XGboost and MLP around 0.82 - 0.84.

However, when I run the models in the test df, the kaggle scores of XGboost and MLP drop to 0.76-0.77, and my best model becomes a DT with 0.782.

Is this normal behavior? I know the number of observations is not that large and therefore a few bad predictions will have impact. Still, i wasnt expecting a gap like this.

rigid storm Jul 30, 2025, 6:33 AM

#

I am new to ML and am trying to grasp the basics. If anyone could please tell me what kind of checkpoints should i set for myself so that i know that i am doing the right thing and actually understanding this stuff.

ornate valve Aug 4, 2025, 11:16 AM

#

@quaint copper It's too different to train it on your PC than to do it on Kaggle: Accuracy: 0.877778

median patio Aug 5, 2025, 4:59 AM

#

Just finished my Titanic submission on Kaggle — scored 79% accuracy using Random Forest!
I’ve shared all the steps and reasoning behind each move in my notebook:
🔗 https://www.kaggle.com/code/nishchalpandey/titanic-survival-prediction-random-forest

Would love any feedback you guys have 🙌
And if you find it useful, a quick upvote would mean a lot! 🚀

Titanic Survival Prediction | Random Forest

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

prisma abyss Aug 5, 2025, 2:21 PM

#

Hey guys I am new to ML and want to build models , I wanted to start with kaggle , are there any yt playlists that help me understand concepts of ML and cover basics

#

And I would love to collaborate , help and learn .

hushed fox Aug 10, 2025, 4:20 PM

#

prisma abyss Hey guys I am new to ML and want to build models , I wanted to start with kaggle...

Hello, you can start with the Iris flower classification, it is a classic
I am still starting out and this was really great, and I advise you to watch the tutorial from the channel "projects data science" channel on YouTube it is about 5 videos they are small but beneficial and doesn't take long.
hope this helps 👍

prisma abyss Aug 10, 2025, 5:14 PM

#

hushed fox Hello, you can start with the Iris flower classification, it is a classic I am s...

Thanks for the suggestion 😊, I will definitely check it out

solemn stream Aug 12, 2025, 10:01 PM

#

Hello , I am new to ML and want to build models , i am looking to create a small team to start try out some of the Kaggle challenges. I'd like to start fairly simple with things like classification or regression problems.
Thanks 😇

frank jasper Aug 20, 2025, 6:33 AM

#

Hi everyone! 👋
I’m new here, and I wanted to share a lightweight, rule-based framework I’ve been working on — called AdaptoFlux.

It currently uses simple, human-readable arithmetic operations — like addition, subtraction, multiplication, and division — along with basic transformations (e.g., ±1, negation, and value copying).
No machine learning models. No gradients. Just pure math 🧮

Despite its simplicity, it typically achieves over 70% accuracy on the leaderboard — and with further tuning, sometimes even higher!

I’ve put together a fully runnable notebook with:
✅ One-click execution
✅ Step-by-step breakdown of how the rules are built
✅ Visualization of the logic flow and decision process

Check it out here:
👉 https://www.kaggle.com/code/gugu12138/adaptoflux-on-titanic

Would love to hear your thoughts or feedback!
And if you find it interesting, an upvote would mean a lot 🙌

P.S. This is just the starting point — the framework is designed to support more complex operations in the future, so it’s not limited to basic arithmetic. Excited to explore where this can go! 🚀

Looking forward to learning and collaborating with all of you!

hushed moss Aug 25, 2025, 1:26 PM

#

Guys i used random forest classifier algo for titanic and got 0.791 score.
how can i do better ?

regal lily Aug 27, 2025, 1:31 PM

#

https://github.com/Akanakis1/Titanic_Machine_Learning_from_Disaster

quartz sonnet Aug 28, 2025, 6:52 PM

#

quaint copper Hello! Im new here as i started to learn ML quite recently, so be patient 😉 I'...

The titanic dataset is small. You are getting an optimistic estimate from your cv score. What you can try is to see how your cv scores and lb scores correlate from simple to more complex models

cosmic pumice Aug 29, 2025, 1:06 PM

#

Hi everyone, I used two MLP models for prediction and assigned a weight of 0.5 to each model's prediction for the final output. But now my score is only 0.7799. What should I do next to improve it?

#

material_goose

lucid frost Aug 30, 2025, 6:16 AM

#

Hello, i sent my first titanic submission and after 18h it still has a score of 0.000; Do you guys know how long it usually takes to receive a score?
Many thanks!

cosmic pumice Aug 30, 2025, 6:38 AM

#

lucid frost Hello, i sent my first titanic submission and after 18h it still has a score of ...

I think you might have submitted the wrong file. Normally, you wouldn’t get a score of zero unless you’re somehow predicting the exact opposite of the correct answers. Generally, the score appears right after you submit it.

lucid frost Aug 30, 2025, 7:38 AM

#

cosmic pumice I think you might have submitted the wrong file. Normally, you wouldn’t get a sc...

thank you! i gotta check what went wrong

cosmic pumice Aug 30, 2025, 1:45 PM

#

what happen

civic harness Sep 5, 2025, 4:30 AM

#

Hi everyone,

I am trying to build my data science skills using Kaggle competition dataset. I submitted my test data for Titanic dataset. My accuracy score is 77% even after multiple submissions. I check Leadboard and some people have gotten score of 100%. So, I was wondering is there a way to compare their notebooks with mine?

sacred sleet Sep 5, 2025, 12:50 PM

#

civic harness Hi everyone, I am trying to build my data science skills using Kaggle competit...

Hi ! unfortunately, you can only check code of users that published theirs notebooks on the "CODE" section on each challenges.

But 77% is a great score ! (100% are only users who cheated because the list of who survived or not is available on wikipedia, so instead of creating a learning machine they just put the values by hand and got 100%)

civic harness Sep 5, 2025, 3:48 PM

#

@sacred sleet thank you for the information.

All other friends, please let me know if your accuracy was more than 77% . I would be happy to discuss with you all 😊

velvet harness Sep 5, 2025, 7:26 PM

#

Looking for a teammate for this challenge. So far the best submissions achieved are the following:

XGBoost - 79.42%
Linear Regression - 77.03%

The final goal is to achieve >80% accuracy.
We can share some ideas for EDA and Feature Engineering

velvet harness Sep 5, 2025, 7:37 PM

#

lucid frost Hello, i sent my first titanic submission and after 18h it still has a score of ...

check if your csv file matches the submission template
the values within the csv file have to be ints and you may have floats. astype(int) should fix the problem

spare heart Sep 6, 2025, 6:23 PM

#

velvet harness Looking for a teammate for this challenge. So far the best submissions achieved ...

Hi @velvet harness you can dm me I have similar score

cosmic pumice Sep 8, 2025, 7:40 AM

#

civic harness Hi everyone, I am trying to build my data science skills using Kaggle competit...

Hey, my accuracy is 77% too. I used a linear regression model.

knotty holly Sep 9, 2025, 12:45 PM

#

cosmic pumice Hey, my accuracy is 77% too. I used a linear regression model.

Hi, I'm new to this challenge and to the subject in general. Out of curiosity, why did you choose linear regression instead of logistic, given it is a classification problem?

hollow jasper Sep 13, 2025, 6:36 PM

#

@quartz scarab is it possible to remove these advertisements from here. Happy to moderate if you're in need.

hollow jasper Sep 13, 2025, 6:37 PM

#

velvet harness Looking for a teammate for this challenge. So far the best submissions achieved ...

Logistic regression

low stone Sep 13, 2025, 6:52 PM

#

Looking for a partner who wants to collaborate for the titanic competition pls dm if interested

uneven monolith Sep 15, 2025, 12:07 PM

#

Hello, could you please point me to a simple and easy-to-understand code for the Titanic competition so that I can learn from it? Thank you in advance.

hollow jasper Sep 15, 2025, 6:16 PM

#

uneven monolith Hello, could you please point me to a simple and easy-to-understand code for the...

it depends what you mean, you can ask ChatGPT for a simple logistic regression model that'll give you 76-77% on the first run. from there, squeezing out percentage points requires different techniques, feature engineering, hyperparameter tuning or all of the above.

uneven monolith Sep 15, 2025, 6:22 PM

#

hollow jasper it depends what you mean, you can ask ChatGPT for a simple logistic regression m...

Okay, I will try this. Thank you very much.

paper ivy Sep 26, 2025, 1:30 PM

#

Just curious, what scoring method should one use when hyperparameter tuning the models? I'm thinking accuracy, because that seems to be what the leaderboard uses and the imbalance is not that huge (so no balanced_accuracy needed?). But I'm kinda new to this, so... 🙂

hollow jasper Sep 26, 2025, 5:02 PM

#

paper ivy Just curious, what scoring method should one use when hyperparameter tuning the ...

hyper parameter tuning does almost nothing on titanic. it's feature engineering + hybrid models that do best

small anvil Sep 26, 2025, 11:42 PM

#

I have a tricki cuestion about grid search, ES, and hiperparameters tunning stuff. With the same model (XGB) and features, i discover by accident a set of hiperparameters that provides 79% of accuracy, and with tecniques like grid search and ES, the best hiperparameters that i found provides me 77% of accuracy, the problem i can recognizance is i'm using cross validation to check the set of hiperparameters but that don't have linear correlation with the accuracy on the submits

#

so, i'm very confuse. I mean, if grid search and early stop don't find the best set of hiperparameters, i have to assume i make something wrong or is for the nature of the data, like different patterns in the train - test set that don't allows me to say "a improvement in cross validation represent an improvement in the submit"

#

i know hiperparameters tuning is not the big thing in this data set, but anyways i would like to understand whats is happening here

hollow jasper Sep 27, 2025, 1:07 AM

#

small anvil I have a tricki cuestion about grid search, ES, and hiperparameters tunning stuf...

yeah hyperparameter tuning won't do much. 79% is just bit above the two rule "model"

def predict_gender_rule(df):
"""Return Survived (positive class) if female, return Not Survived (negative class) if male"""
return np.where(df["Sex"] == "female", 1, 0)

def main() -> None:
pd.read_csv(TRAIN_CSV)
test = pd.read_csv(TEST_CSV)

pred_test = predict_gender_rule(test)

submission = pd.DataFrame(
{
"PassengerId": test["PassengerId"].astype(int),
"Survived": pred_test,
}
)

out_path = OUT_SUB / "submission_gender_rule.csv"
submission.to_csv(out_path, index=False)
print(f"[done] wrote {out_path.resolve()}")

if name == "main":
main()

small anvil Sep 27, 2025, 1:11 AM

#

hollow jasper yeah hyperparameter tuning won't do much. 79% is just bit above the two rule "mo...

yeah, i get it

#

but my question is

#

it's normal to exist that kind of "blind points" for detect hiperparameters? which criteria i have to use to search it or define a range of searching?

#

how to proceed when a improve in cross validation represent a worse accuracy in submits?

paper ivy Sep 27, 2025, 8:13 PM

#

hollow jasper yeah hyperparameter tuning won't do much. 79% is just bit above the two rule "mo...

What is a good score then? Looking at the leaderboard, 79% seems to be a fairly good score...

small anvil Sep 27, 2025, 8:39 PM

#

paper ivy What is a good score then? Looking at the leaderboard, 79% seems to be a fairly ...

If your score can be matched with a heuristic criteria that don't necessarily mean your model have bad accuracy, that probably means machine learning is maybe not the right approach in cost / efficiency terms

small anvil Sep 27, 2025, 8:41 PM

#

paper ivy What is a good score then? Looking at the leaderboard, 79% seems to be a fairly ...

look https://www.kaggle.com/code/yoni2k/top-3-with-only-4-features-no-data-leakage#Assumptions: this guy say 81.8% put you in the top 3% without data leakage

paper ivy Sep 27, 2025, 10:22 PM

#

Thanks a lot!

broken juniper Sep 27, 2025, 10:26 PM

#

Hi guys, I need a piece of advice, I'm really mad trying to fix this but no luck

I had 0,83 and something accuracy with predictions on validation data when I split the training set
I uploaded it to the competion and got 0,77

I asked ChatGPT and it? he? advised to check the tree max depth, and also use cross-validation where I should look to decrease the difference between best and worst score. Which I did

In 5 runs of cross validation I got:
Mean accuracy: 0.8305
Difference between max and min: 0.0449

So basically it should not be worse than 0.785

I uploaded the new submission and got even worse result of 0.76555

What do I do to get more relevant results on public data? 😭

small anvil Sep 28, 2025, 2:05 AM

#

broken juniper Hi guys, I need a piece of advice, I'm really mad trying to fix this but no luck...

it's exactly what i was asking lately. Seems like the score in cross validation and the score in the submissions have no linear correlation, or maybe is that the model may overfit even with good accuracy in cross validation. If you find what is happening pls comment it in this cannel, cuz i'm facing the same problem.

slow tangle Sep 29, 2025, 3:53 PM

#

anybody has an idea why simple logistic regression is doing better than NN

small anvil Sep 30, 2025, 3:29 AM

#

I had some advances with the problem, is clearly overfitting.

slow tangle Sep 30, 2025, 8:37 AM

#

Ive tried using regularizations with best lambda for cross validation set but still worse in test set than logistic regr.

#

one thing I haven’t done is parsing Name.. mostly just ignored it

#

It might be likely that married with kids might be less likely to survive, as well titles like Master might mean more likely to survive…

hollow jasper Sep 30, 2025, 1:33 PM

#

small anvil look https://www.kaggle.com/code/yoni2k/top-3-with-only-4-features-no-data-leaka...

6 years old, also there's a ton of leakage and cheating in this since results are known. Any legit model in above 83% seems to be a big stack of models with a mix of rules and sending the rest to a gradient boosting algo.

Like Chris Deotte's scores https://www.kaggle.com/code/cdeotte/titanic-wcg-xgboost-0-84688 84.6%. This is a really complicated solution for a beginner which is why I don't even recommend people trying to learn to try to further optimize based basic logistic regression and XGBoost.

do some feature engineering, do some model building, submit and move on to a new problem, imo!

sage spoke Sep 30, 2025, 6:03 PM

#

Hi, how are you? I’m working on the Titanic project and did stacking (XGBoost, Random Forest, and Logistic Regression) and finalized with a Soft Voting ensemble (XGBoost, Random Forest, Logistic Regression plus the previous Stacking). I got evaluation results of VotingClassifier (Soft Voting)
Accuracy: 0.8492
ROC AUC: 0.8764.

However, my ranking is very low (Score: 0.76315), and I don’t understand why — I thought these were good results. Could someone please suggest how to improve? I’m still learning!

My Code is here: https://www.kaggle.com/code/lorrancintra/titanic-4-hybrid-ensemble-final

small anvil Sep 30, 2025, 7:27 PM

#

hollow jasper 6 years old, also there's a ton of leakage and cheating in this since results ar...

Sry i don't get it. U say that approach have data leakage? cuz i read his methodology and i was not able to recognize any type of data leakage.

hollow jasper Sep 30, 2025, 9:55 PM

#

small anvil Sry i don't get it. U say that approach have data leakage? cuz i read his method...

Which approach? Chris? Not really leakage but it's implicit reverse engineering. The models that were stacked performed very well on the same test data. It's pretty easy to see how this wouldn't happen in a majority of settings. But it's kaggle, not real life so go for it

#

Chris is just crafty as heck and was able to make the best of it given years of submissions before him

hollow jasper Sep 30, 2025, 10:21 PM

#

sage spoke Hi, how are you? I’m working on the Titanic project and did stacking (XGBoost, R...

I feel bad that you went to all that trouble with that result. Your feature engineering doesn't look to add any signal to the log reg base line model with no feature engineering, it may have added more noise

#

My best was light feature engineering and a stable xgboost

#

Embarked is def not a feature I found useful, why'd you train your model on it

#

Couldn't look at it too long, looks very AI generated, you need to drop alot of useless / redundant features that aren't adding any signal though!

small anvil Sep 30, 2025, 11:02 PM

#

hollow jasper Chris is just crafty as heck and was able to make the best of it given years of ...

Which score you consider a good result to achieve without data leakage, crazy ensembles or reverse engineering?

hollow jasper Oct 1, 2025, 12:00 AM

#

small anvil Which score you consider a good result to achieve without data leakage, crazy en...

Anything above 81%

#

Try some other comps before going too crazy on titanic tho, it'll help you develop different techniques

small anvil Oct 1, 2025, 2:02 AM

#

@hollow jasper i have one more question. I was reading about the Early Stop in grid search and i been trying that stuff but as long i can see, that have no native integration with cross validation. Is actually a good idea make an integration of early stop + cross validation? in case that is not worth, how u avoid the overfitting in Early Stop?

paper ivy Oct 1, 2025, 9:00 AM

#

sage spoke Hi, how are you? I’m working on the Titanic project and did stacking (XGBoost, R...

Your model is probably overfitted to the training data, that's the reason for the big score difference.

Some ideas:

Don't encode everything ordinarily, try OHE. This does not matter for tree-based models, but for Logistic Regression it does. Similarly, binning for ages and fare for logistic regression, maybe also log-scale the fare
Speaking of Logistic Regression: Are you ever using that model (not the meta learner, the other one)?
You're using xgb both in the stacking classifier as well as the voting, of course this weights xgb heavily
Have you ever calculated something like cohen's kappa for the models?
Try extracting more features, explore which actually have predictive power (correlation, feature importances in random forest) and drop the ones you don't actually use

sage spoke Oct 1, 2025, 1:03 PM

#

hollow jasper Embarked is def not a feature I found useful, why'd you train your model on it

You're right, Zach. I'll work on approaching the system in a more appropriate way. I saw your work — you went through the features one by one. Thanks for the tip!

hollow jasper Oct 1, 2025, 2:06 PM

#

sage spoke You're right, Zach. I'll work on approaching the system in a more appropriate wa...

Thanks, There's many people who have spent a long time on this set since it's been out for awhile. It's a cool toy example, but nothing to stay stuck on beyond putting a submission together that uses a 0 leakage and clean preprocess, fit, transform, predict , submit with any score above or even equal to the baseline of 76.5%

#

As a beginner at least, if it's just for fun and you're very experienced, it can be fun to re-visit and try to go for high scores

#

I'd also make sure you're using a clean repo structure and doing commits. One thing I noticed about your note book is your file paths looked a bit messy. Pathlib is OS agnostic and is very nice to collaborate with.

This is all null if u don't plan on ever being in a collaborative environment, but I figure most people are

hollow jasper Oct 1, 2025, 2:15 PM

#

small anvil <@327618373582061570> i have one more question. I was reading about the Early S...

Well, early stopping and CV serve different purposes. CV is for evaluating how the model will generalize on unseen, early stop on GridSearch tells you when you're hitting a point of no improvement so it's best to go back to the hyperparameters that produced best score

If they could be "integrated", that's pure data leakage since validation set is to be treated as unseen

#

We can only train our model with training data, and we treat every validation set the same as test aka non train

#

That's why those don't mesh

#

Good question though

ionic lotus Oct 4, 2025, 9:21 PM

#

Hi everyone. Got my titanic score to a 0.775 any tips on how to increase my score

sacred sleet Oct 5, 2025, 8:30 AM

#

ionic lotus Hi everyone. Got my titanic score to a 0.775 any tips on how to increase my scor...

You can try multiple types of models like XGBOOST and try different settings values, you can also learn about data eengenring

#

but 0.775 is a very good score, i suggest you to try other copetitions with bigger datasets like the #🏠┊house-prices-advanced-regression-techniques competition

worthy forum Oct 6, 2025, 2:55 PM

#

what are common ways to get high > 0.9 score?

#

is there a specific algorithm? some key features that need to be engineered?

#

i did two runs. One on my first try (0.74) and the second after reading MEG RISDAL's post (0.77)

#

is this good enough for a bigenner? Should i move on to another competition?

sacred sleet Oct 6, 2025, 6:25 PM

#

worthy forum what are common ways to get high > 0.9 score?

0.9 is impossible, the peoples how succed cheated by putting manualy the data from wikipedia

worthy forum Oct 6, 2025, 6:44 PM

#

sacred sleet >0.9 is impossible, the peoples how succed cheated by putting manualy the data f...

Ok now i feel better about my model's performance 😅

sacred sleet Oct 6, 2025, 6:44 PM

#

worthy forum Ok now i feel better about my model's performance 😅

You can x)

trail breach Oct 7, 2025, 11:00 AM

#

🚢

hollow jasper Oct 10, 2025, 12:37 AM

#

worthy forum Ok now i feel better about my model's performance 😅

Your model performed on par with the no model if female lives if male dies

#

It's ok though most do

hard tiger Oct 10, 2025, 11:30 AM

#

link for this competetion anyone ?

worthy forum Oct 11, 2025, 4:15 PM

#

hollow jasper Your model performed on par with the no model if female lives if male dies

but i guess my model would generalize better if we change the data?

#

like data with 20 female 80 male

hollow jasper Oct 11, 2025, 7:55 PM

#

worthy forum but i guess my model would generalize better if we change the data?

well there is no way to know. Which is why trying to eek out percentages off of titanic is rather silly. We'll never see another titanic and you'll only ever have that one set of test data to evaluate. its a good place to start and spin up your first real jupyter notebook and do your first real EDA, but beyond that -- dont waste your time optimizing this set

steep gazelle Oct 13, 2025, 8:16 AM

#

Hi! I just made my first submission and Scored 0.76555 Any tips to improve this? Or should I move to other competitions/projects, also this project made me realize how important math is for this field, please let me know which concepts and subjects I can focus on

peak cypress Oct 13, 2025, 2:11 PM

#

why have you needed maths? what model have you used?

hollow jasper Oct 13, 2025, 6:56 PM

#

steep gazelle Hi! I just made my first submission and Scored 0.76555 Any tips to improve this?...

That's equal to heuristic model if male, dead if female live. This isn't a project to dwell on. You have 800 rows to train and Val on (tiny) and 400 rows to test (tiny)

Trying to optimize it isn't worth it

#

The heart of this project is about building a clean machine learning work flow.

Use VScode and make a remote repo on GitHub with regular commits (with proper commit messages) for practice.

Make a nice EDA note book (either a Python interactive file or Jupyter) that covers the basic ideas of feature engineering and includes a full spectrum of analysis on what you have.

vast aspen Oct 15, 2025, 8:10 PM

#

I created a wonderful Streamit app about the preaching of the Titanic survivors 🚢

You can check it out at: https://app-app-titanic-data-bdwwycbgdejsmtuv4ntkss.streamlit.app/

I'm really interested in your opinions.
Thank you.

kind jungle Oct 16, 2025, 6:54 AM

#

Can anyone provide me with the titanic test.csv and train.csv. Unfortunately I cannot download it from the kaggle.com website. Kaggle support cannot help... Thanks a lot in advance...

cedar depot Oct 16, 2025, 7:27 AM

#

kind jungle Can anyone provide me with the titanic test.csv and train.csv. Unfortunately I c...

Source: Kaggle https://share.google/LiH1qRrt24FREscc0
Download from here, in "Data" tab you can find all files

torpid cradle Oct 16, 2025, 10:06 AM

#

It's my first time participating in these type of competition on kaggle. I do have experience with Machine Learning projects. Is there any tip for beginner like on what steps I need to pay attention or anything?

steep gazelle Oct 17, 2025, 6:02 AM

#

peak cypress why have you needed maths? what model have you used?

Just to get a deeper understanding of the models I am using. I used Logistic Regression, KNN classifier, Random forest classifier and svc

steep gazelle Oct 17, 2025, 6:04 AM

#

hollow jasper That's equal to heuristic model if male, dead if female live. This isn't a proje...

Understood I should move to other projects then since, I made it in a jupyter notebook I will add comments and descriptions and push on github

hollow jasper Oct 17, 2025, 6:18 PM

#

steep gazelle Understood I should move to other projects then since, I made it in a jupyter no...

Yes, if you are aiming to be a professional, there's quite a few things that are not tested in kaggle that you absolutely should have mastered; GitHub, ETL ( leaving you flexible for smaller teams where data engineering is not yet stream lined) and managing cloud platforms.

I'm being paid extremely well (TC) bc I joined a start up who needed MLE and data engineer/analyst in one person

pastel axle Oct 18, 2025, 11:51 AM

#

Hey

pastel axle Oct 19, 2025, 3:45 AM

#

Can anyone help me to enter the competition

#

Please

elder timber Oct 28, 2025, 10:29 PM

#

worthy forum is this good enough for a bigenner? Should i move on to another competition?

I feel your pain, I got to a .779 and can't seem to break past it. My param were n_estimators=200, depth=10 and I had [Pclass, SibSp, Parch, Fare, Sex,Embarked]. How did you get there?

stiff gull Oct 30, 2025, 4:24 PM

#

sacred sleet >0.9 is impossible, the peoples how succed cheated by putting manualy the data f...

Oh, I thought those people were great data scientists

cedar depot Nov 4, 2025, 5:08 AM

#

continuation: #5dgai-introductions message

@ivory lotus ,
Firstly do you have prior knowledge/experience with Python, Numpy, Matplotlib, and Pandas ?

#

If yes then try to read this:
https://www.kaggle.com/competitions/titanic and watch the attached youtube video. And share here if you face any doubts/issues.

#

@ivory lotus talk here please

#

That channel is for introductions, this is the right channel 🐣

#

So as you mentioned you have no experience, I followed this roadmap that I would suggest you to follow too:

Kaggle has a bunch of courses and guides at https://kaggle.com/learn

Intro to Programming Course
Prerequisites: None.
https://share.google/68HMIU2sPcz7jXpKn
Python Course
Prerequisites: "Intro to the Programming Course if you have no knowledge about programming"
https://share.google/tlKbeLBPrcKsfnSQZ
Pandas Course
Prerequisites: Python Course
https://www.kaggle.com/learn/pandas
Intro to Machine Learning
Prerequisites: Python Course
https://share.google/bTm7E168S656eQRzO

Or alternatively, you can find a video course on some other platform like YouTube and follow that..

elder narwhal Nov 9, 2025, 5:23 AM

#

@tight robin

#

@tight robin sir please accept my request in linkedin and send me request to just please share your experience. Iam 17 years old

smoky wing Nov 9, 2025, 7:25 AM

#

Hii

scarlet socket Nov 10, 2025, 2:56 PM

#

everyone stay away from this guy, i think he is trying to hack

ivory eagle Nov 11, 2025, 10:49 AM

#

Follow me on LinkedIn: www.linkedin.com/comm/mynetwork/discovery-see-all?usecase=PEOPLE_FOLLOWS&followMember=shivraj-patel-7b40bb382

gleaming shard Nov 15, 2025, 4:55 AM

#

hello guys

#

how do we do the submission?

static wyvern Nov 22, 2025, 7:03 PM

#

Hey Guys!! Getting my hands on Titanic Competition. Check out my recent submission through Random Forest. Dropped the columns['PassengerId', 'Name', 'Embarked', 'Cabin', 'Ticket'] https://www.kaggle.com/code/abhangkolte/titanic-randomforest

#

Am thinking of keeping the name the next time. Experiment a little with the Random Forest first.

tame vessel Dec 13, 2025, 2:02 AM

#

Hola alguien sabe si puedo usar lo aprendido en el cuso de house prices advanced en la prediccion de este ejercicio? gracias

spare temple Dec 24, 2025, 6:26 PM

#

hi new here,looking for friends,and beginners also to learn together, acountability mate

nova furnace Dec 31, 2025, 7:19 AM

#

Hello, I'm new to ML. I'd love to connect with ambitious friends to grow and learn together

errant mantle Jan 1, 2026, 9:40 AM

#

Good day! I am a player stuck in the newbie village ! 😊

bitter oriole Jan 2, 2026, 3:27 PM

#

Hello everyone, I'm Silver and I'm a beginner I just started today and I'd love friends that we can work together to make it easier for each other

jaunty shard Jan 2, 2026, 5:27 PM

#

Hi everyone, I'm David and I'm a beginner in Data science.
Let's connect and learn from each other and probably end up working together.

midnight pulsar Jan 4, 2026, 7:45 PM

#

jaunty shard Hi everyone, I'm David and I'm a beginner in Data science. Let's connect and lea...

Hi, I'm a beginner too! If you want to learn with me, let me know 😁 I just know the basics of data science, but I want to learn a lot about this topic

jaunty shard Jan 5, 2026, 10:21 AM

#

midnight pulsar Hi, I'm a beginner too! If you want to learn with me, let me know 😁 I just know...

Nice brother.

versed sandal Jan 13, 2026, 3:02 PM

#

Hi everyone, I'm tyros. I've studied machine learning algorithms and deep learning, and I'm currently focusing on computer vision (CV) learning. However, I've never participated in Kaggle competitions before, and my problem-solving thinking and coding skills are just average. I hope to meet like-minded friends here to learn and grow together.

versed sandal Jan 13, 2026, 3:03 PM

#

versed sandal Hi everyone, I'm tyros. I've studied machine learning algorithms and deep learni...

I hope we can grow together. If possible, we can connect with each other as friends—so we can chat, discuss and solve problems together.

severe zealot Jan 15, 2026, 12:11 AM

#

midnight pulsar Hi, I'm a beginner too! If you want to learn with me, let me know 😁 I just know...

Same

glad ingot Jan 16, 2026, 7:31 PM

#

Hey everyone! I am Akarshi and have some exprience in Data Science. I'd love to connect with like minded people to work together on projects and exchange ideas.

solid holly Jan 20, 2026, 1:25 PM

#

Hi everyone! I’m a CS student starting my ML journey and using the Titanic competition as my “hello world” for data science (RIP Titanic, thank you for the dataset 🫡).I'm on preprocessing, basic models and hyperparameter tuning rn.
Feel free to say hi nd connect.

hybrid spire Jan 22, 2026, 5:55 PM

#

errant mantle Good day! I am a player stuck in the newbie village ! 😊

What is newbie village

shut oxide Jan 23, 2026, 1:26 PM

#

can we participate in the competition as an individual or as a team?

dry rivet Jan 27, 2026, 7:06 PM

#

solid holly Hi everyone! I’m a CS student starting my ML journey and using the Titanic compe...

Hi, Padmesh. I've just started with Titanic as my introduction to datasets and data science as well. Would love to discuss more.

oblique rain Jan 31, 2026, 6:10 AM

#

dry rivet Hi, Padmesh. I've just started with Titanic as my introduction to datasets and d...

My opinion don't mention this on resume dataset because the most of the companies didn't appraise it ..it's too appriseal you can use it for the training purpose

calm hill Feb 14, 2026, 2:31 AM

#

Does anyone know how to increase the titanic ML score on kaggle? I'm still learning

frail terrace Feb 14, 2026, 4:43 PM

#

would you guys appreciate a notebook baseline template which you can easily iterate on?

calm hill Feb 15, 2026, 2:56 AM

#

Have you ever tried up to 80%?

#

Thanks

paper ivy Feb 16, 2026, 1:29 PM

#

You probably need more feature engineering. In the end I ended up using:

Title
Embarked
Sex
Pclass
HasCabin
IsChild (Age < 12.0)
AgeMissing
SmallGroup (Group size by ticket between 2 and 4 (both included))
FamSize (Family Size)
Age
FarePerPerson (log-scaled, although this does not matter for tree-based models)
Survival Rate (of the group, either by surname or by ticket, mean if both could be calculated, make sure you have no data leakage)

This achieved 80%+ (82% with a three-model ensemble). There are probably better or simpler models, but it might be a good start. Titanic score is mostly feature engineering + avoiding overfitting

calm hill Feb 16, 2026, 1:37 PM

#

paper ivy You probably need more feature engineering. In the end I ended up using: - Title...

Wow

#

I didn't expect that

#

Thank you

leaden portal Feb 24, 2026, 10:53 AM

#

hi, i am maxing out at ~82% with both XGB and linear SVM, any tips to break that wall?

whole sedge Feb 25, 2026, 5:35 PM

#

im doing really bad with 0.63157, using only numerical features chosen right out of the dataset combined with RandomForestClassifier. Any suggestions on how to nudge my score and skill bit by bit?

steep swift Feb 26, 2026, 12:23 PM

#

hi, guys i have covered supervised and unsurpervised ml , and right now i am learning deep learning , love to discuss concept , and i also want to participate in kaggle competetion , if anyone of have some experience or knowledge pls help me out.

dreamy ingot Mar 5, 2026, 12:21 PM

#

Good morning everyone, I’m new to learning Data Science ready for the experience, Feel free to send some encouraging feedback
Thank you all ❤️

drifting pilot Mar 5, 2026, 12:52 PM

#

I have created an Ensemble evaluation testing and tuning with several algorithms to choose a winner.

My notebook can be accessed here if it helps:
https://www.kaggle.com/code/rommelsharma/titanic-machine-learning-lr-rf-xgb-lgb-gbm

The algorithms include:

LR,
RF,
XGB,
LGB,
GBM

The steps I followed are:

1 Load Data CSV → pandas DataFrames
2 EDA Visualisations 8 charts covering class balance, sex, age, fare, family size
3 Feature Engineering 27+ features, no data leakage — stats fitted on train only
4 Encoding + Normalisation LabelEncoder + StandardScaler (fit on train, transform both)
5 Algorithm Comparison 5-fold stratified CV across LR, RF, XGB, LGB, GBM
6 Hyperparameter Tuning Optuna TPE + MedianPruner — cache-first, ~30–50% faster
7 OOF Predictions 5-fold out-of-fold predictions — zero leakage into meta-learner
8 Weighted Blend Nelder-Mead optimised weights across XGB + LGB + GBM
9 Stacking Logistic Regression meta-learner trained on OOF predictions
10 Final Models Re-trained on full training data + saved via joblib
11 Feature Importance Gain-based importance for XGBoost and LightGBM
12 ROC + AUC Charts Per-fold AUC progression + ensemble comparison
13 Confusion Matrix Accuracy + F1 on the best OOF ensemble
14 Submission CSV PassengerId, Survived for Kaggle submission

For feature engineering I had 27+ features, no data leakage — stats fitted on train only

There are several visualizations too.

I used Optuna to tune the models. since it takes a lot of time I am caching the generated and tuned models. In case you want to change the logic of say just one model, then delete the cached model file of that model and rerun the program that will be trained on the new model taking other models from the cache.

I have provided extensive documentation so that its easy for anyone to read. Feel free to fork it and make your improvements.

I hope you find it useful.

hidden ivy Mar 17, 2026, 2:59 AM

#

Heyy im curious. What's considered a good score for this dataset?

#

beginners level ofcourse but I mean what would be the highest attainable?

#

is 90%+ an achievable mark?

#

I am currently at 83-85% range on F1-scores using a RandomForestClassifier.

quartz sonnet Mar 17, 2026, 9:49 AM

#

@hidden ivy it's a very good score, you can go to the next competition.

woeful scaffold Mar 17, 2026, 1:12 PM

#

quartz sonnet <@581829395321454596> it's a very good score, you can go to the next competition...

Bro check dm

desert remnant Mar 17, 2026, 10:16 PM

#

I need help to understand something.

#

After feature engineering, I ran a simple RF model to test

rf = RandomForestClassifier(
    n_estimators=300,
    max_depth=None,
    random_state=SEED
)

rf.fit(X_train, y_train)
pred = rf.predict(X_val)
accuracy_score(y_val, pred)

Result: 0.7988826815642458

Classification report shows: accuracy 0.80

Cross validation:

scores = cross_val_score(
    rf,
    X,
    y,
    cv=5
)

print("Mean CV Accuracy:", scores.mean())
print("Std:", scores.std())

Result:
Mean CV Accuracy: 0.8092461239093591
Std: 0.04010742324092075

After that I did some tuning and got my cross validation to:
Mean CV Accuracy: 0.8215805661917017
Std: 0.03339597851131273

Then I submitted the output CSV and Kaggle showed Score: 0.75598

I don't understand, none of my tests shows something close to 75%.
What am I doing wrong? Am I missing something?

#

Confusion Matrix also shows 80%

quartz sonnet Mar 18, 2026, 8:03 AM

#

desert remnant After feature engineering, I ran a simple RF model to test ```python rf = Random...

@desert remnant the titanic dataset is a small dataset, so your CV score is overoptimistic. You should track your CV score and your LB score and assure it correlates. You can also try to use 10 folds instead of 5.

desert remnant Mar 18, 2026, 2:43 PM

#

quartz sonnet <@451786534488768514> the titanic dataset is a small dataset, so your CV score i...

Thank you, I tried 10 folds and still returns 80%.
Is there another way to get a more realistic score for Titanic? Since CV is not reliable

quartz sonnet Mar 19, 2026, 8:10 AM

#

desert remnant Thank you, I tried 10 folds and still returns 80%. Is there another way to get a...

Track your CV score and your LB score and assure they correlate and try to not overfit recuding the complxity of your models. Are the SD of the folds similar of what you would get to the public LB?

#

CV assumes that folds are representative of unseen data and have stable distribution. These assumptions break easily on small datasets like the titanic one.

Do you have high variance across splits? Check the scores of each fold. If you have 0.75 in some folds, 0.75 in public LB is expected.

Model CV Score LB Score
RF v1 0.80 0.75
RF v2 0.82 0.75

If CV improves but LB stays flat or declines, you may have overfitting.

For example, all those parameters will overfit:

RandomForestClassifier(
n_estimators=3000,
max_depth=20,
min_samples_leaf=1
)

Are you using some feature engineering that potentially lead to data leakage?
What to do:
Before spliting are you doing mean encoding bins, target encoding? try to use a pipeline instead.
Using KFold? try StratifiedKFold or RepeatedStratifiedKFold

deep cedar Mar 20, 2026, 9:03 PM

#

Hii~ what's the theoretical accuracy ceiling on Titanic? I'm at 0.78 with a custom neural net but the leaderboard has people at 1.0. Is the test set just small enough that overfitting looks like perfect accuracy, or am I missing something about the dataset? 🥺

desert remnant Mar 20, 2026, 9:41 PM

#

quartz sonnet CV assumes that folds are representative of unseen data and have stable distribu...

Thank you for your help. Your questions pointed me in the right direction.
It turns out I was dealing with some data leakage and potential overfitting.
Now when I print accuracy_score() I get a result pretty close to LB score, differing by only 1%.

cursive yew Mar 22, 2026, 7:34 AM

#

guys i got 0.78 with rf is that any good? first timer

craggy swan Mar 22, 2026, 8:51 AM

#

cursive yew guys i got 0.78 with rf is that any good? first timer

its not the best, but a good score on RF ig....keep it up!

quartz sonnet Mar 23, 2026, 7:44 AM

#

It feels like cheating, but if you are interested... Look here https://www.kaggle.com/code/cdeotte/titantic-mega-model-0-84210

green trench Apr 7, 2026, 12:20 AM

#

dangg

#

i feel like thats cheating tho

blissful tundra Apr 8, 2026, 6:15 AM

#

Guys can anyone tell me about this titanic contest

uneven stratus Apr 10, 2026, 1:13 AM

#

blissful tundra Guys can anyone tell me about this titanic contest

Watch the video on kaggle's website, it explains everything you need to know

polar sandal Apr 13, 2026, 6:30 PM

#

hey i've been learning and working in ML for a while, but i'm curious as to how people develop the "intuition" in this field; I'd really appreciate it if anyone could kindly guide me on it using this competition problem as an example. Like how do you approach it, how to think?

vapid dust Apr 14, 2026, 4:54 PM

#

cursive yew guys i got 0.78 with rf is that any good? first timer

it means that, you can use basic ML models, Feature engineering...

jaunty cliff Apr 19, 2026, 6:14 PM

#

ive been trying to work with the dataset and trying out various approaches to get a higher score. is there something im missing like should i move on after a certain score or keep trying for like over 90?

paper ivy Apr 22, 2026, 4:14 PM

#

I think trying for 80%+ definitely has some benefits, much beyond that is probably not worth it

obsidian geyser May 1, 2026, 8:49 AM

#

Hi guys. Just started with the Titanic challange. Following instrcutions, copy pasting leads to errors. is it intentional? or just outdated instrcutions?

sonic zodiac May 6, 2026, 12:06 PM

#

Teach me I'm new here

thorn tusk May 17, 2026, 12:16 AM

#

is there a vc where others are also looking at the information about the titanic predictions?

thorny arrow May 17, 2026, 8:32 PM

#

I think look like Hackathon project

thorny arrow May 23, 2026, 12:38 PM

#

Is it allowed to use Jupyter notebook or colab notebook?

upbeat iris May 25, 2026, 9:14 AM

#

hello i am new to kaggle can anyone give me overview of the competition like from where to start

wicked prawn May 28, 2026, 2:58 PM

#

Hello! My name is Pankaj and I am an AI and DS enthusiast. I am a student of Data Science at Bellevue University. I am here to start learning and discussing more about Predictive modeling.

deft star May 29, 2026, 11:50 AM

#

I just see a bunch of people talking to themselves lol

potent spruce May 30, 2026, 2:34 AM

#

https://www.kaggle.com/code/udaken10/feature

My name is Ken, Hi