final cliff Aug 7, 2023, 11:05 PM

#

What was everyone's experience like with the titanic competition? It's a very popular competition for those who are just getting started. I'd love to hear about your past experience, or current if you're getting started!

acoustic storm Aug 8, 2023, 7:16 AM

#

final cliff What was everyone's experience like with the titanic competition? It's a very po...

It was really cool, and I really enjoyed participating in it, I even got a gold medal for getting so many upvotes for my notebook

final cliff Aug 8, 2023, 7:54 PM

#

acoustic storm It was really cool, and I really enjoyed participating in it, I even got a gold ...

Hah, awesome. Gold medal on titanic is a nice achievement 🙂

acoustic storm Aug 8, 2023, 8:35 PM

#

final cliff Hah, awesome. Gold medal on titanic is a nice achievement 🙂

Thanks Jonathan, I appreciate that

daring sundial Aug 9, 2023, 4:24 AM

#

My past experience with Titanic was great. It was what nudged my towards getting a notebooks expert title

desert epoch Aug 14, 2023, 7:12 PM

#

hello dear kagglers, I'm a complete beginner 101 and happy to join the titanic competition. any Godfather to mentor me? any help will be appreciated

final cliff Aug 15, 2023, 5:10 PM

#

Hi Frank, your test post went through just fine. Going to delete it now to keep the channel on topic.

strong pumice Aug 16, 2023, 12:28 AM

#

@desert epoch have you started the titanic exploration yet? I can help

golden schooner Aug 16, 2023, 8:49 AM

#

Hello everyone, I am a baby Kaggler it would be great if someone could guide me with the Titanic challenge

strong pumice Aug 16, 2023, 4:40 PM

#

@golden schooner I can guide

#

@golden schooner if you are still looking for assistance maybe we can link up and go over some things. I use Rstudio

golden schooner Aug 16, 2023, 4:41 PM

#

strong pumice <@862706246289457152> I can guide

Great

#

So how do you want to link up? Should we have a Google meet?

#

I'm not familiar with Rstudio but I can catch up

strong pumice Aug 16, 2023, 4:41 PM

#

@golden schooner yes yes.

#

@golden schooner what program script are you familiar with

golden schooner Aug 16, 2023, 4:42 PM

#

Python

strong pumice Aug 16, 2023, 4:43 PM

#

@golden schooner okay no problem now is this your first time using Kaggle

golden schooner Aug 16, 2023, 4:44 PM

#

Yes it is ...

final cliff Aug 16, 2023, 6:54 PM

#

Thanks for helping out, @strong pumice !

strong pumice Aug 16, 2023, 6:59 PM

#

@golden schooner I am available now

#

@final cliff no problem at all

robust rivet Aug 16, 2023, 8:04 PM

#

Has anyone tried k-means on the titanic dataset, I haven't seen it anywhere.

ionic arrow Aug 17, 2023, 3:26 AM

#

robust rivet Has anyone tried k-means on the titanic dataset, I haven't seen it anywhere.

I have tried it, but score around 6.5

#

I don't know if I use the Name feature to find people in a family then it is possible?

robust rivet Aug 17, 2023, 9:54 AM

#

ionic arrow I have tried it, but score around 6.5

yeah, for clustering, but they may already have a cluster within themselves.

safe quarry Aug 17, 2023, 5:00 PM

#

robust rivet Has anyone tried k-means on the titanic dataset, I haven't seen it anywhere.

I have, the MI score was really low though so I don't think it's that useful. I got similar predictions from my model when giving a random number for the cluster.

#

Hi I'm just wondering if a no-cheated accuracy score of 83% is considered good (as in I should be proud) for the Titanic Competition or if It's average, high average, low, bad etc. I just wanna know if I should keep working on it

rough mortar Aug 18, 2023, 8:14 PM

#

Hello to everyone, I have recently written an article on hackernoon that is about analyzing titanic dataset, i hope you'll like it. https://hackernoon.com/how-likely-was-one-to-survive-on-the-titanic

How Likely Was One to Survive on the Titanic? | HackerNoon

Only 38% of the passengers survived this devastating event, prompting me to wonder about the individuals who were aboard the Titanic that fateful night.

final cliff Aug 18, 2023, 8:37 PM

#

Great work, @rough mortar. Thanks for sharing!

rough mortar Aug 18, 2023, 8:43 PM

#

final cliff Great work, <@1118268535370678292>. Thanks for sharing!

Thanks

pastel cosmos Aug 20, 2023, 10:32 AM

#

Hi everyone, I saw in one of the posted notebooks for this competition where the creator of that notebook changed values into categories before fitting models.

For example, the values of the age feature were changed to 1, 2, 3, 4 where 1 was the youngest and 4 was the oldest and the age feature was dropped entirely.
Is this a recommend practice for numeric columns? Or is it a different way to normalizing data? Can I leave the age feature as-is? Or is this just one of many ways that can be tried before fitting?

ionic arrow Aug 20, 2023, 11:58 AM

#

pastel cosmos Hi everyone, I saw in one of the posted notebooks for this competition where the...

I think that feature can be turned into an ordinary feature so it will be more meaningful than non-ordinary

fiery marsh Aug 21, 2023, 10:35 AM

#

Am I calculating a score right?

X_train = train_data.drop(["Survived","Ticket","PassengerId","Name","Cabin"], axis=1)
Y_train = train_data["Survived"]
X_test = test_data.drop(["PassengerId","Name","Cabin","Ticket"], axis=1).copy()
random_forest = RandomForestClassifier(max_depth=4)
random_forest.fit(X_train, Y_train)
acc_random_forest = round(random_forest.score(X_train, Y_train) * 100, 2)

I checked a score based on X_train and Y_train.
Should I use X_test and Y_test from test.csv?

finite ermine Aug 21, 2023, 2:18 PM

#

Hey, I'm new in ML and I'm working on Titanic competition. What is the best possible way to deal with the missing cabin values? I don't think I should drop the rows because there is a lot of missing values.

mint path Aug 22, 2023, 6:51 PM

#

Survival rates by age did correlate to children first, but survival rates by fare group also show that the expensive titanic seats were prioritized on the lifeboats. More insights from @rough mortar's HackerNoon story about his Titanic entry: https://hackernoon.com/how-likely-was-one-to-survive-on-the-titanic

Screenshot_2023-08-22_at_12.48.26_PM.png

Screenshot_2023-08-22_at_12.48.37_PM.png

How Likely Was One to Survive on the Titanic? | HackerNoon

Only 38% of the passengers survived this devastating event, prompting me to wonder about the individuals who were aboard the Titanic that fateful night.

final cliff Aug 22, 2023, 7:14 PM

#

mint path Survival rates by age did correlate to children first, but survival rates by far...

Hey HackerNoon, glad to have you here! Thanks for sharing! 👋

vagrant pebble Aug 23, 2023, 7:26 PM

#

Thank you

eternal mantle Aug 24, 2023, 4:07 PM

#

is it too late to start this competitions i just found out about it just now

sinful notch Aug 24, 2023, 4:27 PM

#

It’s always available

#

And it’s two month rolling leaderboard

eternal night Aug 24, 2023, 9:41 PM

#

@strong pumice please help me on the Titanic competition

strong pumice Aug 24, 2023, 11:29 PM

#

Okay

minor turtle Aug 25, 2023, 1:29 AM

#

Hey! I am up for that too, if you want

ionic arrow Aug 25, 2023, 3:32 AM

#

Can someone suggest the best model for this competition?

sinful notch Aug 25, 2023, 7:23 AM

#

^^

#

Keep getting 60-70% with sklearn’s LogisticReg and SVC

eternal night Aug 25, 2023, 10:17 AM

#

@strong pumice Give your time and online place.

eternal night Aug 25, 2023, 6:48 PM

#

@minor turtle thank you for showing interest to help me on the competition. I and @strong pumice have agreed a date for the meeting. Perhaps it will be supper when you are in copy.

tepid lark Aug 25, 2023, 9:00 PM

#

Hey guys! Saransh this side, whats up?

north pagoda Aug 26, 2023, 6:09 AM

#

If anyone wants to finish this competition from scratch.
I have released my notebook that using "Polynomial Regression from scratch"
https://www.kaggle.com/code/jackksoncsie/polynomial-regression-from-scratch

Polynomial Regression from scratch

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

meager flare Aug 26, 2023, 1:23 PM

#

sinful notch Keep getting 60-70% with sklearn’s LogisticReg and SVC

What about Neural Networks?

sinful notch Aug 26, 2023, 2:44 PM

#

meager flare What about Neural Networks?

I’m trying to learn ML from the ground up rn and I’m just avoiding NN/DL for now. Just classical ML & ensemble methods.

#

Imo I’ve heard ppl say NN is over powered for structure tabular data like this one

desert epoch Aug 26, 2023, 2:58 PM

#

north pagoda If anyone wants to finish this competition from scratch. I have released my note...

Damn, even though I had some issues understanding stuff from that notebook and still would just import the models from skicitlearn library, but the fact that I was even able to follow it through somewhat tells me that I am infact learning Python and ML, not at a high level., But am learning and that is what matters!

#

If you showed me this notebook like a month ago, I wouldn't have understood like ANYTHING at all

steel thorn Aug 26, 2023, 4:01 PM

#

how to solve titanic data for accuracy 1.0

sinful notch Aug 26, 2023, 5:18 PM

#

^^

mild echo Aug 26, 2023, 5:24 PM

#

Hello all, I'm Tamunotonye Samuel Solomon Inioribo,

I am new to Kaggle competition and would like to be part of the Titanic. Though I have done a couple of solo projects on ML, I will be glad to be part of a team for this... I can work on R-studio, Jupyter and other IDEs.

Thanks

fringe mesa Aug 29, 2023, 7:52 PM

#

hi+

desert epoch Aug 29, 2023, 9:00 PM

#

hello

hearty veldt Aug 30, 2023, 5:54 PM

#

I couldn't find any way to create a team for the Titanic competition. I read the docs but I cannot find the Team Tab or Team section. Can someone help me by pointing out the link or the button to create a team?

swift igloo Aug 30, 2023, 7:35 PM

#

https://www.kaggle.com/competitions/titanic/team

Titanic - Machine Learning from Disaster

Start here! Predict survival on the Titanic and get familiar with ML basics

#

Let me know if this helps @hearty veldt

hearty veldt Aug 31, 2023, 11:54 AM

#

swift igloo Let me know if this helps <@786983375490908211>

Hi ! thanks ! but how can I make a team, we are three developers that we want to work as a team. We couldn't find how to define or how to make the team. I am sure that this is a basic question, but we are not finding how to solve it

plush musk Aug 31, 2023, 12:27 PM

#

hearty veldt Hi ! thanks ! but how can I make a team, we are three developers that we want to...

Create a notebook, and while saving, you get the option to keep it private. In that tab, enter your team members as the collaborators (with permission to view, edit). Hope this helps!

hearty veldt Aug 31, 2023, 2:23 PM

#

plush musk Create a notebook, and while saving, you get the option to keep it private. In t...

Thanks a lot ! I could do it

small crescent Aug 31, 2023, 8:19 PM

#

https://www.kaggle.com/code/vanessah26/titanic-79-accuracy-using-rfc
Hi everyone, I'm a CS student and started learning about data science this Summer. I joined Kaggle in the middle of August and I've been learning a lot from this community! I just finished my first competition. Please check out my first notebook and give some advice or feedback.
Much appreciate it : ), Happy data analyzing!

Titanic - 79% accuracy using RFC

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

pulsar sail Sep 2, 2023, 3:48 AM

#

Hello Everyone . Great to be finally active on Kaggle
https://www.kaggle.com/code/aniketsiraswal/titanic-machine-learning-from-disaster
85 accuracy using Logistic-Regression Model

Titanic - Machine Learning from Disaster

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

dim ruin Sep 2, 2023, 8:52 AM

#

🎀 Hello everyone;

In this analysis, I explore the Titanic dataset through Exploratory Data Analysis (EDA), conduct statistical analysis, and build predictive models to understand and predict passenger survival on the Titanic. This project incorporates Kaggle's Titanic dataset for comprehensive insights and predictions .

Please check out my notebook and give some advice or feedback. If you like it , don't forget vote it, please, Happy data analyzing!

🔗 https://www.kaggle.com/code/huseyincenik/titanic-eda-statistical-analysis-and-prediction

Titanic : EDA ,Statistical Analysis and Prediction

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

grim hornet Sep 3, 2023, 6:58 AM

#

Hello everyone,
I am new to kaggle, today itself I participated in titanic challenge and want someone to guide me. I am really a novice to the field of machine learning.

ionic arrow Sep 3, 2023, 7:22 AM

#

My first notebook, just code but I spend many hours to have good accuracy, I want to share to all of you. I hope it will be useful for you in this competition! Review and you can feedback to me so I can develop it better! Thank you so much
https://www.kaggle.com/code/hoanglongroai/79-accuracy-from-titanic-disaster#Classification-model

79% Accuracy from Titanic Disaster

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

dim ruin Sep 3, 2023, 12:15 PM

#

ionic arrow My first notebook, just code but I spend many hours to have good accuracy, I wan...

Hello @ionic arrow . Your's notebook is very good for machine learning . If you want to improve the notebook , you can try to ignore "warns" .

ionic arrow Sep 3, 2023, 12:19 PM

#

dim ruin Hello <@699154186100670474> . Your's notebook is very good for machine learning...

Thank you for your advice 🙏🏻, I will fix that 🥰

fluid ocean Sep 5, 2023, 11:20 PM

#

Hi all, I absolutely have no idea how to start this challenge any advices on the learning material that I should take

ionic arrow Sep 6, 2023, 4:46 AM

#

fluid ocean Hi all, I absolutely have no idea how to start this challenge any advices on the...

Try my notebook https://www.kaggle.com/code/hoanglongroai/79-accuracy-from-titanic-disaster#Classification-model

79% Accuracy from Titanic Disaster

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

fervent orchid Sep 6, 2023, 8:34 PM

#

Define the bin edges and labels

bin_edges = [0, 10, 25, 45, 55, 100] # Define your desired age bins
bin_labels = ['Children', 'Young', 'Adult', 'Late Adult', 'Old']

Use pd.cut() to bin the Age column

train_data['AgeGroup'] = pd.cut(train_data['Age'], bins=bin_edges, labels=bin_labels)

fluid ocean Sep 8, 2023, 2:08 PM

#

fervent orchid # Define the bin edges and labels bin_edges = [0, 10, 25, 45, 55, 100] # Define...

do you mean I need categories for the age to start?

#

I dropped the following columns I think they are completely unnecessary and cannot be used: Cabin, Ticket, Name . Am I right here ? Should I drop the Embarked as well?

#

How do I determine which model and feautures to use to me it seems like a regression task, are there metrics that I can use?

grim granite Sep 8, 2023, 5:15 PM

#

My Titanic Solutions:

#

🔗 https://www.kaggle.com/code/touhidurrr/predict-survival-in-titanic-with-decision-forests

Predict Survival in Titanic with Decision Forests

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

#

🔗 https://www.kaggle.com/code/touhidurrr/predict-survival-in-titanic-with-deep-learning

Predict Survival in Titanic with Deep Learning

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

glad stone Sep 9, 2023, 4:42 AM

#

My Titanic solution Notebook ,by Hyperparameter tuning accuracy is 94%.
Please evaluate it ....
🔗https://www.kaggle.com/code/harshpatelind13/titanic-machine-learning-from-disaster-13

Titanic-Machine Learning from Disaster_13

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

grim granite Sep 9, 2023, 3:21 PM

#

Thanks @glad stone , I have read and liked your Notebook.
Can you explain to me the usage of some of these Classes like:

from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score,classification_report

What does StandardScaler do and what how does KNeighborsClassifier work. I would like to know more about them.

chrome rain Sep 9, 2023, 5:44 PM

#

glad stone My Titanic solution Notebook ,by Hyperparameter tuning accuracy is 94%. Please e...

I got surprised when you said you are getting 94% accuracy. I checked your notebook and found that what you are doing is incorrect. Your 'y_test' is not the ground truth, you have taken it from the gender submission file which is not correct. You need to submit your model to the competition to get the correct accuracy.

#

If you want to tune your hyperparameters, then split the initial data set into train and test, for example, like some 8:2 ratio and then perform tuning on that test dataset.

glad stone Sep 9, 2023, 6:42 PM

#

chrome rain I got surprised when you said you are getting 94% accuracy. I checked your noteb...

@chrome rain Thank you for evaluation and identify my error 🫡.

glad stone Sep 9, 2023, 6:43 PM

#

chrome rain If you want to tune your hyperparameters, then split the initial data set into t...

I have worked on it and now its accuracy comes out 82.6%.
🔗https://www.kaggle.com/code/harshpatelind13/titanic-machine-learning-from-disaster-13

Titanic-Machine Learning from Disaster_13

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

keen kayak Sep 9, 2023, 7:49 PM

#

grim granite Thanks <@1005456847702085642> , I have read and liked your Notebook. Can you exp...

StandardScaler is used to rescale numeric features so that they have mean=0 and variance=1. Helps avoid giving undue importance to features with large magnitude over those with small magnitude. More details in the scikit-learn docs https://scikit-learn.org/stable/modules/preprocessing.html#standardization-or-mean-removal-and-variance-scaling.

scikit-learn

6.3. Preprocessing data

The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream esti...

grim granite Sep 9, 2023, 10:01 PM

#

keen kayak `StandardScaler` is used to rescale numeric features so that they have mean=0 an...

Oh, so it is similar to keras.Normalizer()

#

Also, I was wondering how does this scaler function treats categorical values or mapped categorical values.

#

For example, I converted male, female to 0, 1 and keras.Normalizer() was treating it as -0.6-ish value and +1.4-ish value.

#

Is that ok?

grim granite Sep 9, 2023, 10:08 PM

#

glad stone <@923091360318369803> Thank you for evaluation and identify my error 🫡.

I was also confused as for how you got the accuracy values without submitting the results. It seems like I was right. I didn’t notice that you were using gender_submission.csv though.
82.6% is still like 4% better than me. Did you submit it this time?

#

Ok, I just checked your Notebook and it says your score is 0.76555, which means the accuracy is 76.5%.

I think you are not clear about how accuracy works. The accuracy you get during training is heavily biased and means little even if it is 100%. We cannot decide that much from it since feeding the data a model has already seen and calculating accuracy based on that is heavily biased.

For example if you only do the example maths from your textbook and to test your skills I give you the same math you always do, that scores are heavily biased even if you do 100% cause you could just memorise the answers and write them as it. Only when you are given new unseen matn that we can be sure that your skills can be generally applied for any math of similar nature that you have never seen.
That is why training and testing data is kept different. You cannot judge a model by the data it has already seen. You have to let it predict results based on new unseen data to get a sense of prediction ability.

So, you should always submit your models predictions before telling others your models accuracy.

keen kayak Sep 10, 2023, 7:53 AM

#

grim granite Also, I was wondering how does this scaler function treats categorical values or...

Yes, StandardScaler and a keras Normalization layer have a similar purpose.

I don't know whether it's okay to rescale mapped categorical values. I usually separate numeric & categorical preprocessing e.g. with ColumnTransformer.

tender shale Sep 10, 2023, 8:56 AM

#

Hello! How do you interpret parch and sibsp in the dataset? I'm having trouble on how to interpret them because they are only numbers. For example, since majority of the passengers has a parch of 0, does that mean that all of them has a nanny accompanying them or are they alone? What about for adults whose parch is 0? Is it a nanny?

grim granite Sep 10, 2023, 9:57 AM

#

keen kayak Yes, `StandardScaler` and a keras `Normalization` layer have a similar purpose....

Thanks! Anyways, Can you look into my Notebooks and give me some advice for how to increase their accuracy?

#

I am currently sad that my accuracies does not go beyond 80% for titanic.

#

Here is the link: #❓┊ask-a-question message

fluid ocean Sep 11, 2023, 12:30 AM

#

My titanic solution got 74%: https://github.com/valimikayilov/Titanic_ML

GitHub

GitHub - valimikayilov/Titanic_ML: Titanic - Machine Learning from ...

Titanic - Machine Learning from Disaster Challenge attempt on kaggle - GitHub - valimikayilov/Titanic_ML: Titanic - Machine Learning from Disaster Challenge attempt on kaggle

soft patio Sep 11, 2023, 5:05 AM

#

Do deep learning technique work better for the titanic or do simpler techniques like random forest work better?

#

I was able to achieve around 80% accuracy with a random forest and hyperparameter tuning

#

would I get better result using a simple ANN, or would I be better off just using the random forest from sklearn?

coarse plume Sep 11, 2023, 6:32 AM

#

you got 80% on the submission?

strange ridge Sep 11, 2023, 7:46 AM

#

Hi I am starting titanic competition so I need a team for it

soft patio Sep 11, 2023, 12:47 PM

#

coarse plume you got 80% on the submission?

No 80 percent on a testing set

#

Using stratified split

safe orbit Sep 11, 2023, 3:52 PM

#

Hey guys, I got 0.78229 on my first submission. Could anyone look over my code and offer some suggestions? This is my first ML project and I want to also make a YouTube video on how I built it out and such, I know I still need to leave a lot more comments/documentation and clean up a few sections

https://www.kaggle.com/ryannolan1/titanic-wip-78-accuracy

Titanic WIP 78% Accuracy

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

keen jasper Sep 11, 2023, 9:51 PM

#

Hello Everyone😁
I just finished working on this competition and actually enjoyed it very much!
in my notebook I focused on EDA, feature engineering, and diagnosing missing values.
Feedback is an essential step for learning, so I would love to hear your input, guys!

https://www.kaggle.com/code/leen98/eda-and-feature-engineering-the-titanic-sinking

EDA and feature engineering the Titanic sinking

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

strange canyon Sep 13, 2023, 4:33 PM

#

Hello everyone,
I got a 0.559 on my first submission. I tried to use variations of the dataset but I wasnt able to increase the score. What are key factors to look out for in the data preparation which could increase the performance of the model? I use a RandomForestClassifier and the accuracy on the train data is around 0.9 which confuses me even more because I dont understand how the differences in the accuracy on the training data (0.9) and the test data (0.559) can be so big. I hope anyone can help me with these problems!

wide shoal Sep 13, 2023, 7:49 PM

#

strange ridge Hi I am starting titanic competition so I need a team for it

count me in!

sturdy bridge Sep 14, 2023, 3:43 AM

#

i saw a zero score after submission is it even possible

safe orbit Sep 14, 2023, 4:41 PM

#

Probably formatted it wrong

strange ridge Sep 14, 2023, 8:15 PM

#

need partners for this project

safe orbit Sep 15, 2023, 12:52 AM

#

Let me know what you think, I recorded a full 2 hour walkthough of my code: https://www.youtube.com/watch?v=6IGx7ZZdS74&ab_channel=RyanNolanData

YouTube

Ryan Nolan Data

Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic)

Welcome to my data science journey through the Kaggle Titanic - Machine Learning from Disaster Project!

In this video, we'll dive deep into the world of data analysis, feature engineering, and machine learning to predict passenger survival rates on the Titanic.

As Kaggle states: "The competition is simple: use machine learning to create a mode...

▶ Play video

#

I want to make a part 2 with improvements, so if you see any mistakes or ways I can make it better please lmk

#

Notebook: https://www.kaggle.com/code/ryannolan1/titanic-wip-9-12

manic sentinel Sep 16, 2023, 2:59 PM

#

hello everyone,
I got 76.79% accuracy on Titanic Competition.

link :- https://www.kaggle.com/code/dinanksoni/titanic-76-79

Titanic 76.79%

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

vapid nymph Sep 17, 2023, 3:35 PM

#

Hey everyone, I scored a 77% on the Titanic Dataset using a random forest. I'm seeing that the top scorers on the leaderboard have a perfect 100%. Would this count as overfitting the data? Is it possible to actually score a 100%?

sleek minnow Sep 17, 2023, 5:29 PM

#

vapid nymph Hey everyone, I scored a 77% on the Titanic Dataset using a random forest. I'm s...

No it’s impossible, they cheat

signal solar Sep 17, 2023, 5:30 PM

#

vapid nymph Hey everyone, I scored a 77% on the Titanic Dataset using a random forest. I'm s...

Let's just say if Olympic sank the next day it would not be 100% :)

tough portal Sep 17, 2023, 6:03 PM

#

Hello Kagglers!! I have just joined Kaggle and this is my very first competition. I was following the tutorial to get started but when I copied the code for women and men who survived to find out the percentage it is showing me an error. I exactly copied it from the tutorial. Can anyone help with this?

vast summit Sep 20, 2023, 12:43 AM

#

https://www.kaggle.com/code/nishitkaul88/titanic-solution-first

Titanic_Solution_first

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

#

completed the challenege...... hope you guys find this helpful

ivory magnet Sep 26, 2023, 1:40 AM

#

tough portal Hello Kagglers!! I have just joined Kaggle and this is my very first competition...

Which tutorial were you using?

#

And which error is it showing to you?

tough portal Sep 26, 2023, 8:25 AM

#

It’s working now

#

Thank you

near peak Sep 26, 2023, 11:17 AM

#

Hello all, i am new to kaggle and also to data science 🤪
starting with this competition now, are there any teams or is there anyone who would like to team up?

ebon sun Sep 28, 2023, 1:05 PM

#

I’ve found it’s so hard to get to .80 on the submission haha, but alas that’s my goal. Got to 0.78 last night. Anyone wannna chat to see if they have ideas to get my random forest to 0.80?

ebon sun Sep 29, 2023, 3:38 PM

#

https://www.kaggle.com/code/m000sey/random-forest4

#

Here's my 0.78229 score that, for the life of me, I can't improve. Let me know if there's anything I can do to push it forward.

near peak Sep 30, 2023, 5:41 PM

#

ebon sun I’ve found it’s so hard to get to .80 on the submission haha, but alas that’s my...

Did u try with any other model?

ebon sun Sep 30, 2023, 5:41 PM

#

near peak Did u try with any other model?

No, I was trying to get to 80 with only random forest, but if i can't crack the case, I might try another. Do you suggest any?

near peak Sep 30, 2023, 5:43 PM

#

ebon sun No, I was trying to get to 80 with only random forest, but if i can't crack the ...

Knn or support vector might give higher score, u can try once with that

ebon sun Sep 30, 2023, 5:44 PM

#

Have you used one of those models wit this one?

rain nimbus Oct 1, 2023, 11:24 AM

#

near peak Hello all, i am new to kaggle and also to data science 🤪 starting with this com...

Would like to..🤝

cunning abyss Oct 1, 2023, 1:27 PM

#

what's the score for default submission (just copy/pasting the tutorial)?

near peak Oct 1, 2023, 4:09 PM

#

ebon sun Have you used one of those models wit this one?

Yea i got better result with decision tree, but later i tried to change the hyper parameter and got score better for decision tree as well as for random forest (here i am talking about the individual score of the model and not the kaggle submission score 🤪)
I am still trying to improve my overall submission score

jovial bronze Oct 1, 2023, 6:41 PM

#

chatgpt told me you can get high 80s low 90s without cheating that true?

safe orbit Oct 1, 2023, 7:00 PM

#

If anyone needs a notebook to look at, just got 0.79: https://www.kaggle.com/ryannolan1/titanic-voting-classifier-0-78947

Titanic - Voting Classifier 0.78947

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

#

Releasing the video + notes next week

#

have the original vid still here: https://www.youtube.com/watch?v=6IGx7ZZdS74&t=3724s&ab_channel=RyanNolanData

YouTube

Ryan Nolan Data

Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic)

Welcome to my data science journey through the Kaggle Titanic - Machine Learning from Disaster Project!

In this video, we'll dive deep into the world of data analysis, feature engineering, and machine learning to predict passenger survival rates on the Titanic.

As Kaggle states: "The competition is simple: use machine learning to create a mode...

▶ Play video

obsidian pasture Oct 4, 2023, 2:14 AM

#

Hi, my score right now is 0.78 in the leaderboard, I am using deep learning, my question is if this dl approach is suitable for the challenge or is it better to use traditional Ml like random forest or similar ? thanks !

merry marlin Oct 4, 2023, 6:46 AM

#

how do I confim my account? I cant find my country in the phone number codes to send the confirmation SMS

safe orbit Oct 4, 2023, 11:30 AM

#

@obsidian pasture try multiple models. A voting classifier gave me the best results although I didn’t use deep learning

obsidian pasture Oct 6, 2023, 12:03 AM

#

safe orbit <@458358336900038656> try multiple models. A voting classifier gave me the best ...

Thanks !

restive kestrel Oct 7, 2023, 11:48 AM

#

Is possible if two ppl use same model but have differents results ?

whole kindle Oct 7, 2023, 12:16 PM

#

yes...

restive kestrel Oct 7, 2023, 2:09 PM

#

Anyone can get Score above 0.77511 ? and which method he uses ?

safe orbit Oct 7, 2023, 3:57 PM

#

Yes I have a 0.79

#

And way possible to have different scores

#

@restive kestrel https://youtu.be/KzK1pifa2Vk?si=6umfhORMZyolBXTd

YouTube

Ryan Nolan Data

Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic...

Today we are taking a look at how I was able to improve my Titanic Kaggle score up to a 0.79, which was good enough for the top 9%.

I showcase all the code changes and what I would still improve on, if I had more time.

I'll be adding notes to the Kaggle Notebook if interested.

Kaggle Notebook: https://www.kaggle.com/code/ryannolan1/titanic-v...

▶ Play video

restive kestrel Oct 7, 2023, 5:20 PM

#

thx @safe orbit

safe orbit Oct 7, 2023, 5:31 PM

#

Part 1 is also uploaded as well as all the model tutorials so check them all out

#

Housing predictions is being worked on now as well as writing scores

#

But writing vid will have to wait till Jan. I won’t win but I think it’s better that way

desert epoch Oct 8, 2023, 7:25 AM

#

Hi everyone! Is this the right page for the titanic competition? First time in Kaggle for me and not an expert on discord 🙃

safe orbit Oct 8, 2023, 12:14 PM

#

Yes

west bolt Oct 10, 2023, 8:47 PM

#

I wanted to get a sense of how good my result is - using NN with some hyperparameter tuning, scoring 78% tops on the leaderboard. I understand that random forest probably performs better on this dataset, and I'm not currently using ticket #/name to identify groups.

Is that a good score within those constraints, or does that indicate that there are issues with my architecture/feature engineering/etc?

steep edge Oct 12, 2023, 1:32 PM

#

https://www.youtube.com/watch?v=fATVVQfFyU0&t=2098s

YouTube

NeuralNine

Titanic Survival Prediction in Python - Machine Learning Project

In this video we build a model, which predicts titanic survivors with a decent accuracy.

Kaggle Challenge: https://www.kaggle.com/c/titanic

◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾
📚 Programming Books & Merch 📚
🐍 The Python Bible Book: https://www.neuralnine.com/books/
💻 The Algorithm Bible Book: https://www.neuralnine.com/books/
👕 Programming Merch: https://www.neu...

▶ Play video

#

Was following this tut but during the applying of the pipeline for all the pre processing in the train set its giving me reshape error

lost ermine Oct 13, 2023, 5:42 PM

#

steep edge Was following this tut but during the applying of the pipeline for all the pre p...

try to see if you have all column name and size same

steep edge Oct 13, 2023, 5:43 PM

#

lost ermine try to see if you have all column name and size same

Yes everything is same as the video...I dont understand how this tut code always runs for them but whenever we try we face problems

wraith ridge Oct 24, 2023, 7:54 PM

#

I know I'm a bit new, but in order to submit my csv, I should just click "Submit Predictions" and upload my csv, right? I've tried 3 different browsers, different security settings, verified my account, and a different computer, but I can not click that button. Am I missing something?

gaunt cedar Oct 25, 2023, 11:30 AM

#

Hey everyone, I'm new to Kaggle. I've created a deep NN model for this project and I've got 80% accuracy on my validation set. However, I'm yet to fine tune the model so I'm hoping to get slightly better results. I wanted to ask if there's a way to check the bayes error for this project? What is the highest accuracy that has been achieved for this model without cheating. I saw the leaderboards where people have achieved accuracy = 1 , which is certainly not possible.

keen quartz Oct 26, 2023, 1:30 AM

#

gaunt cedar Hey everyone, I'm new to Kaggle. I've created a deep NN model for this project a...

How could someone cheat?

gaunt cedar Oct 26, 2023, 9:54 AM

#

keen quartz How could someone cheat?

I'm not sure how could someone cheat, but given the nature of the data, I don't think 100% accuracy is achievable with any model.

#

Anyone currently working on this project would like to discuss? I've done some fine tuning to the model and so far I'm at 82-83% accuracy. Today I'll build out multiple models with randomly sampling hyperparameter values and I'm targeting to reach 85% accuracy.

safe orbit Oct 28, 2023, 8:32 PM

#

Ye a lot of people cheat top of leaderboard it’s stupid

#

On this project

#

Not all projects

safe orbit Oct 28, 2023, 8:56 PM

#

Also @gaunt cedar I’m currently learning PyTorch, would def like to see what you’re working on. I did standard ML models from scikit, xgboost and a voting classified and it got a top 10% score

shadow cave Oct 30, 2023, 11:25 AM

#

Hello

#

I need help with the prediction model of Titanic survival dat

arctic rune Oct 30, 2023, 12:01 PM

#

@shadow cave i might not be able to help you but can i know the issue ?

#

how to increase accuracy ?

#

i don't know where to start

shadow cave Oct 30, 2023, 12:14 PM

#

@arctic rune thank you for your response. I need help with the feature process of how you could take any value as a feature.

arctic rune Oct 30, 2023, 12:19 PM

#

shadow cave <@732644278097936466> thank you for your response. I need help with the feature ...

ok...i don't know how to do that

shadow cave Oct 30, 2023, 12:30 PM

#

Are you also a beginner?

arctic rune Oct 30, 2023, 1:08 PM

#

yes, very beginner

heavy raptor Oct 30, 2023, 6:55 PM

#

Hello @shadow cave and @arctic rune I just started working with this dataset yesterday after joining kaggle. If either or both of you are interested in working together message me and we can work through it.

shadow cave Oct 31, 2023, 2:35 AM

#

@heavy raptor ok

shadow cave Oct 31, 2023, 3:40 PM

#

@heavy raptor bro let me know i'm working on it. We can perform it together.

gaunt cedar Oct 31, 2023, 10:23 PM

#

safe orbit Also <@703714726701695075> I’m currently learning PyTorch, would def like to see...

@safe orbit sorry I wasn't available for the past few days. It's great to hear that you're working with pytorch. I myself am interested in learning pytorch for some reasons. Would love to see your working.

steep edge Nov 4, 2023, 3:34 AM

#

gaunt cedar Hey everyone, I'm new to Kaggle. I've created a deep NN model for this project a...

Bro over here I am getting exhausted just applying and learning the regular classifiers and you're making NN models?

gaunt cedar Nov 4, 2023, 4:07 AM

#

steep edge Bro over here I am getting exhausted just applying and learning the regular clas...

Lol I'm exhausted too bro. But good thing I've got to 85%+ accuracy.

quiet needle Nov 4, 2023, 1:21 PM

#

steep edge Bro over here I am getting exhausted just applying and learning the regular clas...

Same here everyone is talking about dl and staff while i am getting confused about confusion matrix

gaunt cedar Nov 4, 2023, 2:11 PM

#

quiet needle Same here everyone is talking about dl and staff while i am getting confused abo...

What's your project status of titanic?

steep edge Nov 4, 2023, 3:15 PM

#

quiet needle Same here everyone is talking about dl and staff while i am getting confused abo...

Confused about confusion matrix... 🙂

quiet needle Nov 4, 2023, 3:16 PM

#

gaunt cedar What's your project status of titanic?

I havent finished yet.

steep edge Nov 4, 2023, 3:16 PM

#

gaunt cedar Lol I'm exhausted too bro. But good thing I've got to 85%+ accuracy.

Can you tell me what is the advantage u get applying NN instead of normal Supervised learning?

gaunt cedar Nov 4, 2023, 4:03 PM

#

steep edge Can you tell me what is the advantage u get applying NN instead of normal Superv...

What do you mean by normal Supervised learning?

steep edge Nov 4, 2023, 4:05 PM

#

gaunt cedar What do you mean by normal Supervised learning?

Like just applying classifier algorithms after pre processing

gaunt cedar Nov 4, 2023, 4:06 PM

#

You need to elaborate a little. I'm also using classification using a neural network.

steep edge Nov 4, 2023, 4:08 PM

#

What i want to say is that what are the advantages ur getting using NN instead we can just use regular classification without NN

#

Why make it more complex?

gaunt cedar Nov 4, 2023, 4:11 PM

#

I'm assuming that you're asking why not just take the input features and pass them through a classification algorithm directly, let's say binary classification. And use the output that we get to make a prediction, right?

steep edge Nov 4, 2023, 4:12 PM

#

gaunt cedar I'm assuming that you're asking why not just take the input features and pass th...

Yes

gaunt cedar Nov 4, 2023, 4:12 PM

#

Okay, so here a problem with that...

#

Let's say you have this data and you have to classify either an input feature is 1 or 0...

#

All you do is train your algorithm using binary classification and it will learn a decision boundary (which is the straight line here) that separates the 2 classes. Now if you're input get the value 1 or 0 based on the side that it lies with respect to the decision boundary.

#

Makes sense?

steep edge Nov 4, 2023, 4:17 PM

#

Yes

#

This is a very basic problem got it

gaunt cedar Nov 4, 2023, 4:17 PM

#

Okay so now let's say I give you this dataset...

#

How do you draw a straight line (decision boundary) to separate the 2 classes?

steep edge Nov 4, 2023, 4:18 PM

#

We can use simply random forest witrhthis

#

Why need strt line...The distance off all points from the strt line determines the loss

gaunt cedar Nov 4, 2023, 4:19 PM

#

Yes you can, but what if you want to use classification algorithm instead.

steep edge Nov 4, 2023, 4:21 PM

#

All of these algorithms are classification algo

gaunt cedar Nov 4, 2023, 4:24 PM

#

gaunt cedar Yes you can, but what if you want to use classification algorithm instead.

Sorry I shouldn't have said classification algorithm here, I misunderstood your question.

gaunt cedar Nov 4, 2023, 4:25 PM

#

steep edge Can you tell me what is the advantage u get applying NN instead of normal Superv...

By this you meant use something like random forest instead of NN right?

steep edge Nov 4, 2023, 4:26 PM

#

gaunt cedar By this you meant use something like random forest instead of NN right?

Yes

gaunt cedar Nov 4, 2023, 4:27 PM

#

Nevermind, I misunderstood what you were saying. But the answer to that is, yes you can. As far as I know, using something like random forest can be way more efficient here, computationally as well as timely. And we may as well get the same results as with a deep NN for this titanic dataset.

#

I'm just using NN for the sake of practicing.

steep edge Nov 4, 2023, 4:29 PM

#

Yes i knew it just wanted to know

#

Thnx

gaunt cedar Nov 4, 2023, 4:30 PM

#

no prob. So what's the status of your project?

steep edge Nov 4, 2023, 4:30 PM

#

Hmm,I am applying the algorithms all pre processing done

gaunt cedar Nov 4, 2023, 4:31 PM

#

right.

winter mica Nov 6, 2023, 12:34 PM

#

Hey! I' wrangling with the the 'Cabin' data (or lack thereof) in the titanic set. I'm toying with the idea of playing detective like using ticket numbers or fare details to guess the missing cabins. Or maybe taking shortcut by plugging in the most common cabin for each class for starting point but theres alot of missing values . I'm curious about like any other possible approach-how would you handle this? Looking forward to your insights..

winter mica Nov 6, 2023, 9:32 PM

#

🤷‍♀️ Anyone there???

arctic rune Nov 7, 2023, 11:02 AM

#

winter mica 🤷‍♀️ Anyone there???

i'm here

winter mica Nov 7, 2023, 11:33 AM

#

arctic rune i'm here

sad_panda

arctic rune Nov 7, 2023, 11:47 AM

#

winter mica <:sad_panda:1138924108705431643>

why you sad

winter mica Nov 7, 2023, 1:33 PM

#

arctic rune why you sad

Nobody responded to my message....

arctic rune Nov 7, 2023, 1:34 PM

#

winter mica Nobody responded to my message....

I didn't respond to it because i didn't understand what you were saying and I don't have the knowledge to help you

winter mica Nov 7, 2023, 1:35 PM

#

arctic rune I didn't respond to it because i didn't understand what you were saying and I do...

Ahh I see... :(

steep edge Nov 7, 2023, 5:31 PM

#

winter mica Hey! I' wrangling with the the 'Cabin' data (or lack thereof) in the titanic set...

Feature engineering?

winter mica Nov 7, 2023, 6:11 PM

#

steep edge Feature engineering?

yeah

oblique perch Nov 7, 2023, 9:50 PM

#

what is titanicand spaceship titanic?

oblique perch Nov 7, 2023, 10:06 PM

#

so these are like simple datasets to start with as a beginner?

safe orbit Nov 8, 2023, 2:16 AM

#

yes

desert epoch Nov 8, 2023, 3:08 AM

#

.

safe orbit Nov 8, 2023, 11:57 AM

#

If you need help I made 2 vids

#

https://youtu.be/6IGx7ZZdS74?si=pAUb2ExALjp3r67b

YouTube

Ryan Nolan Data

Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic)

Welcome to my data science journey through the Kaggle Titanic - Machine Learning from Disaster Project!

In this video, we'll dive deep into the world of data analysis, feature engineering, and machine learning to predict passenger survival rates on the Titanic.

As Kaggle states: "The competition is simple: use machine learning to create a mode...

▶ Play video

#

https://youtu.be/KzK1pifa2Vk?si=oumSUHJFHgBibtwB

YouTube

Ryan Nolan Data

Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic...

Today we are taking a look at how I was able to improve my Titanic Kaggle score up to a 0.79, which was good enough for the top 9%.

I showcase all the code changes and what I would still improve on, if I had more time.

I'll be adding notes to the Kaggle Notebook if interested.

Kaggle Notebook: https://www.kaggle.com/code/ryannolan1/titanic-v...

▶ Play video

desert epoch Nov 8, 2023, 6:42 PM

#

THIS IS GREAT ! thank you!

safe orbit Nov 9, 2023, 6:49 PM

#

No problem

jovial bronze Nov 11, 2023, 6:48 PM

#

should you normalize categorical data being fed into a neural network like df[column] = (df[column] - column_mean) / column_std? the numbers are like 0 to 10... normalizing makes sense to me for like Age but for like Cabin prefix I don't know. except that it seems weird to not normalize some columns if I normalize others

winter mica Nov 12, 2023, 12:57 PM

#

jovial bronze should you normalize categorical data being fed into a neural network like `df[c...

Normalization is usually applied to numerical features for efficient training but not typically needed for categorical features. It may seem a bit unusual, it's entirely acceptable to apply normalization selectively based on the type of data you're working with...and for the categorical features, especially those one-hot encoded, the binary nature already provides a kind of normalization.

jovial bronze Nov 13, 2023, 6:41 PM

#

Right was just reading about one-hot encoding... I think I'll try that one out thank you

void birch Nov 14, 2023, 7:20 PM

#

Hi everyone, im a uni student trying to get started in data science. What sort of pre-requisite knowledge would I need to get started, specifically this (Titanic) competition?

gaunt mulch Nov 15, 2023, 1:36 PM

#

void birch Hi everyone, im a uni student trying to get started in data science. What sort o...

I think you should learn some basic about python, sklearn, not too much but a little. After that, you need to take some machine learning courses and Andrew ng is a nice teacher.

twilit cloak Nov 17, 2023, 6:41 AM

#

Hii all I am Bhimana i am new to kaggle and ML competitions . Hope i will mingle with you soon

hollow brook Nov 17, 2023, 7:36 AM

#

I want to learn all about this machine learning and data analysis from where should I start?
I don't have any prior knowledge about coding stuff where I should start from...

sage grove Nov 18, 2023, 7:05 PM

#

Hey try starting with learning pandas and Numpy and some basic knowledge on statistics

safe orbit Nov 19, 2023, 12:43 PM

#

@hollow brook https://youtu.be/6IGx7ZZdS74?si=_7O8l1JJTPNHc8AE

YouTube

Ryan Nolan Data

Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic)

Welcome to my data science journey through the Kaggle Titanic - Machine Learning from Disaster Project!

In this video, we'll dive deep into the world of data analysis, feature engineering, and machine learning to predict passenger survival rates on the Titanic.

As Kaggle states: "The competition is simple: use machine learning to create a mode...

▶ Play video

#

This is my titanic project. Everything in this vid I’ve covered on my YouTube channel

urban girder Nov 19, 2023, 5:14 PM

#

HI everyone, I have a question that how come there are a lot with 100 percent scores on the leaderboard, as a beginner to me that sounds practically impossible unless test data is exploited in some way i have done many models with feature engineering and 82 percent for me has been the highest score.

safe orbit Nov 19, 2023, 6:11 PM

#

They cheated

urban girder Nov 20, 2023, 2:59 AM

#

safe orbit They cheated

Yeah that seems to be the case then why dont they get removed from the leaderboard

safe orbit Nov 20, 2023, 11:01 AM

#

It’s a training competition

desert epoch Nov 21, 2023, 12:57 PM

#

In how many hours can this be done

obsidian pasture Nov 22, 2023, 2:25 AM

#

Hi everyone, I have a question not directly related to this competition but more about embedded models, I’ve heard that for most of the competitions to get a decent result (0.9) embedded models are the right way. Is that true ?

desert epoch Nov 22, 2023, 10:30 AM

#

desert epoch In how many hours can this be done

?

desert epoch Nov 23, 2023, 9:38 PM

#

I have a score of around 0.779 in LB. Does anyone have any idea on how to improve it?

ruby pagoda Nov 26, 2023, 7:07 PM

#

gridsearch

#

worked for me

flat rapids Nov 28, 2023, 3:47 AM

#

how is the name column given? is it with place and name?

scarlet mauve Nov 29, 2023, 8:09 PM

#

flat rapids how is the name column given? is it with place and name?

No, name column is distinct and consists of Mr/Ms name

#

Hi guys! I need some advice.

I have some knowledges in sklearn, pandas, numpy, matplotlib, but i can't measure it. I want to understand, where am I and how I can measure it.

May be you reccomend some courses, which covers basic/intermediate/advanced level of using these libs.

Thanks a lot!

cobalt grove Dec 4, 2023, 7:43 PM

#

Hello there !
I am new to data analysis
I have questions that let's say I download the Titanic data okay , what should I do ? let's say I calculated the median for the data what is the purpose ? I don't have the logic of the data analyst . Could someone please help me or give me resources to learn these concepts ?

cobalt grove Dec 4, 2023, 7:44 PM

#

safe orbit <@1157967075160117339> https://youtu.be/6IGx7ZZdS74?si=_7O8l1JJTPNHc8AE

I will watch that I think I will find answers for my questions
Thank you 🌹

safe orbit Dec 4, 2023, 7:47 PM

#

sweet, and I have 2 parts so should help

deep holly Dec 7, 2023, 5:33 PM

#

i've been getting 0.57177 for the kaggle submissions even if the .csv file had different results. anyone else experiencing this?

pastel ether Dec 8, 2023, 3:01 AM

#

hollow brook I want to learn all about this machine learning and data analysis from where sho...

You should learn python

lament sand Dec 8, 2023, 9:40 AM

#

hi

#

I wan to learn pyton with a person

toxic venture Dec 8, 2023, 12:56 PM

#

Ive recently done the tutorial Titanic Competition, and wanted to redo it with an ML model. However, my model is now getting a 0 public score. Idk where I'm going wrong or how to test…

Here is the link to my notebook https://www.kaggle.com/code/abishekjayan/this-is-where-it-starts

This is where it starts

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

coarse cedar Dec 8, 2023, 3:21 PM

#

I just saw a code for titanic. It's too overwelming. Is it always like that?

reef obsidian Dec 13, 2023, 1:52 AM

#

The titanic code is one of the more simplier datasets and code. Which code are you looking at? The top rated notebook is pretty simple to follow. It uses a RandomForestClassifier model to do the prediction.

marsh lava Dec 13, 2023, 8:31 AM

#

Hi, I made the notebook for beginners to make model by using OpenAI API.
https://www.kaggle.com/code/yutodennou/tips-open-interpreter-titanic
By this notebook, good XGBoost model was generated automatically in Titanic example.

✍Tips - Open-Interpreter_Titanic

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

sage grove Dec 13, 2023, 12:36 PM

#

cobalt grove Hello there ! I am new to data analysis I have questions that let's say I downlo...

Hey @cobalt grove i would like to answer your questions:

-First try to understand the data given to you like what each feature tells about the passenger like what's SibSp etc..
-Get the data types of each column and determine which are continuous and categorical data.
-Check for null values and try to deal with them like using Simpleimputer is the basic level to deal with these. Use median for continuous and mode for category data.
-We use the median for the data that shouldn't be lying in the outlier by replacing mean (It's about boxplot)
-Then learn which graphs are used to compare which type of data.

#

For prediction use ML algorithms

#

https://medium.com/@umesh.ramanathan2004/titanic-kaggle-competition-655c273f2dfa

Medium

Titanic Kaggle Competition

Hi all, the Titanic ship which was a British passenger, sank in the North Atlantic Ocean on 15 April 1912 after striking an iceberg during…

#

Refer my medium article for detailed explanation

cobalt grove Dec 13, 2023, 12:56 PM

#

sage grove Hey <@608368964757618698> i would like to answer your questions: -First try to ...

🥹

sage grove Dec 13, 2023, 12:57 PM

#

Yeah dude any doubts just ping me on my dms

cobalt grove Dec 13, 2023, 12:57 PM

#

sage grove Yeah dude any doubts just ping me on my dms

Thank you 🌹

tidal goblet Dec 13, 2023, 2:27 PM

#

Hi friends 👋
I have recently completed my Data Science machine learning and AI course, and in final exam we have 25 questions and i correct 23/25.
Also play with the dataset heart.csv, titanic dataset and other dataset from kaggle through google colab and a mobile application terminal.

tranquil shoal Dec 16, 2023, 12:13 PM

#

hey @everyone even i am in top 10% still i didnt get a bronze medal in the titantic competition whats the deal ?

tranquil shoal Dec 16, 2023, 12:14 PM

#

tranquil shoal hey @everyone even i am in top 10% still i didnt get a bronze medal in the titan...

my notebook : https://www.kaggle.com/code/ayeshairshadcoder/titanic

Titanic

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

runic finch Dec 17, 2023, 6:14 AM

#

tranquil shoal my notebook : https://www.kaggle.com/code/ayeshairshadcoder/titanic

Not sure, but could be medals aren't awarded on the titanic as its a reoccurring/rolling competition.. just my guess though

twilit cloak Dec 21, 2023, 5:45 AM

#

Hello guys i just started competitions in Kaggle with Titanic problem. I am unable to understand which Algorithm to apply here . How to go to succed in learning with competitions.

sage grove Dec 21, 2023, 8:13 AM

#

Hey @twilit cloak You can refer to my kaggle notebook with full clear instructions given out there.

https://www.kaggle.com/code/umbro10/titanic-data-analysis-0-78-accuracy

Any doubts just ping me

Titanic Data Analysis - 0.78 Accuracy

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

marsh lava Dec 22, 2023, 3:31 AM

#

Hi, I made the notebook for beginners to make model by using Gemini API.
By this notebook, good ensemble model was generated automatically in Titanic competition.

https://www.kaggle.com/code/yutodennou/tips-auto-model-generate-by-gemini-api

✍Tips - Auto Model Generate by Gemini API

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

desert epoch Dec 27, 2023, 1:20 PM

#

Good Afternoon, I'm new to the whole competition thing, I wanted to know if I have to build the model from scratch.

buoyant frost Dec 27, 2023, 1:29 PM

#

No, you just use external libraries and call them

desert epoch Dec 27, 2023, 5:04 PM

#

how can I improve the accuracy ? i used logistic regression function which is built in sklearn.linear_model. when i counted the number of correctly classified examples and divided it by the total number of examples, i got 0.6067

desert epoch Dec 27, 2023, 5:04 PM

#

buoyant frost No, you just use external libraries and call them

thanks alot

buoyant frost Dec 27, 2023, 5:07 PM

#

desert epoch how can I improve the accuracy ? i used logistic regression function which is bu...

doing a lot of things, changing or upgrading the way you are doing feature engineering, the parameters in your models and when dealing with hyperparameters

wet drift Dec 31, 2023, 8:26 AM

#

Is there any way to see other's code? I really want to know what did the 1.0 accuracy one do differently?

gilded tusk Dec 31, 2023, 4:19 PM

#

It is supposedly impossible to reach an accuracy of 1.0, the guy is probably cheating to get that result.

safe orbit Dec 31, 2023, 8:52 PM

#

@wet drift 100% is impossible, but I got a top 10% score and shared my code

#

https://youtu.be/6IGx7ZZdS74

YouTube

Ryan Nolan Data

Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic)

Welcome to my data science journey through the Kaggle Titanic - Machine Learning from Disaster Project!

In this video, we'll dive deep into the world of data analysis, feature engineering, and machine learning to predict passenger survival rates on the Titanic.

As Kaggle states: "The competition is simple: use machine learning to create a mode...

▶ Play video

#

https://youtu.be/KzK1pifa2Vk

YouTube

Ryan Nolan Data

Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic...

Today we are taking a look at how I was able to improve my Titanic Kaggle score up to a 0.79, which was good enough for the top 9%.

I showcase all the code changes and what I would still improve on, if I had more time.

I'll be adding notes to the Kaggle Notebook if interested.

Kaggle Notebook: https://www.kaggle.com/code/ryannolan1/titanic-v...

▶ Play video

wet drift Jan 1, 2024, 1:57 AM

#

gilded tusk It is supposedly impossible to reach an accuracy of 1.0, the guy is probably che...

What about top scorers? Isn't there a way to see their code?

wet drift Jan 1, 2024, 1:58 AM

#

safe orbit https://youtu.be/6IGx7ZZdS74

Thanks a lot bro. I will definately check it out.

safe orbit Jan 1, 2024, 2:08 AM

#

only if they shared it

wet drift Jan 1, 2024, 2:19 AM

#

Okay, Thank you once again

desert epoch Jan 1, 2024, 6:35 PM

#

safe orbit https://youtu.be/6IGx7ZZdS74

Saved your Playlist for future purpose. Thanks

safe orbit Jan 1, 2024, 6:36 PM

#

no problem

north iron Jan 6, 2024, 12:01 AM

#

safe orbit <@1181931342821330995> 100% is impossible, but I got a top 10% score and shared ...

Will try this competition

safe orbit Jan 6, 2024, 1:07 AM

#

GL

quick mango Jan 6, 2024, 2:57 AM

#

safe orbit <@1181931342821330995> 100% is impossible, but I got a top 10% score and shared ...

get the answers
ez

buoyant frost Jan 6, 2024, 10:42 AM

#

why is it impossible?

#

@safe orbit @gilded tusk

#

And how can you cheat for a competition?

safe orbit Jan 6, 2024, 11:03 AM

#

No model will predict 100% as some of the results are random

#

People hard code the results

#

Change excel file

buoyant frost Jan 6, 2024, 1:27 PM

#

Then why does that work for some competitions and not for others?

#

For instance MLCV doesn't have a 100 % in #🚀┊spaceship-titanic but rather almost a 99 %

hexed sail Jan 9, 2024, 7:46 PM

#

Sorry if dumb question, new to kaggle, if it's known to be impossible to get 100% why aren't the 1.00000 scores simply removed from the leaderboard?

#

It gives the impression to newcomers that 100% is achievable.

low linden Jan 9, 2024, 8:02 PM

#

hexed sail It gives the impression to newcomers that 100% is achievable.

It's not impossible - even if the results weren't datamined a model could just get very lucky and guess through the noise. In practice, if a model is ever getting 100% something is wrong with the testing methods as well, a perfect is bad in data science.

Also people (read: interviewers) care about this problem about as much as hello world.

#

I trust a 95% so much more than a 100% in almost every real-world problem. BUT technically it is theoretically possible, I would just have to do an ablation study to examine how the model's able to achieve that score as well as a comprehensive study of the test set.

hexed sail Jan 9, 2024, 8:17 PM

#

Great info thanks!

low linden Jan 9, 2024, 8:50 PM

#

Np, now that I backscroll there are alot of questions on this.

thin epoch Jan 9, 2024, 10:48 PM

#

does anyone know the theoretical highest score achievable with logistic regression?

#

i'm following andrew ng's course and implemented my own model, and have gotten to 76.79% accuracy but hope to get to mid - 80s at least

#

but I'm not sure if this is possible with just logistic regression and some more feature engineering, or if I need to use an entirely different model

buoyant frost Jan 9, 2024, 11:41 PM

#

is there a way to know that?

#

like it would be cool to know it for sure, the theoretical highest score achievable with a model in particular

#

including all the set of the possible hyperparameters and feature engineering you can do

#

short answer from what i've googled is that no

#

long answer is:

#

buoyant frost Jan 9, 2024, 11:45 PM

#

buoyant frost including all the set of the possible hyperparameters and feature engineering yo...

plus this sets are possibly infinite

low linden Jan 9, 2024, 11:58 PM

#

buoyant frost is there a way to know that?

No, there is no limit to the number of values even a single continuous hyperparameter can be, and the solution space is only approximately smooth. That coupled with the randomness injected in during many ML algorithms by design makes this impossible.

#

But, that is an interesting question.

#

I suppose you could limit a continuous hyperparameter to the 64-bit* float limit on most machines but that would still only be an approximation of reality.

thin epoch Jan 10, 2024, 12:21 AM

#

oh guys i kinda meant practically instead of theoretically

#

has anyone managed to get good (mid 80s i mean) results with just logistic regression on this channel?

buoyant frost Jan 10, 2024, 12:29 AM

#

i suggest you reviewing public notebooks related to this dataset

#

maybe someone got an approximated score using logistic regression indeed

buoyant frost Jan 10, 2024, 12:32 AM

#

pulsar sail Hello Everyone . Great to be finally active on Kaggle https://www.kaggle.com/cod...

@thin epoch looks like you got lucky this time

thin epoch Jan 10, 2024, 12:40 AM

#

buoyant frost <@362329119322800128> looks like you got lucky this time

damn i should've just looked up haha, thank you so much

#

i shall inspect this man / woman's code

#

i also have been watching some videos and I realized how deep feature engineering really goes

#

i should probably try doing more of that

buoyant frost Jan 10, 2024, 12:41 AM

#

yeah feature engineering is in many cases the factor that makes your accuracy improve substantially

thin epoch Jan 10, 2024, 12:42 AM

#

quick question -- do you guys combine ur testing and training data when figuring out columns (for stuff like one hot encoding, etc..) the reason I ask is because for example sometimes the training data doesn't have certain values within a column that are present within the testing data, and this causes issues with one hot encoding

#

like is that considered a good practice

buoyant frost Jan 10, 2024, 12:45 AM

#

thin epoch quick question -- do you guys combine ur testing and training data when figuring...

Hello, this is Bing. I can help you with your question about handling categorical variables with different values in training and testing data. 😊

Generally, it is not a good practice to combine your testing and training data when figuring out columns, because this can lead to data leakage¹, which means that your model may learn information from the test data that it should not have access to. This can result in overfitting², which means that your model performs well on the test data, but poorly on new data.

One way to handle categorical variables with different values in training and testing data is to use label encoding³, which means that you assign a numerical value to each category, such as 0, 1, 2, etc. This way, you can avoid creating too many new features with one-hot encoding, and also handle the case where there are new categories in the test data that are not in the training data. However, label encoding may introduce some ordinality⁴, which means that the model may assume that there is some order or ranking among the categories, which may not be true.

Another way to handle categorical variables with different values in training and testing data is to use feature engineering, which means that you transform or create new features from the existing ones, based on some domain knowledge or analysis. For example, if your categorical variable is related to time, such as "era", you may be able to convert it to a numerical variable by using the year or the period as a proxy. This way, you can reduce the number of categories and also capture some meaningful information from the variable.

Have a nice day! 😊

Source: Conversation with Bing, 10/01/2024

#

tldr; no

low linden Jan 10, 2024, 12:52 AM

#

thin epoch quick question -- do you guys combine ur testing and training data when figuring...

If I can't train on it, it does not exist to me.

thin epoch Jan 10, 2024, 3:22 AM

#

thanks

thin epoch Jan 10, 2024, 9:18 PM

#

pulsar sail Hello Everyone . Great to be finally active on Kaggle https://www.kaggle.com/cod...

i may be incorrect but I don't think this has 85% accuracy... I ran your code and it gave me a submission with 76%

#

please let me know if I am mistaken

buoyant frost Jan 11, 2024, 9:17 AM

#

you probably aren't

thin epoch Jan 11, 2024, 6:17 PM

#

rip

#

85% with logistic regression

#

too good to be true

thin epoch Jan 13, 2024, 5:40 PM

#

Three days of constant effort

#

aids feature encoding

#

and I have reached 0.79425

#

im a bit dissapointed because I expected better results tbh for the amount of work I put in, but at least this is somewhat closer to my goal of 82% sad_panda

#

and now I increased the number of iterations which should make the model better, and the score went down to 75% 😂 😂 😂 😭 😭 😭

trail dune Jan 16, 2024, 5:54 PM

#

thin epoch and now I increased the number of iterations which should make the model better,...

did you use ann

thin epoch Jan 19, 2024, 6:59 PM

#

no not yet

drifting tapir Jan 24, 2024, 8:17 AM

#

there are 12 columns in train.csv

#

so do I need to make an AI that takes 11 inputs and spit out 1 result?

drifting tapir Jan 24, 2024, 9:04 AM

#

Feature names unseen at fit time:
- Age_0.17
- Age_0.33
- Age_11.5
- Age_18.5
- Age_22.5
- ...
Feature names seen at fit time, yet now missing:
- Age_0.42
- Age_0.67
- Age_11.0
- Age_20.5
- Age_23.5
- ...```

#

does anyone know what's wrong with age in the column for test.csv data?

#

oh... is it because tree models are basically key-value path like structure

#

it needs to know exactly THAT 0.17 age in the previous training data

#

ok I get tit now

#

ok so tree model is stupid

#

or not really meant for this type of prediction

remote nimbus Jan 31, 2024, 2:44 AM

#

Hey all

#

Finally got discord auth to work, excited to discuss titanic with people who work on it

#

I got .791 accuracy, not sure if that's good, but it took me a while!

storm saddle Jan 31, 2024, 4:58 AM

#

In my local machine I got the accuracy of 92.3 using the xg boost model. But when I upload the CSV file it's saying I have 0.58 accuracy

Why please can anyone tell me what's the problem

remote nimbus Jan 31, 2024, 5:25 AM

#

@storm saddle you're overfitting the training set

#

your code is mastering the training data at the expense of generalizing to the test data

#

also, with a score that low, keep in mind you're doing worse than just saying female = alive, male = dead, which gives 75% acc

storm saddle Jan 31, 2024, 6:25 AM

#

remote nimbus also, with a score that low, keep in mind you're doing worse than just saying fe...

how much you got it actually ?

storm saddle Jan 31, 2024, 6:25 AM

#

remote nimbus <@563230271839797281> you're overfitting the training set

got that point

remote nimbus Jan 31, 2024, 6:27 AM

#

storm saddle how much you got it actually ?

i got to .791

#

I could've gotten in farther but moved on

storm saddle Jan 31, 2024, 6:30 AM

#

fine bruh got it will take care form next time.

remote nimbus Jan 31, 2024, 6:32 AM

#

ok

remote loom Feb 2, 2024, 3:15 PM

#

i just tsarted with this dataset, and im still a bbit lost, can someone help me

remote nimbus Feb 2, 2024, 9:04 PM

#

@remote loom What's up

remote loom Feb 3, 2024, 4:44 AM

#

remote nimbus <@1012335104640679976> What's up

im just starting out and thi sis my first project, how do i fugre out which metrics to use and predict the survival? the problem i did beofre this was a car sales one

remote nimbus Feb 3, 2024, 4:45 AM

#

yo

#

well the metric you'll want to train on is the one they're going to judge you on, which is accuracy

remote loom Feb 3, 2024, 1:31 PM

#

what shud i do here?

remote nimbus Feb 3, 2024, 6:01 PM

#

@remote loom you're confusing your test dataset with y

#

Think of it this way. You're given X_train, y_train (the Survived labels provided on training set). For test you're only given X_test, not the y_test labels

#

I assume you're trying to split your training set into training and validation? That would look more like:

X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.2)

#

(Also I'd recommend specifying stratify=y_train in that, but we can get to that once you fix the basic misunderstanding)

#

Here's an example which should make it clearer:
import pandas as pd
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')

features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']

X_train = train_data[features]
y_train = train_data['Survived']

X_test = test_data[features]

X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, stratify=y_train, test_size=0.2, random_state=1)

iron nest Feb 12, 2024, 3:24 PM

#

What is the baseline for titanic competition? What percentage of model prediction is considered good for this challenge?

iron nest Feb 12, 2024, 5:30 PM

#

I started the titanic competition just around a week ago. Why it says I crossed the deadline? How many days are allowed for this competition submission?

remote nimbus Feb 13, 2024, 1:06 AM

#

iron nest I started the titanic competition just around a week ago. Why it says I crossed ...

go to https://www.kaggle.com/competitions/titanic

Titanic - Machine Learning from Disaster

Start here! Predict survival on the Titanic and get familiar with ML basics

remote nimbus Feb 13, 2024, 1:07 AM

#

iron nest What is the baseline for titanic competition? What percentage of model predictio...

80% is an impressive score. 75% is an easy score, as simple as just looking at gender. 85% is about the best you can get without cheating or super-luck

iron nest Feb 13, 2024, 6:08 AM

#

remote nimbus 80% is an impressive score. 75% is an easy score, as simple as just looking at ...

Thanks Ben.

hearty stream Feb 17, 2024, 3:00 PM

#

Hey everyone, this would be my first kaggle competition. So before diving into it, I just wanted to confirm if I understood the problem statement correctly.
So the idea is to calculate survivial rate of the passengers. The features could be any, gender is one of them, which I guess is default.
Can we use any model to implement it? Like I went through some courses which showed Linear, decision tree, etc?
Am I missing anything?

runic finch Feb 17, 2024, 4:04 PM

#

hearty stream Hey everyone, this would be my first kaggle competition. So before diving into i...

Nope you’re correct. You can use any model you see fit to predict which passengers survive and which ones perish

mellow roost Feb 20, 2024, 2:04 AM

#

What is the purpose of the YOLO model under the "Models" section of the competition? https://www.kaggle.com/competitions/titanic/models
I know that we can use whatever model we would like, so just wondering (particularly because it seems that it is an image recognition model)

Also, what is the performance metric of this competition? Is it just submission based?

remote nimbus Feb 20, 2024, 11:45 PM

#

mellow roost What is the purpose of the YOLO model under the "Models" section of the competit...

Performance metric is based on accuracy of submission yea

#

not sure what the yolo model is

tacit flax Feb 25, 2024, 6:23 PM

#

how do people get 100% percent accuracy ? is that rly possible ?

quartz scarab Feb 26, 2024, 3:18 AM

#

tacit flax how do people get 100% percent accuracy ? is that rly possible ?

It's trivial to just look up the real answers for who lived/died and submit them to get 100% without using machine learning. That's why this is just a tutorial competition and not a real one. You should just ignore people who have 100%, all results on titanic are wiped every few months anyway, it's just for learning.

tacit flax Feb 26, 2024, 7:33 AM

#

quartz scarab It's trivial to just look up the real answers for who lived/died and submit them...

ohhh ok , thanks !

mellow pulsar Feb 29, 2024, 2:12 PM

#

Hey everyone! I've just started working on this competition properly and have a question.

Given that the dataset is already split into training and testing sets, does it still make sense to use cross validation?

In my head it feels like using cross validation on the training set won't be very beneficial because each iteration will have relatively small training sets

half ledge Mar 2, 2024, 2:59 AM

#

Anyone want to collaborate? I am very new at this

earnest steppe Mar 2, 2024, 3:16 AM

#

mellow pulsar Hey everyone! I've just started working on this competition properly and have a ...

Yes it still make sense to use validation split so that you can know how your model is performing on unseen data. In real life project, test data is not touched until the very last to evaluate model's accuracy. On kaggle, I think there is a limit on the maximum number of submission in 24 hours , so if you want to try more models than that limit then you won't be able to evaluate it. So it's advised to use cross validation.

plush musk Mar 2, 2024, 5:38 PM

#

Got 72% accuracy using RandomForstClassifier, is this good?

earnest steppe Mar 3, 2024, 2:13 AM

#

plush musk Got 72% accuracy using RandomForstClassifier, is this good?

I think you can improve the accuracy with some tuning

plush musk Mar 3, 2024, 3:24 AM

#

I'll try my best

restive patrol Mar 4, 2024, 9:01 AM

#

Hey guys
So I was trying something cool wanted to share with you all.
I used LLM (GPT4) to predict the outcome of each row, I fed a good set of facts about our data and I am getting an accuracy of 78%.

If someone wants to know more feel free to dm, might share soon on twitter too

#

"input": {
"PassengerId": 944,
"Pclass": 2,
"Name": "Hocking, Miss. Ellen Nellie""",
"Sex": "female",
"Age": 20.0,
"SibSp": 2,
"Parch": 1,
"Ticket": "29105",
"Fare": 23.0,
"Cabin": null,
"Embarked": "S"
},

#

"output": {
"Analysis": "The passenger whose details we are analyzing was a young adult (20 years old) female belonging to the 2nd passenger class. She boarded the Titanic from Southampton, she was not traveling alone but had two siblings or spouse on board, and one parent or child. Her fare belongs to the 3rd quartile. Unfortunately, we don't have data on the cabin assigned to her.",
"conclusion": "Based on the provided data and the survival rates, we know that females (74.2%) had a much higher survival rate than males. In addition, while the overall survival rate in the 2nd class was lower than the 1st class, young adult female passengers in the 2nd class had a respectable survival rate of 43.28%. The survival rate from Southampton was 33.7%, and for those traveling with family, it's slightly better at 50.6%. Being in the 3rd quartile fare might have slightly improved her chances as its survival rate is 45.5% compared to lower fares. Lack of cabin data gives her a lower rate of survival at 30%. Given the survival rates, the chances of this passenger's survival seems quite positive.",
"Answer": "Survive",
"Facts Used": "Gender, age, Pclass, Embarked, SibSp, Fare, Cabin",
" Additional Facts that might have helped": "Deck level would have given us more insights into the survival rate, as would knowing more about the relative ages and classes of her siblings/spouse and parent/child on board."
}

#

Haven't yet fine tuned the model, that would help give parameters the weightage

finite ermine Mar 5, 2024, 4:25 PM

#

Hey everyone, I'm a beginner in machine learning and I've been working on Titanic competition lately. I used logistic regression and my score is 0.59.

#

I believe there is an overfit, I tried regularization and selecting less features but it didn't work. What can I do to improve my model?

#

my notebook: https://www.kaggle.com/code/kutayozdur/titanic-logreg-1/notebook

Titanic-LogReg_1

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

crisp laurel Mar 7, 2024, 3:17 PM

#

finite ermine I believe there is an overfit, I tried regularization and selecting less feature...

I have basically the same problem, only that I'm using a different model.

woven crater Mar 7, 2024, 7:37 PM

#

hi guys, i just want to check for this dataset is it possible to get score 1?

#

just by using ML

#

i know this question have been asked multiple times sorry for asking repeated question

quartz scarab Mar 7, 2024, 9:03 PM

#

@woven crater Getting a perfect score using ML is unrealistic / basically impossible. If you think about the nature of the problem you can also intuitively understand this - survival on the titanic was not something you could perfectly predict from the information you are using to infer these judgements.

Scores of 1 on the leaderboard are simply people looking up the answers (since it's a real historical event). This is why the titanic is simply a tutorial problem.

woven crater Mar 7, 2024, 9:11 PM

#

Thanks for answering my question!

unique hawk Mar 11, 2024, 8:40 AM

#

Hello everyone, I am taking my first steps in data analysis with Python and I am not knowing how to solve this, I guess I am making a mistake in the location of the file, but the only thing I did was copy from the tutorial, sorry for the inconvenience and thank you very much to all.

#

Screenshot_2024-03-11-09-35-57-395_com.android.chrome.jpg

vivid shuttle Mar 11, 2024, 9:25 AM

#

unique hawk Hello everyone, I am taking my first steps in data analysis with Python and I am...

Hi Francisco. Can you confirm if the data set train.csv is present in the input section?

#

I have created a notebook from the competition and executed the commands same as yours. It works perfectly fine for me

unique hawk Mar 11, 2024, 9:31 AM

#

vivid shuttle Hi Francisco. Can you confirm if the data set train.csv is present in the input ...

Hi Yogita, I don't know how to do that, I'm lost

unique hawk Mar 11, 2024, 9:57 AM

#

Screenshot_2024-03-11-10-45-14-671_com.android.chrome.jpg

vivid shuttle Mar 11, 2024, 1:26 PM

#

unique hawk Hi Yogita, I don't know how to do that, I'm lost

Is it possible for you to once try Desktop/Website version. In this case the "Input" Section is present in the top right corner. I have never used on mobile

unique hawk Mar 11, 2024, 1:49 PM

#

vivid shuttle Is it possible for you to once try Desktop/Website version. In this case the "In...

Thanks a lot Yogita 🫰

cyan delta Mar 11, 2024, 5:02 PM

#

Hey, I am new to ML and learning ML by watching YT videos and MOOC. I am looking for a Mentor/Guide/Buddy with whom can share his/her experience with me and help me learn become a better ML practitioner.

brisk lintel Mar 13, 2024, 8:58 AM

#

Hey guys, I wanted to ask what classifiers should I use to get higher accuracy? So far the highest accuracy I have got with an optimized Decision Tree is 0.7799. I have also used Logistic regression,KNN,SVM,Random forest but they got lesser accuracy.

sage grove Mar 14, 2024, 5:45 AM

#

brisk lintel Hey guys, I wanted to ask what classifiers should I use to get higher accuracy? ...

It's the highest dude which is possible other techniques are merging other datasets to get more details thus predicting with higher accuracy

brisk lintel Mar 14, 2024, 5:51 AM

#

How can we merge other datasets? Did you mean shuffling of datasets or just merging other public Titanic datasets? If so doesn't that violate the term of the competition?

#

I also heard there is a way to merge different ML models to achieve greater accuracy? Is this true?

sage grove Mar 14, 2024, 6:11 AM

#

brisk lintel How can we merge other datasets? Did you mean shuffling of datasets or just merg...

There is an extension of this dataset which people use to get the higher accuracy

brisk lintel Mar 14, 2024, 6:37 AM

#

I see

neat citrus Mar 16, 2024, 6:53 PM

#

i scored 0.78229 in titanic predictions. how are people scoring 1.000???

tawdry raptor Mar 18, 2024, 9:58 PM

#

may be its overfit result

daring plover Mar 21, 2024, 8:04 PM

#

This was my first competition and I tried to use XGboost by obtaining an accuracy of 0.83, since I am new to Data Science and ML can you give me some feedback on my work?
my notebook: https://www.kaggle.com/code/davidg960/xgboost-classifier

hushed steppe Mar 22, 2024, 6:34 PM

#

Really enjoyed doing the competition tutorial.. are there little hints on techniques to try to boost my score? I don't want the answers, just a nudge in the right direction

#

Right now I am using Google Gemini to help me out

hushed steppe Mar 22, 2024, 8:05 PM

#

Here is a notebook on feature engineering that I found useful: https://www.kaggle.com/code/gunesevitan/titanic-advanced-feature-engineering-tutorial/notebook?scriptVersionId=27280410

Titanic - Advanced Feature Engineering Tutorial

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

wide yoke Mar 26, 2024, 2:18 PM

#

Hey guys, I'm new here and managed to get 84% at best, is that a reasonnable score & should I move to something else, or would you adivse me to keep looking to improve?

wide yoke Mar 26, 2024, 2:38 PM

#

actually, that 84% is on my validation set. When submitting, I'm at 75%, which is 2% lower than my previous attempt at 77%

I'm a bit confused: I've spent time trying to improve the validation set, making sure of no data leakage, and it ended up being worse at test set.

What's the recommanded process to avoid that in the future?

wraith spruce Mar 28, 2024, 8:29 AM

#

Hey guys , I upload a notebook on missing data in Titanic.
Explore the notebook now and be part of the quest to reveal the untold tales hidden within the depths of the Titanic dataset!
https://www.kaggle.com/code/sakshisatre/titanic-s-missing-data-navigating-null-values

"Titanic's Missing Data: Navigating Null Values"

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic Dataset

final cove Mar 31, 2024, 5:27 AM

#

Can we download the datasets and run it on our own computers? I have a windows computer and linux computer.

#

My windows computer is a Dell PowerEdge R720 server running Windows Server 2022. 2 x 10 core processors threaded which makes it 40 threads. 3.5 TB HardDrive and 128 GB RAM.

quartz scarab Apr 1, 2024, 12:40 AM

#

final cove Can we download the datasets and run it on our own computers? I have a windows c...

Yep!

final cove Apr 2, 2024, 4:08 AM

#

I was trying to follow the tutorial on this page: https://www.kaggle.com/code/alexisbcook/titanic-tutorial and seen that the package sklearn is deprecated in PyPI.

Titanic Tutorial

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

#

When I ran pip install sklearn it says it is deprecated and to use scikit-learn.

#

Nevermind. I had to install other packages.

safe orbit Apr 3, 2024, 7:17 PM

#

If anyone wants a vid to follow, I made this a few months ago: https://www.youtube.com/watch?v=6IGx7ZZdS74 Also have a p2 improving my model

YouTube

Ryan Nolan Data

Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic)

Welcome to my data science journey through the Kaggle Titanic - Machine Learning from Disaster Project!

In this video, we'll dive deep into the world of data analysis, feature engineering, and machine learning to predict passenger survival rates on the Titanic.

As Kaggle states: "The competition is simple: use machine learning to create a mode...

▶ Play video

acoustic oracle Apr 5, 2024, 1:19 AM

#

I have a question - why doesn't a pytorch model work for this competition

#

I tried making a neural network and getting predictions out of it but I repetitively get 0 score

acoustic oracle Apr 5, 2024, 1:55 AM

#

i do get an f1 score of 0.65

#

but my accuracy is still 0

#

i would greatly appreciate the help!

pseudo stratus Apr 6, 2024, 1:25 AM

#

neat citrus i scored 0.78229 in titanic predictions. how are people scoring 1.000???

Sorry it's late, but give this a read:
https://www.kaggle.com/code/carlmcbrideellis/titanic-leaderboard-a-score-0-8-is-great
I was super confused at first too and I thought that my scores were pretty lacklustre.

Titanic leaderboard: a score > 0.8 is great!

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

acoustic oracle Apr 6, 2024, 4:17 AM

#

pseudo stratus Sorry it's late, but give this a read: https://www.kaggle.com/code/carlmcbrideel...

Damn even though this was not addressed to me - Thanks for this! I was one of those folks who submitted float values and got a 0.00000 😭

hazy owl Apr 6, 2024, 10:59 AM

#

Ayo guys I had a question what is a good score for a beginner in this competition?

#

Nvm I saw the message above

pseudo stratus Apr 6, 2024, 3:08 PM

#

hazy owl Ayo guys I had a question what is a good score for a beginner in this competitio...

Anything above 0.78 I'd say

pseudo stratus Apr 6, 2024, 3:08 PM

#

acoustic oracle Damn even though this was not addressed to me - Thanks for this! I was one of th...

Hahaha yeah same

acoustic oracle Apr 6, 2024, 3:46 PM

#

pseudo stratus Anything above 0.78 I'd say

Damn I got 0.68899 -need to refine it

pseudo stratus Apr 7, 2024, 1:02 AM

#

acoustic oracle Damn I got 0.68899 -need to refine it

I have a pretty high standard for good haha don't sweat it too much 😅

acoustic oracle Apr 7, 2024, 1:06 AM

#

pseudo stratus I have a pretty high standard for good haha don't sweat it too much 😅

no because i guess the modal submission has 0.75 so you are right

#

I toned my model down that may be a reason why I got a lower score (maybe)

#

I'll try to get at least a 0.75 - refining

hazy owl Apr 7, 2024, 5:50 AM

#

pseudo stratus I have a pretty high standard for good haha don't sweat it too much 😅

I got near 0.61 and 0.62 for 1st two tries and 0.71 for the third since I changed my algorithm a bit

plush raven Apr 7, 2024, 12:37 PM

#

I got 0.67 yesterday)
It was my first attempt and I know only about LinearRegression. Got float results and divided them by 1 if abs(x) >= 0.5 else 0

acoustic oracle Apr 8, 2024, 12:14 AM

#

acoustic oracle no because i guess the modal submission has 0.75 so you are right

took a couple tries but got it to 0.75

green sparrow Apr 8, 2024, 3:48 AM

#

What are the next steps to improving? I got a 0.77, and I basically just did everything that I learned and took notes on from the pandas and intro to machine learning kaggle courses, so I just did a random forest with optimized max_leaf, and filled NaN values with the mean, and that was it

silk oracle Apr 8, 2024, 7:36 AM

#

Hello, I am new here, and trying with this "Titanic dataset". Upon examination, I've noticed a significant number of missing values in the 'Cabin' column. I believe that the cabin data could potentially offer valuable insights into survival probabilities. So, I am stuck in a dilemma regarding whether to discard this column or not. what are your thoughts on this matter?

hazy owl Apr 9, 2024, 10:03 AM

#

pseudo stratus Anything above 0.78 I'd say

I have got 0.784 but I can't go beyond that I did feature selection and hyper parameter tuning too 😭

hazy owl Apr 9, 2024, 10:04 AM

#

silk oracle Hello, I am new here, and trying with this "Titanic dataset". Upon examination,...

Try feature selection on it I think

#

Edit the new one I submitted is 0.78947

lost gazelle Apr 9, 2024, 10:41 AM

#

Hey all, new to ML and this is my first ever contest. Just learned a bit of KNN and Naive bayes and I could achieve an accuracy of 0.665. Long way to go it seems!

#

What's the minimum accuracy for it to be considered "good" informally? 0.75?

acoustic oracle Apr 9, 2024, 9:56 PM

#

hazy owl I have got 0.784 but I can't go beyond that I did feature selection and hyper pa...

What algorithm are you using?

plush raven Apr 10, 2024, 12:14 AM

#

plush raven I got 0.67 yesterday) It was my first attempt and I know only about LinearRegre...

Changed LinearRegression to LogisticRegression with the same feature selection and got 0.77

hazy owl Apr 10, 2024, 5:30 PM

#

acoustic oracle What algorithm are you using?

Random forest classifier with feature selection and hyper parameter tuning

acoustic oracle Apr 10, 2024, 5:31 PM

#

hazy owl Random forest classifier with feature selection and hyper parameter tuning

damn

#

thanks

#

lemme guess scikit learn

hazy owl Apr 10, 2024, 5:47 PM

#

acoustic oracle lemme guess scikit learn

Yes

signal dawn Apr 11, 2024, 11:57 PM

#

Hi everyone , I used Random forest classifier too , and got a score of 0.97 , with 7 features and pd.get_dummies () for categorical features. How can I check my model is not overfitting?

silk oracle Apr 12, 2024, 2:01 PM

#

plush raven Changed LinearRegression to LogisticRegression with the same feature selection a...

How much score did you get with Linear regression?

plush raven Apr 12, 2024, 2:02 PM

#

silk oracle How much score did you get with Linear regression?

0.67

silk oracle Apr 12, 2024, 2:09 PM

#

plush raven 0.67

LinearRegression.score(X_test, Y_test) is this the metric we have to measure??

green sparrow Apr 12, 2024, 5:03 PM

#

signal dawn Hi everyone , I used Random forest classifier too , and got a score of 0.97 , ...

Your 0.97 is the actual scoreboard score given by the competition right? If I understand overfitting correctly, it is not possible to overfit on test data when you don't have the true test results. Overfitting happens when your training test score is too high because it memorized the training data along with its noise etc. and generalizes it. But if you are running it on the test data, where you aren't even provided with the true answers, and they check it and give you a score of 0.97, then I feel like the 0.97 cannot be an indication of overfitting right?

I am new too, so take what I am saying with a grain of salt.

acoustic oracle Apr 12, 2024, 8:03 PM

#

signal dawn Hi everyone , I used Random forest classifier too , and got a score of 0.97 , ...

a 0.97 score is great on Kaggle

#

I got a 0.75 but I used a neural network

lost gazelle Apr 17, 2024, 12:17 PM

#

signal dawn Hi everyone , I used Random forest classifier too , and got a score of 0.97 , ...

If your leaderboard score is 0.97, then you're all good.

lost gazelle Apr 17, 2024, 12:18 PM

#

acoustic oracle I got a 0.75 but I used a neural network

Same, i got 0.76 to 0.79 everytime i used a NN

vast hingeBOT Apr 17, 2024, 12:18 PM

#

lost gazelle Apr 17, 2024, 12:20 PM

#

Damn I didn't know we could play games

acoustic oracle Apr 17, 2024, 12:45 PM

#

lost gazelle Same, i got 0.76 to 0.79 everytime i used a NN

Damn nice- how big was your network and what activation functions did you use?

lost gazelle Apr 18, 2024, 5:22 PM

#

acoustic oracle Damn nice- how big was your network and what activation functions did you use?

(32, relu, validation split 20%) × 3 layers and last layer sigmoid iirc

acoustic oracle Apr 18, 2024, 7:05 PM

#

lost gazelle (32, relu, validation split 20%) × 3 layers and last layer sigmoid iirc

Damn nice. I guess I have a very big NN cause mine goes to 128 and I don't use sigmoid (I use it only when I need to get the prediction labels)

acoustic oracle Apr 18, 2024, 7:05 PM

#

lost gazelle (32, relu, validation split 20%) × 3 layers and last layer sigmoid iirc

You use BCE loss or BCE with Logits loss?

kind fox Apr 18, 2024, 8:43 PM

#

hi guys, im doing a project for class on this dataset and need to understand how the data was collected and the original purpose of it. I couldn't find anything besides competition rules on kaggle. Does anyone have an idea?

lost gazelle Apr 19, 2024, 7:21 PM

#

acoustic oracle You use BCE loss or BCE with Logits loss?

Bce loss

lost gazelle Apr 19, 2024, 7:23 PM

#

kind fox hi guys, im doing a project for class on this dataset and need to understand how...

It's literally the titanic. the data must have been collected by the rescuers for obvious purposes

lost gazelle Apr 19, 2024, 7:25 PM

#

acoustic oracle Damn nice. I guess I have a very big NN cause mine goes to 128 and I don't use s...

Make a for loop and try for 3 layers with 16, 32, 64 and 128 nodes each layer. Check the loss at each and you'll see that the 32 node one probably has the least. Also in this for loop you can control different learning rates, num of epochs, validation split, etc etc

acoustic oracle Apr 19, 2024, 7:39 PM

#

lost gazelle Make a for loop and try for 3 layers with 16, 32, 64 and 128 nodes each layer. C...

wdym by look at the loss for each layer? Also how many input features do you take? I take 6. Also do you use batches?

lost gazelle Apr 20, 2024, 8:16 PM

#

acoustic oracle wdym by look at the loss for each layer? Also how many input features do you tak...

I don't remember how many features i took in but yea around 7 8 ignoring names and some others. And I didn't mean loss for each layer, i meant for each model.

acoustic oracle Apr 20, 2024, 8:17 PM

#

lost gazelle I don't remember how many features i took in but yea around 7 8 ignoring names a...

ah ok

spring coyote Apr 23, 2024, 8:56 AM

#

hi everyone, i wanna do the titanic competition with a friend but we cant seem to find a way to team up

#

according to some forums there should be a 'team' tab next to the rules tab, but there isnt

#

and when i tried to share my kaggle notebook with him, i wasnt able to enter my phone number. klicking into the field did not allow me to enter digits

hoary pawn Apr 23, 2024, 9:22 AM

#

Hey everyone, I'm currently stuck at 0.82 acc using Logistic Regression. I'm using 7 features, mapped the categorical ones and used the mean to fill the nans in the numerical columns. Should i try to improve the model even more or try another one? Any tips for improving this one?

vast hingeBOT Apr 26, 2024, 5:06 AM

#

vast hingeBOT May 2, 2024, 5:00 PM

#

vast hingeBOT May 5, 2024, 5:32 AM

#

sharp island May 7, 2024, 6:42 AM

#

Hello everyone! I'm new to Kaggle. I recently worked with a few other students on this competition. But I just noticed, this one doesn't have a "Teams" tab. How do we submit as a team for this competition?

#

wanton steeple May 7, 2024, 12:53 PM

#

sharp island Hello everyone! I'm new to Kaggle. I recently worked with a few other students o...

invite them

sharp island May 7, 2024, 1:38 PM

#

as i said, there's no team tab

#

also this link https://www.kaggle.com/competitions/titanic/team seems to lead to a 404 page

Titanic - Machine Learning from Disaster

Start here! Predict survival on the Titanic and get familiar with ML basics

velvet mist May 7, 2024, 1:47 PM

#

sharp island also this link https://www.kaggle.com/competitions/titanic/team seems to lead to...

that's weird! it's working for me.

#

I think you should contact support for this one

sharp island May 8, 2024, 12:28 AM

#

huh. okay thanks ^_^

simple bone May 8, 2024, 12:54 AM

#

sharp island also this link https://www.kaggle.com/competitions/titanic/team seems to lead to...

some time it happens but it come normal again after some time

#

it's happens alot with me with datasets

pliant flume May 13, 2024, 7:39 PM

#

Hey, need help

#

I dont find the option to join anywhere

#

Neither I can see the teams option

feral ruin May 15, 2024, 4:54 AM

#

Hey guys,
I am a beginner and I need help fixing my TensorFlow model that I created to participate in this competition. I have described everything in detail here: https://www.kaggle.com/competitions/titanic/discussion/502611. Could someone please take a look?

Titanic - Machine Learning from Disaster

Start here! Predict survival on the Titanic and get familiar with ML basics

sharp island May 16, 2024, 4:10 AM

#

pliant flume Neither I can see the teams option

yeah. same problem here... maybe you could send the support team a message but i doubt they would answer

sinful plover May 17, 2024, 6:31 AM

#

Hi! im new to the kaggle. im looking for teamates for titanic competition and also find anyone with good heart helping me for the machine learning

vernal creek May 17, 2024, 10:48 AM

#

hello, i am also starting with this titanic dataset

sinful plover May 19, 2024, 5:34 AM

#

vernal creek hello, i am also starting with this titanic dataset

maybe we can work together?

nimble badge May 22, 2024, 10:01 AM

#

sinful plover Hi! im new to the kaggle. im looking for teamates for titanic competition and a...

Me too! Can we work together?

high lance May 22, 2024, 1:56 PM

#

sinful plover Hi! im new to the kaggle. im looking for teamates for titanic competition and a...

Same here, let's work together

twin crescent May 22, 2024, 5:49 PM

#

sinful plover Hi! im new to the kaggle. im looking for teamates for titanic competition and a...

me too

vernal creek May 23, 2024, 9:45 AM

#

let's start

uncut stirrup May 25, 2024, 2:23 AM

#

hoy empiezo

nocturne vapor May 27, 2024, 8:10 AM

#

I am new in this feild and I am starting with
IBM AI Engineering Professional Certificate course on Coursera(https://www.coursera.org/professional-certificates/ai-engineer)
any suggestion?

Coursera

IBM AI Engineering

Offered by IBM. Launch your career as an AI engineer. Learn how to provide business insights from big data using machine learning and deep ... Enroll for free.

storm trail May 27, 2024, 3:40 PM

#

sinful plover Hi! im new to the kaggle. im looking for teamates for titanic competition and a...

hello i am also starting with titanic dataset

sweet shore May 27, 2024, 6:19 PM

#

Yeah me too started just now

left palm May 27, 2024, 8:18 PM

#

Hi, me too started with titanic dataset

sweet shore May 28, 2024, 4:59 AM

#

im just confused to start

orchid anvil May 28, 2024, 5:15 PM

#

Hello guys, I am brand new on Kaggle and just finished this challenge as my first challenge ever. I got a decent score of round about 83% accuracy. Now I am looking for improvements on my methods. Is there some common ground on how this problem should be approached? I am basically looking for a state of the art or best practice version where I can learn some new tricks.

mortal marten May 31, 2024, 12:13 PM

#

Hi guys

#

I also did my first submission just now on titanic.

#

I scored a 0 though. Idk what I did wrong.

#

For some reason, the actual accuracy (not the train.csv one but the test.csv one) of my bernoulli NB model is 100%

#

Can anyone help me with this?

pale field May 31, 2024, 2:09 PM

#

Did your answer CSV have the two columns?

#

https://www.kaggle.com/code/carlmcbrideellis/titanic-leaderboard-a-score-0-8-is-great

Titanic leaderboard: a score > 0.8 is great!

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

#

#

Or that

#

Did you convert to ints? @mortal marten

mortal marten May 31, 2024, 2:11 PM

#

pale field Did your answer CSV have the two columns?

Yes. It doesn't accept if it has less or more than passengerid and survived

mortal marten May 31, 2024, 2:12 PM

#

pale field Did you convert to ints? <@679010371813638154>

I converted to floats. Does it make a difference?

pale field May 31, 2024, 2:12 PM

#

The survived needs to be int

mortal marten May 31, 2024, 2:12 PM

#

I see. I converted everything to floats

mortal marten May 31, 2024, 2:13 PM

#

mortal marten For some reason, the actual accuracy (not the train.csv one but the test.csv one...

What about this?

#

I actually used many types of models just to compare them. My xgboost random forest model also got only one prediction wrong.

sweet shore May 31, 2024, 6:23 PM

#

Hello jinay Vora iam also worked on it 2 days before

mortal marten Jun 1, 2024, 3:43 AM

#

sweet shore Hello jinay Vora iam also worked on it 2 days before

What was your method

sweet shore Jun 1, 2024, 3:56 AM

#

I did it by decision tree classifier

mortal marten Jun 1, 2024, 2:46 PM

#

sweet shore I did it by decision tree classifier

Nice. I found my problem. As stated by @pale field I submitted 'Survived' as float datatype instead of int. I corrected it and I got 0.77 score. I still don't know why my model predicts all of them correct.

#

I used 19 different models so that I can compare all of them with each other.

#

1 model got 100%. 1 model only predicted one wrong.

#

Not the training accuracies but the actual accuracies

#

Training accuracy was like 86

sweet shore Jun 1, 2024, 2:52 PM

#

Did you use logistic regression

mortal marten Jun 1, 2024, 3:03 PM

#

@sweet shore I used
Gaussian NB
Bernoulli NB
Complement NB
Multinomial NB
Decision Tree Classifier
Random forest classifier
Xgboost classifier
Xgboost random forest classifier
Adaboost classifier
Logistic regression
K nearest neighbour's
Bagging classifier
Hard voting classifier
Soft voting classifier
Stacking classifier
SVC classifier
And 3 more I don't remember.

coarse hill Jun 1, 2024, 3:59 PM

#

Hi anyone looking to work on a dataset together.
I'm very new to kaggle and wanted to kick start my journey.

nocturne vapor Jun 3, 2024, 7:45 AM

#

hey guys, I was working on this dataset and I have
used Random forest without tuning
features = all - (name, ticket, cabin)
but accuracy at submission was 77%
Any suggestion??

indigo radish Jun 3, 2024, 5:04 PM

#

Try logistic regression!

#

It is a very standard approach towards this data, and maybe for this Random Forest algorithm, try dropping the name column as it adds no value to the data!

#

Try looking into the correlation of the different features amongst each other...... try plotting the data, do more of EDA and see what you can derive from the data...

nocturne vapor Jun 4, 2024, 1:15 AM

#

Ok I will try

nocturne vapor Jun 4, 2024, 6:56 AM

#

I saw correlation using heatmap
after that I chose 3 features for prediction ['Pclass', 'Sex','Fare'] accuracy didn't improved when I submitted on kaggle it was 77%

#

I don't know how people are getting 100% accuracy

mortal marten Jun 4, 2024, 8:46 AM

#

nocturne vapor I don't know how people are getting 100% accuracy

I got 100% accuracy. Even I dont know how I got it.

wraith flare Jun 4, 2024, 5:34 PM

#

mortal marten I got 100% accuracy. Even I dont know how I got it.

Which features you give in titanic model?

#

I got only 82% accuracy

wraith flare Jun 4, 2024, 5:37 PM

#

mortal marten <@1241815379743871068> I used Gaussian NB Bernoulli NB Complement NB Multinomia...

Which algorithm give you 100 per cent accuracy

mortal marten Jun 4, 2024, 6:20 PM

#

wraith flare Which features you give in titanic model?

I used gender, age, fare, class and embarked

#

And the algorithm was Bernoulli NB

mortal marten Jun 5, 2024, 5:04 AM

#

I converted embarked of SQC to 1 2 3

wraith flare Jun 5, 2024, 2:58 PM

#

mortal marten I converted embarked of SQC to 1 2 3

With LabelEncoder or OrdinalEncoder

mortal marten Jun 5, 2024, 2:58 PM

#

Default settings lmao

wraith flare Jun 5, 2024, 2:59 PM

#

mortal marten Default settings lmao

Using dictionary

mortal marten Jun 5, 2024, 2:59 PM

#

wraith flare Using dictionary

Dataframe

#

Random state 42 and test size 0.3 or 0.2 idk

wraith flare Jun 5, 2024, 3:00 PM

#

mortal marten Random state 42 and test size 0.3 or 0.2 idk

Yes i did this

mortal marten Jun 5, 2024, 3:00 PM

#

Nice

wraith flare Jun 5, 2024, 3:01 PM

#

Maybe due to LabelEncoder the value SQC will mismatch with the value.let me check it out

mortal marten Jun 5, 2024, 3:07 PM

#

wraith flare Maybe due to LabelEncoder the value SQC will mismatch with the value.let me chec...

I manually converted it to 123

#

Then dropped the original sqc column

nocturne vapor Jun 6, 2024, 5:12 AM

#

mortal marten I used gender, age, fare, class and embarked

ok I will try these features but I have used those features after seeing Heatmap correlation

mortal marten Jun 6, 2024, 5:13 AM

#

nocturne vapor ok I will try these features but I have used those features after seeing Heatmap...

I see. I just checked simple correlation and then used them.

wraith flare Jun 6, 2024, 6:54 AM

#

wraith flare Jun 6, 2024, 1:25 PM

#

When i predict the test.csv it accuracy go to 0.76 why??

mortal marten Jun 6, 2024, 7:13 PM

#

wraith flare

Did you remove outliers?

#

I think my model has some problem I'm too beginner to understand

coarse brook Jun 8, 2024, 4:53 AM

#

Okay, so I have my predictions
the bottom is the survival rate laid out in the train data, and the top is the prediction from the test data. Its a smaller size dataset, so would it be different, or?

plain belfry Jun 15, 2024, 11:54 AM

#

Hi i'm new. Perhaps a silly question; i'm confused on how i can measure accuracy of my model if there is no y_test data

half grail Jun 17, 2024, 2:21 PM

#

plain belfry Hi i'm new. Perhaps a silly question; i'm confused on how i can measure accuracy...

split train_data into train_data and test_data

desert epoch Jun 20, 2024, 10:52 PM

#

plain belfry Hi i'm new. Perhaps a silly question; i'm confused on how i can measure accuracy...

Hello there, before testing you could split your train into train and validation. When you measure based on validation you might have an idea of the outcome of the test.
Something else you can do is.k fold cross validation of the entire train dataset using cross_val_score.
I hope this helps.
Thanks

young zenith Jun 24, 2024, 8:06 PM

#

coarse hill Hi anyone looking to work on a dataset together. I'm very new to kaggle and want...

Hi Manu , Even i'm new here

young zenith Jun 26, 2024, 7:16 AM

#

Hey guys , how long does it take for the competition submission score to occur?

restive granite Jun 29, 2024, 5:16 AM

#

Glad to build the model today , it was a pure headache honestly but i did learned many things along the way

livid lantern Jul 2, 2024, 10:11 AM

#

My model does not make predictions very accurately, is there a recommendation for an educational video (with an easy level of English) that will make predictions that are close to 100%?

icy wadi Jul 3, 2024, 6:53 PM

#

livid lantern My model does not make predictions very accurately, is there a recommendation fo...

provide more information, what's accuracy you've got? What the model you used and how do you handled your data?

rotund sail Jul 4, 2024, 11:38 PM

#

mortal marten And the algorithm was Bernoulli NB

I don't understand intuitively why the embarked location would impact survival rate? i kown this is predicted by model, but it's just not that intuitive

versed bone Jul 5, 2024, 12:58 PM

#

wraith flare

use grid search on the regularization term u may get it up a bit

median shale Jul 6, 2024, 2:38 AM

#

Hi, anyone know if there's another link for Alexis Cook’s Titanic Tutorial? , the one given in the platform is not working.

median shale Jul 6, 2024, 3:30 AM

#

median shale Hi, anyone know if there's another link for Alexis Cook’s Titanic Tutorial? , th...

I've found it, in case anyone else need it: https://www.kaggle.com/code/alexisbcook/titanic-tutorial

Titanic Tutorial

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

restive granite Jul 6, 2024, 5:48 AM

#

median shale I've found it, in case anyone else need it: https://www.kaggle.com/code/alexisbc...

Thanks

mortal marten Jul 6, 2024, 7:34 PM

#

rotund sail I don't understand intuitively why the embarked location would impact survival r...

Embarked location may help classify rich and poor people. Like there is one place where the 1st class passengers were in majority.

#

Idk or I'm shooting a shot in the dark

humble ferry Jul 7, 2024, 2:39 AM

#

mortal marten Embarked location may help classify rich and poor people. Like there is one plac...

But why is it important? When I try to add this feature, it worsens my score on the test set but improves it on the train set. I think we already have Pclass that determines who's poor and who's rich. So, maybe this feature is redundant, or I'm doing something wrong.

mortal marten Jul 7, 2024, 4:06 AM

#

humble ferry But why is it important? When I try to add this feature, it worsens my score on ...

As i said, shot in the dark

junior needle Jul 7, 2024, 10:26 AM

#

Hey fellas ,
I got an accuracy of 0.78708 V17 , i have done feature engineering , data cleaning and trained different models and found out the best model according to me fn. How to improve my rating ? I am a beginner? At CV set i am getting about 0.8715 but idk why test is too low? Any reasons"
Also when do you stop ,till you achieve 100% or 80%+?

neon raft Jul 7, 2024, 1:44 PM

#

you don't have to achieve 100% that would be a case of overfitting

#

you can predefine a benchmark for yourself.

desert epoch Jul 10, 2024, 11:29 AM

#

neon raft you don't have to achieve 100% that would be a case of overfitting

He's talking about the test dataset probably

desert epoch Jul 10, 2024, 3:45 PM

#

junior needle Hey fellas , I got an accuracy of 0.78708 V17 , i have done feature engineering ...

I get the same issue too 😆 no matter what I do, it looks like it gets even worse.
https://www.kaggle.com/code/richarddev/titanic-survival-prediction

Titanic: Survival Prediction

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

crystal reef Jul 10, 2024, 11:42 PM

#

i get like 77% using random forests

gray charm Jul 11, 2024, 2:03 PM

#

i'm Completely new to kaggle can anyone help me to start? How to start the challange?

serene plover Jul 13, 2024, 11:16 AM

#

gray charm i'm Completely new to kaggle can anyone help me to start? How to start the chall...

Just go to code, follow any workbook, make a copy, and submit.

wooden lintel Jul 13, 2024, 11:16 AM

#

serene plover Just go to code, follow any workbook, make a copy, and submit.

is there a step by step so that we can learn as well

sweet steppe Jul 13, 2024, 9:48 PM

#

hi there folks looking for some help. The titanic test data is missing values. what are some recommended ways of handling that? I am using decision tree model. Deletion of rows with missing data won't work since the output must be 418 rows.

red socket Jul 14, 2024, 3:13 AM

#

sweet steppe hi there folks looking for some help. The titanic test data is missing values. w...

Hey, I suggest you use either XGBoost or Gradient Boost, they will give you a higher accuracy than decision tree. Plus, you should fill in the missing values in the test data with either the mean or mode. And can anyone check out my code, so I can improve my accuracy? https://www.kaggle.com/code/vishalyginny/titanic

Titanic

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

red socket Jul 14, 2024, 3:15 AM

#

wooden lintel is there a step by step so that we can learn as well

maybe try to follow the tutorials in the beginner competitions?

manic dew Jul 17, 2024, 8:14 PM

#

Hi guys I just submit the first prediction titanic with the tutorial, but I am wondering how it goes from now, I do not think it is over. Right?!? tks

stone basin Jul 20, 2024, 10:01 AM

#

manic dew Hi guys I just submit the first prediction titanic with the tutorial, but I am w...

After submitting the first prediction, we have to learn to how implement a machine learning model, then updating second version.

opal nebula Jul 25, 2024, 3:19 PM

#

Hello All, I am trying to do my first kaggle project, I am having basic issue, I am not able to see "Join competition button",, any suggestions?

opal nebula Jul 25, 2024, 5:07 PM

#

Hello All,, I followed through the cook book and used randomforest model and it generateda public score of .77751, Is there any suggestion on how I can make it better

fickle geyser Aug 4, 2024, 2:23 AM

#

opal nebula Hello All, I am trying to do my first kaggle project, I am having basic issue, I...

Check if your account is verified. A Phone number is required to verify your account.

gaunt cradle Aug 12, 2024, 12:51 PM

#

me. Im beginner in this field.

stone knoll Aug 13, 2024, 4:53 AM

#

here

stone knoll Aug 13, 2024, 5:00 AM

#

livid lantern My model does not make predictions very accurately, is there a recommendation fo...

100% accuracy is not good.at that time, you should take overfitting into consideration.

livid lantern Aug 15, 2024, 6:18 PM

#

stone knoll 100% accuracy is not good.at that time, you should take overfitting into conside...

Yes, I have learned a lot since my last post about the problem and now I understand that training should be interrupted when the error function of the validation sample starts to increase, but I still haven't dared to solve the problem again....

#

can you tell me what accuracy I, as a beginner with two months of experience in machine learning (I can start solving it later), should strive for, I am already finishing a fairly large course on machine learning and am beginning to suspect that rough training in this problem is unnecessary, therefore I will have to take it up again in the near future

near bolt Aug 15, 2024, 7:59 PM

#

Does anyone know what we are suppose to do in feature engineering?

stone knoll Aug 16, 2024, 8:22 AM

#

livid lantern can you tell me what accuracy I, as a beginner with two months of experience in ...

sry, i am a beginner too. The max accuracy i have got is 80%. in fact , there is a long way to increase the accuracy. However, i have no idea now. i'm so sorry that i can't help you.

stone knoll Aug 16, 2024, 8:28 AM

#

mortal marten <@1241815379743871068> I used Gaussian NB Bernoulli NB Complement NB Multinomia...

hello, can you share your codes. hearing you got a high accuracy , i have a great interest to learn how you implement. thanks!

mortal marten Aug 16, 2024, 8:43 AM

#

stone knoll hello, can you share your codes. hearing you got a high accuracy , i have a gre...

https://www.kaggle.com/code/jinayvora25/titanic-ml-models

This is my first ml model. So please don't mind the messy code. Also let me know if you find any errors.

Titanic ML Models

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

stone knoll Aug 16, 2024, 9:14 AM

#

mortal marten https://www.kaggle.com/code/jinayvora25/titanic-ml-models This is my first ml m...

thanks, it's so kind of you! what is your test accuray ?

stone knoll Aug 16, 2024, 9:17 AM

#

mortal marten https://www.kaggle.com/code/jinayvora25/titanic-ml-models This is my first ml m...

what dose this graph mean?

mortal marten Aug 16, 2024, 10:03 PM

#

stone knoll what dose this graph mean?

Good question. I don't remember

#

Oh wait it's the sibling one

#

I plotted the sibling column with respect to the survived

supple galleon Aug 16, 2024, 10:26 PM

#

Hey guys, I am starting in kaggle and Idk how to start in the titanic competition

mortal marten Aug 17, 2024, 6:20 AM

#

supple galleon Hey guys, I am starting in kaggle and Idk how to start in the titanic competitio...

Refer to other submitted notebooks. Also use your EDA fundamentals.

#

Try and visualize the data as much as possible. Understand the data thoroughly.

daring plover Aug 19, 2024, 4:04 PM

#

Can you tell me why when I train a model (XGBoostClassifier) with hyperparameter tuning it gives me an accuracy score lower that the one obtained without tunig?

viral skiff Aug 24, 2024, 7:27 PM

#

daring plover Can you tell me why when I train a model (XGBoostClassifier) with hyperparameter...

Is it overfitting?

small wolf Aug 28, 2024, 11:34 AM

#

Hey guys, in this tutorial im following: https://www.kaggle.com/code/amitkumarjaiswal/beginner-s-tutorial-to-titanic-using-scikit-learn/notebook
there is a reference to #1, #2, and #3, any idea what theyre talking about with these numbers,

Heres an example blurb,:

Beginner's Tutorial to Titanic using Scikit-learn

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

stone knoll Sep 1, 2024, 6:35 AM

#

Is there anyone get a full prediction?

umbral garden Sep 7, 2024, 9:59 PM

#

Hello everyone, I'm (new and) learning ML and this is my first time joining a Kaggle competition, I am stuck in an error for hours, and any help is welcome! Challenge: Titanic - Machine Learning from Disaster, needs help with feature engineering pipeline error. (I'm new to implementing a pipeline like this) What I'm trying to achieve: Age is a predictor variable, from 0.0 to 80.0, and also contains NaN. I want to bin this feature, first assigning NaN to number 999, then binning like: 0-1 is "Infants", "1-4" is "Toddlers", ..., "100-inf" is "Unknown", then, One-hot encoding the features. However, when I try to run a random forest model (image 4), I got the following error because my AgeBinningTransformer (image 1) is incorrect: `Cell In[53], line 24, in AgeBinningTransformer.transform(self, X)
22 X_copy = X.copy()
23 # Ensure 'Unknown' label is correctly handled
---> 24 X_copy['AgeGroup'] = pd.cut(X_copy['Age'], bins=self.bins, labels=self.labels, include_lowest=True)
25 return X_copy

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices` Any suggestion is welcome, thank you 🥲

#

fossil skiff Sep 8, 2024, 11:49 AM

#

Hi All, new member from Sheffield UK.

My question re Titanic comp is - are people genuinely achieving 100% on Titanic dataset competition? Seems like a stretch to reach. Is it a loophole? What score should we be aiming for before moving on?

fossil skiff Sep 8, 2024, 12:04 PM

#

umbral garden Hello everyone, I'm (new and) learning ML and this is my first time joining a Ka...

You should convert the age column to numeric.

fossil skiff Sep 8, 2024, 12:07 PM

#

umbral garden Hello everyone, I'm (new and) learning ML and this is my first time joining a Ka...

Do you use LLMs to help you with errors like this or do you class that as cheating? You can always ask the LLM to help you get to the answer without being explicit

mental trout Sep 8, 2024, 10:28 PM

#

umbral garden Hello everyone, I'm (new and) learning ML and this is my first time joining a Ka...

I might have to look at the full code to suggest a correct diagnosis. However, I managed to make it work by using two separate approaches: The first one is by using KBinsDiscretizer (image 1) to bin into 5 groups (for example) via ordinal encoding, and the next one is a modified version of your code (image 2) which then feeds the pipeline for processing (image 3). Hope it helps.

umbral garden Sep 8, 2024, 10:42 PM

#

mental trout I might have to look at the full code to suggest a correct diagnosis. However, I...

Hi! Thank you for helping! For that particular error I managed to resolve it... I still need to work on the preprocessing step tho. Also, I made my workbook public: https://www.kaggle.com/code/bigsmallmediumpotato/titanic-ml-challenge

Titanic ML challenge

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

umbral garden Sep 8, 2024, 10:46 PM

#

fossil skiff Do you use LLMs to help you with errors like this or do you class that as cheati...

you mean ChatGPT? it's great when I know what code I'm looking for, so I just enter the prompt and let it return what I'm expecting. It's great automation. It can't replace debugging process though, and independent thinking. However, it is helpful to enter the prompt: in simple terms, explain [a line of code]. followed by the prompt: an example

mental trout Sep 8, 2024, 11:13 PM

#

umbral garden Hi! Thank you for helping! For that particular error I managed to resolve it... ...

Cool, I'll give it a look.

umbral garden Sep 8, 2024, 11:19 PM

#

mental trout Cool, I'll give it a look.

Ty! you're welcome to collaborate on this if you want 🙂

mental trout Sep 8, 2024, 11:55 PM

#

umbral garden Ty! you're welcome to collaborate on this if you want 🙂

Sure, I'm still relatively new in Kaggle so you might have to guide me with some of the rules and regulations lol.

umbral garden Sep 9, 2024, 12:00 AM

#

This is my first challenge as well, I think I can add you as a collaborator with your Kaggle username. rules wise, team submissions are allowed (per leaderboard).

mental trout Sep 9, 2024, 1:23 AM

#

umbral garden This is my first challenge as well, I think I can add you as a collaborator with...

Kaggle username is shahriarrahman10, I'm also connecting with you on Discord.

digital latch Sep 15, 2024, 3:50 PM

#

Hello guys , is anyone doing titanic challenge ? I'm kind of new to ml coding, and after two days of coding I have sucesfully got the accuracy rate of 73% to 75% (using naive bayes) for the validation set that I broke from the train set ,
I have tried to work with the test set , though I have completed the data preprocessing but.. since the y_test (Survival feature) is missing I can't exactly use the model to test set..

small wolf Sep 15, 2024, 3:53 PM

#

digital latch Hello guys , is anyone doing titanic challenge ? I'm kind of new to ml coding, a...

you should split your training set into two parts, one for test and one for training. The test set provided is just the tests you need to run for submission

#

oh i guess you knew that actually lol,

#

you'll know your accuracy when you submit

digital latch Sep 15, 2024, 4:01 PM

#

small wolf you should split your training set into two parts, one for test and one for trai...

yeah , I don't know how to apply the model to the test set since I don't know which true y value to compare the y predicted value for the test set

small wolf Sep 15, 2024, 4:02 PM

#

once you submit your csv it will give you the accuracy, they dont give you the trueY for the test set becuse then you could just submit that

#

check out this section on the competition page

#

https://www.kaggle.com/c/titanic

Titanic - Machine Learning from Disaster

Start here! Predict survival on the Titanic and get familiar with ML basics

digital latch Sep 15, 2024, 4:36 PM

#

small wolf once you submit your csv it will give you the accuracy, they dont give you the t...

ah thank you , I submitted my submission file , I got 67.7% accuracy.

small wolf Sep 15, 2024, 4:42 PM

#

fiery palm Sep 17, 2024, 5:18 AM

#

Hello @everyone, just made my first submission woohoo . Had an accuracy of 76% using the Random Forest classifier 🌲 . I replaced all the NaN values in the Age feature using 0. This is probably not a good idea 🥲 . I selected the min_samples_split, max_depth and n_estimators by plotting a few graphs, but my feature engineering skill is not developed yet. Any advice on how to increase the accuracy? 🙂

umbral garden Sep 20, 2024, 7:58 PM

#

fiery palm Hello @everyone, just made my first submission <:woohoo:1138924114707480726> . H...

Hi, you can look through my notebook section 2 feature engineering for a summary of what I did. My highest score is 0.789 using categorial boosting. Hope this helps! https://www.kaggle.com/code/bigsmallmediumpotato/titanic-ml-challenge-top-10

Titanic ML challenge - Top 10%

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

simple meteor Sep 21, 2024, 1:18 PM

#

Man, I'm already stuck at the first line of code. Can someone please guide me here? I'm not sure what I did wrong..

#

here is the first block of code that got cut off at the top:

rigid parcel Sep 21, 2024, 10:59 PM

#

simple meteor Man, I'm already stuck at the first line of code. Can someone please guide me he...

You need to run import pandas as pd for the data to be read in. That line is present in the first block of code that got cut off so maybe you forgot to run it

arctic pilot Sep 22, 2024, 1:38 AM

#

I got 0.76555 accuracy in the titanic competition using XGBoost model, how can I improve the accuracy ?

strong sandal Sep 26, 2024, 6:22 AM

#

hey guys, i have been suggested to use knn or mice to impute the missing value for age, is this optimal?

mental trout Sep 26, 2024, 4:06 PM

#

strong sandal hey guys, i have been suggested to use knn or mice to impute the missing value f...

Since age is one of the primary contributors, it is generally a good approach to impute using a regression model like RFR, XGboost, Knn Reg, etc.

hexed birch Sep 26, 2024, 7:46 PM

#

Hey guys, if anyone can help me out on how to improve I'd be forever grateful! So, this is the the train and test data before one-hot-coding on embarked and social status features. Before creating X and y, I dropped Name, ID, and that was it! Also, using RandomForest for the clf!

#

My goal is to get at least 80%+ before using multiple classifiers, and often the outcome for this one is 77-79

glad sapphire Sep 27, 2024, 2:26 PM

#

simple meteor Man, I'm already stuck at the first line of code. Can someone please guide me he...

I think you need to import pandas as pd in first line

steep crag Sep 29, 2024, 5:35 AM

#

Hey everyone! I am new here. I recently tried a decision tree and a random forest algo on the dataset and had a score of 0.75 and 0.77 respectively. Can someone please share how they got a score of 1 and what algorithm/ process they followed?

hexed birch Sep 29, 2024, 8:26 PM

#

steep crag Hey everyone! I am new here. I recently tried a decision tree and a random fores...

I did manage to see one example of someone reaching for that 1.0, but he used multiple classifiers, tensor flow and a voting system!

vocal island Oct 1, 2024, 4:56 AM

#

hey guys ,im new here,i want to know how do we select particular algorithm ?is it based on data set?

#

and what is Exploratory data analysis ?what is its use?

mystic egret Oct 2, 2024, 2:36 AM

#

How do i updated my regression parameter value as i adding or deleting data?

maiden cove Oct 2, 2024, 11:27 PM

#

Anyone have the floor plans of the Titanic? I'm splitting up the Cabin row so I can make a floor column and a roomNumber column, but I can't find floor plans to see if the room number matters for nearness to stairs\exit.

azure wharf Oct 4, 2024, 2:32 AM

#

regarding this problem, i am supposed to try whatever model i want and choose the one with best result? incidentaly for this problem ill be choosing decision tree but for other problem should i try more than 1 model?

hexed birch Oct 4, 2024, 6:24 PM

#

maiden cove Anyone have the floor plans of the Titanic? I'm splitting up the Cabin row so I ...

For me at least, Cabin was dropped since 77% of the column is missed.

sharp roost Oct 8, 2024, 8:15 PM

#

Hey, guys. Could you share an insight on how to select a proper model for the Titanic task and similar tasks?

digital latch Oct 9, 2024, 8:13 AM

#

sharp roost Hey, guys. Could you share an insight on how to select a proper model for the Ti...

I don't know how correct I'm but from what I understand , the selection of your model depends on the task you want to do , in case of titanic dataset, since we are asked to predict whether the passenger will survive (True or 1) or will perish (False or 0) , since we are asked to classify the passenger's survival , this is a Binary(since only two possible outcomes) Classification problem.
If there were more than 1 outcomes like 0, 1 or 2 etc it would have been a multiclass Classification problem.
Since we are are to do classification , some standard algorithms like KNN , Naive Bayes , Logistic Regession models and some others are use to classify specifically , we select the model which has the least (minimum) cost function [think of this as error] , a lower learning curve(takes less time to compute) and higher accuracy score,
on the other hand we could also classify using neural networks for more robustness and more dexterity.

dusky furnace Oct 11, 2024, 3:43 PM

#

Hey guys . How can I submit my colab notebook on keggle?

fossil skiff Oct 13, 2024, 3:25 PM

#

umbral garden you mean ChatGPT? it's great when I know what code I'm looking for, so I just en...

IMO - it's not as black and white as that. They can increase independent thinking productivity tenfold if used creatively.

proven flower Oct 17, 2024, 9:11 PM

#

Hi, I see the score is 1.0. Does that mean the accuracy is 100%? If so, could this indicate the model is overfitting?

tiny zenith Oct 20, 2024, 10:58 PM

#

hello, I'm just getting started on my titanic submission and I can't figure out how to use the notebook. When I try to type 'import', the entire line disappears and so does the cursor. then I try to click on the code cell to edit the code again, and the entire line I click on disappears. It's pretty much impossible to write any code. Does anyone know what I'm doing wrong?

keen halo Oct 27, 2024, 3:27 PM

#

hexed birch Hey guys, if anyone can help me out on how to improve I'd be forever grateful! S...

I wasn't able to go over 78% also usinf multiple classifier with LR, DT, KNN, RF, XG, I have defined the Fare/Pclass, FamilySize and divided in bin the ages, have you found other features or have you tried other algorithms?

hexed birch Oct 30, 2024, 1:38 PM

#

Sorry about the delay, I'll take a good look in that dataset later in the day, but as far as memory go, I did tried many. I used this flowchart https://scikit-learn.org/stable/machine_learning_map.html, but at the end of the day, the RF seems to perform considerably bettter than any other. As far as Feature Engineering goes, I kept pretty simple. One thing that I remember making a very little difference - but some improve nonetheless, it's to use a Grid Search to find the best hyperparameters for the estimator. So, take a look of that as well!

pseudo stratus Nov 3, 2024, 10:11 AM

#

keen halo I wasn't able to go over 78% also usinf multiple classifier with LR, DT, KNN, RF...

I got 81 using a neural network, no stacking. Definitely possible to get higher.

keen halo Nov 4, 2024, 7:51 AM

#

pseudo stratus I got 81 using a neural network, no stacking. Definitely possible to get higher.

Ah ok, I am trying to do without NN, but thank you for the answer

sly trellis Nov 7, 2024, 7:08 PM

#

Hi guys, What do you consider a good score (satisfactory)?

tough linden Nov 7, 2024, 11:09 PM

#

depends

#

95 is really hard to push past

#

80 isnt a bad number though

knotty knot Nov 8, 2024, 12:22 AM

#

Hi folks, i'd like to understand if someone of you experienced to have a very high accuracy on BOTH train and x-validation sets, and nonetheless having 10% less on the test submitted on kaggle. Since i see no particular issues on how i engineered features of the test set to have it aligned with the features of the model trained (one hot encoding, features dropped etc...) it remains only the assumption of having an overfitting model. But then why is that model performing well on the cross validation set ?
Just to give you an order of magnitude:

acc on train set: 86%
acc on cv set: 85 and counting %
-acc on kaggle submission: 76 and counting % ... -.-

gentle marten Nov 8, 2024, 2:08 AM

#

knotty knot Hi folks, i'd like to understand if someone of you experienced to have a very hi...

Hey, I am a beginner as well but I encountered something like this while working on the playground series for this month, you might want to look at increasing the amount of folds (if youre using a stratified k-fold) just throwing this out there

#

I was encountering a large difference between cv and leaderboard with 5 folds (around 140k entries in dataset) and then after doing some parameter optimization trials with 8 splits instead my leaderboard score almost exactly related to my cv score

sly trellis Nov 8, 2024, 10:50 AM

#

tough linden 80 isnt a bad number though

I see
Just starting off so wanted to know when can I move on from a competition to another

#

Can anyone suggest me how can I fill Null values in the cabin column
I see a pattern that cabins starting with C,D,E,F have higher chance of survival compared to others but dont find any relation between fare, class and cabin

also can anyone tell me if they are able to use fare and age? I dont see them being useful

desert epoch Nov 10, 2024, 8:45 PM

#

knotty knot Hi folks, i'd like to understand if someone of you experienced to have a very hi...

To get a better understanding of how the submission is evaluated, I suggest you use Matthew's Correlation Coefficient for evaluating your train, as this is a better reflection of the submission scores.

real pebbleBOT Nov 13, 2024, 6:19 AM

#

real pebbleBOT Nov 13, 2024, 10:28 AM

#

real pebbleBOT Nov 13, 2024, 12:46 PM

#

slender oyster Nov 14, 2024, 8:25 AM

#

Hello, I'm a beginner, just started learning data analytics last month. I used an XGBoost model on my train dataset with some engineered features. I've had four submissions and the highest I got so far is only at 77.99%. I've tried different kinds of engineered features to improve the model but retained only those that seem to work. For a beginner, is it possible to push past 80% or that requires a bit more advanced knowledge?

hazy falcon Nov 14, 2024, 10:35 AM

#

slender oyster Hello, I'm a beginner, just started learning data analytics last month. I used a...

my friend used some tree diagrams and made through 77%

#

you definitely can push past 80%

ripe ridge Nov 14, 2024, 12:04 PM

#

slender oyster Hello, I'm a beginner, just started learning data analytics last month. I used a...

Can you let me know if you succeed pushing past 80%. I’m in the exact same situation!

real pebbleBOT Nov 14, 2024, 4:05 PM

#

real pebbleBOT Nov 14, 2024, 6:51 PM

#

hazy falcon Nov 15, 2024, 1:10 PM

#

everyone is now a farm merge valley player

real pebbleBOT Nov 18, 2024, 7:31 PM

#

real pebbleBOT Nov 19, 2024, 11:31 AM

#

real pebbleBOT Nov 20, 2024, 1:57 PM

#

sinful rain Nov 23, 2024, 5:02 AM

#

Z

real pebbleBOT Nov 23, 2024, 3:48 PM

#

real pebbleBOT Dec 2, 2024, 6:56 AM

#

real pebbleBOT Dec 4, 2024, 2:20 PM

#

whole garden Dec 4, 2024, 4:41 PM

#

What are the prerequisites for this exercise?

real pebbleBOT Dec 5, 2024, 5:50 AM

#

real pebbleBOT Dec 6, 2024, 2:46 AM

#

real pebbleBOT Dec 6, 2024, 5:42 PM

#

real pebbleBOT Dec 7, 2024, 7:02 PM

#

real pebbleBOT Dec 8, 2024, 6:14 AM

#

real pebbleBOT Dec 8, 2024, 8:41 AM

#

real pebbleBOT Dec 8, 2024, 7:34 PM

#

real pebbleBOT Dec 9, 2024, 9:27 PM

#

real pebbleBOT Dec 10, 2024, 12:49 AM

#

real pebbleBOT Dec 10, 2024, 10:42 AM

#

twin ember Dec 10, 2024, 10:44 AM

#

Hello everyone, I am new to this competition and kaggle, and I made a LGBM model that had 80% accuracy, however, my public score is of 0.0000. why is this happening?

real pebbleBOT Dec 10, 2024, 3:58 PM

#

solid girder Dec 12, 2024, 1:04 PM

#

Hi! I'm a newbie with datascience. I'm looking to partner up with people to discuss and do datascience projects. I'm looking for people who are interested in understanding why something works the way it does, not just bumbling through to increase accuracy scores. I've finished the IBM Data Science course and now doing the Titanic project. Anyone interested to work with me?

real pebbleBOT Dec 12, 2024, 5:17 PM

#

dark stone Dec 14, 2024, 6:12 AM

#

solid girder Hi! I'm a newbie with datascience. I'm looking to partner up with people to disc...

i'm down!

solid girder Dec 16, 2024, 10:56 AM

#

dark stone i'm down!

I'll DM you.

tacit skiff Dec 16, 2024, 1:59 PM

#

Best tutorial for mastering this comp?

#

I know that 100% score is cheating

fresh spear Dec 16, 2024, 7:04 PM

#

Hey I’m a software engineer student and I want to improve my score
( 0.77751 ) , it’s for a class and I am only allowed to use logistic regression, anyone has suggestions to how to improve my feature engineering ? Maybe share what you did in your code? Thanks

#

https://www.kaggle.com/code/amitbarkama/mlparttwo

MLPartTwo

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

fresh spear Dec 17, 2024, 1:06 AM

#

i would like to know how can i do hyper parameter tuning correctly ?
and what should i use? Grid search / Bayesian / Hyperopt / Optuna?

safe orbit Dec 19, 2024, 8:54 PM

#

https://youtu.be/6IGx7ZZdS74

YouTube

Ryan & Matt Data Science

Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic)

Welcome to my data science journey through the Kaggle Titanic - Machine Learning from Disaster Project!

In this video, we'll dive deep into the world of data analysis, feature engineering, and machine learning to predict passenger survival rates on the Titanic.

As Kaggle states: "The competition is simple: use machine learning to create a mode...

▶ Play video

#

https://youtu.be/t-INgABWULw
https://youtu.be/LrCylIe0RJM

@fresh spear @tacit skiff

YouTube

Ryan & Matt Data Science

Mastering Hyperparameter Tuning with Optuna: Boost Your Machine Lea...

In this comprehensive tutorial, we delve deep into the world of hyperparameter tuning using Optuna, a powerful Python library for optimizing machine learning models. Whether you're a data scientist, machine learning enthusiast, or just looking to improve your model's performance, this video is packed with valuable insights and practical tips to ...

▶ Play video

YouTube

Ryan & Matt Data Science

Hands-On Hyperparameter Tuning with Scikit-Learn: Tips and Tricks

Welcome to our comprehensive guide on hyperparameter tuning with Scikit-Learn! 🚀

In this tutorial, we'll dive deep into the world of machine learning model optimization. If you're looking to take your data science skills to the next level and boost your model's performance, you're in the right place.

Interested in discussing a Data or AI proje...

▶ Play video

tacit skiff Dec 19, 2024, 8:55 PM

#

Lucky boy

#

I will watch them all

safe orbit Dec 19, 2024, 8:56 PM

#

@whole garden Start with this playlist, https://www.youtube.com/playlist?list=PLcQVY5V2UY4LNmObS0gqNVyNdVfXnHwu8

YouTube

Scikit-Learn Tutorials - Master Machine Learning

Scikit-Learn is a powerful and user-friendly machine learning library in Python that provides a wide array of tools for creating, training, and evaluating ma...

tacit skiff Dec 19, 2024, 8:56 PM

#

Without ad blocker

safe orbit Dec 19, 2024, 8:56 PM

#

thanks man

#

I'll have spaceship vid out in feb i think. Working on some new ml vids this month though

fresh spear Dec 20, 2024, 9:39 AM

#

is there a way to pass 0.8 with log reg?

#

0.787 is the best i achived

real pebbleBOT Dec 23, 2024, 10:37 AM

#

real pebbleBOT Dec 24, 2024, 8:41 AM

#

dark stone Dec 25, 2024, 2:51 AM

#

I made my first submission following the guide and I got a 0.7751. Is anyone else a beginner in the process of raising their score? I'm looking to collab :)

Titanic Tutorial

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

real pebbleBOT Dec 25, 2024, 7:11 AM

#

thorny flare Dec 26, 2024, 5:18 AM

#

solid girder Hi! I'm a newbie with datascience. I'm looking to partner up with people to disc...

hey i am also in a somewhat similar position can you add me too

real pebbleBOT Dec 27, 2024, 5:28 PM

#

real pebbleBOT Dec 27, 2024, 6:54 PM

#

real pebbleBOT Dec 29, 2024, 7:47 AM

#

real pebbleBOT Dec 29, 2024, 6:00 PM

#

real pebbleBOT Dec 30, 2024, 12:02 AM

#

real pebbleBOT Jan 1, 2025, 10:26 AM

#

remote quail Jan 2, 2025, 2:09 AM

#

hello, i am trying to save and it keeps saying failed, when i do a quick save it works and i can not submit to competition. please

real pebbleBOT Jan 2, 2025, 8:07 AM

#

real pebbleBOT Jan 2, 2025, 12:55 PM

#

fresh spear Jan 3, 2025, 2:11 PM

#

remote quail hello, i am trying to save and it keeps saying failed, when i do a quick save it...

you might have an error in the code

#

if you dont write into the csv you cant submit it

remote quail Jan 3, 2025, 2:12 PM

#

Hello Amit, I have managed to get it to submit, thanks for the response

fresh spear Jan 3, 2025, 2:12 PM

#

no problem

real pebbleBOT Jan 3, 2025, 9:53 PM

#

dark stone Jan 5, 2025, 6:25 AM

#

thorny flare hey i am also in a somewhat similar position can you add me too

i'm also still looking for people to discuss with!

eager cedar Jan 6, 2025, 11:55 AM

#

Hello, Do I just need to press the submit button or do I need to ask for some kind of permission to join this titanic thing?

neat bough Jan 8, 2025, 4:09 AM

#

dark stone i'm also still looking for people to discuss with!

me too!

neat bough Jan 8, 2025, 4:28 AM

#

Hey, I'm a beginner in this Titanic project. I followed Alexis Cook's Titanic Tutorial and submitted, but I am unsure how to improve/progress forward. Anyone have any advice on where to look next?

Titanic Tutorial

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

eager cedar Jan 10, 2025, 6:52 AM

#

try paid competitions

tribal pivot Jan 10, 2025, 11:36 PM

#

neat bough Hey, I'm a beginner in this Titanic project. I followed Alexis Cook's [Titanic T...

in the video they explained that looking at the data set and looking online can help you understand better what to look for

tribal pivot Jan 11, 2025, 12:01 AM

#

also looking at the submissions or forums on kaggle could help you out with different and unique ideas

tribal pivot Jan 11, 2025, 12:48 AM

#

This should also be a big help
https://www.kaggle.com/learn/intro-to-machine-learning

Learn Intro to Machine Learning Tutorials

Learn the core ideas in machine learning, and build your first models.

green harbor Jan 12, 2025, 9:42 AM

#

Hi, I am new to Machine Learning anyone can explain why we need to Normalize and standardize the data?

copper knot Jan 13, 2025, 8:28 AM

#

green harbor Hi, I am new to Machine Learning anyone can explain why we need to Normalize and...

source: https://www.geeksforgeeks.org/what-is-data-normalization/ Why do we need Data Normalization in Machine Learning?
There are several reasons for the need for data normalization as follows:

Normalisation is essential to machine learning for a number of reasons. Throughout the learning process, it guarantees that every feature contributes equally, preventing larger-magnitude features from overshadowing others.
It enables faster convergence of algorithms for optimisation, especially those that depend on gradient descent. Normalisation improves the performance of distance-based algorithms like k-Nearest Neighbours.
Normalisation improves overall performance by addressing model sensitivity problems in algorithms such as Support Vector Machines and Neural Networks.
Because it assumes uniform feature scales, it also supports the use of regularisation techniques like L1 and L2 regularisation.
In general, normalisation is necessary when working with attributes that have different scales; otherwise, the effectiveness of a significant attribute that is equally important (on a lower scale) could be diluted due to other attributes having values on a larger scale.

GeeksforGeeks

Data Normalization Machine Learning - GeeksforGeeks

A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

#

Starting with the Titanic survival prediction after singing up with Kaggle for about 2 years, better late than never.

full jewel Jan 13, 2025, 11:45 AM

#

Hi, just a quick question. I submitted some data sets, just to experiment. After looking at the leaderboard, I noticed that a lot of people have 100% accuracy. Is this even possible, or did they just use historical data to give the correct answer for each passenger?

copper knot Jan 13, 2025, 12:21 PM

#

full jewel Hi, just a quick question. I submitted some data sets, just to experiment. After...

That's what I would think happened

copper knot Jan 13, 2025, 12:23 PM

#

full jewel Hi, just a quick question. I submitted some data sets, just to experiment. After...

I just updated the basic gender/sex logistic regression, to familarize myself with the submission process and its and its .76555

full jewel Jan 13, 2025, 12:46 PM

#

OK, thank you very much, I was really confused about that.

hollow wren Jan 13, 2025, 9:32 PM

#

Titanic’s fate highlighted the flaw in this plan. 🚢

People say that the Titanic wasn't equipped with enough lifeboats to accommodate everyone in case it sank.

But the plan was, if we ran into trouble, other ships in the area would come to our aid, and we needed enough lifeboats to ferry people in shifts from our ship to the others.

This data not only reflects the social dynamics of the early 20th century but also serves as a reminder of the ongoing need to address inequalities in crisis situations. It's crucial for modern safety protocols to ensure fairness and prioritize human life regardless of socioeconomic status, gender or employment role.

What lessons do you think we can draw from the Titanic tragedy that are applicable to today's society? 👇

Original post: https://www.linkedin.com/posts/arshmankhalid_titanic-kaggle-survival-activity-7284676098452254721-GBWG?utm_source=share&utm_medium=member_desktop

regal lily Jan 16, 2025, 8:51 PM

#

I finished my homework! ✅
https://www.kaggle.com/code/alexandroskanakis/titanic-survived-classifier

Titanic Survived Classifier

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

fresh spear Jan 17, 2025, 10:58 PM

#

https://www.kaggle.com/code/amitbarkama/logistic-regression

Logistic Regression

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

#

https://www.kaggle.com/code/amitbarkama/knn-gnb-lda-pca

KNN , GNB, LDA, PCA

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

wicked abyss Jan 20, 2025, 1:22 AM

#

After about 25 submissions, I’ve finally landed on a notebook I’m happy with. It scored pretty well, and feels like a good stopping point as I move on to the next project. I’m still getting started with Kaggle competitions, so I’d love any feedback you might have—or if you find my approach useful or worthy of praise, an upvote would mean a lot!

Here’s the link to my notebook: https://www.kaggle.com/code/josephnehrenz/classification-titanic-random-forest-model-in-r

Thanks in advance, and good luck to everyone still working on the challenge!

Classification | Titanic Random Forest Model in R

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

pine sonnet Jan 20, 2025, 7:50 PM

#

wicked abyss After about 25 submissions, I’ve finally landed on a notebook I’m happy with. It...

As a complete n00b, I'm curious how your previous submissions scored, either in the competition or on your validation set. Most of the feature engineering things I've tried so far seemed to make things worse (using XGBoost as the model), so I'm curious how big of an impact your features made as you added them in later submissions

wicked abyss Jan 21, 2025, 6:05 AM

#

pine sonnet As a complete n00b, I'm curious how your previous submissions scored, either in ...

Hey @pine sonnet,

I totally understand your experience — I had something similar happen. I initially scored 79% with a fairly basic model, and when I started adding more features, my results actually got worse. At first, I thought I was on the right track, but it turned out I was running into issues with overfitting.

As I added more complex features and ramped up cross-validation and parameter tuning, my results really tanked. What I learned is that for this competition, the sweet spot seems to be finding a balance:

Adding some meaningful features to improve the model, but not going overboard.
Avoiding over-tuning to the training data so the model still generalizes well to unseen data.

It’s tempting to throw in every feature you can think of, but for this challenge, simplicity with a little refinement seems to work better than full complexity. An overtuned/overdone model can easily translate to 5%-10% prediction accuracy loss for this data.

distant linden Jan 23, 2025, 1:46 PM

#

wicked abyss After about 25 submissions, I’ve finally landed on a notebook I’m happy with. It...

Hello, I'm noob in machine learning and I try hard to understand all the mecanics. Your notebook is very helpful, I just have a thing that I don't understand : you explain a lot of statistics things in order to see the correlation of columns, to underline the link between the rate survive with gender, title etc. However, I don't understand when do you use your graph and calcul in your model to predict the test set.

simple egret Jan 23, 2025, 11:42 PM

#

Hello, I am newbie to kaggle. what are the next steps after titanic tutorial?

wicked abyss Jan 24, 2025, 3:48 AM

#

distant linden Hello, I'm noob in machine learning and I try hard to understand all the mecanic...

Hi @distant linden ,

Thank you so much for taking the time to check out my notebook! 😊 I’m really glad to hear that you found it helpful—your feedback means a lot. Regarding your question, you’re absolutely right that the graphs and calculations I included are primarily used to explore the relationships between features and the target variable, as well as to confirm the value of those features for prediction. For example, when we see a strong relationship between Sex and survival rate in the graphs, this insight suggests that Sex is an important feature. I take these insights and incorporate them by creating or refining features for the model. For instance:

Transforming features:
If we notice patterns in the Age distribution, we might group it into bins or take a log transformation.

Creating interaction terms:
If two variables (like Sex and Pclass) show combined effects, I might include an interaction term like Sex * Pclass.

In essence, the visualizations aren’t directly used in the model but guide which features I create or prioritize. They act as a bridge between data understanding and model performance. As you mentioned you're new to machine learning (welcome! 🎉), I hope this helps clarify things. If there’s a specific part you’d like me to expand on, let me know, and I’d be happy to help further.

P.S. If you found the notebook helpful, I’d really appreciate an upvote—working toward my first bronze medal has been quite the journey! 😊

simple egret Jan 25, 2025, 12:33 AM

#

Asking for feedback on my kernel
Hello, I have recently started on titanic challenge and I finished tutorial. Then, I tried doing some preprocessing to clean up and improve the accuracy. However, accuracy decreased. What is the problem with my code? I attached the link to my notebook.
https://www.kaggle.com/code/eidenspark/notebook6b8d8cd056
Thanks!

notebook6b8d8cd056

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

wicked abyss Jan 25, 2025, 12:55 AM

#

Hi @simple egret, nice work and welcome to kaggle! I'm fairly new myself but think I can help you out quite a bit. There's a lot I can share, it may be better for you to check out my notebook linked about 5 comments up:

https://www.kaggle.com/code/josephnehrenz/classification-titanic-random-forest-model-in-r>

It's written in R but super easy to follow, with plenty of documentation explaining the process, I think if you spend a few minutes reviewing it, you'll be able to make a number of connections and know what to do going forward.

High level, you can definitely do more than cut missing values and run the model! I'd suggest starting by imputing the missing values—there are plenty of methods for that, so it's worth exploring. After that, dive into the data and try creating some interesting features that might boost the model's predictive power.

Hope this helps get you started! Feel free to ask if you have any questions, and if you find it helpful, I'd really appreciate an upvote as I'm questing for my first bronze medal! Good luck! 🚀

pine sonnet Feb 3, 2025, 6:15 PM

#

wicked abyss Hey <@1221029413156818946>, I totally understand your experience — I had someth...

I think part of what I'm wondering is whether seeing no improvement on an individual feature is a sign that that feature shouldn't be included, or whether it's the kind of thing where you only see the benefit when you've added everything in together. Looking at what I was doing, it looks like I had added a feature equivalent to your FarePP, but I didn't see any benefit to adding it, I got exactly the same accuracy on a validation set with vs. without. I'm not sure whether that means I should leave it out, or whether it might matter in interaction with something else. If you drop FarePP from your model, does it get worse?

wicked abyss Feb 4, 2025, 4:54 AM

#

Hey @pine sonnet, you've definitely hit on the heart of the modeling process here! Deciding when to add, drop, or create interactions between features is often the trickiest part. There aren't clear-cut rules, it's more about experimenting, asking the right questions (which you are doing right now), and iterating based on what the model is telling you. You're right, sometimes a feature doesn't show an immediate benefit on its own, but it could have an impact in combination with other features.

As for FarePP or any feature, I wouldn't drop it just yet based on a single validation result. A lot of times, the true value of a feature emerges only after a few changes have been made elsewhere in the model. This is why you see such a variety of approaches across the 15k+ entrants. It truly is all about finding your personal model "touch." Sometimes it goes great and sometimes not so much.

Personally, I try to limit my changes to one or two things at a time, so I can trace back any performance dips and better understand what went wrong. I also like to document those updates in the comment section when submitting results, so if a change hurts the score, I can easily roll it back and try something else. It's normal to submit many, many files for any given competition. For example, I'm already into the teens for the 20s for the new Playground competition for February that opened a few days ago.

So keep experimenting and trust the process! There’s no one-size-fits-all solution here, and it’s all about finding that balance.

potent smelt Feb 18, 2025, 9:57 AM

#

Hey all!
While going through the titanic train data i noticed, that there are a couple of people who are way too old for their age to be true. Mr. Patrick Connors for example was supposedly 705 yrs old. How do I deal with such an obviously false dataset? Do I exclude these extreme ages? In this case I can also change it by hand because you can find the true age online. But should I expect that the data is false in more cases, which I didnt find yet? Finding wrong information in the more "normal" looking entrances seems to be difficult.
thanks for any answer.
cheers Laurin

viscid thorn Feb 18, 2025, 2:51 PM

#

potent smelt Hey all! While going through the titanic train data i noticed, that there are a ...

Treat them like outliers.

mortal abyss Feb 18, 2025, 9:21 PM

#

Hi all, I am starting my Kaggle journey with this Getting Started Competition.

Tonight will be all about setting up my dev environment and creating a first benchmark submission without any data preprocessing/cleaning to start from.

What are some of yours methods of operation when starting a competition? Do you submit a first benchmark or do most of you try to squeeze out as much accuracy as possible from the first try?

Have a good evening everyone!

dim ledge Feb 21, 2025, 1:09 PM

#

Because I am coming back to ML from a hiatus, I used my first attempt to try my best from memory alone as a recall exercise. Then I started to take ideas from forums like this and make improvements from there.

viral oyster Feb 21, 2025, 1:19 PM

#

Ready to open my first notebook

real pebbleBOT Feb 28, 2025, 12:16 PM

#

steady dawn Mar 2, 2025, 10:37 PM

#

Hi all, I was wondering if someone would be open to helping me with the titanic tutorial? I entered the two lines of code into the second code cell (copy+paste). The data is coming out but not in a table format, is that ok? I also don't see a third code cell.

hybrid eagle Mar 3, 2025, 12:27 AM

#

Hi everyone! I just completed my first Kaggle submission for the Titanic competition. I’d love your feedback on my notebook: https://www.kaggle.com/code/amrkabbary/titanic-survival-prediction-a-beginner-s-guide. Any tips or suggestions for improvement are welcome!

cold ravine Mar 4, 2025, 7:01 AM

#

i've submitted my first notebook as well

median venture Mar 9, 2025, 12:47 PM

#

On the tutorial, my random forest model code is continuously running

wary willow Mar 13, 2025, 8:48 PM

#

I am starting this project and almost through the tutorial, does anyone want to partner up with me on this? Looking to collaborate

visual plank Mar 15, 2025, 11:22 AM

#

wary willow I am starting this project and almost through the tutorial, does anyone want to ...

Hello I would love to do that! You have contacts?

#

Hello everybody! I am a complete newbie to Kaggle and starting to get my hands on the Titanic competition. Before starting, I realized there are several modeling approaches and models to use before submitting. I would like to use the best models to get at least 0.8 score. Is there any notebooks and links I can look at to learn?

wary willow Mar 15, 2025, 8:54 PM

#

visual plank Hello I would love to do that! You have contacts?

I just sent you a direct message

scenic kayak Mar 16, 2025, 4:42 PM

#

wary willow I just sent you a direct message

Send me too the resources. I am new to Kaggle

uncut fox Mar 20, 2025, 1:51 AM

#

Hello, I'm a newbie. Anyone want to learn together?

eager cedar Mar 21, 2025, 10:44 AM

#

Yo what's the highest possible ceiling for titanic dataset without cheating?

digital nacelle Mar 21, 2025, 12:29 PM

#

Hello, I'm a beginner, and when I tried to solve this problem, I faced some issues with my evaluation process.

I used XGBoost Classifier to predict, and applied 5-fold cross-validation to evaluate my results. In my cross-validation, 8 out of 10 folds achieved an accuracy higher than 80%, while the remaining folds had around 75%. However, when I submitted my predictions, my score dropped significantly to around 70%, which was much worse than expected.

Could anyone give me some advice on how to improve my test set accuracy? Thank you in advance!

eager cedar Mar 21, 2025, 11:39 PM

#

send something like this

#

it's hard to give advices based on your words alone since we havent see any kind of data

#

digital nacelle Mar 22, 2025, 10:47 AM

#

eager cedar

I've fixed it. I think my mistake was using cut and qcut separately for the train and test sets.

eager cedar Mar 22, 2025, 11:36 AM

#

great!

blazing yew Mar 22, 2025, 4:48 PM

#

Hi to everyone! I have seen many people with score 1.00 is it really possible to predict 100% of passajers?

eager cedar Mar 25, 2025, 12:46 AM

#

Nope, they probably downloaded the ground truths, there are guides on how to do that in kaggle, either way Titanic is not about getting 1s or high score, it's just something you do to warm up

signal geyser Mar 25, 2025, 3:01 AM

#

Hello everyone,

I attempted the titanic survival challenge in kaggle. I was hoping to get some feedback regarding my approach. I'll summarize my workflow:

Performed exploratory data analysis, heatmaps, analyzed the distribution of numeric features (addressed skewed data using log transform and handled multimodal distributions using combined rbf_kernels)
Created pipelines for data preprocessing like imputing, scaling for both categorical and numerical features.
Creating svm classifier and random forest classifier pipelines
Test metrics used was accuracy, precision, recall, roc aoc score
Performed random search hyperparameter tuning
Cross validation score of svm was slightly higher than random forest
Testing score of random forest was 0.78229
Testing score of svm was 0.53588

I think some flaws in my notebook are not performing feature extraction, feature selection and missing outlier analysis. I would appreciate any feedback provided. I really want to improve and perform better in the coming competitions.

link to my kaggle notebook:https://www.kaggle.com/code/jayasuryanmutyala/titanic-survival/notebook

Thanks in advance!

titanic_survival

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

strong hornet Mar 26, 2025, 8:44 AM

#

Hi there, is that possible to make team in titanic competition?

wary heron Mar 26, 2025, 10:56 AM

#

Impossible

surreal gyro Mar 26, 2025, 12:25 PM

#

If I get good in this compatitoan wiill I get nobal prize?

wary heron Mar 26, 2025, 12:51 PM

#

surreal gyro If I get good in this compatitoan wiill I get nobal prize?

Nope though u will get typo master prize

surreal gyro Mar 26, 2025, 12:52 PM

#

wary heron Nope though u will get typo master prize

is that prastigous?

eager cedar Mar 29, 2025, 3:42 AM

#

signal geyser Hello everyone, I attempted the titanic survival challenge in kaggle. I was hop...

You imputed using median on numerical pipeline, this applies to age since it's the only feature that has missing values, have you checked the graph? see what happens if you dump all those 177 missing values on median.

You applied log transform, did you check whether it actually fixed skewness?

You applied Standard scaler which uses Z score and assumes normal distribution, did you check whether your distributions are normal or even close to normal?

Pipelines from my understanding is made for automating data preprocessing that expect data inputs at certain frequencies with the same features and the transformations are decided after carefully analyzing and taking considering on how to handle each features properly, What I'm trying to say is that for projects like these pipelines aren't necessary, it is used in the industry to automate cleaning process so that people wont have to every time a new set of dataset are produced. I always see people use it the wrong way and just plug in simple imputer and stuffs to clean the data to make the cleaning process instant, but I think this is very wrong as every features are handled differently based on the data analysis, but I may be wrong on my understanding 🙂

My feedback is use formula's and technique with intent, spend more time analyzing your data and know when and when not to use statistical treatments

#

I upvote tho for support 🙂

signal geyser Mar 29, 2025, 5:42 AM

#

eager cedar You imputed using median on numerical pipeline, this applies to age since it's t...

Yes thank you. I understood your points. Also when I applied the log transform it did not make too much difference for some features If i remember correctly because out of the all numerical features only one looked skewed it fixed that feature but others looked very much multimodal . I still don't really have a solid idea on how to properly address multimodal distributions I'll read some articles online and try it again. Also I don't know how to upvote. Thanks again for the feedback.

eager cedar Mar 29, 2025, 5:52 AM

#

you are using tree based model and an SVM with RBF Kernel so multimodal wont matter, but skewness does a bit. it's not a problem for tree but when you have something extreme like the ones in Fare when most of your values are around less than 100 and then you got like 2 data that has 512 fares it MIGHT become a problem, so you might wanna check whether those extreme values affect the model you are using or not

#

so one solid piece of advice that I apply to myself too is to always ask myself 'why', 'why do I need this', 'why do I do that', 'why is this neccessary', 'why do I choose this instead of that model' etc

signal geyser Mar 29, 2025, 5:59 AM

#

eager cedar you are using tree based model and an SVM with RBF Kernel so multimodal wont mat...

I will definitely do it the next time

eager cedar Mar 29, 2025, 5:59 AM

#

NO, not "next time" bro, always haha

signal geyser Mar 29, 2025, 6:00 AM

#

I build a very simple solution based on minimal knowledge about the models when I worked on this

signal geyser Mar 29, 2025, 6:00 AM

#

eager cedar NO, not "next time" bro, always haha

Yup definitely

eager cedar Mar 29, 2025, 6:00 AM

#

it's a nice notebook, definitely has stuffs to improved upon but it has almost everything from start to finish

signal geyser Mar 29, 2025, 6:03 AM

#

eager cedar it's a nice notebook, definitely has stuffs to improved upon but it has almost e...

I try implementing everything that I learn from Hands On Machine Learning Book by Aurélien Géron in kaggle challenges. Its still a work on progress haha. I recently finished learning svm so I will definitely do better.

#

Also do you have any advice for choosing a deep learning framework ? @eager cedar

#

I have some experience with pytorch before working on some simple projects

eager cedar Mar 29, 2025, 6:07 AM

#

Nah, treat all of these framework as a tool, everything is just applied mathematics and some frameworks might have stuffs you need that others dont have, you can switch back and forth between these frameworks

signal geyser Mar 29, 2025, 6:07 AM

#

eager cedar Nah, treat all of these framework as a tool, everything is just applied mathemat...

I agree 100 percent

eager cedar Mar 29, 2025, 6:09 AM

#

that is one important thing too, basically AI is just applied math since most of the coding has dedicated libraries and frameworks it solve most of your coding stuff but the math behind these you still need to perfectly grasp

signal geyser Mar 29, 2025, 6:09 AM

#

I always had it confusing which one to just follow since I see pytorch has been gaining a lot of popularity over the last few years but the book implements everything in tensorflow. I'm thinking of just following the book and picking up pytorch again

wary heron Mar 29, 2025, 6:11 AM

#

signal geyser I always had it confusing which one to just follow since I see pytorch has been ...

Pick pytorch asap

signal geyser Mar 29, 2025, 6:11 AM

#

eager cedar that is one important thing too, basically AI is just applied math since most of...

Yeah very true I feel the mathematical intuition behind the models is really important and gives a lot of knowledge behind the scenes for the model

eager cedar Mar 29, 2025, 6:11 AM

#

well tensorflow migh be a bit complex to look at while pytorch is easier in the eyes either way both have uses

signal geyser Mar 29, 2025, 6:12 AM

#

Do you recommend any books or courses I can follow for pytorch ?

eager cedar Mar 29, 2025, 6:12 AM

#

what can I say, choose what works best for you

#

hmm I dont know about books but I know some videos in youtube

signal geyser Mar 29, 2025, 6:14 AM

#

eager cedar what can I say, choose what works best for you

I have some experience prior with working on pytorch but a very foundational level

signal geyser Mar 29, 2025, 6:14 AM

#

wary heron Pick pytorch asap

Will do that.

eager cedar Mar 29, 2025, 6:15 AM

#

I grinded this 25 hrs video like a year ago
https://www.youtube.com/watch?v=V_xro1bcAuA

but most of my skills came from actually trying to use it

YouTube

freeCodeCamp.org

PyTorch for Deep Learning & Machine Learning – Full Course

Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.

✏️ Daniel Bourke developed this course. Check out his channel: https://www.youtube.com/channel/UCr8O8l5cCX85Oem1d18EezQ

🔗 Code: https://github.com/mrdbourke/pytorch-deep-learning
🔗 Ask a question: htt...

▶ Play video

signal geyser Mar 29, 2025, 6:15 AM

#

eager cedar I grinded this 25 hrs video like a year ago https://www.youtube.com/watch?v=V_xr...

I followed the same video haha when I learned it too

#

I don't think he covered nlp though

eager cedar Mar 29, 2025, 6:16 AM

#

I have 0 patience so I just take up what I know and build something and just comeback to some videos when Im totally lost

#

one of the things I wish I did back when still learn was, I should have specialized on 1 thing and focused on it instead of trying everything haha

I did Computer Vision, NLPs among other bunch of stuffs

signal geyser Mar 29, 2025, 6:18 AM

#

eager cedar I have 0 patience so I just take up what I know and build something and just com...

Thats good approach tbh

eager cedar Mar 29, 2025, 6:18 AM

#

but everything worked out in the end, I just had to deepen each of these specific areas

signal geyser Mar 29, 2025, 6:20 AM

#

I'll try revisiting my basics in pytorch and work on some simple projects to deepen my understanding as well

tiny jungleBOT Mar 29, 2025, 6:20 AM

#

rapid.roll.off has been warned

Reason: Bad word usage

eager cedar Mar 29, 2025, 6:21 AM

#

whaaat

signal geyser Mar 29, 2025, 6:21 AM

#

eager cedar but everything worked out in the end, I just had to deepen each of these specifi...

Thanks again bro I really appreciate it for the advice and guidance. I'll definitely improve in next attempts in kaggle

eager cedar Mar 29, 2025, 6:21 AM

#

signal geyser Thanks again bro I really appreciate it for the advice and guidance. I'll defini...

No problem

spice ledge Apr 2, 2025, 12:52 PM

#

https://www.kaggle.com/competitions/titanic/discussion/571172
anyone else who check my question?

Titanic - Machine Learning from Disaster

Start here! Predict survival on the Titanic and get familiar with ML basics

limpid rune Apr 3, 2025, 8:00 AM

#

👍

sage bridge Apr 14, 2025, 10:44 PM

#

wassup everyone

hollow widget Apr 19, 2025, 4:47 PM

#

I don’t know why, but I can’t seem to boost my score. Any tips from you all?

empty patio Apr 20, 2025, 12:49 AM

#

Hello everybody, this is my first time being on Kaggle, what skills do I need know before tackling Titanic?

eager cedar Apr 20, 2025, 3:03 AM

#

hollow widget I don’t know why, but I can’t seem to boost my score. Any tips from you all?

can you give more details

eager cedar Apr 20, 2025, 3:04 AM

#

empty patio Hello everybody, this is my first time being on Kaggle, what skills do I need kn...

Data Analysis, Statistics, Machine Learning

hollow widget Apr 20, 2025, 4:24 AM

#

eager cedar can you give more details

I tried to tweak my model for the Titanic competition a bunch of times, but I kept hitting the same score.

eager cedar Apr 20, 2025, 5:16 AM

#

what is your common score can you tell me the steps you did before 'tweaking' your model

hollow widget Apr 20, 2025, 6:30 AM

#

My common score is 0.69, and I followed the common steps before modifying my model, such as splitting the data, handling NaNs, and some other tasks. I obtained the same score after using XGBClassifier instead of RandomForestClassifier, although I didn't use early_stopping_rounds.

eager cedar Apr 20, 2025, 7:28 AM

#

You see my friend 90% of your score will come from Data Analysis and the remaining 10% is handled by hyperparameter tuning, while I can't say for sure how effective you handled the preliminary steps these are the checklist you can use for self check:

Data Cleaning ->
Did you properly handled missing data and/or Duplicates? What I mean by "properly handled" is you didn't just fill the missing blindly, The strategy used is guided by analysis and statistics like for example:

Filling missing values in Age ->
did you just fill it with median/mode? did you check it's distribution before and after filling?

Did you made a smart imputation by analyzing the data -> extracting titles from names and see how this titles corresponds to specific age bracket or did you try to impute the missing values with machine learning instead? did it's distribution make sense after imputation?

#

EDA ->
Did you analyze what factors contribute to your target? or did you just fill the missing blanks and then feed it to the model hopefully it magically output high score?

What are the major driving features for survival? what contributes to it? what features does not contribute to it? did you test the hypothesis using statistics? what are the result?

Feature Engineering and Selection ->
How does the feature correlate to the target and to each other? can we make a new feature to better capture a pattern? did we select the best set of features according to our goal? are the features in the format and is processed in a way that passes the assumption of the models we are gonna use?

Model Training ->
Let's run a baseline, check feature importance and see what features contribute less, more and did not contribute at all, from our metrics result does it reflect our expected outcome? what can we improve? does our model overfit or underfit etc.

Hyperparameter tuning ->
Base on the result of the baseline model analysis, do we need regularization? maybe we need more trees or limit max depth... Let's use bayesian optimization and see what range of parameters are the best etc etc

There's so much that is going on other than filling the blanks and running the data into a machine learning model. this is just a summary of all the things you can do believe me there are tons of stuffs that can guide you in building the proper model and the data needed for it.

hollow widget Apr 20, 2025, 8:48 AM

#

I sincerely appreciate your help. I attempted some modifications and obtained an accuracy of 79% in the spaceship-titanic competition.

green shale Apr 22, 2025, 9:50 AM

#

heyyy https://www.kaggle.com/code/lakshay5312/titanic-eda/notebook can anybody checkout my notebook and tell me what part have i missed, this is my first ml project and i tried to learn all the steps of preproccessing with it .

Titanic EDA

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

#

my accuracy resulted in 72% approx but why

vestal island Apr 26, 2025, 8:41 PM

#

All I did was copy/paste what was instructed but got this. Any ideas on why?

deft mural Apr 30, 2025, 10:49 PM

#

vestal island All I did was copy/paste what was instructed but got this. Any ideas on why?

The test_data variable needs to be defined first, it seems that train_data was defined so see where it is and do the same for test_data

ocean pivot May 2, 2025, 10:10 AM

#

does anyone know what is gender submission dataset ???

charred tinsel May 2, 2025, 2:49 PM

#

ocean pivot does anyone know what is gender submission dataset ???

That is a baseline model prediction which is based on only 'Sex' attribute of the dataset, which predicts that all males die and all females survive. This achieves ~0.786CV and 0.76555LB... Gender is the most important attribute along with a few others.

waxen sage May 3, 2025, 2:28 AM

#

Anybody else tuning a RandomForestClassifier atm? Just looking for someone to DM / bounce ideas/questions off of as I try to increase my score. Currently at 83 using StratifiedKFold / cross_val_score but I've been observing my submission test scores being consistently 3% less -- Is this normal or a sign I'm overfitting?

sonic nacelle May 3, 2025, 3:20 AM

#

@everyone Question for the Group - I have been working through a Data Science bootcamp and wanted to keep sharpening my skills. I found this contest and figured I would give it a try...however, I am a little intimidated by the fact I am still fairly new to this Data Science arena. Does anyone have any thoughts on this?

waxen sage May 3, 2025, 3:25 AM

#

sonic nacelle @everyone Question for the Group - I have been working through a Data Science ...

I'm new as well George -- Dive in!

charred tinsel May 3, 2025, 5:27 AM

#

waxen sage Anybody else tuning a RandomForestClassifier atm? Just looking for someone to D...

Yeah. I think that is normal to have that difference in scores when you get around 80% accuracy. Its pretty difficult to get beyond that score with a plain model.

ocean pivot May 3, 2025, 5:27 AM

#

do you know which data is compared with our predicted data?

charred tinsel May 3, 2025, 5:29 AM

#

ocean pivot do you know which data is compared with our predicted data?

Test labels... which should not be revealed

ocean pivot May 3, 2025, 5:30 AM

#

the same test labels which are given in the test.csv right?

charred tinsel May 3, 2025, 5:31 AM

#

ocean pivot the same test labels which are given in the test.csv right?

No, test.csv has all the features used for prediction but it doesn't have the column 'Survived' which are the labels to compute your score

ocean pivot May 3, 2025, 5:32 AM

#

I mean that when we submit our predictions on kaggle how do they measure accuracy there has to be some reference

charred tinsel May 3, 2025, 5:36 AM

#

ocean pivot I mean that when we submit our predictions on kaggle how do they measure accurac...

Yes. The test.csv, if you notice doesn't have the column 'Survived'... You have to predict the value of this column for test set, kaggle has that actual values but it won't be revealed

ocean pivot May 3, 2025, 5:40 AM

#

Thanks actually I got confused with that gender submission dataset

#

I got accuracy of 76 any suggestion how can I improve?

charred tinsel May 3, 2025, 5:51 AM

#

ocean pivot I got accuracy of 76 any suggestion how can I improve?

Start with basic baselines. Try to find which are the important features, how some features interact and try to improve over your baseline

jolly matrix May 5, 2025, 8:08 AM

#

Hello everyone. shorlty system counted 0.78468 but the cross_val_score(cv=342) function says 0.82115 is this even legal? how is it counted? ma work:
https://www.kaggle.com/code/leleleonid/titanic-data-type-optimization-randomforest

Titanic: Data Type Optimization & RandomForest

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

charred tinsel May 5, 2025, 4:59 PM

#

jolly matrix Hello everyone. shorlty system counted 0.78468 but the cross_val_score(cv=342) f...

For a small dataset like this, this difference might be normal. Although possibly you can reduce that difference, but to increase the score you need to engineer better features.

waxen sage May 7, 2025, 10:33 PM

#

jolly matrix Hello everyone. shorlty system counted 0.78468 but the cross_val_score(cv=342) f...

@jolly matrix: To add to @charred tinsel 's advice, one thing that might also help is accounting for the distribution shift between training and test sets. Have you looked into this? FYI I'm working through this now so DM-me if interested in talking out ideas and sharing observations. ✊

sturdy sinew May 13, 2025, 1:42 PM

#

sonic nacelle @everyone Question for the Group - I have been working through a Data Science ...

I am new as well. Let's dive in and have some fun!

frosty hamlet May 14, 2025, 9:57 AM

#

Hello guys I'm new

Notice me 😭

#

By the way guys I have a question

I just got started with the titanic prediction.

Wanted to find out how y'all dealt with missing data especially in the AGE column.

Should I use the age mean to fill the missing part?

The missing values is quite a lot tho

frosty hamlet May 14, 2025, 10:32 AM

#

eager cedar You see my friend 90% of your score will come from Data Analysis and the remaini...

Very helpful thanks a lot

timid magnet May 14, 2025, 11:53 AM

#

frosty hamlet By the way guys I have a question I just got started with the titanic predict...

Just visualize the data, it will be cleared

frosty hamlet May 14, 2025, 3:58 PM

#

timid magnet Just visualize the data, it will be cleared

Okay I should visualise before filling missing spaces?

I'll do that thanks

frosty hamlet May 16, 2025, 1:35 PM

#

eager cedar You imputed using median on numerical pipeline, this applies to age since it's t...

After filling the 177 missing values in the age column with median the skewness actually increased " (0.51)" compared to the way it was "(0.36)" when there were still missing values

I had hoped that filling the missing values with median would have given a distribution closer to normal based on logic/common sense

Log transformation and square root transformation didn't help matters.

Planning to try other methods I find on the net but what would you suggest?
I hope I'm not the only one getting such issues tho

charred tinsel May 16, 2025, 4:11 PM

#

frosty hamlet After filling the 177 missing values in the age column with median the skewness ...

Imputing with a constant value will generally disturb the distribution of that feature if there are many missing values, as frequency will peak at that imputed value.. this may or may not affect the model you use (depends). To preserve closely the original distribution, you may want to use a better strategy, like you can try to use other features to predict Age.
But if the feature is not so important, you may be just wasting some of your time..

frosty hamlet May 16, 2025, 6:56 PM

#

charred tinsel Imputing with a constant value will generally disturb the distribution of that f...

I see , thanks for the quick response 🥹

Just for clarity,

you're suggesting I focus rather on selecting an effective predicting model to use that won't have issues with skewed distribution rather than focusing on normalising the age distribution right?

charred tinsel May 16, 2025, 8:49 PM

#

frosty hamlet I see , thanks for the quick response 🥹 Just for clarity, you're suggesting...

Maybe you can do that.. but on this particular dataset, i would say exploring and understanding the data in general is much more important than anything.

frosty hamlet May 17, 2025, 12:23 PM

#

charred tinsel Maybe you can do that.. but on this particular dataset, i would say exploring an...

I see

Thanks for the insights 🫶

waxen sage May 17, 2025, 11:07 PM

#

waxen sage May 17, 2025, 11:08 PM

#

waxen sage <@615551978105208838>: To add to <@1065542706417377281> 's advice, one thing th...

I may need to cancel my advice but want to hear others' opinions 🙏

charred tinsel May 18, 2025, 3:52 AM

#

waxen sage I may need to cancel my advice but want to hear others' opinions 🙏

Hey btw, what's the progress? You were working on that.. I was assuming both have very similar distributions.

waxen sage May 18, 2025, 7:42 AM

#

charred tinsel Hey btw, what's the progress? You were working on that.. I was assuming both hav...

I had discovered back on the 7th that distribution for some variables was different between training and test sets and had been accounting for it since then, until I realized today that accounting for distribution shift between training and test sets is a type of data leakage (IMO) because per competition rules the test set is supposed to be "unseen" data. I've begun adjusting my analysis to assess distribution shift between training/test folds of training set data only.

charred tinsel May 18, 2025, 8:52 AM

#

waxen sage I had discovered back on the 7th that distribution for some variables was differ...

Ohk

#

But until you are not really peeking too much into the testing data, and are using standard techniques for correcting distribution shift using only information from training set, i consider there won't be any data leakage.

thick pagoda May 18, 2025, 6:17 PM

#

eager cedar

what did this end up scoring after submission - curious seeing these nice numbers I'm at 0.79665 on my 3rd submission (my F1 is always close to my score so far going from 0.77 on my first try).

thick pagoda May 19, 2025, 11:18 AM

#

waxen sage

If you actually peek at the submission distribution (i.e. use the unseen test‐set statistics) and bake that back into your training logic or final thresholds, you’ve leaked information from the test set.

waxen sage May 19, 2025, 2:17 PM

#

thick pagoda If you actually peek at the submission distribution (i.e. use the unseen test‐se...

Thanks for confirming @thick pagoda - I came to the same conclusion! Better late than never 🤦‍♂️

thick pagoda May 19, 2025, 11:03 PM

#

Ofc np I'm new to this stuff but I instantly presumed most of the 1.0 (100%) models exploit this and its not the spirit of the exercise!
How did your models do? @waxen sage

waxen sage May 19, 2025, 11:20 PM

#

thick pagoda Ofc np I'm new to this stuff but I instantly presumed most of the 1.0 (100%) mod...

I'm new as well! I love this comp - perfect way to dev skills. Revisions so far have clocked in between 75-78%; I got lucky with one 79% submission early on but haven't been able to reproduce and was really just throwing darts blind at the time. I'm currently working through a revision and hoping for some gains this week! 🤞

thick pagoda May 20, 2025, 12:04 AM

#

That's really good i think, I'm at 79.6% - and now problem is I saw 'the answer' to get "more" isn't anything code or model specific so I'll stop there for now, if the models generalizing at that range its a good model imo 🤷‍♂️

#

its a good lesson in feature engineering above everything, I went in with lots of fancy code and although I did a pretty decent EDA phase, played with UMAP I missed some key connections in the data, to get those 80+ scores... good luck!

stiff marsh May 21, 2025, 6:08 PM

#

frosty hamlet After filling the 177 missing values in the age column with median the skewness ...

"I had hoped that filling the missing values with the median would result in a distribution closer to normal, based on logic and common sense."

This is generally true for datasets or features with less than 5% missing values.

However, in the case of the age column, missing values account for about 20% (train dataset only).

Anyway, they say a picture is worth a thousand words, so I did some visualizations to help you understand better:

What happens when you use the median to impute the 177 missing values is that all those values are dumped at a single point, which greatly distorts the distribution. The main goal of imputation is to fill missing values in a way that resembles the original distribution."

#

this is my second discord btw, my first account got hacked, anyway the score with using only gender as feature in this problem scores 76 percent you can use that as comparison with your current model and the highest legit scored I see so far without cheating is probably from cdeotte - 84%

frosty hamlet May 21, 2025, 6:20 PM

#

stiff marsh "I had hoped that filling the missing values with the median would result in a d...

Thanks a lot for the response Nixon

Do you feel using the median to fill 177 missing values is the wrong approach in this case since it's far away from resembling the original distribution.

So maybe probably in cases like this I engage in smart imputation like you showed in the images you sent

Were I feel the missing values with scores within the median range instead filling solely with median

frosty hamlet May 21, 2025, 6:20 PM

#

stiff marsh this is my second discord btw, my first account got hacked, anyway the score wit...

So sorry about that

frosty hamlet May 21, 2025, 6:22 PM

#

stiff marsh this is my second discord btw, my first account got hacked, anyway the score wit...

Hmmm okay I see

Still working on mine, will compare when done

Thanks

stiff marsh May 21, 2025, 6:30 PM

#

frosty hamlet Thanks a lot for the response Nixon Do you feel using the median to fill 177 mi...

Yes you will always decide how you impute your missing values based on the insight you gain after analyzing the data and not resort automatically to simple imputations unless the missing data is small enough that it wont distort the distribution

frosty hamlet May 21, 2025, 6:43 PM

#

stiff marsh Yes you will always decide how you impute your missing values based on the insig...

I see

Now I know my mistake. Thanks for the tip Nixon I'm super grateful 🫶😊

stiff marsh May 21, 2025, 6:49 PM

#

waxen sage

No. By "accounting", I'm assuming you are referring to covariance shift, checking if there is a difference in "feature " distributions is okay.

stiff marsh May 21, 2025, 6:49 PM

#

frosty hamlet I see Now I know my mistake. Thanks for the tip Nixon I'm super grateful 🫶😊

You're welcome 🫡

charred tinsel May 21, 2025, 8:38 PM

#

stiff marsh this is my second discord btw, my first account got hacked, anyway the score wit...

After the leaderboard changed to 100% some yrs ago, the scores dropped.. now the highest score achievable would be around 83% i believe.

#

I am currently at 81.8%

timid magnet May 22, 2025, 1:39 PM

#

stiff marsh "I had hoped that filling the missing values with the median would result in a d...

Okay so you mean rather than blindly imputing median to missing values, we should analyze and visualize the data and then impute missing values

#

So if I have outliers in my data should I blindly go for median or like what?

charred tinsel May 22, 2025, 2:53 PM

#

timid magnet Okay so you mean rather than blindly imputing median to missing values, we shoul...

Try to understand the data in general... how that feature with missing values correlates with other features, or how the missingness of the feature may be correlated to another feature (this would help you to streamline your strategy).
Sometimes imputing with median may be very much sufficient, sometimes it may be the only possible best option and other times may even be worse. That is only possible to find out when you'll understand a data.

frosty hamlet May 23, 2025, 10:38 AM

#

charred tinsel I am currently at 81.8%

Impressiveee 🎉

waxen sage May 31, 2025, 11:07 PM

#

waxen sage

poll_question_text

For this competition, is accounting for distribution shift between training and test (submission) data sets an example of data leakage?

victor_answer_votes

2

total_votes

4

sullen torrent Jun 1, 2025, 4:30 PM

#

Hi chat, Im a new learner on kaggle and im trying to make a notebook submission for the titanic survivor prediction competition. But even though my output file is created and visible, the competion wont accept the notebook when I click "Create Submission"
any idea why? I can send screenshots if necessary

#

charred tinsel Jun 3, 2025, 3:59 PM

#

Hey everyone
Here is how I achieved 82.5% accuracy on test set. I tried to apply the most of what I've learned through my own experimentations, along with insights I gained from public notebooks/discussions.

https://www.kaggle.com/code/a00000100/titanic-machine-learning-from-disaster

Do check it out.

Titanic - Machine Learning from Disaster

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

waxen sage Jun 5, 2025, 2:32 PM

#

sullen torrent Hi chat, Im a new learner on kaggle and im trying to make a notebook submission ...

I encountered this issue when my notebook had an error and wasn't able to execute completely. Have you confirmed no errors are visible via the Kaggle UI when scrolling through your notebook?

wanton quest Jun 6, 2025, 2:32 PM

#

Anyone get score 0.8+?

raw sierra Jun 6, 2025, 3:15 PM

#

wanton quest Anyone get score 0.8+?

i got .796 :(

#

using rf

radiant isle Jun 7, 2025, 4:37 AM

#

charred tinsel Hey everyone Here is how I achieved **82.5%** accuracy on test set. I tried to a...

what is the submission accuracy u got?

charred tinsel Jun 7, 2025, 5:25 AM

#

Submission accuracy is 0.8253.
CV was ~0.85

pure lark Jun 12, 2025, 12:38 PM

#

wanton quest Anyone get score 0.8+?

with logistic regressor the best I could get was: 0.77272

abstract anchor Jun 13, 2025, 2:58 AM

#

what did yall do with cabin column? did you guys just drop it ?

#

i dont even see any relevance of cabin feature for our target. should i drop it or what should i do ?

pure lark Jun 13, 2025, 5:33 AM

#

abstract anchor what did yall do with cabin column? did you guys just drop it ?

No the cabin is actually useful. Most of the cabin data has missing values. But if you look thru the data the cabins that have values have prefixes to them like C85, E46 etc. The C, E represent the decks. The life boats were kept above the higher decks, i.e. closest to A, B,C so the people living in A, B,C decks had more chances of survival than the E,F,G ones and of course more chances than people who didn't have cabins (who lived in dormitories and not cabins which are probably the missing value data which is huge in numbers coz rich people who used cabins were less).

#

When I added this deck feature along with some more features my score went from 0.7751 to 0.78708

molten grove Jun 20, 2025, 4:14 PM

#

I got 82% accuracy using RandomForest.

molten grove Jun 22, 2025, 4:26 PM

#

pure lark When I added this deck feature along with some more features my score went from ...

That sounds cool. I'm already getting 82 using RF without cabin. But this insight sounds useful. I'll try using it in the next iteration. Thanks for it bro.

frosty hamlet Jun 24, 2025, 11:04 AM

#

charred tinsel Hey everyone Here is how I achieved **82.5%** accuracy on test set. I tried to a...

Impressive 🎉

still vector Jun 30, 2025, 12:29 AM

#

0.787 with binary classification NN. I might try some more feature engineering later. For now i did HasCabin?, one hot encoded (Mr, Mrs, Miss, Other) and Embarked, and combined Parch + SibSP = FamilySize.
I want to try ranking the decks, and changing the 'Other' title to their respective genders.
Fun challenge, would recommend.

lime fable Jun 30, 2025, 4:41 PM

#

i got 70% Accuracy using Gradient Boosting Classifier

dusk terrace Jul 4, 2025, 7:48 AM

#

lime fable i got 70% Accuracy using Gradient Boosting Classifier

i got 78% using logistic

blissful aurora Jul 4, 2025, 5:29 PM

#

Hi,
I 've got 75% accuracy. My code is on Github, have you some advice to improve it ?
https://github.com/Jeremy-Duval-PhD/Kaggle_Titanic

GitHub

GitHub - Jeremy-Duval-PhD/Kaggle_Titanic: https://www.kaggle.com/co...

https://www.kaggle.com/competitions/titanic. Contribute to Jeremy-Duval-PhD/Kaggle_Titanic development by creating an account on GitHub.

blissful aurora Jul 4, 2025, 5:30 PM

#

molten grove I got 82% accuracy using RandomForest.

Is your code public ?

molten grove Jul 4, 2025, 5:30 PM

#

blissful aurora Is your code public ?

Not yet. But I'm planning to make it soon.

#

I'll let you know once it's out.

blissful aurora Jul 4, 2025, 5:33 PM

#

Thank you !

cedar gust Jul 4, 2025, 11:13 PM

#

Hey guys, quick question, I was wondering that if I haven't previously worked on anything at least too substantive in AI/ML, if this titanic project is doable for me? Any tips/suggesstions as to where I can start is also appreciated!

lime fable Jul 5, 2025, 12:31 PM

#

dusk terrace i got 78% using logistic

i used algorithm evaluation

fallen carbon Jul 6, 2025, 8:35 PM

#

cedar gust Hey guys, quick question, I was wondering that if I haven't previously worked on...

Everyone needs to start somewhere, and this is a pretty good problem to get started on. Try getting familiar with general visualisation tools and Pandas, figure out how to manipulate the data & select some features that seem useful, then try running some Sklearn classifiers on it & assessing accuracy.

cedar gust Jul 6, 2025, 9:45 PM

#

fallen carbon Everyone needs to start somewhere, and this is a pretty good problem to get star...

thank you

blissful aurora Jul 9, 2025, 1:51 PM

#

cedar gust Hey guys, quick question, I was wondering that if I haven't previously worked on...

It's a good way to start. I can give you the link to my commented code if you want.

cedar gust Jul 9, 2025, 8:48 PM

#

blissful aurora It's a good way to start. I can give you the link to my commented code if you wa...

Thank you, if I end up needing it I'll definitely shoot u a dm, appreciate you