#🚀┊spaceship-titanic | Kaggle | Page 1

keen ibex Aug 9, 2023, 4:29 PM

#

Hi, how are the leaderboard scores calculated?

desert crescent Aug 9, 2023, 7:35 PM

#

keen ibex Hi, how are the leaderboard scores calculated?

Classification accuracy: https://www.kaggle.com/competitions/spaceship-titanic/overview/evaluation

Spaceship Titanic

Predict which passengers are transported to an alternate dimension

keen ibex Aug 10, 2023, 5:01 AM

#

Well, it can’t be that my predictions have 0 score and using multiple models. There must be something wrong I am doing

#

I have read all the details but still not sure why any score would be zero (statistically it is impossible)

keen ibex Aug 10, 2023, 5:19 AM

#

I found the error. The True and False are case sensitive and I was using all CAPs

young basalt Aug 10, 2023, 7:03 PM

#

I have question why we dont drop VRDeck , roomService, FoodCourt , ShoopingMall along with PassengerId and Name

#

things like these should not matter in training the model or am i missing something

keen ibex Aug 10, 2023, 7:22 PM

#

young basalt things like these should not matter in training the model or am i missing someth...

The utilization of specialized services may serve as an indicator that certain people have access to unique resources, thereby increasing their likelihood of being transported. We must consider the potential existence of biases or specific skills that these individuals may leverage—either their own or those of others—when employing these services

young basalt Aug 10, 2023, 7:28 PM

#

keen ibex The utilization of specialized services may serve as an indicator that certain p...

thnx man

zinc grove Aug 12, 2023, 8:29 AM

#

Anyone interested in teaming up?

coarse wave Aug 14, 2023, 3:27 PM

#

zinc grove Anyone interested in teaming up?

yes,

white halo Aug 15, 2023, 5:12 PM

#

Glad to see you all teaming up! Please post in our https://discord.com/channels/1101210829807956100/1130572338182762657 channel so others have the opportunity to team up as well!

untold storm Aug 16, 2023, 2:57 PM

#

Yo guys! Lol, finally a place to connect to other people, geez. It's been a lonely 7 months, and 3 of them doing Data Analytics and ML. I'm from South Africa by the way, it's great to meet you all! And I just joined this competition

white halo Aug 16, 2023, 6:53 PM

#

untold storm Yo guys! Lol, finally a place to connect to other people, geez. It's been a lone...

Welcome to our community 🙂

untold storm Aug 16, 2023, 6:54 PM

#

white halo Welcome to our community 🙂

Thanks brother!! Lol I got a long way to go in this and I'm wondering where the summaries are to get up to scratch. I'm totally new to tech

tame cargo Aug 17, 2023, 7:38 PM

#

I'm so happy with my solution for null values in the test dat in the expenditure columns

#

I just had to say lol

#

I haven't started my feature engineering yet, but I'm not on a team either, but anyways, it seems that the utility expenditure is quite important after all...

Screenshot_2023-08-17_at_8.40.16_p.m..png

tame cargo Aug 18, 2023, 1:39 PM

#

Hmm anybody got any ideas for further feature creation?

hoary monolith Aug 18, 2023, 1:54 PM

#

hey completely new to machine learning anyone want to team up

#

#👥┊looking-for-a-team

#

what will be thought behind the fill nan values for cabin?

tame cargo Aug 18, 2023, 5:13 PM

#

hoary monolith what will be thought behind the fill nan values for cabin?

After data processing and converting to integers, I just filled random values from min() to max(), There is probably a better solution but I think that this should suffice.

hoary monolith Aug 18, 2023, 5:15 PM

#

what will be the approch for converting to integers?

tame cargo Aug 18, 2023, 9:32 PM

#

For converting, a simple mapping did the trick!

map={'categorical':numerical, 'A':0}
df['CabinDeck'] = df['CabinDeck'].map(map)

untold storm Aug 18, 2023, 11:31 PM

#

Hi everyone, I am struggling to really get the ball rolling for this project. Is there anyone willing to help me? I am able to do most of the basics and then a be able to do a mixture of things, which isn't good enough. I am struggling to do EDA and then categorical encoding. Please send a DM if anyone is willing to help. Much a appreciated, I'm based in South Africa and my timezone is GMT +2. I will put in extra hours to accommodate anyone willing to help me get through the spaceship Titanic, I am still a noob but I got 3 months worth of skill and self taught.

hoary monolith Aug 19, 2023, 10:32 AM

#

tame cargo For converting, a simple mapping did the trick! ``` map={'categorical':numerical...

thank you so much

#

my test score is way to low how can i imporve it am using Random Forest Classifier

zinc grove Aug 19, 2023, 5:00 PM

#

Hey everyone,

I'm thrilled to share that my work for the Spaceship Titanic competition has achieved an impressive Top 5% rank on the leaderboard! 🥳🏆

In this Notebook: https://www.kaggle.com/code/ishanpurohit/top-5-solution-with-detailed-explanation, I've poured countless hours into conducting in-depth analysis, crafting insightful visualizations, and implementing advanced modeling techniques. From uncovering hidden patterns in missing data to optimizing feature engineering, my notebook showcases a comprehensive approach that has propelled me into the top echelon of participants.

If you're looking for a comprehensive guide to conquering the Spaceship Titanic challenge, complete with strategies to improve model performance and achieve remarkable insights, this is the notebook you've been waiting for.

I invite you all to check out my notebook and explore the journey that led to this achievement. Your support means the world to me, so if you find it valuable, please consider giving it an upvote! 🙌👍

tame cargo Aug 19, 2023, 8:02 PM

#

zinc grove Hey everyone, I'm thrilled to share that my work for the Spaceship Titanic comp...

damn that notebook is too good! Completely changed my perspective of this competition. I might start from scratch again!

zinc grove Aug 20, 2023, 10:25 AM

#

tame cargo damn that notebook is *too* good! Completely changed my perspective of this comp...

Thank you for your feedback and comment.

cold bone Aug 22, 2023, 9:18 PM

#

untold storm Hi everyone, I am struggling to really get the ball rolling for this project. Is...

Hi! - What worked for me in the past is to look at other's notebook ( @zinc grove shared the notebook in this channel), learn from it, come back to my own notebook and make modifications (vs. spending an enormous amount of time trying to perfect my code). This method enabled me to stay positive, learn from others, make changes, keep the momentum, and repeat. 🤗

zinc grove Aug 23, 2023, 6:27 AM

#

cold bone Hi! - What worked for me in the past is to look at other's notebook ( <@46087040...

I can agree. I have reviewed many notebooks and discussions before implementing and finding what's work best. I am glad my notebook was helpful.

cold bone Aug 23, 2023, 11:40 AM

#

untold storm Hi everyone, I am struggling to really get the ball rolling for this project. Is...

Another suggestion is to start small; make small progress based on one learning module at a time. When I started with Kaggle, I take one Kaggle Learn course, pick a dataset that interests me and apply what I learned by anlayzing it. Below post talks about the approach more in detail. Hope you find it helpful. 🤗

https://www.kaggle.com/discussions/getting-started/393853

Getting Started with Data Science | How to leverage Kaggle resource...

Getting Started with Data Science | How to leverage Kaggle resources to Maximize Learning .

untold storm Aug 26, 2023, 7:04 PM

#

Hey guys thanks for the help! Lol 2 weeks later and finally got somewhere in the spaceship Titanic challenge. Now my model is trained and is 78% accurate. But now im stuck, I don't know how to deal with the missing column that should be predicted in the test.csv 🤣😅😅 please any suggestions would help. It took me 2 weeks to get to this point and now I'm totally clueless

trail tangle Aug 28, 2023, 4:36 PM

#

Could neural networks potentially be used to solve this problem

#

Or is there not enough data

trail tangle Aug 30, 2023, 9:04 PM

#

Update: It was...

mighty lake Aug 31, 2023, 9:46 PM

#

untold storm Hey guys thanks for the help! Lol 2 weeks later and finally got somewhere in the...

I recommend looking through discussions.

Espicially...
https://www.kaggle.com/competitions/spaceship-titanic/discussion/315987

Spaceship Titanic

Predict which passengers are transported to an alternate dimension

karmic pivot Sep 4, 2023, 1:52 PM

#

Hello! After I converted some columns into numerical values (or I normalized some of them), so I still need to do the same on test.csv? Or should we not edit test.csv? THank you in advance for your answer!

limpid furnace Sep 5, 2023, 10:09 AM

#

karmic pivot Hello! After I converted some columns into numerical values (or I normalized som...

Yeah bro

karmic pivot Sep 5, 2023, 10:23 AM

#

limpid furnace Yeah bro

It's a question with two choices. It's a bit difficult to understand with what you meant by "yeah bro"

limpid furnace Sep 5, 2023, 10:26 AM

#

You should edit test.csv too

karmic pivot Sep 5, 2023, 10:38 AM

#

limpid furnace You should edit test.csv too

All right, thanks a bunch!

limpid furnace Sep 5, 2023, 10:40 AM

#

No problem

limpid furnace Sep 5, 2023, 4:39 PM

#

This is my notebook about this competition. I have used some basic technique to extract feature and fill NA but I have a pretty good accuracy. I didn't explore data and EDA much but I will share my notebook to all of you so you can find something interesting in EDA and improve it much better. If you have some advice for me, please comment and I will hear all of your comments. And if you interested my notebook. pleas support me by voting, I will very appreciate that. Thank you so much!!
https://www.kaggle.com/code/hoanglongroai/80-accuracy-spaceship-titanic#Make-prediction

80% Accuracy Spaceship Titanic

Explore and run machine learning code with Kaggle Notebooks | Using data from Spaceship Titanic

lavish pulsar Sep 9, 2023, 1:41 AM

#

Anyone interested in teaming up with me?

scenic furnace Sep 10, 2023, 10:23 AM

#

limpid furnace This is my notebook about this competition. I have used some basic technique to ...

Nice and clean notebook, well done!
You should consider replacing missing values before datatype conversion. {Series,DataFrame,..}.astype(bool) converts NaN values to True - which is not a good idea especially for the VIP column, where most of the values are False.

limpid furnace Sep 10, 2023, 11:43 AM

#

scenic furnace Nice and clean notebook, well done! You should consider replacing missing values...

Thank you so much for giving me advice!!! 🥰🙏🏻 I will cover that later

trail crypt Sep 15, 2023, 10:34 AM

#

Going to look into this competition soon. Pretty similar to normal Titanic problem I assume?

junior cobalt Sep 16, 2023, 5:45 PM

#

Need partner for this project , a team

mystic jungle Sep 19, 2023, 3:45 AM

#

My notebook with accuracy of 78.6% on"Spaceship Titanic"
🔗https://www.kaggle.com/code/harshpatelind13/spaceship-titanic-13

Spaceship Titanic__13

Explore and run machine learning code with Kaggle Notebooks | Using data from Spaceship Titanic

foggy dirge Sep 23, 2023, 7:00 PM

#

hello everyone,
I achieved an accuracy of 78.04% on the spaceship Titanic.
https://www.kaggle.com/dinanksoni/spaceship-titanic-78-04

Spaceship Titanic 78.04%

Explore and run machine learning code with Kaggle Notebooks | Using data from Spaceship Titanic

lyric basin Sep 26, 2023, 2:54 PM

#

hey can you help me with something

#

Like I face problem everytime how to decide which algorithm to use

sudden rivet Sep 29, 2023, 6:37 PM

#

Hey y'all,
I'm trying my best at feature engineering the spaceship-titanic.

I want to find the best way at tackling the NaN values and impute rather than just delete them. And I ran across this discussion post https://www.kaggle.com/competitions/spaceship-titanic/discussion/315987#2461774

How did the heck did they identify so many great and powerful relationships/rules?

sudden rivet Sep 30, 2023, 8:28 PM

#

Woohoo! I just scored a 0.78536 using Random Forest 🌳 on the space-titanic. I gotta say this one's feature engineering was so rigorous... https://www.kaggle.com/code/m000sey/space-random-forest/edit/run/144798140 I am going to try a few more hyperparameters to see if I can inch up the score

carmine crescent Oct 10, 2023, 8:53 PM

#

i did the normal EDA on the dataset, now i cant think of what to do next???

sudden rivet Oct 10, 2023, 8:56 PM

#

carmine crescent i did the normal EDA on the dataset, now i cant think of what to do next???

What are your insights from the EDA?

#

Anything stick out to you? Did you try any transformations of the data?

vale heron Oct 11, 2023, 3:22 PM

#

Hey ! im looking for some feedbacks on my submission notionbook on spaceship titanic. Ive been studying python for only 2 weeks and im currently studying google analytics on coursera to become a data analyst.
https://www.kaggle.com/code/sebastienmotionstats/spaceship-titanic-sub-motionstats

carmine crescent Oct 12, 2023, 12:30 AM

#

sudden rivet What are your insights from the EDA?

Yes. Now I am thinking that maybe RandomForests would work here, since there is no direct relation between the data and the desired output.

#

Am I thinking right?

carmine crescent Oct 14, 2023, 7:40 AM

#

Need to check that out

carmine crescent Oct 14, 2023, 9:15 AM

#

Woah, I got 0.75 w/o it

thin hearth Oct 23, 2023, 11:26 AM

#

anyone team-up?

teal shadow Oct 24, 2023, 5:09 AM

#

I cant get over 0.75

teal shadow Oct 24, 2023, 5:09 AM

#

thin hearth anyone team-up?

I can team up , but I am not that experienced

thin hearth Oct 24, 2023, 5:10 AM

#

No problem let's join

#

can you share me your notebook link

teal shadow Oct 24, 2023, 5:13 AM

#

thin hearth No problem let's join

dm me

teal shadow Oct 24, 2023, 5:52 AM

#

i tried to essemble 3 modles , lr , random fortest ,and gradient boosting but only manage to boost my result by like 0.77 to 0.78

opaque sedge Oct 30, 2023, 7:59 PM

#

Anyone interested on teaming up on this spaceship titanic, can you pls DM me?

quasi pier Nov 8, 2023, 1:25 AM

#

#

Got 79% at the first shot, then never got higher...

tiny mantle Nov 14, 2023, 11:49 AM

#

quasi pier

Having exact same problem

#

My model keeps improving in validation scores and but keeps decreasing in kaggle score

#

If someone is willing to look at my code I would be really greatful. I am doing something very stupid and I don’t know what

quiet widget Nov 15, 2023, 12:09 AM

#

Does catboosts performs better than xgboost on this dataset?

#

with xgboost i manage to get 0.803

onyx bridge Nov 21, 2023, 3:36 PM

#

I got 0.71

thick flare Nov 22, 2023, 6:15 PM

#

tiny mantle If someone is willing to look at my code I would be really greatful. I am doing ...

Share the link

autumn token Nov 22, 2023, 10:11 PM

#

I got 78%

placid meadow Dec 9, 2023, 7:54 PM

#

Has anyone tried neural network to solve this problem?

plain nimbus Dec 14, 2023, 8:59 PM

#

I got 97.96 with decision tree 🙂

#

This is my notebook: https://www.kaggle.com/code/tomedison/spaceship-titanic

spaceship-titanic

Explore and run machine learning code with Kaggle Notebooks | Using data from Spaceship Titanic

#

but my score is 0

#

anyone know why?

#

thanks

terse verge Dec 19, 2023, 4:00 PM

#

Hello, I'm following this notebook: https://www.kaggle.com/code/oscardata963/spaceship-titanic-notebook

And I'm trying to do it in R.
So far I have something like this: [Photo 2]

Which works but I want the logarithmic scale and the tight_layout where the values are pretty much tight rather than what I have like this: [Photo 3]

Can someone help me? I can't find anything on the internet and I've tried everything. One of the problems I have in my professional journey is the fact that I can't solve things when I'm stuck and I have nobody to solve my problems. Please ping me.

Spaceship Titanic Notebook

Explore and run machine learning code with Kaggle Notebooks | Using data from Spaceship Titanic

#

Here the code for you to modify it:

par(mfrow=c(1,3), mar=c(4, 4, 2, 1))
for (col in colnames(train)) {
  if (class(train[[col]]) == "numeric") {
    hist(train[[col]], main=col, xlab="")
  }
}

#

I if I try to add the parameter of hist() log = 'y' it gives me these two errors:

#

Which means I have nulls and the 'is not a graphical parameter' which I don't understand it really. Anyways is impressive to me that python is able to detect and avoid the nulls when plotting with matplotlib (because the dtype is object not numeric or double) and tightens the data with such few code and I have to do all this, I just don't know what to do. Please someone help me.

tacit meadow Dec 21, 2023, 8:24 AM

#

is it possible to get a perfect score in this competition?

terse verge Dec 22, 2023, 12:11 PM

#

could somebody help me with my question?

#

that's the problem with my journey in coding and data science I can't solve my problems when I'm stuck

#

and nobody helps me

#

I've already also asked in the kaggle forum

#

if you can recommend me a place where somebody can help me? sad_panda

#

#🚀┊spaceship-titanic message

buoyant igloo Dec 24, 2023, 2:28 AM

#

The top two scores on the leaderboard are 0.98 and 0.96, far ahead of the pack at ~0.82, 🤔 could these scores be achieved "gaming the system" by systematically changing submissions to infer where the errors are, or is there a key insight that almost everyone missed?

terse verge Dec 24, 2023, 10:29 AM

#

Probably the latter

terse verge Dec 24, 2023, 11:05 AM

#

#

Well the first one definitely used an iterative method 😂

terse verge Dec 24, 2023, 12:39 PM

#

Why is python making transported a numerical value?

#

like automatically

buoyant igloo Dec 24, 2023, 5:03 PM

#

terse verge Why is python making transported a numerical value?

The numerical values are correlations, e.g. correlation between Age and Transported = -0.075, correlation ranges from 1 to -1 https://en.wikipedia.org/wiki/Pearson_correlation_coefficient

Pearson correlation coefficient

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has ...

terse verge Dec 24, 2023, 5:03 PM

#

that doesn't answer my question

buoyant igloo Dec 24, 2023, 5:05 PM

#

terse verge that doesn't answer my question

see numeric_only parameter of corr() bool, default False

Include only float, int or boolean data.

terse verge Dec 24, 2023, 5:05 PM

#

what

#

i don't understand you

buoyant igloo Dec 24, 2023, 5:09 PM

#

I guess I don't understand your question. You show a call to corr() which returns a table of numerical values and ask why Transported is converted to a number. The input value of Transported True/False is converted to a number because the numeric_only parameter of corr() defaults to False, the output associated with Transported is numbers because correlations are numbers.

terse verge Dec 24, 2023, 5:10 PM

#

ok then why would cryosleepand vip are not converted to numeric? because they have nans while transported doesnt?

#

i guess that answers my first question tho

#

now

#

I want to do the same thing in R

#

I have so far something like this but I don't know how to add the transported one

#

chatgpt is telling me this but I don't know why transported is saying it has size 0 when i have done str(train) and see obviously that the variable is still in the dataset

#

here the code if you want to help me:

# Select numeric columns
numeric_columns <- sapply(train, is.numeric)

# Calculate the correlation matrix for numeric columns
cor_matrix <- cor(train[, numeric_columns], use = "complete.obs")

cor_matrix

#

Anyways i solve it but 😓

#

I can't believe the fact that doing that transformation changes the data that much

#

#

Is that a problem?

#

I guess R does the calculations different than Python and so there are those differences but I don't know that when training that might be a problem

#

imma continue and i wait to your answer but still continue with the project

#

now that i see is not only transported is the whole df, seems like cor() in R treats it differently than cor in python :(

#

i hope that is not a problem, but the values are pretty different

terse verge Dec 24, 2023, 7:54 PM

#

plain nimbus but my score is 0

https://developers.google.com/machine-learning/crash-course/classification/accuracy

#

terse verge Dec 25, 2023, 10:02 AM

#

https://www.kaggle.com/discussions/questions-and-answers/463429

Hey guys i posted a question about how to create my custom transfomer in R.

Custom Transfomer Question in R | Kaggle

Custom Transfomer Question in R .

#

Any help could be appreciated

terse verge Dec 26, 2023, 12:59 PM

#

Can anyone help me? https://www.kaggle.com/discussions/questions-and-answers/463648

Feature engineering question in R

Transforming pipelines in R | Spaceship Titanic question | Kaggle

Transforming pipelines in R | Spaceship Titanic question.

#

is very basic

terse verge Dec 27, 2023, 10:38 AM

#

Can somebody help me with this code?

library(mlr3pipelines)
library(mlr3)

# Define the numerical pipeline
num_pipeline <- po("scale", id = "num_scale") %>>%
  po("impute", id = "num_impute", param = list(strategy = "median"))

# Define the categorical pipeline
cat_pipeline <- po("encode", id = "cat_encode", param = list(method = "1hot"))

# Define the column transformer
full_pipeline <- po("branch", id = "branch") %>>%
  po("pipe", id = "num_pipe", num_pipeline, col_roles = list(num = num_attribs)) %>>%
  po("pipe", id = "cat_pipe", cat_pipeline, col_roles = list(cat = cat_attribs)) %>>%
  po("ccombine", id = "ccombine")

As you can see i'm trying to create a pipelines in r

flint stag Dec 28, 2023, 2:15 PM

#

Hello, Looking for a partner or team to join for this project. I'm experienced software engineer (10 years) with couple of months experience in ML. Let me know please if anyone is interested.

terse verge Dec 29, 2023, 2:33 PM

#

https://www.kaggle.com/discussions/questions-and-answers/464266

Transformation pipelines in R question | Spaceship Titanic competit...

Transformation pipelines in R question | Spaceship Titanic competition.

#

If someone helps me i would appreciate it

#

is a feature engineering question

#

caret and it's function, the fact that i cannot abstract the pipelines from the data is what bugs me

buoyant igloo Dec 29, 2023, 6:06 PM

#

flint stag Hello, Looking for a partner or team to join for this project. I'm experienced s...

Hi Sanjeev -- I am also experienced s/w engineer + ML beginner. I have colab notebook which does complete processing from input data to prediction with pytorch. It gets 0.79 score which is around the middle of the leaderboard, where the top credible score is around 0.82. For me, 0.79 is good enough. I'm happy to share the notebook with some ideas how to improve the score by 1-2% if you're interested.

mellow crag Dec 29, 2023, 6:53 PM

#

Hello and happy holidays!
I'm an Hobbyist here, so excuse my beginners questions as I am from the business sector. 🙂
In the Spaceship Titanic, I see that PassangerId is composed by Group and Number within the group. I would like to create a category labeled IsGroup indicating if that person is in a group or not (True or False). I was thinking about separate gggg and pp, than counting the gggg occurrences and if higher than 1 IsGroup == True, else IsGroup == False. Is this feasible? Any solution more elegant?

buoyant igloo Dec 29, 2023, 7:11 PM

#

mellow crag Hello and happy holidays! I'm an Hobbyist here, so excuse my beginners questions...

Suggest better might be category for group_size, this has more information for a classifier which includes the special case group_size==1 which would be False in your scheme. Then another category could be family_size using last name to identify families (how to handle missing names needs a bit of thought). In pandas you can do this with Series.str.split() with underscore as delimiter then value_counts() https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.value_counts.html. My philosophy was to look for a short-cut solution like value_counts(), but if I couldn't find it quickly then write a dumb loop over the row or column in python -- there is no compelling need to optimize speed or code size for engineering a small feature table like this.

inland summit Dec 31, 2023, 9:11 PM

#

Hello. I'm working on the spaceship titanic competition and I have a technical question. I'm using the suggested notebook to guide me through this exercise. I'm at the section which validates the Random Forest model using the validation set (not the OOB validation. I've already completed that part). The 'rf.evaluate' function returns two values: Loss = 0 and Accuracy = 0.792. Can someone explain the difference between these two?

terse verge Jan 1, 2024, 1:39 AM

#

inland summit Hello. I'm working on the spaceship titanic competition and I have a technical q...

Certainly! In the context of machine learning models, "Loss" and "Accuracy" are two commonly used metrics to evaluate the performance of a model. Let's break down what each of these metrics represents:

Loss:
- The loss is a measure of how well the model is performing on a specific task. It quantifies the difference between the predicted values and the actual values (ground truth).
- The goal during training is typically to minimize the loss. Different types of models and tasks use different loss functions. For example, in classification problems, cross-entropy loss is commonly used.
- In the context of the Titanic competition, the loss is likely calculated based on the predictions of whether a passenger survived or not compared to the actual survival status.
Accuracy:
- Accuracy is a measure of the overall correctness of the model's predictions. It is the ratio of correctly predicted instances to the total instances.
- It is one of the most straightforward metrics and is expressed as a percentage. An accuracy of 0.792 means that approximately 79.2% of the predictions made by the model on the validation set are correct.
- While accuracy is informative, it may not be the only metric to consider, especially in imbalanced datasets. For example, if only a small percentage of passengers survived, a model that predicts all passengers as not surviving might still achieve a high accuracy but may not be useful.

In summary, during the evaluation of a model:

Loss provides a more detailed and task-specific measure of how well the model is performing.
Accuracy provides a general measure of overall correctness but may not be sufficient in all cases, especially in imbalanced datasets.

For a more comprehensive evaluation, you may also consider exploring other metrics such as precision, recall, F1 score, or the area under the ROC curve (AUC-ROC), depending on the specific goals and characteristics of your problem.

#

ChatGPT apparently is only trained with the og titanic

#

but you get the idea

#

Remember another thing

#

In the context of the Titanic competition on platforms like Kaggle, you are correct. The actual survival status of passengers in the test set is typically not provided to participants. During the competition, participants train their models on the training set, where the ground truth (actual survival status) is known, and they validate the performance of their models using a validation set.

The loss reported during training and evaluation is calculated based on the predictions made by the model compared to the known outcomes in the validation set. This is done to get an estimate of how well the model is likely to perform on unseen data. The exact evaluation metric used for loss depends on the competition, but it's often related to the classification task at hand.

Once participants are satisfied with their models and want to make predictions on the test set, they use their trained models to predict the outcomes for the test set. **The actual outcomes for the test set are not provided during the competition. Participants then submit their predictions to the competition platform, which evaluates those predictions based on the true outcomes held by the platform. The platform uses these evaluations to calculate the final performance metrics, and participants are ranked on the public leaderboard accordingly.

In summary, during the competition, participants do not have access to the true outcomes for the test set. They use the training set and validation set to train and evaluate their models, and the final evaluation is based on the unseen test set when submitting predictions to the competition platform.**

buoyant igloo Jan 2, 2024, 2:52 PM

#

inland summit Hello. I'm working on the spaceship titanic competition and I have a technical q...

Usually, loss is way to calculate error which used in training of neural networks. Random forests are a quite different type of classifier. A random forest does not use this kind of loss function in training, so I would guess that here loss is reported as zero as a placeholder to preserve the same function return as other classifiers in Keras. Instead, random forests use "purity", which measures how well a given node in a tree divides samples into different classes. In scikit-learn, the purity metric can be Gini impurity (default) or entropy; I don't know Keras, maybe it uses the term "loss" for purity in a random forest in which case the reported loss would be the average purity at the lowest node in the trees. It would be a bit surprising for the training loss to be exactly zero for a challenging classification problem like this, which again suggests the zero is just a placeholder.

broken cape Jan 4, 2024, 2:00 AM

#

Hello, i am getting a rather unusual problem with my submissions, somehow I am getting a score of 0 while sample submission is getting a score of 0.49. I know my model predictions are not that bad😅
It'll be really helpful if someone could guide me through this! Thanks

terse verge Jan 4, 2024, 7:14 AM

#

There could be because of many reasons. The most common one is that the submission only reads the bool as True or False

Check if you are predicting the values as numeric or even bool but with all uppercase and change it.

rustic kernel Jan 20, 2024, 7:39 AM

#

Hello, I'm working on the spaceship titanic and find unclear moment for me.

#

After constructing the age distribution, I noticed a peak at zero value. I'm trying to understand whether these are missing values or if there are just a lot of children in the dataset. Upon further inspection of the rows with zero age, I observed that they don't spend anything, but they may end up on different planets and either survive or not survive in the end.

There are two main hypotheses: either these are indeed children, or they could be spaceship personnel.

I'm not quite sure how to test this hypothesis, and I would be grateful if more experienced professionals could guide me in the right direction.

Thank you in advance!

vague sparrow Jan 24, 2024, 4:17 PM

#

Hey! I just made a question and I hope anyone from here could help me! I tried to complete this competition with Tensorflow and I want to improve my score https://www.kaggle.com/competitions/spaceship-titanic/discussion/470564

Spaceship Titanic

Predict which passengers are transported to an alternate dimension

faint parcel Jan 26, 2024, 7:32 AM

#

Hi everyone, I just started out and want to ask about this competition. We have to predict the column 'transported' but there is no column as such in train.csv or test.csv. How will I train the model then?

random idol Feb 4, 2024, 10:53 PM

#

Hello everyone.
I think someone has already noticed and even taken advantage of it. But I'm at a dead end.
These are scatterplots of Cabin_number and Group_number depending on some categorical features (and target feature). They form lines, and some even belong entirely to one class - for example in Cabin_deck feature. The situation is similar with Cabin_deck and Home_Planet. I want to use this to fill in the missing data, but have not had success with this yet.
If you have any ideas, please write.

cursive jetty Feb 8, 2024, 3:06 AM

#

Hi everyone, this is my first time entering a competition. My name is Hitesh an aspiring Data Scientist and I'm about to start learning ML. I want to learn through a more practical approach and then learn the theoretical aspect of ML simultaneously hence I'm here.

If anyone can help me getting started, it would be a huge help. Cheers!

vivid flicker Feb 13, 2024, 11:39 AM

#

can anyone please help as testing dataset of this spaceship-titanic is missing about 2% so how to predict for them ?

turbid jasper Feb 16, 2024, 2:16 PM

#

can someone explain why this is happening

dire cradle Feb 16, 2024, 5:55 PM

#

Would love it if someone here could give their insight on my work!
https://www.kaggle.com/code/satvshr/spaceship-titanic-using-gridsearchcv-xgboost

Spaceship Titanic using GridSearchCV (XGBoost)

Explore and run machine learning code with Kaggle Notebooks | Using data from Spaceship Titanic

inner pulsar Feb 18, 2024, 9:29 PM

#

turbid jasper can someone explain why this is happening

there's s string in of the columns that the model isn't able to understand , use some encoding to change it to a number

vague bay Feb 18, 2024, 10:34 PM

#

does sample submission contain correct answers?

#

I want to evaluate my answer without submitting code

worthy moat Feb 20, 2024, 8:54 PM

#

New notebook exploring Spaceship-Titanic transported status! Check it out and upvote if you find it helpful: https://www.kaggle.com/code/seifwael123/titanic-data-analysis-a-comprehensive-exploration
Thanks!

Titanic Data Analysis: A Comprehensive Exploration

Explore and run machine learning code with Kaggle Notebooks | Using data from Spaceship Titanic

fair linden Feb 25, 2024, 6:15 PM

#

is my model tripping ?

#

I get around 80% accuracy but still , I have rly high doubts that Cabin_num is the most important feature

gusty cipher Mar 7, 2024, 1:13 PM

#

HI please check this notebooks and give me your feedback. If possible please upvote.

https://www.kaggle.com/code/bvvkarthik/spaceship-titanic-0-80-beginner-friendly

🚀SpaceShip-Titanic-(~0.80)-Beginner-Friendly

Explore and run machine learning code with Kaggle Notebooks | Using data from Spaceship Titanic

echo falcon Mar 16, 2024, 4:13 PM

#

I am getting the score as zero even if I converted the predictions to integers.

worthy moat Mar 18, 2024, 2:00 AM

#

https://www.kaggle.com/code/seifwael123/spaceship-titanic-eda-modelling-optuna

Spaceship Titanic 🚢: EDA | Modelling | Optuna

Explore and run machine learning code with Kaggle Notebooks | Using data from Spaceship Titanic

coarse wave Apr 1, 2024, 1:49 AM

#

hello, everyone. i just joined this competition and i get 0.78 public scores. i want to get more high accuracy and can anyone help me?

fickle stirrup Apr 7, 2024, 9:17 AM

#

Can anyone explain why I'm getting 0000 public score?

knotty crescent Apr 7, 2024, 10:20 PM

#

Hi, Everyone. This was my first competition, and I got a public score of 0.72. I'm looking forward to working to get this higher. I have been going through the lessons, and for the first time, I am getting my head around machine learning. The other times I have tried it, it did not make sense or I did not know why I was doing something.
I just realise that there is still so much to learn

solar relic Apr 10, 2024, 7:52 AM

#

can anyone please tell https://www.kaggle.com/code/aadishsharma/spaceship-prediction/edit
where i am making error in this

Spaceship-Prediction

Explore and run machine learning code with Kaggle Notebooks | Using data from Spaceship Titanic

solar relic Apr 10, 2024, 9:33 AM

#

solar relic can anyone please tell https://www.kaggle.com/code/aadishsharma/spaceship-predic...

got it thanks

sullen scarab May 3, 2024, 5:04 PM

#

anyone interested in doing this project together

iron ivy May 11, 2024, 10:04 AM

#

Hi All, I am looking the solutions but I cannot understand why only the following features have been selected and the features "Age" , "RoomService" have been left out? I mean numerical columns shouldn't be they reprocessed?

compact spruce May 19, 2024, 2:33 AM

#

https://colab.research.google.com/drive/1t1sdCvx9Gl3WPjCqWQMXDzbn0dX1x_Ej?usp=drive_link
Could anyone give me some tips to improve my code?

Google Colab

vale viper May 22, 2024, 9:45 AM

#

Hi, do you know why when I submit it gives me 0 score? Thank you! https://www.kaggle.com/code/davidg960/space-adventure

arctic estuary May 28, 2024, 12:56 AM

#

vale viper Hi, do you know why when I submit it gives me 0 score? Thank you! https://www.ka...

Hello, I think you need to convert your predictions to True/False instead of 0/1

#

#

sample submission

vital yew Jun 1, 2024, 12:40 PM

#

i ended up doing several transformations to passengerid however they seem legit in the end, but not legit enough. any suggestions?

#

#

#

#

ok so i did find a mistake, the id's were different but I replaced them w the original values directly from the test dataset and the error persists

raven skiff Jun 12, 2024, 6:18 PM

#

Hello, i have done this spaceship prediction,i got 78% is their any way to get more percentage.

#

I mean accuracy score

old sail Jun 13, 2024, 1:08 PM

#

raven skiff Hello, i have done this spaceship prediction,i got 78% is their any way to get m...

Try with different algorithms

raven skiff Jun 14, 2024, 6:17 PM

#

I have tried with randomtreeclasifier,svm's, adaboost . But all got 78% and below

#

Which algorithm you have used?

woven shell Jun 16, 2024, 11:39 AM

#

Good morning to evreybody. I'm new in Kaggle and i'm happy to be here

olive ginkgo Jun 24, 2024, 2:26 PM

#

hey i did titanic model using multiple linear regression, got accuracy score of 78%

hoary plover Jun 29, 2024, 5:18 AM

#

olive ginkgo hey i did titanic model using multiple linear regression, got accuracy score of ...

Did you do feature engineering?

ancient smelt Jul 3, 2024, 6:49 PM

#

olive ginkgo hey i did titanic model using multiple linear regression, got accuracy score of ...

i think it's a bad decision to use linear regression for that data. You should to use Classifiers for making categorical prediction(True/False, Red/Write/Green etc.) instead of using Regression! Regression is used for making continous predictions like house pricing, length or someone that you can measure or calculate, for example.

hoary plover Jul 4, 2024, 11:07 AM

#

ancient smelt i think it's a bad decision to use linear regression for that data. You should t...

there is no "transported" feature in the training data

#

so how did you train the model?

ancient smelt Jul 4, 2024, 3:20 PM

#

hoary plover there is no "transported" feature in the training data

I need more code or explanation what are you doing into analysis. Judging by your message you could have 3 possible mistaken ways:

If you asking data['transported'] you have a syntax mistake - you should write 'Transported' instead of lowercased version. All is case sensitive in python.
when you asking(for example) X_train['Transported'] - you will have an error because you should split your data for features - (for example) X_train = df.drop(columns=['Transported']) and y_train = df['Transported'] and after than use test_train_split. You will have your Transported column into y_train variable.
You trying to ask a variable without your 'Transported' column

hoary plover Jul 4, 2024, 4:02 PM

#

ancient smelt I need more code or explanation what are you doing into analysis. Judging by you...

Got u

#

U did wit logistic regression?

ancient smelt Jul 4, 2024, 6:12 PM

#

hoary plover U did wit logistic regression?

You didn't read my first post, the next conv doesn't make sense

hoary plover Jul 4, 2024, 11:45 PM

#

ancient smelt You didn't read my first post, the next conv doesn't make sense

I already extracted the column “transported “ from the data to y

#

Chill oleh

winter lava Jul 11, 2024, 5:21 PM

#

I was able to achieve a 71% accuracy with minimal Feature engineering utilizing pretrained LLM embedding models to obtain feature vectors from the textualized data and fitting classifiers to those vectors, if anyone is interested. https://www.kaggle.com/code/liamdavies1/space-titanic-using-llm-enhanced-feature-embedding

Space Titanic using LLM Enhanced Feature Embedding

Explore and run machine learning code with Kaggle Notebooks | Using data from Spaceship Titanic

#

Also, dont get me wrong, that is not a good accuracy for this problem. It is just demostrating a different approach

hoary plover Jul 20, 2024, 5:43 PM

#

winter lava I was able to achieve a 71% accuracy with minimal Feature engineering utilizing ...

What was the reason for such low accuracy?

pale plank Jul 29, 2024, 6:43 PM

#

Hey guys, did any of you feature enginner the cabin to separate it into three columns? like a/b/c, each into a separated column and one-hot encoded. Did this showed better resultS?

hoary plover Aug 6, 2024, 11:50 PM

#

Hey everyone,
I hope you are all doing well. I recently completed the "Titanic Spaceship" model.
I have shared the code in the Kaggle Notebook along with a brief explanation.

If you have any questions regarding the code or any suggestions, feel free to message me.

Kaggle Notebook Link: https://www.kaggle.com/code/mushei/spaceship-titanic-model-code

Spaceship Titanic Model Code

Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources

raw egret Aug 13, 2024, 3:20 PM

#

Hey guys, I tried to follow Samuel Cortinhas's notebook to complete this project, but I got stuck when dealing with missing values of surname. When I follow the code as shown above, I got the following error, could anyone help me with part of the code?

#

#

Thank you!

hoary plover Aug 16, 2024, 4:55 AM

#

raw egret Hey guys, I tried to follow Samuel Cortinhas's notebook to complete this project...

The error message means that you're trying to tell the sns.countplot() function to look for data at a specific position (or index) in your dataset, but that position doesn't exist. It's like trying to find a page in a book that isn't there.

prisma ridge Sep 23, 2024, 2:18 AM

#

hey guys i just started on this competition I was going to imputate the data since alot of it is missing but do any of you have any idea how to imputate the categorical data like destination, usually with numerical data I would just imputate it with the mean values. Thanks in advance!

upper sail Sep 23, 2024, 5:58 PM

#

prisma ridge hey guys i just started on this competition I was going to imputate the data sin...

I personally, use the OrdinalEncoder from sklearn.preprocessing , which converts all the catagorical data into a numeric one first ,
then I fill up missing values of the whole dataset , which is now fully numeric.

upper sail Oct 1, 2024, 6:00 AM

#

hello , can anyone help me improve my score , I don't know to feature engineer properly so didn't use that , but developed a few ways in which I could get an accuracy of 78% in Space titanic with data casting or feature dropping using correlation.
I get a lower accuracy if I drop the features with lower correlation and found data casting to be more useful , in which I convert all data types in a single data type i.e. float in this case.
I also get a lower score if I partition my train dataset in something other than 80-20.
I played with these parameters , even using Random Forest , but I'm getting different accuracy in Random forest Classifier if I run different times.
Is there a way to optimize more and can anyone help me , with how to vectorize my dataset for parallel computing (just asking).
https://www.kaggle.com/code/rishita00/space-titanic-indepth-classification/notebook

Space Titanic InDepth Classification

Explore and run machine learning code with Kaggle Notebooks | Using data from Spaceship Titanic

austere bay Oct 2, 2024, 2:35 PM

#

upper sail hello , can anyone help me improve my score , I don't know to feature engineer ...

You should expect slightly different accuracies if you run without seeding the random state. How different are they?

upper sail Oct 2, 2024, 3:05 PM

#

austere bay You should expect slightly different accuracies if you run without seeding the r...

I see, so I need to set a seed , for consistent result ?
Around 2% without changing anything. It varies between 78% to 80% , without any change.
With feature selection get an accuracy ranging between 75% 78%
With change in size of train and validation sets
70:30 , 77% to 79%
60:40 , 74% to 77%
90:10 , 76% to 77%
80:20 , 78% to 80% with Random Forest Classifier.

austere bay Oct 2, 2024, 3:40 PM

#

Yeah, that means you can just seed it. You get a different initial state each time you run the code.

native zinc Oct 28, 2024, 4:28 AM

#

hi, new to kaggle and new to competition, anyone who joined recently ?

#

can anyone suggest me if Roomservice, foodCourt, shopping mall and Spa can be ignored, or still need to be data processed for null values ?

tawdry ridge Oct 30, 2024, 7:35 PM

#

native zinc can anyone suggest me if Roomservice, foodCourt, shopping mall and Spa can be ig...

null values should be imputed na ?

#

how can u skip data (pre) processing u shan't in any case

compact grove Nov 2, 2024, 4:09 PM

#

hi everyone, there most of people using blending in competition. Is it really valuable?

dusk belfry Nov 3, 2024, 10:14 AM

#

compact grove hi everyone, there most of people using blending in competition. Is it really va...

Sometimes. Blending is good for combining model strengths, e.g. random forest and linear regression. Blending (mostly) never leads to your model performing worse than before, so it's always worth giving it a go.

compact grove Nov 3, 2024, 11:54 AM

#

dusk belfry Sometimes. Blending is good for combining model strengths, e.g. random forest an...

Understand, thank you so much.

native zinc Nov 6, 2024, 2:25 AM

#

tawdry ridge how can u skip data (pre) processing u shan't in any case

thank you

prisma ridge Nov 7, 2024, 3:57 AM

#

anybody here uses imputer model like IterativeImputer, how do you guys go about processing or encoding features before feeding the data to an imputer model?

fallen pebble Nov 14, 2024, 7:35 PM

#

guys, is it normal that we only get the false ones ?

#

can i predict the true ones if i only get false one as a train sample

#

or maybe i misread the data

lapis portal Nov 18, 2024, 6:55 PM

#

fallen pebble guys, is it normal that we only get the false ones ?

Hey did you figure this out

#

It’s just the sample submission with false ones

#

So they’re just placeholders to show you the format of submission

final creek Nov 20, 2024, 9:32 AM

#

prisma ridge anybody here uses imputer model like IterativeImputer, how do you guys go about ...

Hi! Have you figured this out? I have the same question

fallen pebble Nov 23, 2024, 12:21 AM

#

yea, i dont know why i didnt see the right doc

#

somehow i got 78% of acc which is meh

ocean arch Nov 25, 2024, 3:55 PM

#

Hello everyone, i am new to kaggle competition. I am getting started with spaceship competition. Can anyone help, how to handle "nan" values in different columns ? shall that be removed or that also can be trained?

prisma ridge Dec 2, 2024, 3:04 AM

#

final creek Hi! Have you figured this out? I have the same question

I ask around a bit since then, so basically you don't want to encode too much or else the features won't be as good in your ML models. You can use other imputing techniques and model that lets you encoding and change your features the least like so imputers like KNN or simple imputer needs the least preprocessing. Now with iterative imputer its a bit tricky since you want to maintain the features integrity so you can do stuff like normalize or scaling (StandardScaler or MinMaxScaler) and choose the correct encoding for each features.

TLDR; Don't over-preprocess features for an imputation model, it can affect integrity of the features when they are later use in a ML model

neon crystal Dec 8, 2024, 12:49 PM

#

Hey all, this might have already been discussed, but I completed this competition a bit ago and I was wondering if it would be possible to take the model that I have trained in this competition and host it on a private hugging face repo for the experience of hosting the model.

I'm a bit new to hugging face and was hoping I could gain some experience in hosting models there and figured a dataset with completely fictional dataset would be a good place to start. I'm sure that hugging face has their own introductory models to host, but figure having an agent hosted from start to finish using independent sources may be very beneficial.

eager sand Dec 24, 2024, 2:52 PM

#

Hello all ,
I am new here in Kaggle, and this is my first competition (and I am totally lost!)
Please, could someone tell me how you deal with missing values?
These are the number, should we just drop them ?
HomePlanet 201
CryoSleep 217
Cabin 199
Destination 182
Age 179
VIP 203
RoomService 181
FoodCourt 183
ShoppingMall 208
Spa 183
VRDeck 188
Name 200
Thanks a lot!
PS: If someone has a place in a team! I will be greatful.

astral violet Dec 31, 2024, 4:16 AM

#

eager sand Hello all , I am new here in Kaggle, and this is my first competition (and I am...

try to think about if there are any relationships between the features, here's an idea, maybe try to see if when 'Cryo'==True, how does all the expenditure features behavior and vice versa?

astral violet Dec 31, 2024, 4:19 AM

#

prisma ridge I ask around a bit since then, so basically you don't want to encode too much or...

agree, in fact, I passed the completely unprocessed data into a HistGBC, (no param tuning whatsoever), got 0.79833, so any score lower than this means the pre-processing are actually hurting the score

astral violet Dec 31, 2024, 4:20 AM

#

ocean arch Hello everyone, i am new to kaggle competition. I am getting started with spaces...

please search and understand terms like MCAR, NMAR, and MAR, NaNs could have meaning

astral violet Dec 31, 2024, 4:23 AM

#

winter lava I was able to achieve a 71% accuracy with minimal Feature engineering utilizing ...

I fed the completely unprocessed data into a HistGBC, (no param tuning whatsoever), got 0.79833, with like 8 lines of code...

astral violet Dec 31, 2024, 4:23 AM

#

fair linden is my model tripping ?

maybe this feature leaked?

eager sand Dec 31, 2024, 7:23 AM

#

astral violet try to think about if there are any relationships between the features, here's a...

thank you so much , i'll try it

astral violet Dec 31, 2024, 11:17 AM

#

eager sand thank you so much , i'll try it

let's study hard together!

sleek bone Feb 2, 2025, 10:54 PM

#

Hello everyone! I am glad that I could join you. I am doing a academic project about this competition. However, it was only after we had been working on our project for a whole semester that my teacher realised that I am working on a fictional dataset, which might not be of great value to the company because the linear or non-linear relationships between the variables didn't hold up, and there was no need to study it. (I am sorry but I didn't know that I need to choose a real dataset). My teacher asked me to justify the choice of studying this dataset, but I can't think of one. 😦 Could you please tell me why you choose to take this challenge ? And the advantage of working on it ? Or if you know the origin of this dataset, I am truly grateful if you could answer me! virtual_hug Thank you in advance!

inner socket Feb 3, 2025, 12:30 AM

#

sleek bone Hello everyone! I am glad that I could join you. I am doing a academic project a...

Hello @sleek bone, I participated time ago in this competition, I think is an excellent case for the application of imputations techniques.

sleek bone Feb 3, 2025, 10:41 PM

#

Thank you virtual_hug I think so too! I mainly explored the package MICE en langage R. However, I thought that some other methods use the relations between the variables, and there don't exist any such relation in a fictional dataset... Could I ask you which techniques you used for this work ? Thank you in advance!

jovial herald Mar 11, 2025, 6:57 AM

#

Hello everyone. I used a neural network model to join the competition. However, it seems that I am getting slightly different results when I run the code multiple times. May I ask how can I make neural network using tensorflow reproducible. Thank you!

I also posted a discussion in Kaggle where you can find the notebook that I used. Here is the link: https://www.kaggle.com/competitions/spaceship-titanic/discussion/567619

Spaceship Titanic

Predict which passengers are transported to an alternate dimension

humble pulsar Mar 17, 2025, 5:58 PM

#

Hey everyone. I recently completed the Spaceship Titanic competition on Kaggle, and this is my first project where I worked solo. Since I’m still studying this area, I would really appreciate any feedback or suggestions to improve my model. If you see anything I might have done wrong or areas that could use some adjustments, I’d love to hear your thoughts! Looking forward to learning more from the community. Thanks in advance!
My notebook: https://www.kaggle.com/code/laurasaraiva/spaceship-titanic

Spaceship Titanic

Explore and run machine learning code with Kaggle Notebooks | Using data from Spaceship Titanic

twilit rapids Mar 30, 2025, 11:54 AM

#

sleek bone Hello everyone! I am glad that I could join you. I am doing a academic project a...

Your teacher is short sighted

sleek bone Mar 30, 2025, 1:51 PM

#

twilit rapids Your teacher is short sighted

Could you please share with me why do you say that ? (Fortunately I have finished this projet with him now but I only achieved 0.803 as the score

twilit rapids Mar 31, 2025, 10:42 AM

#

It might not be a real world dataset, but there's so much value in it as a student playing around with dataset as newbie will help you build skills, there is no point in doing real world project if the students skills are still let's just say beginner, unless this academic project of yours is done on Masters Degree Level only then it would make sense

sleek bone Apr 1, 2025, 8:37 AM

#

Thanks! Actually, my project is done on master's degree level...

inner tulip Apr 1, 2025, 7:07 PM

#

🔥

twilit rapids Apr 3, 2025, 2:58 AM

#

sleek bone Thanks! Actually, my project is done on master's degree level...

Then he is right after all LMFAO

delicate robin Apr 5, 2025, 5:23 PM

#

Started this challenge today, I have only worked on famous datasets like MNIST to learn (without any real practise) and jumping into this was disaster lol
Though I have achieved 0.799 score

how do I improve it ? Tried ensemble learning, Included names (thinking there just might be some relation setup), tried GridSearch etc

#

what else can one explore here to improve the model

dire merlin Apr 6, 2025, 4:24 PM

#

Hello, I was wondering if anyone could tell me why everyone uses the random forest approach to resolving this competition?

carmine wedge Apr 19, 2025, 4:46 PM

#

dire merlin Hello, I was wondering if anyone could tell me why everyone uses the random fore...

Isn't XGBoost better than Random Forest?

carmine wedge Apr 20, 2025, 8:54 AM

#

delicate robin Started this challenge today, I have only worked on famous datasets like MNIST t...

You should try feature engineering; I didn't use ensemble learning or grid search and still scored almost the same.

coarse wave Apr 23, 2025, 10:18 PM

#

Hey everyone! New year. This is my first project on kaggle. Any tips?

glacial fiber May 5, 2025, 4:44 PM

#

Hi everyone. My team and I are trying this competition and we used CatBoost along with basic feature engineering to get a pretty good score, but we noticed that our score gets worse when we do data imputation instead of leaving NaN's. Initially we filled NaN's with the mode of the column, then we added more sophisticated imputation by using patterns in the data (||Everyone from deck A/B/C is from Europa, everyone from cabin G is from Earth, everyone under 13 or in cryosleep spends 0 dollars||) but somehow this imputation makes our results worse consistently, and even the imputation by modes makes our results worse. This is very hard for me to understand - since the NaN's in this competition really look like the creators just blanked out random cells, it's hard to imagine that any pattern of where the NaN's are has any bearing on anything, and when reading others discussing this competition they say their results improve with imputation. Does anyone have any idea why this would happen? This is our notebook:

https://www.kaggle.com/code/samrohrer/spaceship-titanic-bad-imputation

Spaceship Titanic Bad Imputation

Explore and run machine learning code with Kaggle Notebooks | Using data from Spaceship Titanic

grave helm Jun 26, 2025, 4:52 PM

#

glacial fiber Hi everyone. My team and I are trying this competition and we used CatBoost alo...

Sorry I'm new, so grain of salt.

But the NaN values are likely being handled in SOME way, even if not explicitly from your direct instructions. Either rows are being dropped, using mean, or 0.

Whatever is happening is doing better than the mode calculation it seems.

glacial fiber Jun 26, 2025, 4:55 PM

#

grave helm Sorry I'm new, so grain of salt. But the NaN values are likely being handled in...

it's been a while since we were having this issue, but we didn't ever figure it out. Catboost uses the NaN's to try to derive information, but in this particular case since it's synthetic data and I'm pretty sure they blindly blanked out the same random percentage of each feature, it seems weird for the existence of the NaN's to contain any information

#

but maybe they just leaned towards NaN'ing out people who vanished or something

pearl basin Jul 17, 2025, 3:00 PM

#

Hey Kagglers and ML Enthusiasts! 👋
I’ve just published a new notebook where I built a model to predict Titanic survival using machine learning techniques like Logistic Regression, Random Forest.
It includes data cleaning, EDA, model comparison, and feature importance — beginner-friendly and easy to follow!

📘 Check it out here:
👉 Titanic Survival Prediction using Machine Learning

If you find it helpful, learned something new, or just want to support the work — a quick upvote would mean a lot! 💙

Let’s grow and learn together

timber harbor Sep 13, 2025, 11:11 AM

#

Hi, created a notebook on space titanic : Basic DS Framework 80% accuracy using Random Forest and Boosting algorithms, check it out:
https://www.kaggle.com/code/salahuddinbayassi/beginner-ds-framework-for-80-accuracy

stone forge Oct 28, 2025, 4:10 PM

#

Hi @everyone
I created a wonderful Streamit app about the preaching of the Titanic survivors 🚢

You can check it out at: https://app-app-titanic-data-bdwwycbgdejsmtuv4ntkss.streamlit.app/

I'm really interested in your opinions.
Thank you.

tawny roost Nov 9, 2025, 7:25 AM

#

Hii

weak cave Nov 10, 2025, 7:27 AM

#

Hi

supple surge Nov 10, 2025, 3:04 PM

#

hi

tranquil sluice Nov 12, 2025, 8:29 AM

#

Hi

pastel depot Nov 15, 2025, 12:36 PM

#

Hi

rocky hazel Jan 5, 2026, 12:14 PM

#

Hi, I have a question about spaceship titanic. I first submitted a logistic model with simply filling NAs by zeros and no new features, resulting in around 0.79. However, as I used Gradient Boosting and added some observations like total spent > 0 and Cabin -> Deck, and filling NAs with some ideas, it dropped to about 0.74. I thought my ideas are valuable and will increase accuracy. Anyone with this experience?

rocky hazel Jan 5, 2026, 1:26 PM

#

Here is the code of 0.74
https://www.kaggle.com/code/sunghakheo/notebookce8bdbe95e

pine mauve Jan 15, 2026, 2:08 PM

#

rocky hazel Hi, I have a question about spaceship titanic. I first submitted a logistic mode...

You may have been lucky the first time. I obtained my best score last year while doing only some basic feature engineering. Now I've implemented many more but I cannot get near that score 🙂

keen tree Jan 23, 2026, 4:00 PM

#

hello all kagglers hope you all are great

untold yarrow Feb 14, 2026, 4:43 PM

#

would you guys appreciate a notebook baseline template which you can easily iterate on?

foggy tide Feb 15, 2026, 5:08 PM

#

untold yarrow would you guys appreciate a notebook baseline template which you can easily iter...

Go ahead mate

floral kite Feb 26, 2026, 12:30 PM

#

pearl basin Hey Kagglers and ML Enthusiasts! 👋 I’ve just published a new notebook where I b...

i am new kaggle , how can i access your notebook?

frozen cobalt May 11, 2026, 11:11 AM

#

Hello, new here