#child-mind-institute-detect-sleep-states

1 messages · Page 1 of 1 (latest)

jade juniper
#

Hello everyone,
I'm finding it hard to understand the dataset. As the training dataset doesn't contain score and train series doesn't contain event. How are we suppose to build a model?

I'm kindoff confused here. Can someone please explain in steps?

dry pecan
jade juniper
#

Thanks. What is the final/y variable we'll be predicting?

glass rune
#

Hey, can anyone give an idea of what is the feature 'steps' in the dataset?

broken mural
broken mural
# glass rune Hey, can anyone give an idea of what is the feature 'steps' in the dataset?

From what I'm seeing, it's a sequence number from beginning to end for a specific series_id.

For example, in the case of series_id 038441c925bb, it looks like each step equates to roughly a 5 second poll starting from 0.

It's a bit more obvious when you look at the submission response -- they only ask for the step value for a specific series; so it's used as a type of surrogate key tolink across the two files.

glass rune
#

Thanks Rob, I just need to figure how to reduce the data size before applying any model

ripe needle
#

Hii !! Can you guys help me get started with this competition !?

quasi briar
#

For ppl who’ve had successful submissions, how long did it typically take for your submissions to be scored? Mine is taking awfully long and eventually runs out of memory

dusk ridge
#

I have prepared the basic model but I am confused about the score metrics and evaluation (Event Detection AP).

lavish anchor
#

hi just started working on this competition I have loaded the data, checked for cleaning, and now about to do feature extraction once i get my gpu set up to jupyter notebook, anyone want to work together for fun?

fathom zodiac
glass rune
#

Hello to the Kaggle community,

I'm currently participating in the competition and have a question regarding the accelerometer data collection frequency. The competition overview and documentation do not provide specific details on the frequency of Z-axis data logging.

Do you think the Z-axis accelerometer data is recorded at regular intervals, every 5 seconds, or does it represent the total acceleration over a 5 second time window?

Thanks in advance for your help!

bronze oriole
#

how does the scoring work? I'm quite confused

bronze oriole
glass rune
#

What?

bronze oriole
#

the score metrics and evaluation

hollow karma
#

Help us please !!!

#

@everyone

#

@glass rune have you figured it out?????

glass rune
#

I haven't looked at it yet

fathom zodiac
glass rune
#

@fathom zodiac Thank you for your answer, I think more and more that this is the case.

sour vale
#

Hi everyone, I am trying to get to understand the data and what we are dealing with. I do not have any technical expertise in the health related field so I do not understand the description of the data provided and how events are defined with all the talk about inactivity for more than 30 mins and one event per night …. Got a bit confused.
Plus I read that this can be considered as a time series problem which triggered a “what?!!!!! “ in my head as I was seeing this as a classification problem …. So any help regarding this please?

glass rune
#

@sour vale No, I see this suite as a signal, not a classification.

jade juniper
#

Hey everyone, I'm ran a RF classifier NB. The NB got submitted, but I see that submission.csv is empty. What am I missing here?

oblique fulcrum
#

Are you able to submit
If you get a valid score then you are good to go

jade juniper
#

No, I'm getting this error.

#

@oblique fulcrum So we're suppose to get something in submission right?

oblique fulcrum
#

No this is because your notebook is consuming too much ram

#

You can look at my baseline work

#

I have taken care of this

jade juniper
#

Please share the link here

#

But, why sample submission is empty? It doesn't make any sense to me.

halcyon sparrow
proven sinew
#

There are tips on discussions for ways to reduce memory usage...this is a very common problem in this comp

strange cove
#

Just wondering, is there an efficient way of loading and dealing with Parquet files?

cerulean galleon
#

I used pandas to process it as a dataframe

#

the files are pretty big though, I don't know if anyone has a way to deal with massive files?

jagged meteor
#

Polars is a pret good alternative

proven sinew
#

I think polars (lazy evaluation) may be necessary for this comp especially if you're not using Carl McBride's condensed dataset. Pandas is too slow to do all the data manipulation

Also you can cast the values to the smallest datatype (e.g. int16 instead of int64) possible to save memory

#

basically - lazy evaluation doesn't load the dataframe into memory. The dataframe is a huge memory bottleneck

olive wyvern
#

Anyone willing to make team for cmi competition please dm

humble sluice
remote pewter
viral timber
#

can anyone explain how to get model weights..

fleet reef
olive wyvern
#

Would it work in 8 GB ram

oblique fulcrum
#

Kaggle kernels offer more than 8gb RAM

#

try them

#

You can prefer colab also

edgy ferry
#

hey guys i’m using dask .compute for loading the files but I’m still getting out of memory errors when i submit

#

should I just ditch dask and use polars?

oblique fulcrum
#

Polars lazy execution may work but you will need to be very careful with your code

#

I will recommend you to train elsewhere and infer on the kaggle submission kernel for best results

edgy ferry
#

also how does it know which file is my submission file? Do I need to name it something specific?

oblique fulcrum
#

Please read the competition code instructions and you will be clear

#

You may download the training data and train your model on local pc
Save your model and infer in your kaggle kernel and submit

edgy ferry
#

I didn’t even think of that

signal saddle
lilac brook
#

Polars definitely helped in processing the training set. Thanks to all the folks who put their starter notebooks on that

edgy ferry
#

Hey how on earth do I predict on the test set if I don’t have the labels? Do I use the series id to predict?

plain sorrel
#

Hey all, this will be my first real attempt at a kaggle competition!
Althrough i'm already 1 month late i wish you all good luck ❤️

paper ivy
#

Hello everyone! This is my first code competition, and how much time do your notebooks usually take to submit? My notebook is submitting within an hour, although I just used a simple RandomForest model without any data optimization and it took 10 minutes to save version. Should I optimize the data more deeply?

shrewd ruin
# paper ivy Hello everyone! This is my first code competition, and how much time do your not...

My notebook is taking longer than an hour for submission, I also used a Random Forest classifier, some optimization using Optuna framework, I have a write-up on this: https://bayoadejare.medium.com/91ad8af99c24?sk=cfe58e4cdfd5cbb59401bd3ebcf05500 I'm getting some submission errors, and for further optimization getting some out-of-memory error, so not sure if it is as useful for this competition.

Medium

Kaggle — Child Mind Institute Detect Sleep States Dataset.

paper ivy
paper ivy
olive wyvern
mild plume
#

ah im back kaggle : ) just starting working on the sleep states problem... turned the problem into a state prediction problem instead because if you just keep the labels as events its ugly as the events are really sparse .. idk what other approaches are heck ivent even submitted a solution, anyone who's in the same box or better and want to learn... im training an LSTM rn , my pipeline for this competition is not complete (evaluation and inference[submission] are left) nonetheless looking to learn... specially i want to get better at recurrent nets and maybe time series transformers too but we saw way too much of that in the llm science exam lol

glass rune
jade juniper
#

Hello everyone, after using reduced memory dataset I'm trying to merge parquet and csv file but getting memory issue. Also, using merge I'm getting 9500 rows which I believe is incorrect. Can someone help me out, what is the final dataset used for ML?

Also, how can I install CuML succesfully?

mild plume
#

wanted to share some results i guess.. really hectic month but now i have some time, with 8 days left i made my first submission, really really bad results... just gotta make the validation functions and make the models better now, most of the software infrastructure is done.

wheat python
#

how big is the test dataset for calculating leader board score? and also how big is the final real test dataset?

crisp cloud
#

Hello, I'm currently finishing my first version and I'm trying to make a valid submission but the submission crashed and I'm trying to debug it.
The provided test_series.parquet is bothering me here, there is only 3 series and they only have a few data point each (and their very small size is very likely messing with my algo).

Should I consider that the real test events will look like the training ones, or should I expect them to look like the provided ones?

My crashing submission giving me very small info on the cause of the crash, I'm a bit desperate...

crisp cloud
# wheat python how big is the test dataset for calculating leader board score? and also how big...
  • Found in the "Data" tab: Note that this is a Code Competition, in which the actual test set is hidden. In this public version, we give some sample data in the correct format to help you author your solutions. The full test set contains about 200 series.
  • Found in the "Leaderboard" tab: This leaderboard is calculated with approximately 25% of the test data. The final results will be based on the other 75%, so the final standings may be different.
misty thicket
jade juniper
#

Is there any NB (code) where validation accuracy is calculated?