#child-mind-institute-detect-sleep-states | Kaggle | Page 1

jade juniper Sep 7, 2023, 11:39 AM

#

Hello everyone,
I'm finding it hard to understand the dataset. As the training dataset doesn't contain score and train series doesn't contain event. How are we suppose to build a model?

I'm kindoff confused here. Can someone please explain in steps?

dry pecan Sep 8, 2023, 4:50 AM

#

jade juniper Hello everyone, I'm finding it hard to understand the dataset. As the training ...

From the dataset description:

"The dataset comprises about 500 multi-day recordings of wrist-worn accelerometer data annotated with two event types: onset, the beginning of sleep, and wakeup, the end of sleep."

It looks like you'll need to do some of your own data processing to get labels in there. Everything else can be a feature.

jade juniper Sep 9, 2023, 9:27 PM

#

Thanks. What is the final/y variable we'll be predicting?

glass rune Sep 9, 2023, 11:26 PM

#

Hey, can anyone give an idea of what is the feature 'steps' in the dataset?

broken mural Sep 10, 2023, 2:44 AM

#

jade juniper Thanks. What is the final/y variable we'll be predicting?

The "event" column of the train_events.csv as I understand it. Some stitchwork is needed to link everything together between the parquet data and the csv files.

broken mural Sep 10, 2023, 2:47 AM

#

glass rune Hey, can anyone give an idea of what is the feature 'steps' in the dataset?

From what I'm seeing, it's a sequence number from beginning to end for a specific series_id.

For example, in the case of series_id 038441c925bb, it looks like each step equates to roughly a 5 second poll starting from 0.

It's a bit more obvious when you look at the submission response -- they only ask for the step value for a specific series; so it's used as a type of surrogate key tolink across the two files.

glass rune Sep 10, 2023, 5:57 PM

#

Thanks Rob, I just need to figure how to reduce the data size before applying any model

ripe needle Sep 11, 2023, 5:55 AM

#

Hii !! Can you guys help me get started with this competition !?

quasi briar Sep 11, 2023, 10:16 PM

#

For ppl who’ve had successful submissions, how long did it typically take for your submissions to be scored? Mine is taking awfully long and eventually runs out of memory

dusk ridge Sep 13, 2023, 6:25 PM

#

I have prepared the basic model but I am confused about the score metrics and evaluation (Event Detection AP).

lavish anchor Sep 13, 2023, 6:30 PM

#

hi just started working on this competition I have loaded the data, checked for cleaning, and now about to do feature extraction once i get my gpu set up to jupyter notebook, anyone want to work together for fun?

fathom zodiac Sep 15, 2023, 1:01 PM

#

quasi briar For ppl who’ve had successful submissions, how long did it typically take for yo...

During submission, mine takes ~1 hour with LightGBM (100 iters) & 18 features from memory-optimized lightweight training set. When only experimenting with Kaggle notebook without submitting, on the same setup it runs ~10mins. I guess the hidden test set is much larger than the training set one.

glass rune Sep 16, 2023, 5:02 PM

#

Hello to the Kaggle community,

I'm currently participating in the competition and have a question regarding the accelerometer data collection frequency. The competition overview and documentation do not provide specific details on the frequency of Z-axis data logging.

Do you think the Z-axis accelerometer data is recorded at regular intervals, every 5 seconds, or does it represent the total acceleration over a 5 second time window?

Thanks in advance for your help!

bronze oriole Sep 16, 2023, 5:59 PM

#

how does the scoring work? I'm quite confused

bronze oriole Sep 16, 2023, 6:00 PM

#

dusk ridge I have prepared the basic model but I am confused about the score metrics and ev...

have you gotten this working?

glass rune Sep 16, 2023, 7:42 PM

#

What?

bronze oriole Sep 17, 2023, 1:27 AM

#

the score metrics and evaluation

hollow karma Sep 17, 2023, 6:13 AM

#

Help us please !!!

#

@everyone

#

@glass rune have you figured it out?????

glass rune Sep 17, 2023, 12:53 PM

#

I haven't looked at it yet

fathom zodiac Sep 18, 2023, 12:03 AM

#

glass rune Hello to the Kaggle community, I'm currently participating in the competition a...

Hi Olivier, I think it recorded the data at 23:30:05 rather than summing or aggregating the data from 23:30:00 to 23:30:05

glass rune Sep 18, 2023, 12:49 PM

#

@fathom zodiac Thank you for your answer, I think more and more that this is the case.

sour vale Sep 18, 2023, 7:23 PM

#

Hi everyone, I am trying to get to understand the data and what we are dealing with. I do not have any technical expertise in the health related field so I do not understand the description of the data provided and how events are defined with all the talk about inactivity for more than 30 mins and one event per night …. Got a bit confused.
Plus I read that this can be considered as a time series problem which triggered a “what?!!!!! “ in my head as I was seeing this as a classification problem …. So any help regarding this please?

glass rune Sep 20, 2023, 5:07 PM

#

@sour vale No, I see this suite as a signal, not a classification.

jade juniper Sep 22, 2023, 10:12 PM

#

Hey everyone, I'm ran a RF classifier NB. The NB got submitted, but I see that submission.csv is empty. What am I missing here?

oblique fulcrum Sep 23, 2023, 6:26 AM

#

Are you able to submit
If you get a valid score then you are good to go

jade juniper Sep 23, 2023, 8:08 PM

#

No, I'm getting this error.

#

@oblique fulcrum So we're suppose to get something in submission right?

oblique fulcrum Sep 23, 2023, 8:25 PM

#

No this is because your notebook is consuming too much ram

#

You can look at my baseline work

#

I have taken care of this

jade juniper Sep 23, 2023, 8:32 PM

#

Please share the link here

#

But, why sample submission is empty? It doesn't make any sense to me.

halcyon sparrow Sep 28, 2023, 6:09 AM

#

sour vale Hi everyone, I am trying to get to understand the data and what we are dealing w...

This is a time series problem. Not a time series forecasting problem but a time series classification problem. We are basically supposed to detect when. person goes to sleep and when the person wakes up.

proven sinew Oct 1, 2023, 11:20 AM

#

jade juniper But, why sample submission is empty? It doesn't make any sense to me.

The test file that is given to us is empty...that's correct

But after that your code will be run on a hidden test set which is not empty and predictions will be made on the hidden test set. Your code used too much memory when ran on their test set

#

There are tips on discussions for ways to reduce memory usage...this is a very common problem in this comp

strange cove Oct 1, 2023, 7:23 PM

#

Just wondering, is there an efficient way of loading and dealing with Parquet files?

cerulean galleon Oct 2, 2023, 8:16 AM

#

I used pandas to process it as a dataframe

#

the files are pretty big though, I don't know if anyone has a way to deal with massive files?

jagged meteor Oct 2, 2023, 9:19 PM

#

Polars is a pret good alternative

proven sinew Oct 3, 2023, 8:04 AM

#

I think polars (lazy evaluation) may be necessary for this comp especially if you're not using Carl McBride's condensed dataset. Pandas is too slow to do all the data manipulation

Also you can cast the values to the smallest datatype (e.g. int16 instead of int64) possible to save memory

#

basically - lazy evaluation doesn't load the dataframe into memory. The dataframe is a huge memory bottleneck

olive wyvern Oct 4, 2023, 12:42 PM

#

Anyone willing to make team for cmi competition please dm

humble sluice Oct 5, 2023, 11:50 AM

#

olive wyvern Anyone willing to make team for cmi competition please dm

I can be your teammate

remote pewter Oct 6, 2023, 9:51 PM

#

olive wyvern Anyone willing to make team for cmi competition please dm

Iam in too.. Iam new to kaggle.. trying to understand how it all works.. I would love to team up and start off

viral timber Oct 9, 2023, 2:15 PM

#

can anyone explain how to get model weights..

fleet reef Oct 12, 2023, 4:16 PM

#

Webinars now:

https://us02web.zoom.us/j/85016786716?pwd=Q3hTYiswaWRlYmU4eCtBOFhlcHZTdz09

Zoom Video

Join our Cloud HD Video Meeting

Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom Rooms is the original software-based conference room solution used around the world in board, conference, huddle, and training rooms, as well as ex...

oblique fulcrum Oct 13, 2023, 10:53 AM

#

strange cove Just wondering, is there an efficient way of loading and dealing with Parquet fi...

Use Polars

olive wyvern Oct 15, 2023, 10:04 AM

#

Would it work in 8 GB ram

oblique fulcrum Oct 15, 2023, 11:10 AM

#

Kaggle kernels offer more than 8gb RAM

#

try them

#

You can prefer colab also

edgy ferry Oct 15, 2023, 7:43 PM

#

hey guys i’m using dask .compute for loading the files but I’m still getting out of memory errors when i submit

#

should I just ditch dask and use polars?

oblique fulcrum Oct 15, 2023, 7:56 PM

#

Polars lazy execution may work but you will need to be very careful with your code

#

I will recommend you to train elsewhere and infer on the kaggle submission kernel for best results

edgy ferry Oct 15, 2023, 7:58 PM

#

oblique fulcrum I will recommend you to train elsewhere and infer on the kaggle submission kerne...

what do you mean by this

#

also how does it know which file is my submission file? Do I need to name it something specific?

oblique fulcrum Oct 15, 2023, 8:11 PM

#

Please read the competition code instructions and you will be clear

#

You may download the training data and train your model on local pc
Save your model and infer in your kaggle kernel and submit

edgy ferry Oct 15, 2023, 8:12 PM

#

oblique fulcrum You may download the training data and train your model on local pc Save your m...

hey that’s really smart

#

I didn’t even think of that

signal saddle Oct 15, 2023, 10:35 PM

#

edgy ferry also how does it know which file is my submission file? Do I need to name it som...

Yes, name your submission "submission.csv"

lilac brook Oct 16, 2023, 3:34 PM

#

Polars definitely helped in processing the training set. Thanks to all the folks who put their starter notebooks on that

edgy ferry Oct 16, 2023, 5:04 PM

#

Hey how on earth do I predict on the test set if I don’t have the labels? Do I use the series id to predict?

edgy ferry Oct 16, 2023, 10:55 PM

#

proven sinew The test file that is given to us is empty...that's correct But after that you...

thank so you much

plain sorrel Oct 17, 2023, 10:26 AM

#

Hey all, this will be my first real attempt at a kaggle competition!
Althrough i'm already 1 month late i wish you all good luck ❤️

paper ivy Oct 17, 2023, 6:34 PM

#

Hello everyone! This is my first code competition, and how much time do your notebooks usually take to submit? My notebook is submitting within an hour, although I just used a simple RandomForest model without any data optimization and it took 10 minutes to save version. Should I optimize the data more deeply?

shrewd ruin Oct 18, 2023, 10:25 AM

#

paper ivy Hello everyone! This is my first code competition, and how much time do your not...

My notebook is taking longer than an hour for submission, I also used a Random Forest classifier, some optimization using Optuna framework, I have a write-up on this: https://bayoadejare.medium.com/91ad8af99c24?sk=cfe58e4cdfd5cbb59401bd3ebcf05500 I'm getting some submission errors, and for further optimization getting some out-of-memory error, so not sure if it is as useful for this competition.

Medium

H yper-Parameter Tuning Workflow with Sci-kit Learn and Optuna

Kaggle — Child Mind Institute Detect Sleep States Dataset.

paper ivy Oct 18, 2023, 3:56 PM

#

shrewd ruin My notebook is taking longer than an hour for submission, I also used a Random F...

Thank you for your answer! It's really helpful!

paper ivy Oct 19, 2023, 4:59 AM

#

paper ivy Hello everyone! This is my first code competition, and how much time do your not...

Ok, I solved it. The problem was I was downloading the entire test set at once, and it seems to be too huge to download, since it takes about 9 hrs and then the notebook interrupts. I solved it downloading the test set only partially each time, and then concatenating them

olive wyvern Oct 23, 2023, 8:03 AM

#

Goal is not to win but to learn
Looking for collaboration

https://www.kaggle.com/code/sb0702/model-tensorflow-deep-learning-model

Model 💡: TensorFlow Deep Learning Model

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

mild plume Nov 14, 2023, 4:39 PM

#

ah im back kaggle : ) just starting working on the sleep states problem... turned the problem into a state prediction problem instead because if you just keep the labels as events its ugly as the events are really sparse .. idk what other approaches are heck ivent even submitted a solution, anyone who's in the same box or better and want to learn... im training an LSTM rn , my pipeline for this competition is not complete (evaluation and inference[submission] are left) nonetheless looking to learn... specially i want to get better at recurrent nets and maybe time series transformers too but we saw way too much of that in the llm science exam lol

glass rune Nov 15, 2023, 3:46 PM

#

mild plume ah im back kaggle : ) just starting working on the sleep states problem... turne...

I kinda am in the same boat. Its just data is too sparse to predict something tangible. A simple random forest is enough to just based on the data size for the events

mild plume Nov 17, 2023, 1:43 PM

#

glass rune I kinda am in the same boat. Its just data is too sparse to predict something ta...

ahan

jade juniper Nov 24, 2023, 5:25 AM

#

Hello everyone, after using reduced memory dataset I'm trying to merge parquet and csv file but getting memory issue. Also, using merge I'm getting 9500 rows which I believe is incorrect. Can someone help me out, what is the final dataset used for ML?

Also, how can I install CuML succesfully?

mild plume Nov 28, 2023, 2:37 PM

#

wanted to share some results i guess.. really hectic month but now i have some time, with 8 days left i made my first submission, really really bad results... just gotta make the validation functions and make the models better now, most of the software infrastructure is done.

wheat python Nov 29, 2023, 1:26 AM

#

how big is the test dataset for calculating leader board score? and also how big is the final real test dataset?

crisp cloud Nov 30, 2023, 3:12 PM

#

Hello, I'm currently finishing my first version and I'm trying to make a valid submission but the submission crashed and I'm trying to debug it.
The provided test_series.parquet is bothering me here, there is only 3 series and they only have a few data point each (and their very small size is very likely messing with my algo).

Should I consider that the real test events will look like the training ones, or should I expect them to look like the provided ones?

My crashing submission giving me very small info on the cause of the crash, I'm a bit desperate...

crisp cloud Nov 30, 2023, 3:14 PM

#

wheat python how big is the test dataset for calculating leader board score? and also how big...

Found in the "Data" tab: Note that this is a Code Competition, in which the actual test set is hidden. In this public version, we give some sample data in the correct format to help you author your solutions. The full test set contains about 200 series.
Found in the "Leaderboard" tab: This leaderboard is calculated with approximately 25% of the test data. The final results will be based on the other 75%, so the final standings may be different.

wheat python Nov 30, 2023, 3:56 PM

#

crisp cloud * Found in the "Data" tab: Note that this is a Code Competition, in which the ac...

Thanks a lot!

misty thicket Dec 3, 2023, 6:43 AM

#

crisp cloud Hello, I'm currently finishing my first version and I'm trying to make a valid s...

Maybe it's because you generated too many predictions, causing an exception to be thrown, because I once encountered this problem, but when I tried to reduce the generated predictions, this error would not be reported. You can try to reduce the predictions. number

crisp cloud Dec 5, 2023, 3:16 PM

#

misty thicket Maybe it's because you generated too many predictions, causing an exception to b...

Ok, I'll try thx!

jade juniper Dec 12, 2023, 1:35 PM

#

Is there any NB (code) where validation accuracy is calculated?