#child-mind-institute-problematic-internet-use | Kaggle | Page 1

high sonnet Sep 21, 2024, 12:49 AM

#

Shoutout to Tubotubo for an awesome starters notebook. https://www.kaggle.com/competitions/child-mind-institute-problematic-internet-use/discussion/535121

Child Mind Institute — Problematic Internet Use

Relating Physical Activity to Problematic Internet Use

edgy apex Sep 25, 2024, 1:53 PM

#

Im interested to join a team!

short apex Sep 26, 2024, 2:31 PM

#

edgy apex Im interested to join a team!

Me too!

olive jay Sep 26, 2024, 5:55 PM

#

looking for a team.

astral olive Sep 27, 2024, 5:17 PM

#

Looking for team, Someone interested please DM

glass stream Sep 28, 2024, 8:03 AM

#

Can't we participate solo in the competition?

true phoenix Sep 28, 2024, 10:42 AM

#

glass stream Can't we participate solo in the competition?

As far as I know, you can

#

Did you read something that suggests the contrary?

glass stream Sep 28, 2024, 3:35 PM

#

true phoenix Did you read something that suggests the contrary?

Just wanted to confirm

#

I'm actually a bit confused about getting started with the project tbh

true phoenix Sep 28, 2024, 3:50 PM

#

glass stream I'm actually a bit confused about getting started with the project tbh

ah yeah I can understand that

#

I'm yet to try this kaggle comp, but in general a good way to start is to read the description ofc, but also the main discussion posts & possibly check out some of the notebooks on what other people are doing.

glass stream Sep 28, 2024, 3:52 PM

#

true phoenix I'm yet to try this kaggle comp, but in general a good way to start is to read t...

But checking what others are doing, isn't that cheating?

true phoenix Sep 28, 2024, 4:03 PM

#

glass stream But checking what others are doing, isn't that cheating?

not really

#

seeing how people load the data and create a submission is perfectly fine

#

I didn't mean exactly copying their code :P

#

but especially if you're new to kaggle, that can help with getting started

glass stream Sep 28, 2024, 4:34 PM

#

Ahh I see thanks

fierce edge Oct 1, 2024, 3:44 PM

#

Is there any null value in the data provided in the competition's .parquet file?

edgy apex Oct 1, 2024, 6:18 PM

#

fierce edge Is there any null value in the data provided in the competition's .parquet file?

No, I haven't seen any!

#

Does anyone know why is there 3960 ids in the train.csv but only 996 directories with ids to work on? Are the other id important in any way?

fierce edge Oct 2, 2024, 2:27 AM

#

edgy apex Does anyone know why is there 3960 ids in the train.csv but only 996 directories...

same question

fierce edge Oct 2, 2024, 2:28 AM

#

edgy apex No, I haven't seen any!

train.csv contains too many missing values.

edgy apex Oct 2, 2024, 4:08 AM

#

fierce edge train.csv contains too many missing values.

I'm trying to figure out why is that...
I mean there must be some reason for that

fierce edge Oct 2, 2024, 4:09 AM

#

edgy apex I'm trying to figure out why is that... I mean there must be some reason for tha...

ok

twilit bison Oct 2, 2024, 10:52 PM

#

edgy apex I'm trying to figure out why is that... I mean there must be some reason for tha...

You could try a semi-supervised learning method to help with the unlabelled data

tall notch Oct 3, 2024, 4:16 AM

#

twilit bison You could try a semi-supervised learning method to help with the unlabelled data

which method have you tried

twilit bison Oct 4, 2024, 4:24 PM

#

tall notch which method have you tried

I haven't yet, been too focused on gradient boosting models

tall notch Oct 4, 2024, 4:26 PM

#

twilit bison I haven't yet, been too focused on gradient boosting models

Are you planning to work on ser?

rapid lake Oct 5, 2024, 6:18 AM

#

Hi, I am a beginner in the field of DS and looking for a partner for Child-mind-institute-problematic-internet-use Kaggle competition and also a partners from whom I can learn.

slow nebula Oct 5, 2024, 9:50 AM

#

Hi there! Just a quick question: how do we interpret weekday in the parquet data? Does it start from Monday or Sunday?

slow nebula Oct 5, 2024, 9:52 AM

#

edgy apex Does anyone know why is there 3960 ids in the train.csv but only 996 directories...

The data description pages says that only a proportion of the participants are asked to wear a wearable accelerometer for (allegedly) up to 30 days. Only those participants will have corresponding parquet data

edgy apex Oct 5, 2024, 9:53 AM

#

slow nebula The data description pages says that only a proportion of the participants are a...

Thank you so much!

slow nebula Oct 5, 2024, 9:54 AM

#

I highly recommend always having the data dictionary and the data description page open. You need constantly look-up to make sense of all the fields

#

Although, spoiler alert, some descriptions are very inaccurate. Be critical and don't just take the description at face value

#

I suppose this is part of the competition, that's why they haven't corrected them

#

I would, however, really appreciate it if we are given some more information on the data columns. For example, the actual model of the ActiGraph used, the protocol for fitness endurance test, etc. I spent so many hours just to make sense of the data because of the scarcity of documentation from the competition

#

It's a wonderful learning experience, but extremely frustrating at times

#

I'm not sure, for example, whether the ActiGraph devices already implemented some form of vehicle detection algorithm or not.

#

And I have concerns about the implementation of GGIR package in wristpy, specifically for the dataset of this competition

#

The lack of true raw data really limits what we can do and what we can know

#

Fortunately wristpy does not do any further filtering, so the major information loss is the aggregation (and potentially arising from auto-calibration)

#

We also don't know the actual protocol of the accelerometer experiment

#

Like whether the device is on dominant or non-dominant hand

#

(By the way, if you need any help, I am happy to contribute to the wristpy repo. I have gone over the entire GGIR documentation and many relevant literature)

slow nebula Oct 5, 2024, 2:43 PM

#

slow nebula Hi there! Just a quick question: how do we interpret `weekday` in the parquet da...

Oh sorry I'm stupid. It was written on the data description page. I'm used to digging things up for this project so I was overcomplicating it lol

trail geyser Oct 7, 2024, 4:58 AM

#

Hi! I am Aakash, having experience in Web Development and now learning AI/ML.

This competition is the second one I've joined, and got a confusion by seeing there are ~21 columns missing in the test set, and what's the scene with the parquet files?

Through the map provided, it seems like those 21-22 columns that are missing in the test df are very crucial in predicting the target variable.

what is your approach in proceeding in this challenge? By looking at some notebooks I found out that someone has put those missing columns as target columns. It seems completely a new concept to me.

Looking forward to hear from you 🙂

toxic bane Oct 7, 2024, 4:42 PM

#

Hello everyone! Seems like my local CV score and public LB are not very correlated. Has anyone found good local settings which give at least some correlation between local CV and public LB?

sullen rain Oct 8, 2024, 3:26 AM

#

I saw 82 columns in the train dataset which we use to develop our model and was confused by that. I would love to work with someone/team on this competition. Please kindly DM me thank you

oak lynx Oct 9, 2024, 4:14 PM

#

Hello Folks!
Im Dharmik and im a data scientist from India.
Ive been working on this competition and basically managed to get the rough skeleton of the approach i had in my mind ready.
The major question I had in my mind was around the model performance. Has anyone able to get a decent performance without any fancy feature engineering? I want to brainstorm ideas on how to further improve the performance on two major fronts:-

Data -> includes imputation, feat engg. and any transformations in general
Model -> this would majorly include hyperparameter tuning or later exploring if any other algorithms provide better performance.

Happy to get in touch with anyone here and do a problem solving session!!

frozen spindle Oct 9, 2024, 8:02 PM

#

rapid lake Hi, I am a beginner in the field of DS and looking for a partner for Child-mind-...

What's your experience with coding? I am looking for good coders to team up with.

A PhD AI student this side

split shore Oct 10, 2024, 7:02 AM

#

who wants to team Up ?

twilit bison Oct 10, 2024, 10:52 PM

#

trail geyser Hi! I am Aakash, having experience in Web Development and now learning AI/ML. ...

Those 22 columns are not provided as they are related to the target as questionaire scores

olive bramble Oct 11, 2024, 5:08 AM

#

Did any one attended this webinar ? if yes, can you share some insights please.

#

Also guide me how to access recording of the webinar

blazing thicket Oct 11, 2024, 2:37 PM

#

oak lynx Hello Folks! Im Dharmik and im a data scientist from India. Ive been working on ...

yes I used imputation, the data seemed highly imbalanced

#

I spent a bit doing EDA and PCA yesterday, ping me if you get stuck

twilit bison Oct 11, 2024, 6:56 PM

#

Hi all, I wanted to ask how people are dealing with the outliers in the dataset and if anyone has used any transformations at all to improve performance?

gaunt smelt Oct 15, 2024, 7:00 PM

#

I don't know how to read this big dataset since it takes too much time in local machine and o kaggle notebook it goes out of memory to read ths muc parquet files any idea how to tackle this link to the competition has been shared https://www.kaggle.com/competitions/child-mind-institute-problematic-internet-use/overview
I am trying to combine all these paquet files please guide

Child Mind Institute — Problematic Internet Use

Relating Physical Activity to Problematic Internet Use

worn grove Oct 16, 2024, 10:00 AM

#

gaunt smelt I don't know how to read this big dataset since it takes too much time in local ...

you can always refer to other notebooks and see how they deal with the dataset

pliant dock Oct 16, 2024, 11:56 AM

#

worn grove you can always refer to other notebooks and see how they deal with the dataset

are all submitted notebooks public?

worn grove Oct 18, 2024, 8:23 AM

#

pliant dock are all submitted notebooks public?

no, the notebooks are open to public only if you turn on in the settings of the notebook

pliant dock Oct 18, 2024, 7:50 PM

#

Aii thanks

hexed mica Oct 19, 2024, 2:42 PM

#

olive bramble Did any one attended this webinar ? if yes, can you share some insights please.

they talked about outliers, metrics, CV score. It's worth a watch https://vimeo.com/1020745085

Vimeo

Dell Technologies Video

Powering AI With Precision NVIDIA - Episode 2 - AI for Good with Ch...

This is "Powering AI With Precision NVIDIA - Episode 2 - AI for Good with Child Mind Institute, Dell Technologies, and NVIDIA" by Dell Technologies Video…

▶ Play video

olive bramble Oct 20, 2024, 1:14 AM

#

Thank you very much dear @hexed mica for providing the source.

hexed mica Oct 20, 2024, 6:08 PM

#

slow nebula Hi there! Just a quick question: how do we interpret `weekday` in the parquet da...

it's explained in the data tab -> weekday - The day of the week, coded as an integer with 1 being Monday and 7 being Sunday.

austere field Oct 23, 2024, 2:21 AM

#

I'm having an issue when trying to submit to the competition.

I get the error: Submission Scoring Error
Your notebook generated a submission file with incorrect format. Some examples causing this are: wrong number of rows or columns, empty values, an incorrect data type for a value, or invalid submission values from what is expected.

I have searched the discussion to see if anyone else was having the issue: https://www.kaggle.com/competitions/child-mind-institute-problematic-internet-use/discussion/541102#3025616 but there seems to be no obvious solution, at least to me 🙂

I assume what they are hinting at in the post is that there are entries which are in both test and train, however, having tried to remove duplicates from either as well as both and retried to submit, I get the same error.

My submission looks fine, to me regarding the test.csv:
id sii
00008ff9 0
000fd460 0
105258 0
00115b9f 0
0016bb22 0
001f3379 1
0038ba98 0
0068a485 0
0069fbed 0
0083e397 2
0087dd65 0
00abe655 0
00ae59c9 1
00af6387 0
00bd4359 0
00c0cd71 0
00d56d4b 0
00d9913d 0
00e6167c 0
00ebc35d 0

Anyone available for a little support?

Child Mind Institute — Problematic Internet Use

Relating Physical Activity to Problematic Internet Use

merry nacelle Oct 24, 2024, 7:06 AM

#

austere field I'm having an issue when trying to submit to the competition. I get the error: ...

Is your file really in CSV (comma-separeted) format? It looks like it is in space separated as you have shown here.

gray wave Oct 24, 2024, 9:09 AM

#

Hi.. I am new to kaggle. I am trying to submit my notebook but the submit button shows disabled. Why is it so?

#

rn_image_picker_lib_temp_a8631d52-81a3-4731-8a57-5123f873d425.jpg

austere field Oct 24, 2024, 6:40 PM

#

merry nacelle Is your file really in CSV (comma-separeted) format? It looks like it is in spac...

Yes, it should be - same as if I download the test.csv and open it. Always good to check though 😄

📎 submission.csv 📎 inbox_16554585_7e5ff68ef4475a51b5d326b9cb582cef_submission.xls

split shore Oct 31, 2024, 5:57 AM

#

any team up ?

bronze swift Oct 31, 2024, 6:24 PM

#

Hello everyone , how are you planning to use part-0.parquets

bronze swift Oct 31, 2024, 6:25 PM

#

gaunt smelt I don't know how to read this big dataset since it takes too much time in local ...

There is notebook that shows how to read these big data files. That uses parallel computing.

brave field Nov 1, 2024, 1:00 PM

#

gaunt smelt I don't know how to read this big dataset since it takes too much time in local ...

use polars to read the parquet , it supports hive based reading, have a look at the docs

balmy magnet Nov 5, 2024, 7:13 AM

#

Hi guys, just a small doubt. How many actual columns do the test set has. I am trying to reduce the dimensionality of the dataset and it throws an error stating the number of columns might be actually more.

blazing thicket Nov 5, 2024, 3:13 PM

#

bronze swift There is notebook that shows how to read these big data files. That uses paralle...

I am using cuda df, lmk I can provide code

#

for EDA

twilit bison Nov 5, 2024, 3:44 PM

#

balmy magnet Hi guys, just a small doubt. How many actual columns do the test set has. I am t...

Could it be that one hot encoded columns has a few values that aren’t present in the train set etc

polar glacier Nov 7, 2024, 10:33 PM

#

Im interested to join a team!

bronze swift Nov 13, 2024, 3:59 PM

#

are you solving it as a regression problem or classification problem?

sly temple Nov 14, 2024, 5:21 AM

#

I've been working on it as a regression problem - seems more flexible; Polars as well LazyFrame streaming to stay w/in the 30GB RAM limit on kaggle notebooks

pale tiger Nov 17, 2024, 7:38 PM

#

Can we participate in this competition without a team ?

mossy arrow Nov 18, 2024, 6:26 AM

#

pale tiger Can we participate in this competition without a team ?

Yes, we can.

thorn plaza Dec 4, 2024, 7:46 PM

#

Hi everyone,

I hope you're doing well. I recently completed a data science course on Udemy and am eager to enhance my skills further. As I am new to the field, could anyone kindly suggest ways to practice and work on projects that would help me improve?

I would truly appreciate any guidance, tips, or resources you could share to help me grow in this journey.

short apex Dec 5, 2024, 1:54 AM

#

Can we use LLM API in CIU competition?

final sparrow Dec 6, 2024, 11:47 AM

#

short apex Can we use LLM API in CIU competition?

No you can't connect to the internet

near kelp Dec 10, 2024, 6:56 PM

#

thorn plaza Hi everyone, I hope you're doing well. I recently completed a data science cour...

Ressources? : 1. Copiously illustrated, will help you visualize just what algorithms are doing

#

You name it, he's thought about it ; strategies for every type of situation explored ; a toolbox for reference

hard igloo Dec 16, 2024, 8:38 PM

#

Hey all, I was recently working on my submission for this competition but I unfortunately seem to be running into an error when trying to submit recently. I noticed that other people are still submitting their predictions but had thought that because we were past the one deadline, it wasn't letting me submit anymore. I have gone through my code and run the notebook successfully, but everytime I go to submit an entry, it says the notebook gave an exception. I looked through the logs and the logs state that it ran successfully. I wonder if someone has seen this before, or am I missing something in that I can't submit any more entries? I was able to submit entries before and have submitted quite a few, so not sure what is going on. Any help would be greatly appreciated!

merry nacelle Dec 17, 2024, 4:14 PM

#

If i derive a dataset by preprocessing the existing competition dataset, will i be allowed to directly use that derived dataset on my submission notebook? Or, it needs to including the preprocessing code too in the submission notebook?

bold hound Dec 19, 2024, 5:24 AM

#

What do we mean by overfitting the leaderboard?

#

Newbie here

patent panther Dec 19, 2024, 2:46 PM

#

Hello @bold hound ,

Overfitting the leaderboard means that a notebook has learned patterns specific to the public portion of the data but may not generalize well to the private portion. Public leaderboard scores are often computed using a small subset of the validation data (for example, 20%), while the final scores are calculated on the larger, untested hold-out dataset (the remaining 80%).

As a result, some public notebooks will seem very accurate by blending multiple models to fit the public validation set very well. However, they can often end up learning noise rather than real patterns, causing them to perform poorly to very poorly when evaluated on the private dataset.

This is referred to as a "shake-up" on Kaggle. It varies depending on the dataset and the metric used. Sometimes, blending public notebooks leads to good results, but at other times, it causes severe overfitting and poor performance on the private leaderboard.

The recommended approach is to choose a local cross-validation strategy to test your local models and choose your best models based on this.
To determine if a shake-up will occur, people compare their local metrics to their public LB scores. When it does correlate well, a shake-up is less likely. Here is seems a lot of people have noticed a strong difference so a shake-up seems likely.

Hope that helps

dark condor Dec 19, 2024, 6:35 PM

#

patent panther Hello <@1305457468041859116> , Overfitting the leaderboard means that a noteboo...

Hi

pliant dock Dec 19, 2024, 8:20 PM

#

its gonna be a shake up fest tonight😂😂

#

How bad will the shake up be lol

pliant dock Dec 20, 2024, 12:25 AM

#

😂😂😂

upbeat mesa Dec 20, 2024, 2:36 AM

#

KNN imputation is all you need (from my experiments) 🤣

weary temple Dec 20, 2024, 3:04 AM

#

coreect

bold hound Dec 20, 2024, 3:31 AM

#

Thank you very much @patent panther for the answer!

bold hound Dec 20, 2024, 3:32 AM

#

pliant dock its gonna be a shake up fest tonight😂😂

It was definitely a heartbreaking shake up 😂

upbeat mesa Dec 20, 2024, 4:04 AM

#

My selected sub was 2500th+ in public LB 😂

bold hound Dec 20, 2024, 5:31 AM

#

My rank came to be 1580 in public LB 😂

#

I was in silver medal category hours back 😂

somber swift Dec 20, 2024, 3:17 PM

#

I was staring at the screen in in disbelief after the private LB was revealed. 😂 I expected a shake, but certainly not this.

pliant dock Dec 20, 2024, 3:28 PM

#

somber swift I was staring at the screen in in disbelief after the private LB was revealed. �...

Congrats btw,

somber swift Dec 20, 2024, 4:27 PM

#

pliant dock Congrats btw,

Thank you ☺️