#child-mind-institute-problematic-internet-use

1 messages Β· Page 1 of 1 (latest)

high sonnet
edgy apex
#

Im interested to join a team!

short apex
olive jay
#

looking for a team.

astral olive
#

Looking for team, Someone interested please DM

glass stream
#

Can't we participate solo in the competition?

true phoenix
#

Did you read something that suggests the contrary?

glass stream
#

I'm actually a bit confused about getting started with the project tbh

true phoenix
#

I'm yet to try this kaggle comp, but in general a good way to start is to read the description ofc, but also the main discussion posts & possibly check out some of the notebooks on what other people are doing.

glass stream
true phoenix
#

seeing how people load the data and create a submission is perfectly fine

#

I didn't mean exactly copying their code :P

#

but especially if you're new to kaggle, that can help with getting started

glass stream
#

Ahh I see thanks

fierce edge
#

Is there any null value in the data provided in the competition's .parquet file?

edgy apex
#

Does anyone know why is there 3960 ids in the train.csv but only 996 directories with ids to work on? Are the other id important in any way?

fierce edge
edgy apex
twilit bison
twilit bison
tall notch
rapid lake
#

Hi, I am a beginner in the field of DS and looking for a partner for Child-mind-institute-problematic-internet-use Kaggle competition and also a partners from whom I can learn.

slow nebula
#

Hi there! Just a quick question: how do we interpret weekday in the parquet data? Does it start from Monday or Sunday?

slow nebula
slow nebula
#

I highly recommend always having the data dictionary and the data description page open. You need constantly look-up to make sense of all the fields

#

Although, spoiler alert, some descriptions are very inaccurate. Be critical and don't just take the description at face value

#

I suppose this is part of the competition, that's why they haven't corrected them

#

I would, however, really appreciate it if we are given some more information on the data columns. For example, the actual model of the ActiGraph used, the protocol for fitness endurance test, etc. I spent so many hours just to make sense of the data because of the scarcity of documentation from the competition

#

It's a wonderful learning experience, but extremely frustrating at times

#

I'm not sure, for example, whether the ActiGraph devices already implemented some form of vehicle detection algorithm or not.

#

And I have concerns about the implementation of GGIR package in wristpy, specifically for the dataset of this competition

#

The lack of true raw data really limits what we can do and what we can know

#

Fortunately wristpy does not do any further filtering, so the major information loss is the aggregation (and potentially arising from auto-calibration)

#

We also don't know the actual protocol of the accelerometer experiment

#

Like whether the device is on dominant or non-dominant hand

#

(By the way, if you need any help, I am happy to contribute to the wristpy repo. I have gone over the entire GGIR documentation and many relevant literature)

slow nebula
trail geyser
#

Hi! I am Aakash, having experience in Web Development and now learning AI/ML.

This competition is the second one I've joined, and got a confusion by seeing there are ~21 columns missing in the test set, and what's the scene with the parquet files?

Through the map provided, it seems like those 21-22 columns that are missing in the test df are very crucial in predicting the target variable.

what is your approach in proceeding in this challenge? By looking at some notebooks I found out that someone has put those missing columns as target columns. It seems completely a new concept to me.

Looking forward to hear from you πŸ™‚

toxic bane
#

Hello everyone! Seems like my local CV score and public LB are not very correlated. Has anyone found good local settings which give at least some correlation between local CV and public LB?

sullen rain
#

I saw 82 columns in the train dataset which we use to develop our model and was confused by that. I would love to work with someone/team on this competition. Please kindly DM me thank you

oak lynx
#

Hello Folks!
Im Dharmik and im a data scientist from India.
Ive been working on this competition and basically managed to get the rough skeleton of the approach i had in my mind ready.
The major question I had in my mind was around the model performance. Has anyone able to get a decent performance without any fancy feature engineering? I want to brainstorm ideas on how to further improve the performance on two major fronts:-

  1. Data -> includes imputation, feat engg. and any transformations in general
  2. Model -> this would majorly include hyperparameter tuning or later exploring if any other algorithms provide better performance.

Happy to get in touch with anyone here and do a problem solving session!!

frozen spindle
split shore
#

who wants to team Up ?

twilit bison
olive bramble
#

Did any one attended this webinar ? if yes, can you share some insights please.

#

Also guide me how to access recording of the webinar

blazing thicket
#

I spent a bit doing EDA and PCA yesterday, ping me if you get stuck

twilit bison
#

Hi all, I wanted to ask how people are dealing with the outliers in the dataset and if anyone has used any transformations at all to improve performance?

gaunt smelt
worn grove
pliant dock
worn grove
pliant dock
#

Aii thanks

hexed mica
olive bramble
#

Thank you very much dear @hexed mica for providing the source.

hexed mica
austere field
#

I'm having an issue when trying to submit to the competition.

I get the error: Submission Scoring Error
Your notebook generated a submission file with incorrect format. Some examples causing this are: wrong number of rows or columns, empty values, an incorrect data type for a value, or invalid submission values from what is expected.

I have searched the discussion to see if anyone else was having the issue: https://www.kaggle.com/competitions/child-mind-institute-problematic-internet-use/discussion/541102#3025616 but there seems to be no obvious solution, at least to me πŸ™‚

I assume what they are hinting at in the post is that there are entries which are in both test and train, however, having tried to remove duplicates from either as well as both and retried to submit, I get the same error.

My submission looks fine, to me regarding the test.csv:
id sii
00008ff9 0
000fd460 0
105258 0
00115b9f 0
0016bb22 0
001f3379 1
0038ba98 0
0068a485 0
0069fbed 0
0083e397 2
0087dd65 0
00abe655 0
00ae59c9 1
00af6387 0
00bd4359 0
00c0cd71 0
00d56d4b 0
00d9913d 0
00e6167c 0
00ebc35d 0

Anyone available for a little support?

merry nacelle
gray wave
#

Hi.. I am new to kaggle. I am trying to submit my notebook but the submit button shows disabled. Why is it so?

austere field
split shore
#

any team up ?

bronze swift
#

Hello everyone , how are you planning to use part-0.parquets

bronze swift
brave field
balmy magnet
#

Hi guys, just a small doubt. How many actual columns do the test set has. I am trying to reduce the dimensionality of the dataset and it throws an error stating the number of columns might be actually more.

blazing thicket
#

for EDA

twilit bison
polar glacier
#

Im interested to join a team!

bronze swift
#

are you solving it as a regression problem or classification problem?

sly temple
#

I've been working on it as a regression problem - seems more flexible; Polars as well LazyFrame streaming to stay w/in the 30GB RAM limit on kaggle notebooks

pale tiger
#

Can we participate in this competition without a team ?

mossy arrow
thorn plaza
#

Hi everyone,

I hope you're doing well. I recently completed a data science course on Udemy and am eager to enhance my skills further. As I am new to the field, could anyone kindly suggest ways to practice and work on projects that would help me improve?

I would truly appreciate any guidance, tips, or resources you could share to help me grow in this journey.

short apex
#

Can we use LLM API in CIU competition?

final sparrow
near kelp
#
  1. You name it, he's thought about it ; strategies for every type of situation explored ; a toolbox for reference
hard igloo
#

Hey all, I was recently working on my submission for this competition but I unfortunately seem to be running into an error when trying to submit recently. I noticed that other people are still submitting their predictions but had thought that because we were past the one deadline, it wasn't letting me submit anymore. I have gone through my code and run the notebook successfully, but everytime I go to submit an entry, it says the notebook gave an exception. I looked through the logs and the logs state that it ran successfully. I wonder if someone has seen this before, or am I missing something in that I can't submit any more entries? I was able to submit entries before and have submitted quite a few, so not sure what is going on. Any help would be greatly appreciated!

merry nacelle
#

If i derive a dataset by preprocessing the existing competition dataset, will i be allowed to directly use that derived dataset on my submission notebook? Or, it needs to including the preprocessing code too in the submission notebook?

bold hound
#

What do we mean by overfitting the leaderboard?

#

Newbie here

patent panther
#

Hello @bold hound ,

Overfitting the leaderboard means that a notebook has learned patterns specific to the public portion of the data but may not generalize well to the private portion. Public leaderboard scores are often computed using a small subset of the validation data (for example, 20%), while the final scores are calculated on the larger, untested hold-out dataset (the remaining 80%).

As a result, some public notebooks will seem very accurate by blending multiple models to fit the public validation set very well. However, they can often end up learning noise rather than real patterns, causing them to perform poorly to very poorly when evaluated on the private dataset.

This is referred to as a "shake-up" on Kaggle. It varies depending on the dataset and the metric used. Sometimes, blending public notebooks leads to good results, but at other times, it causes severe overfitting and poor performance on the private leaderboard.

The recommended approach is to choose a local cross-validation strategy to test your local models and choose your best models based on this.
To determine if a shake-up will occur, people compare their local metrics to their public LB scores. When it does correlate well, a shake-up is less likely. Here is seems a lot of people have noticed a strong difference so a shake-up seems likely.

Hope that helps

pliant dock
#

its gonna be a shake up fest tonightπŸ˜‚πŸ˜‚

#

How bad will the shake up be lol

pliant dock
#

πŸ˜‚πŸ˜‚πŸ˜‚

upbeat mesa
#

KNN imputation is all you need (from my experiments) 🀣

weary temple
#

coreect

bold hound
#

Thank you very much @patent panther for the answer!

bold hound
upbeat mesa
#

My selected sub was 2500th+ in public LB πŸ˜‚

bold hound
#

My rank came to be 1580 in public LB πŸ˜‚

#

I was in silver medal category hours back πŸ˜‚

somber swift
#

I was staring at the screen in in disbelief after the private LB was revealed. πŸ˜‚ I expected a shake, but certainly not this.

somber swift