empty lion Aug 3, 2023, 2:45 PM

#

Hey everyone! I'm Jerad, a developer for Kaggle. We made this channel to give people a way to get help with Kaggle, or ask other questions in general that the Kaggle staff or community may be able to answer.

If there's anything you're not sure about, let us know! I'll actually be hanging out here, and may be able to help out!

gaunt oasis Aug 8, 2023, 7:45 PM

#

Hi @indigo fulcrum, this post would be a better fit for the #🔗┊sharing-projects channel. Good luck on your journey to become a Notebook Expert!

sharp iris Aug 10, 2023, 4:29 AM

#

Hello Everyone,
I have a question about the Kaggle competition.
There are many pre-trained models already available. If I use those models in my competition only on test data, not any work on train dataset, and submit it. will it be acceptable? Or I have to train it, then I can test my trained model on test set.
Example:
There are many models of English speech recognition in hugging face. Can I use those pre-trained models only on the test set, and if it will produce a good score in the leaderboard, will it be acceptable in the competition?
I know it's a noob question. Help me☺️

Thanks
Aditta

deft stream Aug 10, 2023, 5:29 AM

#

@empty lion Is there a way I can change my Kaggle username? There had been a minor spelling error while creating the username.

verbal crest Aug 10, 2023, 5:33 AM

#

sharp iris Hello Everyone, I have a question about the Kaggle competition. There are many p...

Hey Aditta, some competitions have specific rules about which data and models you can use, you need to check each one. But usually it is fine to use external models (from hugging face, kaggle models, or anywhere else). That said, you'll probably find that on their own pre-trained models will only get you part of the way to a good score.

verbal crest Aug 10, 2023, 5:34 AM

#

deft stream <@1106666264128126997> Is there a way I can change my Kaggle username? There had...

Unfortunately at this time we do not support changing usernames. If you have extenuating circumstances you can contact Kaggle support to request a change at https://www.kaggle.com/contact

Contact Kaggle Support

A lot of the common inquiries we receive are listed below. Please click on the one that applies to you to learn more.

sharp iris Aug 10, 2023, 6:17 AM

#

verbal crest Hey Aditta, some competitions have specific rules about which data and models yo...

Without any training, if I get a good score. Is it acceptable?
🤔 🤔 🤔
@verbal crest sir

verbal crest Aug 10, 2023, 6:47 AM

#

@sharp iris Yeah it's fine unless a specific competition has a specific rule against using certain models.

sharp iris Aug 10, 2023, 6:58 AM

#

Thanks

elder flower Aug 11, 2023, 10:29 PM

#

How long does it take for competition results to be verified in general?

verbal crest Aug 11, 2023, 10:55 PM

#

@elder flower Usually something like 2-7 days, sometimes longer depending on the competition.

coral pebble Aug 13, 2023, 6:11 AM

#

Does anyone have tips for one to reach the Master tier or above on Kaggle ?

Not sure if this is the right channel to ask though

open isle Aug 13, 2023, 6:47 AM

#

In the competition ranking or in some other ranking?
If it is about competitions, then you need to take part in many competitions, learn from the solutions of the winners of the past competitions and incorporate them into your approach.

coral pebble Aug 13, 2023, 6:48 AM

#

open isle In the competition ranking or in some other ranking? If it is about competitions...

In any category. I could not find many quality answers when searching on kaggle forums

open isle Aug 13, 2023, 6:49 AM

#

Okay, so the official requirements are the following. I'll give suggestions for each ranking separately

#

Datasets:
There are two main parts: collecting interesting data and promoting it.
So, first step is to collect some data. There are already thousands of datasets on Kaggle, so you would need to find some interesting data, which wasn't collected yet. Another approach would be to share some data for the ongoing competitions: for example, sharing relevant external data, doing some processing on the data and so on.
But simply making a good dataset isn't enough - you need to get people's attention. The first step is to make the dataset presentable. When you create a dataset, you see a score - how well it is done, it includes descriptions, metadata and other things. So be sure to fill in all the fields.
And after the dataset is ready, you need to promote it - post about it on Kaggle forums and social media.

#

Discussions:
You need people to upvote your posts. 1 vote is bronze, 5 votes is silver, 10 votes is gold.
The "easiest" way to get upvotes is to be active on forums in an ongoing competition - share your insights, ask questions, participate in hot topics.
Some people simply share articles from internet on Kaggle forums. It is a low-effort activity, but, unfortunately, it works.
Votes for the comments in the notebooks are counted too.

#

Notebooks:
Now, this becomes tricker. Personally I think that Notebooks (and competitions) are much more competitive rankings compared to the two previous ones.
You need to make a good analysis, share it and get enough votes.
There are numerous ways to make good kaggle notebooks:

build a good model for an ongoing competition and share it
do an EDA (exploratory data analysis) for a competition or dataset and share it
And so on. What is important to know, that it is difficult to produce novels ideas, so many people try to get medals by joining a new competition and share a good analysis within first 12-24 hours. It is tough, but doable.

It will take some time to be good at it, but it is definitely rewarding.

I'll share some resources to help you:
https://www.analyticsvidhya.com/blog/2020/12/exclusive-interview-with-andrey-lukyanenko/
https://www.youtube.com/watch?v=qKqLHs3J-Rc&ab_channel=AnalyticsVidhya

Analytics Vidhya

avcontentteam

Kaggle Grandmaster Series - Exclusive Interview with Andrey Lukyane...

In this Interview, Andrey Lukyanenko joins us today to give insight into his data science journey and what pitfalls to avoid in the start.

YouTube

Analytics Vidhya

Data Visualization in Data Science | DataHour | Analytics Vidhya

Visualization is the best method to grasp the complex and hidden results from the data. Analyzing the visualizations is better than calculating data statistics and various plots and techniques can be used to do so.

In this DataHour, Andrey will share the history of data visualization. After which he will explain about different plot types and ...

▶ Play video

#

Competitions:
Now, this is the most difficult ranking on Kaggle. You need to take part in the competitions and reach a high place in it. It is very difficult, so even experienced data scientists can fail. The important thing is to iterate over ideas fast, try many things and be prepared to spend a lot of time.

Here is a link where I talk how I got a gold medal several years ago:
https://www.youtube.com/watch?v=rpClh8WmTdo&ab_channel=ChaiTimeDataScience
This channel has a lot of very useful interviews

YouTube

Chai Time Data Science

Interview with Kaggle Kernels Grandmaster #1: Artgor | Andrew Lukya...

Audio (Podcast Version) available here: https://anchor.fm/chaitimedatascience

In this episode, Sanyam Bhutani interviews the king of kaggle kernels, Grandmaster Andrew Lukyanenko Ranked #1 about his journey into Data Science, Kaggle. They also talk about his pipeline for writing kernels.

Follow:
Andrew Lukyaneko
https://twitter.com/AndLukyane...

▶ Play video

#

That's it. If you have any further questions, I'll be happy to answer them

coral pebble Aug 13, 2023, 7:01 AM

#

Thank you so much for the detailed explanation kerneler

summer drum Aug 13, 2023, 3:54 PM

#

In the learning Python tab of Kaggle, chapter 6, there is sth confuse me.

claim.startswith(planet)
>>>TRUE```
While I try it myself in jupiternote, it return False, with the exact code.
Also, why is the thing btw () must be identified?

#

open isle Aug 13, 2023, 4:35 PM

#

the argument of startswith method should be string, like "guud"

summer drum Aug 13, 2023, 4:57 PM

#

oh, it seems that the reason why it returns "TRUE" initially was bc planet is identified as a string somewhere before.

#

thank you for your help

dire geyser Aug 14, 2023, 3:36 AM

#

summer drum In the learning Python tab of Kaggle, chapter 6, there is sth confuse me. ```cla...

This code should not return True!!!!!!!!!!!!!!!!!

summer drum Aug 14, 2023, 3:46 AM

#

#

This is the screenshot, and the link is https://www.kaggle.com/code/colinmorris/strings-and-dictionaries

Strings and Dictionaries

Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources

#

Can you guys explain what happen?

#

GagerComfy

reef bough Aug 15, 2023, 6:11 AM

#

open isle Discussions: You need people to upvote your posts. 1 vote is bronze, 5 votes is ...

are medals and votes linearly related i.e. if someone gets 20 votes on a discussion post, does that mean he/she will get 2 gold medals? Or once you cross 10 votes threshold, you only get one gold medal irrespective of how many votes one get?

open isle Aug 15, 2023, 6:12 AM

#

reef bough are medals and votes linearly related i.e. if someone gets 20 votes on a discuss...

You can get only one medal on a post/notebook/dataset/competition.
Here is the progression information: https://www.kaggle.com/progression

Kaggle Progression System

Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals.

desert tusk Aug 15, 2023, 6:17 AM

#

Why does ICR competition doesn't appear in meta kaggle dataset on competition file?

torpid cipher Aug 15, 2023, 6:20 AM

#

Hello, I'm new to kaggle and trying my luck with the CommonLit - Evaluate Student Summaries Competition. I wrote code for this and saved it in a submission.csv file at the end. But always get a scoring error. Can someone help me or give me a tip?

reef bough Aug 15, 2023, 6:21 AM

#

torpid cipher Hello, I'm new to kaggle and trying my luck with the CommonLit - Evaluate Studen...

what's the error you get? It may be because of the fact that the schema of your submission.csv doesn't align with what the competition expects. I have also encountered similar error in other competition, that was because when I saved the file, pandas by default an additional index column, but not including it gets the submission through.

torpid cipher Aug 15, 2023, 6:22 AM

#

Submission Scoring Error

#

Save and Run all works and I get a submission.csv file but the upload doesn't work

reef bough Aug 15, 2023, 6:24 AM

#

torpid cipher Save and Run all works and I get a submission.csv file but the upload doesn't wo...

compare the columns of your submission.csv and the submission.csv from the competition.

torpid cipher Aug 15, 2023, 6:25 AM

#

well the output of my submission.csv and the required version looks the same. Or do I have to store this in a pandas dataframe?

reef bough Aug 15, 2023, 6:27 AM

#

open isle You can get only one medal on a post/notebook/dataset/competition. Here is the p...

okay, I mean, if that's the case, the low key effort posts of sharing articles seems like a good way for discussion. It's basically the more you post, the more are the chances of getting a medal (atleast bronze, although getting a gold may be difficult). Sharing resources like "Data Science cheat sheet" tend to do good in discussion forums.

open isle Aug 15, 2023, 6:27 AM

#

Yes, in fact I remember that there were some discussion grandmasters who got their rank by sharing such articles

torpid cipher Aug 15, 2023, 6:29 AM

#

I do not understand what you mean

reef bough Aug 15, 2023, 6:34 AM

#

torpid cipher I do not understand what you mean

the above message was for separate conversation. I am not sure exactly what could be cause of your error, generally in my case, the submission error was because of schema mismatch. My second guess was, may be for some samples the scoring metric is undefined like for example log of negative number, but I briefly looked into the scoring metric of common-lit competition it looks like they are using RMSE, which should be easily definable for all samples.

reef bough Aug 15, 2023, 6:35 AM

#

torpid cipher I do not understand what you mean

are you sure, there aren't any more information aboout the error in logs, like the full stack trace?

torpid cipher Aug 15, 2023, 6:40 AM

#

I actually have negative values in my predictons. Can it be that the MCRMSE is not implemented correctly?

#

I'm still a complete beginner. Please excuse me if I'm not doing everything right

#

So it can't be because of that, since both positive and negative values can occur

#

may I post some of the logs here?

limber moat Aug 15, 2023, 7:56 AM

#

How and where I can get a reason why my result disappeared from final leaderboard at ICR competition?

open isle Aug 15, 2023, 1:05 PM

#

Usually the results dissappear from the leaderboard in case when admins decide that there was some kind of rule breaking.

elder flower Aug 15, 2023, 3:03 PM

#

Should I reply to kaggle competition admin that sent me instructions after the competition to ask some questions about it?

open isle Aug 15, 2023, 3:05 PM

#

I think it would be a good idea

elder flower Aug 15, 2023, 3:48 PM

#

Are kaggle winnings considered like lottery wins or like income for taxation?

dense atlas Aug 15, 2023, 4:32 PM

#

Probably depends on your juridiction

solid dome Aug 19, 2023, 8:22 AM

#

Has anyone gotten their silver medal converted to bronze? Got a mail for achieving silver on a notebook which I saw on Kaggle itself. But now after 2 hours, the medal is again bronze. Any ideas about it? The votes are still the same!

reef bough Aug 19, 2023, 12:18 PM

#

solid dome Has anyone gotten their silver medal converted to bronze? Got a mail for achievi...

someone may have downvoted the notebook, even though the votes are same, but not all votes are counted the same towards the medals

solid dome Aug 19, 2023, 1:26 PM

#

Ohh I see

solid dome Aug 19, 2023, 3:31 PM

#

Yea that's what I thought so asked it here

reef bough Aug 20, 2023, 3:17 PM

#

ohh, didn't know about that, my bad. I guess, my comment is only applicable for comments and discussion then @solid dome

verbal crest Aug 21, 2023, 4:09 AM

#

solid dome Has anyone gotten their silver medal converted to bronze? Got a mail for achievi...

Sounds like we might have a bug where the email is being sent on different logic than the medal is awarded. I'll make sure we look into it! The vote logic is very well tested, so its very likely the email is what went wrong here.

coral pebble Aug 21, 2023, 5:24 AM

#

@solid dome this is because someone retracted their upvote. He gave you the vote which is needed for the silver medal then deleted his account / retracted it a few hours later

solid dome Aug 21, 2023, 8:52 AM

#

verbal crest Sounds like we might have a bug where the email is being sent on different logic...

Hi @verbal crest , I have seen the medal on Kaggle and the notebook also showed the silver medal icon. Email as well Kaggle notifications both showed silver. Attaching photos of the Kaggle notification as well as the notebook.

verbal crest Aug 21, 2023, 8:53 AM

#

@solid dome The scenario bogoconic1 mentioned above is a very liekly cause. This sort of thing happens all the time of course. Our system constantly calculates medals based on requirements, and it is possible to lose or downgrade medals.

#

Typically if you wait a little bit, you'll get some more upvotes and it will upgrade again.

solid dome Aug 21, 2023, 8:54 AM

#

@verbal crest got it. Thanks for the clarification.

desert tusk Aug 23, 2023, 11:13 AM

#

why doesn't kaggle have any competition for audio for newbie?

verbal crest Aug 23, 2023, 4:19 PM

#

desert tusk why doesn't kaggle have any competition for audio for newbie?

Audio competitions are pretty rare on Kaggle, but I agree it would be pretty cool if we had a beginner competition in that category

desert tusk Aug 24, 2023, 7:19 AM

#

verbal crest Audio competitions are pretty rare on Kaggle, but I agree it would be pretty coo...

How can we advanced it? I believe that we can take an open dataset and establish a comptition

slim meadow Aug 24, 2023, 7:54 AM

#

sorted_by_flavor_and_unitssold.to_markdown(max_rows = 20) This is throwing an error in Kaggle. What am I doing wrong?

#

max_rows int, optional
Maximum number of rows to display in the console.

copper carbon Aug 26, 2023, 5:43 PM

#

Hello, is it possible for a notebook to have >= 5 non-novice votes, but still not be awarded a bronze medal?

coral pebble Aug 28, 2023, 2:18 PM

#

copper carbon Hello, is it possible for a notebook to have >= 5 non-novice votes, but still no...

Yes

light nest Aug 28, 2023, 11:26 PM

#

hey guys , is it possible for ttest to return 0 as p-value ?

deft fox Aug 28, 2023, 11:34 PM

#

elder flower Are kaggle winnings considered like lottery wins or like income for taxation?

In the USA, lottery wins (and Kaggle wins) are considered an income. It may be different in other countries.

deft fox Aug 28, 2023, 11:36 PM

#

solid dome Has anyone gotten their silver medal converted to bronze? Got a mail for achievi...

Pretty sure that if a user who upvoted your notebook was deleted, their vote goes away. If you were that close to silver, you will get it again.

deft fox Aug 28, 2023, 11:38 PM

#

copper carbon Hello, is it possible for a notebook to have >= 5 non-novice votes, but still no...

Yes, happens all the time. Not sure if you want me to go into detail, but a short version is that people who upvote you frequently don't have their votes counted. Supposed to prevent the gaming of the voting system. I have hundreds of posts with >1 non-novice votes without a bronze medal, and probably dozens of posts >7-8 votes without a silver medal. Similar for >12-13 votes without a gold medal.

deft fox Aug 28, 2023, 11:41 PM

#

desert tusk How can we advanced it? I believe that we can take an open dataset and establish...

You can create a competition of any kind you like. Yet feature audio competitions are rare, presumably because competition hosts deal with other types of data.

copper carbon Aug 29, 2023, 5:57 AM

#

deft fox Yes, happens all the time. Not sure if you want me to go into detail, but a shor...

that's fair, though i feel they should probably mention this on the progression system page (just like they do for discussions)

desert tusk Aug 29, 2023, 6:45 AM

#

deft fox You can create a competition of any kind you like. Yet feature audio competition...

How can help with that? I can try to do my best

deft fox Aug 29, 2023, 7:03 AM

#

desert tusk How can help with that? I can try to do my best

There is a guide about community competitions. https://www.kaggle.com/c/about/community

Kaggle Competitions

Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals.

deft valley Aug 29, 2023, 7:13 AM

#

Not sure if this can be talked about but where is the unlearning competition? The announcement said it would be on kaggle ages ago and it has to be done before neurips.

desert tusk Aug 29, 2023, 9:18 AM

#

deft fox There is a guide about community competitions. https://www.kaggle.com/c/about/co...

If I want to take a open dataset for a specific problem, that may have any problem? e.g. https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0

mozilla-foundation/common_voice_11_0 · Datasets at Hugging Face

#

I am not a lawyer and I don't have any undserstanding on that

elder warren Aug 29, 2023, 9:40 AM

#

Hi @everyone,
Can someone please explain to me why the first output is a signed zero, even arithmetically the 2 should be unsigned zeros.

dense atlas Aug 29, 2023, 9:57 AM

#

Float conversion… this is quite dangerous, esp with if conditions. Solution is to mind your type and use int() if you expect an int.

coral pebble Aug 29, 2023, 11:19 AM

#

copper carbon that's fair, though i feel they should probably mention this on the progression ...

Votes that do not count towards medals also do not count towards points

abstract sequoia Aug 29, 2023, 4:29 PM

#

I need to evaluate the output of an ML model using MATLAB, but dont have a license, can someone run a script for me?

tulip hare Aug 29, 2023, 4:32 PM

#

Could a server admin please change my server nickname to "Chris Akiki" ? 🙏

verbal crest Aug 29, 2023, 4:33 PM

#

tulip hare Could a server admin please change my server nickname to "Chris Akiki" ? 🙏

On this discord accounts are linked to Kaggle, so if you change your Kaggle profile name it will update here automatically.

tulip hare Aug 29, 2023, 4:34 PM

#

verbal crest On this discord accounts are linked to Kaggle, so if you change your Kaggle prof...

I would rather keep my full name on my Kaggle profile, but it's not a big deal. Thank you for letting me know 🙂

verbal crest Aug 29, 2023, 4:35 PM

#

Totally get the desire to differentiate. Right now we're sticking with the linking since we really want people to be able to find each other on Kaggle.com too.

vapid jungle Aug 29, 2023, 4:35 PM

#

Hey I am doing a project with protiens and ligands in the form of mol2 and pdb files, would anyone happen to know the best way to encode the files into a fixed length vector while considering both structures

frigid dove Aug 29, 2023, 5:30 PM

#

Hey, I need resources on deploying ML models - Pytorch, Tensorflow based ones.

I want to know the best industry practices followed in deployment.

Any book/article related with it is appreciated.

ashen reef Aug 29, 2023, 6:36 PM

#

@verbal crest , is it still unlinked

verbal crest Aug 29, 2023, 6:37 PM

#

@ashen reef It's linked now! 🎉

ashen reef Aug 29, 2023, 6:37 PM

#

Haa.. Finally, thanks..!!

stable pivot Aug 29, 2023, 6:52 PM

#

hey , anyone familiar with the flair library , i need some help !

raven peak Aug 29, 2023, 7:01 PM

#

Anyone got any experience with similarity score scripting with chroma vector store?

past spade Aug 29, 2023, 7:07 PM

#

I missed the BIPOC cohort deadlines, does anyone know how often does kaggle organise such cohorts?

verbal crest Aug 29, 2023, 7:10 PM

#

@past spade Not sure when our next one will be, but I'd say it's probably about 6 months away or so. In the meantime you can learn a lot from all the helpful people here in the discord!

past spade Aug 29, 2023, 7:10 PM

#

verbal crest <@1129503144599097475> Not sure when our next one will be, but I'd say it's prob...

Oh okay cool, thanks!

lilac sierra Aug 29, 2023, 7:14 PM

#

What happened with plans to start machine unlearning challenge on the mid August?
Will this competition appear in the nearest future?

verbal crest Aug 29, 2023, 7:26 PM

#

lilac sierra What happened with plans to start machine unlearning challenge on the mid August...

It's in the works

slate locust Aug 29, 2023, 7:26 PM

#

I need help on how to use the lasio python Library in Kaggle. I thought these libraries were auto imported. I have tried

!pip install lasio

..on a separate cell but it didn't work. At one time, it gave a network connectivity error but my wifi was good and fast.

I couldn't find any console to input commands either.

Please I need guidance 🙏

placid valve Aug 29, 2023, 9:56 PM

#

hey there I am using R and Error in predict.xgb.Booster(xgb_model, test_dmatrix) :
Feature names stored in object and newdata are different! this pops up can anybody take a look at my code

torpid crater Aug 29, 2023, 9:59 PM

#

Hey all✌️ Anyone here ever messed around with creating synthetic datasets to train theroem based neural networks for search and rec.? Basically hierarchy logic of domain specific data. DM me would love any suggestions 👍

deft fox Aug 30, 2023, 12:21 AM

#

placid valve hey there I am using R and Error in predict.xgb.Booster(xgb_model, test_dmatrix)...

Not sure how anyone can look at your code when you didn't provide a link. Based on the error message Feature names stored in object and newdata are different! I would conclude that you have different features in train and test data (the matrix you are trying to predict). This is usually the ID column or something similar. So whatever features you are adding or removing in the train data, make sure the same operations are applied to test data.

deft hearth Aug 30, 2023, 12:32 AM

#

Hello Kaggle community, I want to know if I can invite people to this discord server.

verbal crest Aug 30, 2023, 12:35 AM

#

deft hearth Hello Kaggle community, I want to know if I can invite people to this discord se...

Yes, you are very welcome to! Our custom invite link is discord.gg/kaggle

primal wedge Aug 30, 2023, 1:36 AM

#

Do you know how to deploy model ML in onnxruntime website with framework Next js

#

?

hot dock Aug 30, 2023, 3:01 AM

#

Hello guys. Just started doing Kaggle and I’m curious how do you guys handle large image datasets. I am currently working on the RSNA challenge. It’s some 400 gb so I don’t think it’s possible to download locally. What would be the best option for online computing with persistent storage?

deft fox Aug 30, 2023, 3:48 AM

#

hot dock Hello guys. Just started doing Kaggle and I’m curious how do you guys handle lar...

If you don't have the storage and the bandwidth to download these files - most people don't - my suggestion is to work with them directly on Kaggle. No need to download or move anything - simply write notebooks and do the training right there.

hidden estuary Aug 30, 2023, 5:35 AM

#

I m a newbie data scientist, what is the best possible way for me to advance in this field, also i m currently pursuing my masters

subtle dagger Aug 30, 2023, 7:11 AM

#

Hello, I have recently come across a problem trying to use kaggle website as the headers and other parts of the UI are overlapping making it difficult to use it as shown in the images below. I wanted to know if the problem is caused from settings in my browser(even though I have faced same issues in both chrome and FireFox) and how I can fix it.

graceful axle Aug 30, 2023, 8:32 AM

#

I am really interested to know more about clustering algorithms from people who have used them. For example, perhaps data is broken down by age, gender, race, country, language. Standard questions to ask on forms. I know that in clustering, principal components of the cluster grouping boundaries don’t necessarily align with the predefined categories that set the axes. In fact, discovering structures in the data is the point. I have only used clustering at the very beginner level though. To what extent do demographics data with unusual individuals result in outliers from any cluster? This is a question I’ve been curious about for some time. Since I’m new to this discord I’m not sure if it’s too far off topic or if it’s a reasonable learning question. Could you please let me know if I should delete? I think it is clearly on the subject of data science but not a specific kaggle competition.

#

I would love to know how this works if anyone knows though

#

Like a social scientist or someone

placid valve Aug 30, 2023, 8:35 AM

#

xgb_train <- xgb.DMatrix(data = as.matrix(train_data_main), label = a)
xgb_test <- xgb.DMatrix(data = as.matrix(test_data_main))

bst<-xgboost(data = as.matrix(train_data_main), label = a, max.depth = 6,
eta = 0.3, nthread = 2, nround = 100, objective = "reg:absoluteerror")

I have a code like this how can I easily optimize it

zinc walrus Aug 30, 2023, 11:03 AM

#

graceful axle I am really interested to know more about clustering algorithms from people who ...

Hi @graceful axle ! I attempted clustering using purchasing behavior (I know it’s not exactly demographics as you asked). The data I used does contain a bunch of people whose purchasing trends are what could be called outliers. My understanding is that people in one cluster aren’t going to be exactly alike, but more similar to those in the same cluster than to people who are in a different cluster. https://www.kaggle.com/code/mounikagoruganthu/mathematical-distance-in-ml

Mathematical Distance in ML

Explore and run machine learning code with Kaggle Notebooks | Using data from [Private Datasource]

slate locust Aug 30, 2023, 1:58 PM

#

Please I'm still stuck. Any kind assistance will be greatly appreciated.

I need help on how to use the lasio python Library in Kaggle. I thought these libraries were auto imported. I have tried

!pip install lasio

..on a separate cell but it didn't work. At one time, it gave a network connectivity error but my wifi was good and fast.

I couldn't find any console to input commands either.

Please I need guidance 🙏

hot dock Aug 30, 2023, 2:29 PM

#

what online gpu platform should I use? I want to about 1 TB about persistent storage. Thanks in advance.

#

what online gpu platform should I use? I want to use about 1 TB of persistent storage. Thanks in advance.

verbal crest Aug 30, 2023, 4:03 PM

#

subtle dagger Hello, I have recently come across a problem trying to use kaggle website as the...

The issue you are facing here is that none of our icons are loading and instead you are seeing their alt text on the page. I'm not sure what's causing them all to fail, I'll share it with the team.

subtle dagger Aug 30, 2023, 5:04 PM

#

verbal crest The issue you are facing here is that none of our icons are loading and instead ...

Thank you for your support.

vivid owl Aug 30, 2023, 6:53 PM

#

subtle dagger Hello, I have recently come across a problem trying to use kaggle website as the...

Interesting. I don't have the same issue. Maybe your browser setting?!

fleet ingot Aug 30, 2023, 8:07 PM

#

hi everyone, can anyone tell me that how can we extract data from mobile applications like API permissions just like the CSV file i attached . I need this for my thesis research. @verbal crest

📎 Android.csv

green haven Aug 30, 2023, 8:19 PM

#

fleet ingot hi everyone, can anyone tell me that how can we extract data from mobile applica...

Hey, what type of data are you interested in?

fleet ingot Aug 30, 2023, 8:20 PM

#

green haven Hey, what type of data are you interested in?

That what permissions an application is taking.

pliant ermine Aug 30, 2023, 11:58 PM

#

Has anyone read the Kaggle Workbook? I was gonna use it to check out if I can do my first kaggle competition from it or not. It’s from packt publications and not that well known or at least I hadn’t heard about it before, it’s available in an humble bundle now

subtle dagger Aug 31, 2023, 4:27 AM

#

vivid owl Interesting. I don't have the same issue. Maybe your browser setting?!

Which browser setting do you think would bring such changes?

verbal crest Aug 31, 2023, 6:59 AM

#

subtle dagger Which browser setting do you think would bring such changes?

Where are you accessing Kaggle from? We've previously had issues with China's firewall blocking Google's CDN causing this bug specifically. Otherwise maybe something else is blocking that specific resource from loading. (We are also internally looking to try and fix this bug, but it might take a little while - it only happens rarely).

subtle dagger Aug 31, 2023, 8:33 AM

#

verbal crest Where are you accessing Kaggle from? We've previously had issues with China's fi...

I am accessing it from Ethiopia.

rotund moth Aug 31, 2023, 10:34 AM

#

Hello everyone! I am new in ML and did some basic models, feature engineering etc. Can anyone recommend me some basic knowledge competitions? I already did titanic and Spaceship competition. Thank you!

haughty basin Aug 31, 2023, 10:50 AM

#

Hello everyone , I am thinking to start on ASR and LLM . Can anyone please suggest me a proper roadmap to start it

charred scroll Aug 31, 2023, 11:58 AM

#

Hello, everyone, where can I summit a bug/improvement to the Learn notebooks in the platform?
Answer

Product Feedback | Kaggle

Product Feedback.

languid tartan Aug 31, 2023, 1:06 PM

#

Hello, I heard that Kaggle has demos in Google Cloud Next, how can I find those?

vivid owl Aug 31, 2023, 1:13 PM

#

languid tartan Hello, I heard that Kaggle has demos in Google Cloud Next, how can I find those?

I'd love to know which session that might be also. I searched the Session Library, but couldn't find it.

But the first step is to register for a complementary access of recorded sessions via a Digital Pass: https://cloud.withgoogle.com/next/

Experience Google Cloud Next ’23

Google Cloud Next ’23 is back - in person on August 29-31 in San Francisco! Connect with me and 15,000+ peers for product announcements, sneak peeks into future roadmaps, on-site demos with experts and partners, plus hands-on training & certification opportunities. g.co/cloudnext

fresh flume Aug 31, 2023, 1:31 PM

#

Hello, I have only one learning path in my Google Cloud Skills Boost and my Mentor @trim lotus informed me that I suppose to have more than one. May I kindly request that this issue be resolved. Thank you.

#

green haven Aug 31, 2023, 2:07 PM

#

Hey, how are you? My name is Matviy and I am a high schooler from Ukraine. I would just like a quick word of advice. I got perfect accuracy on this model so I thought it is perfect, but I googled it and Google said it is more than likely it is a false accuracy. Could you let me know what you think in this matter?

hot dock Aug 31, 2023, 2:40 PM

#

green haven Hey, how are you? My name is Matviy and I am a high schooler from Ukraine. I wou...

It's probably overfitting, meaning the model is "memorized" the dataset. Usually you need to split the data into two, one for training, one for validing/testing. You would take the accuracy from the second set

green haven Aug 31, 2023, 2:47 PM

#

Yeah, I did split the data with:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

sweet ice Aug 31, 2023, 3:18 PM

#

Hey kagglers if any senior data scientist, or machine learning engineering want mentoring me would happy and very palisent with that

rotund moth Aug 31, 2023, 3:25 PM

#

Can I use chi2 test and pearson correlation coefficient in dataset containing both numerical and categorical variables?
I have a dataset which contains both numerical and categorical variables, So can I use mentioned two techniques separately to select features? For example - A, B, C, D, E are my columns wherein A, B are categorical so here I'll use chi2 test whereas C,D are numerical so i'll use pearson coeff. and E can be my target can be either categorical or numerical.

sweet ice Aug 31, 2023, 3:31 PM

#

rotund moth Can I use chi2 test and pearson correlation coefficient in dataset containing bo...

Yes, You can use the chi-squared test for categorical variables and the Pearson correlation coefficient for numerical variables to select features in a dataset. Adjust your approach for variable E based on whether it's categorical or numerical. Remember to interpret results carefully and consider the broader context of your data.

rotund moth Aug 31, 2023, 3:33 PM

#

sweet ice Yes, You can use the chi-squared test for categorical variables and the Pearson ...

Thank you. I want to know doesn’t it affect my model if I use some test on some variables while excluding others?

sweet ice Aug 31, 2023, 3:45 PM

#

rotund moth Thank you. I want to know doesn’t it affect my model if I use some test on some ...

For datasets containing both categorical and numerical features, consider using metrics like F1 score and AUC-PR (Area Under the Precision-Recall Curve). These metrics are well-suited to handle the mixed nature of your data and provide valuable insights into your classification model's performance.

gleaming beacon Aug 31, 2023, 4:58 PM

#

I have a question..may be silly one... Can anyone tell me how efficient are Datacamp and udemy courses for Datascience ?

thick glacier Aug 31, 2023, 6:28 PM

#

Hello everyone! I've just completed my first work on a classification algorithm using a spam email dataset. I would love to hear your thoughts and suggestions for any improvements I can make. Your insights would be greatly appreciated!

https://www.kaggle.com/dinanksoni/spam-email-classification

Spam Email Classification

Explore and run machine learning code with Kaggle Notebooks | Using data from Email Spam Classification Dataset CSV

graceful axle Aug 31, 2023, 6:29 PM

#

How to resolve this error?

py
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Error: 
ValueError: Expected 2D array, got 1D array instead:
array=[1232.  677.  221. ... 1294.  860. 1126.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

thick glacier Aug 31, 2023, 6:30 PM

#

reshape your x_train using array.reshape(1,-1)

graceful axle Aug 31, 2023, 6:30 PM

#

thick glacier reshape your x_train using array.reshape(1,-1)

Ahh , thanks

graceful axle Aug 31, 2023, 6:32 PM

#

thick glacier reshape your x_train using array.reshape(1,-1)

Did tat, but still didnt work

thick glacier Aug 31, 2023, 6:33 PM

#

check you x_tarin and x_test who is 1D array. and change it into 2D array.

graceful axle Aug 31, 2023, 6:35 PM

#

ok

deft fox Aug 31, 2023, 7:57 PM

#

graceful axle How to resolve this error? ```py py scaler = StandardScaler() scaler.fit(X_trai...

The answer was already provided in the error message: Reshape your data either using array.reshape(-1, 1) rather than .reshape(1,-1). This is assuming you have a single column of data, or single feature as described in the error. It is unlikely that you have a single sample, so array.reshape(1, -1) probably would not work. Also, instead of array you need to have X_train or X_test, meaning the actual name of the array.

rough cosmos Aug 31, 2023, 8:12 PM

#

Hi, I am new to kaggle competitions and have a few questions. Are we allowed to use a LLM like llama2 for the CommonLit - Evaluate Student Summaries competition?
It says "Internet access disabled
Freely & publicly available external data is allowed, including pre-trained models".
Can we use a model we train and upload to huggingface?
Thanks

deft fox Aug 31, 2023, 8:28 PM

#

rough cosmos Hi, I am new to kaggle competitions and have a few questions. Are we allowed to ...

Can we use a model we train and upload to huggingface? No, but you can use a model that you train and upload to Kaggle. The key is that you can't use an internet connection, but anything uploaded to Kaggle can be accessed directly even with internet turned off.

green haven Aug 31, 2023, 8:30 PM

#

@deft fox this is kinda random but you said you got your first computer in 1984. Was it the first macintosh?

deft fox Aug 31, 2023, 8:44 PM

#

green haven <@1050260888667557999> this is kinda random but you said you got your first comp...

Commodore 64. My first exposure to Macintosh was in 1989, which at that point was Mac II.

green haven Aug 31, 2023, 8:44 PM

#

Pretty cool, thanks for that fun fact.

regal plank Aug 31, 2023, 11:26 PM

#

I am wondering:

code:
from sklearn.tree import DecisionTreeRegressor

Why do we import the specific function instead of just:import slkearn

graceful axle Sep 1, 2023, 1:27 AM

#

deft fox The answer was already provided in the error message: **Reshape your data either...

👍

deft fox Sep 1, 2023, 2:53 AM

#

regal plank I am wondering: code: ``from sklearn.tree import DecisionTreeRegressor`` Why d...

You don't have to import individual functions, but there are at least 2 reasons to do so: 1) the whole sklearn takes up more memory than individual functions; 2) later in the script individual functions are called only by their name (DecisionTreeRegressor) while you would have to type the whole thing if only sklearn was imported (sklearn.tree.DecisionTreeRegressor). So, memory savings and less typing.

graceful axle Sep 1, 2023, 5:08 AM

#

Any one here from kaggle staff ? I want DM about my payment

rustic ether Sep 1, 2023, 5:51 AM

#

Hey guys! I mostly been doing cv stuff, but I've been looking into Reinforcement Learning, especially with a robot simulation. Is there a pathway/free resources where I could look into deep RL with simulations in unity?

gloomy egret Sep 1, 2023, 5:52 AM

#

i need to feed my data into an llm, i am using lora to do it, but i have a large amount of text data it would have nearly 500m tokens, so does that harm the accuracy or efficiency of the model in any way if so is there any other methods to input data into llms.

rough cosmos Sep 1, 2023, 7:50 AM

#

rustic ether Hey guys! I mostly been doing cv stuff, but I've been looking into Reinforcement...

Not sure about unity but hugging face has a pretty good intro course for RL

graceful axle Sep 1, 2023, 9:43 AM

#

Hello,

I'm currently working on a time series project, and I intend to employ the EMD+CNN technique for forecasting the output. Upon applying EMD to the training data, I obtained a total of 14 Intrinsic Mode Functions (IMFs). Consequently, I constructed my CNN neural network with dimensions (30100, 20, 14, 1), with 20 representing the window size. However, I encountered an issue when attempting to decompose the test data using EMD, as it produced only 11 IMFs. This inconsistency caused an error when trying to execute the CNN model.

I have two questions: Is there a method to enforce a consistent number of IMFs during the EMD decomposition process? If not, is there an automated way to select the most significant IMFs?

Please note that I am utilizing the EMD-signal library in Python.

Thank you

regal plank Sep 1, 2023, 10:14 AM

#

deft fox You don't have to import individual functions, but there are at least 2 reasons ...

Gotcha, thanks!

sage belfry Sep 1, 2023, 11:12 AM

#

Hi all Kaggle Family!!

I recently published the following post in the Q&A Forum about a two step model for document classification. It would be great if you can have a look and help me with this problem I'm trying to face since I'm a bit lost at this point . Thanks a lot in advance! 🙂

https://www.kaggle.com/discussions/questions-and-answers/436192#2418471

Two Stage Model for Document Classification | Kaggle

Two Stage Model for Document Classification.

regal plank Sep 1, 2023, 12:08 PM

#

How can I exactly check and compare ?

i the prediction is only price how can I know which home was inputted and hence check.

like the prediction is an array of prices, how can I check which price is for which home ? or even what features are being applied ?

if that makes sense.

worn crow Sep 1, 2023, 3:30 PM

#

Hi guys, newbie here. I submitted an answer for Digit Recognizer competition and it was accepted. Now I'm trying to use that model and create a website or desktop application. But I'm stuck. I tried to get a prediction using the test data using the below code, but an error was thrown. What can be the issue here? and as a start, how can I use this model on Gradio to make a simple digit recognizer. TIA!

test = X_test.iloc[0]
pred=model.predict(test)
print(pred)

The first image is the start of the error, and the second image is the end of the error.

thick glacier Sep 1, 2023, 4:20 PM

#

worn crow Hi guys, newbie here. I submitted an answer for Digit Recognizer competition and...

In this prediction you need a 2D array but you pass a 1D array.

#

Always read errors because the solutions are there.

dapper stratus Sep 1, 2023, 7:40 PM

#

did anyone mess around w yolo enough i can ask them a question

deft fox Sep 1, 2023, 7:45 PM

#

dapper stratus did anyone mess around w yolo enough i can ask them a question

By doing it this way you are basically asking someone to commit to answering your next question before they see it. I suggest you ask your question, and it may or may not be answered. That's the nature of these types of forums.

dapper stratus Sep 1, 2023, 7:46 PM

#

ok my bad

deft fox Sep 1, 2023, 7:46 PM

#

dapper stratus ok my bad

Nothing bad about it, just ask a question.

dapper stratus Sep 1, 2023, 7:46 PM

#

i have used yolo v8 to train my object detection model to train on a bunch of pics of apples and bananas

#

it generated a train folder that looks like this

#

#

i am now facing diffuculties trying to use this to test on new pics that i have

#

i am done with the training part but i can't test on the pics i have of apples and so

blazing thicket Sep 1, 2023, 9:47 PM

#

Hello Everyone. I;m in the Intro to SQL Course facing some issues anyone there to help me?

wheat furnace Sep 1, 2023, 11:29 PM

#

blazing thicket Hello Everyone. I;m in the Intro to SQL Course facing some issues anyone there t...

What's the issue?

blazing thicket Sep 2, 2023, 12:30 AM

#

wheat furnace What's the issue?

Hey. Thank you but its solved. 🙂

heavy oasis Sep 2, 2023, 4:54 AM

#

Where I could get data on fire event globally

nova tulip Sep 2, 2023, 5:01 AM

#

As a data scientist working remotely. Anybody has any recommendations on which country to migrate? Considering taxes, culture and all of that. (Not really important but would prefer a cold country, but still open to any country)

slate pulsar Sep 2, 2023, 6:05 AM

#

Hi there,
I have performed EDA on a dataset, but the notebook is not shown in the notebook section of that dataset
how can I have my notebook there?

solar plank Sep 2, 2023, 10:48 AM

#

Hey there, I am just starting off with Kaggle,
is there any list/sheet for different Kaggle Datasets to practice for beginners (equivalent to LeetCode 75 for example) to learn and implement different ML approaches?

thorny mirage Sep 2, 2023, 11:44 AM

#

dapper stratus

hay there, try it model = YOLO("runs/detect/train5/weights/best.pt") or same sub folders like this runs/detect/train5 .

#

little trick to find it,

❯ find runs/detect | grep -i best.pt
runs/detect/train5/weights/best.pt

twin lion Sep 2, 2023, 1:33 PM

#

hey hope you all are doing i want to get any ideas regarding the projects that can help me land a job in machine learning

dense atlas Sep 2, 2023, 2:35 PM

#

slate pulsar Hi there, I have performed EDA on a dataset, but the notebook is not shown in th...

You need to make your notebook public (in the editing tab)

slate pulsar Sep 2, 2023, 3:30 PM

#

dense atlas You need to make your notebook public (in the editing tab)

Got it
Thanks

wraith ledge Sep 2, 2023, 11:44 PM

#

Is it a good idea to post a notebook on statistical methods for data analysis like distributions method to get more upvotes ?

green haven Sep 3, 2023, 3:21 PM

#

Hahah I recall now I am enrolled in a course.

junior walrus Sep 3, 2023, 3:27 PM

#

@vast relic : i need some responses regarding machine learning survey forms can i post here ?

amber flax Sep 3, 2023, 5:40 PM

#

Hi! For the past few days, I've been trying to fine-tune a model using TPU parallelism / FSDP with a Kaggle TPU notebook. The reason I need to set up FSDP is because the model I'm using is very large (Openlm's open llama 3b v2). When I try to fine-tune it, I quickly run out of memory on the TPU. I'm not sure where to even begin with trying to get this to work, I was able to find this article in the documentation of Hugging Face Transformers Trainer, but I don't understand what I'm supposed to be doing...

Link: https://huggingface.co/docs/transformers/main_classes/trainer#pytorchxla-fully-sharded-data-parallel

My current code: https://www.kaggle.com/code/starblasters8/fine-tuning-llama

Any help would be greatly appreciated!!

Fine-Tuning-Llama

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

Trainer

hidden plinth Sep 3, 2023, 8:10 PM

#

Hi, has anyone come from an unrelated bachelors degree to a masters? Or have gotten into the field through alternative means other than achieving a Bachelors?

#

Currently getting a bachelors in an unrelated, but statistic heavy degree that I am completely uninterested in. I am looking to get into data science since the only thing I really enjoyed about my degree so far has been the stats lol.

unreal portal Sep 4, 2023, 12:13 PM

#

Two questions related to creating a kaggle dataset:

Isn't the data limit per dataset supposed to be 100GB? I currently have a dataset of size ~50GB and when trying to upload an additional ~16GB of data it says I'm exceeding the size limit.
I have uploaded my data in batches (see attached image) but want to unpack the individual folders so that all the data is in one single folder. How do i do this?

pliant rune Sep 4, 2023, 3:08 PM

#

Hello everyone, I'm currently working on estimating the market size of the retail credit market in South Africa, and I'm facing some challenges. I'd appreciate your insights and suggestions on which statistical models or methods might be suitable for this task. Additionally, if anyone has experience or expertise in market sizing, I'd be grateful for any guidance or best practices you can share. Thank you in advance for your assistance!

grave echo Sep 4, 2023, 3:08 PM

#

hidden plinth Hi, has anyone come from an unrelated bachelors degree to a masters? Or have got...

Hey there! I think you'd find a lot of people from non-CS backgrounds working in the DS / ML space. I have a bachelors in core electrical engineering but I didn't like the domain so I shifted to machine learning and general computer science on my own during undergrad. It just takes extra time and effort to make the shift. Easier to do it while you're a student.

pure burrow Sep 4, 2023, 4:46 PM

#

hello everybody i am new and i have problem with an exercise notebook , i delete some part of the initial code and i want to restart the file from the beginning

hidden plinth Sep 4, 2023, 5:54 PM

#

grave echo Hey there! I think you'd find a lot of people from non-CS backgrounds working in...

Thanks for the reply! Good to know, I am thinking of pursuing a masters in DS but I think most of the programs have a ton of Prereqs

foggy monolith Sep 4, 2023, 6:00 PM

#

hidden plinth Hi, has anyone come from an unrelated bachelors degree to a masters? Or have got...

I am in a similar situation myself. I graduated with a degree in Biology but am trying to get into data science after taking biostatistics and using R for my senior thesis.pursuing my masters in Biological data science but I think, as for any field, learning the skills that you will likely need on the job and practicing them thru your own projects will be crucial in landing a job

hidden plinth Sep 4, 2023, 6:42 PM

#

foggy monolith I am in a similar situation myself. I graduated with a degree in Biology but am ...

I totally agreee. I am a psych major but my school focuses heavely on research. I have taken like 6 different research stat classes. My fear is that most of my prereqs wont translate when I go to apply for a masters. Also, I have only ever used SPSS, we never really got to mess around with R or any database programs. From my understanding STEM degrees have a far better chance at getting into those types of programs than social sciences.

zinc thistle Sep 4, 2023, 8:49 PM

#

is this normal?

#

when i run the code it sometimes has an error

#

and then it sometimes works

cedar violet Sep 4, 2023, 8:50 PM

#

zinc thistle is this normal?

what error do you get

zinc thistle Sep 4, 2023, 8:51 PM

#

#

im trying to do something with this data:
https://www.kaggle.com/datasets/nelgiriyewithana/top-spotify-songs-2023

Most Streamed Spotify Songs 2023

Hottest Spotify Hits 🎵

cedar violet Sep 4, 2023, 9:04 PM

#

zinc thistle

looks like there's a row that contains an inconsistent data, in this case the long string. hence your model cannot interprete it. try checking the row that contains this particular data and see if u can drop it

zinc thistle Sep 4, 2023, 9:29 PM

#

cedar violet looks like there's a row that contains an inconsistent data, in this case the lo...

found it, thanks

deft fox Sep 4, 2023, 9:36 PM

#

zinc thistle

The line at the top and the bottom tell you what the problem is. Decision trees want purely numerical data, and you seem to have a mix of numerical and categorical variables. All non-numerical features must be converted to numbers.

zinc thistle Sep 4, 2023, 10:14 PM

#

zinc thistle found it, thanks

btw here is what was wrong if anyone is curious

inbox_16183763_aed775bd8d9e0b8f15c805c55818d957_csv.png

regal plank Sep 4, 2023, 11:39 PM

#

Hi, I am running a notebook and getting this error ?
I assume this an older version as I dont see " > | "

My account is phone verified, where is the connect to the internet setting ?

unreal salmon Sep 5, 2023, 1:16 AM

#

Hi there, I am trying to build a weather classification app with streamlit. The problem is my model is over 25 MB (it's 87MB minimum let's say) which GitHub doesn't allow as per their size restriction. I am thinking of using Git LFS to store a pointer to that file but I read the streamlit doesnt interact with git LFS to fetch the large object in the LFS repository.
I need advice on how I can push my large file into the repo directly for my app to find and use it.

tardy lodge Sep 5, 2023, 1:17 AM

#

regal plank Hi, I am running a notebook and getting this error ? I assume this an older vers...

It's under "Notebook options" in the right-hand pane within the editor

vivid owl Sep 5, 2023, 2:15 AM

#

zinc thistle btw here is what was wrong if anyone is curious

Thank you for sharing. That’s a long string value. Good job catching it! It blended in so well with other numerical values. 🤣

tawdry thorn Sep 5, 2023, 11:21 AM

#

what do I do if my mentor has never shown up or answered my messages?

vivid owl Sep 5, 2023, 12:55 PM

#

tawdry thorn what do I do if my mentor has never shown up or answered my messages?

Hi @tawdry thorn - This is a channel for all the Kaggle members (14 million of them can view this post). I suggest you post KaggleX related topics in channels prefixed with "kx-" that stands for "KaggleX".

Read my message in a KaggleX channel here: https://discordapp.com/channels/1101210829807956100/1145761138248785982/1148605268599513108

thin sequoia Sep 5, 2023, 2:01 PM

#

Hey guys,
I have a general query
So, I started learning ML in june last year, then my college started and my first year was very rigirous so I had to put my focus on it. Now, it's over, I want to resume my learning, what should I do, which path do you recommend ?
It will be really helpful for me!
Thanks 💛
More context: I did intro and intermediate ml courses on kaggle, participating in 2-3 beginner friendly competitions, and then started ML Specialization by Andrew Ng sir, did 2 courses and made a project.
Now that almost an year passed, I am not able to recall most of the concepts like how to handle bias and variance, gradient descent, entropy etc...

Which path from following should I choose ?

Do a recap of both kaggle courses and a fast revision of both ML specialization courses, and participate in more competitions, also make projects.
Do the recap of kaggle courses and re-learn from both courses in ML specialization and participate in more competitions with projects.

Any other path you know which will help me better than above.
Or anything you would like to add on ?
It will really helpful for me, I'll appreciate it!

cerulean basin Sep 5, 2023, 2:02 PM

#

Hello! I would like to fine tune LayoutLM using my own dataset of form images. These images are similar to those included in the Funsd dataset. I intend to annotate the data using the exact structure of the Funsd dataset. My question is regarding the block level annotations, do the bounding box coordinates of the block need to coincide with the bounding boxes coordinates returned by the OCR (in my case I'm using pyTesseract to get the box dimensions). The problem is that the blocks found by the pyTesseract do not always match the desired box boundaries.

charred rock Sep 5, 2023, 6:19 PM

#

Hello everyone, please what is the difference between Data science and machine learning? I am confused

wind ice Sep 5, 2023, 7:45 PM

#

In simpler words, Data Science is data driven decision making and Machine Learning focuses on learning from the data to train the models

charred rock Sep 6, 2023, 12:47 AM

#

wind ice In simpler words, Data Science is data driven decision making and Machine Learni...

Thank you so much 🙏

hard ether Sep 6, 2023, 4:15 AM

#

May anyone recommend some coding/programming or machine learning internships for high school students?
The more the merrier!
Thank you in advance!

kindred rune Sep 6, 2023, 5:57 AM

#

I am a newbie deep learning enthusiast, I am encountering a problem while creating and training a model, The accuracy of the model changes every time I run the code and the change is sometimes substantial, I have created the model by following the instructor and checked the code thoroughly for typos. Everything is perfect but the accuracy of the model changes with every instantiation of the model which is not logical as I have already set a random seed for that model. Please see the screen recording attached to see the issue. Can anybody explain this to me?

silent kite Sep 6, 2023, 7:26 AM

#

zinc thistle is this normal?

computer not work with text. also machine learning algorithm, bcs. all machine learning algorithm just a math formula.
1- set test-size in train_test_split function.
2- convert your categorical features to numerical

main mango Sep 6, 2023, 7:35 AM

#

kindred rune I am a newbie deep learning enthusiast, I am encountering a problem while creati...

Check out https://www.tensorflow.org/api_docs/python/tf/config/experimental/enable_op_determinism for how to enable determinism (reproducible results), and its side-effects.

TensorFlow

tf.config.experimental.enable_op_determinism | TensorFlow v2.13.0

Configures TensorFlow ops to run deterministically.

kindred rune Sep 6, 2023, 7:42 AM

#

main mango Check out https://www.tensorflow.org/api_docs/python/tf/config/experimental/enab...

That was really helpful for me to understand the issue. thank you for your input and time.😇

deft fox Sep 6, 2023, 9:13 AM

#

kindred rune I am a newbie deep learning enthusiast, I am encountering a problem while creati...

When you run a model for a fixed number of epochs (25 in your case), sometimes the model learns more up to that point, and sometimes less. It is a function of how close the initial weights were to the actual solution. Instead, you should run the model with arbitrarily large number of epochs, say 500, and use early stopping with patience of 5-10 epochs. That means the training stops if the loss function doesn't decrease for 5-10 epochs. If done this way, you will get more reproducible results, and the number of training epochs will likely be different as well each time. That is normal.

glossy moth Sep 6, 2023, 10:15 PM

#

Hi guys, I cannot import a library module in kaggle while working on my own local. Can someone help me please ?
Thanks
@verbal crest

deft fox Sep 7, 2023, 12:35 AM

#

glossy moth Hi guys, I cannot import a library module in kaggle while working on my own loca...

Your problem is not in import - it is in not being able to install skimpy. If you read the whole error message, you will see that for some reason it can't resolve names and connect to PyPi to fetch skimpy.

graceful axle Sep 7, 2023, 5:25 PM

#

Starting with ML, I have Pandas and Numpy done. What should I start learning in ML? Try to do something with Titanic dataset?

hidden plinth Sep 8, 2023, 1:45 AM

#

graceful axle Starting with ML, I have Pandas and Numpy done. What should I start learning in ...

are you referring to the courses offered on kaggle?

thick current Sep 8, 2023, 10:59 AM

#

Hello, everyone. Do any of know where can I find sources to create my own dataset? I would like to create a project or dataset, where the it will predict the time a lettuce to grow based on temperature, humidity, tds value, ph level, and nutrient solutions in a controlled environment. Thank you in advance.

steady dune Sep 8, 2023, 1:11 PM

#

Please anyone guide me how to decide which algo is to apply.
And what steps should i take to do EDA?

waxen siren Sep 8, 2023, 2:22 PM

#

Hello there, I have a question guys. I have to work with Knowledge graphs and am completely new to ML. Could someone suggest some tutorial on PyKeen? It would be really helpful. Like a crash course or something

pseudo holly Sep 8, 2023, 2:25 PM

#

thick current Hello, everyone. Do any of know where can I find sources to create my own datase...

Hi Jjay, are you looking for existing data or do you want to collect new data by setting up a physical environment of lettuce growth?

thick current Sep 8, 2023, 2:27 PM

#

pseudo holly Hi Jjay, are you looking for existing data or do you want to collect new data by...

I am trying to set up a physical environment, however, for now, I would like to know or get sample datasets that I mentioned and try to predict the time it takes to grow. And then, I will set up a physical environment where those independents variables will be controlled.

pseudo holly Sep 8, 2023, 2:33 PM

#

thick current I am trying to set up a physical environment, however, for now, I would like to ...

Got it 😜 Maybe try searching in Kaggle Datasets and UCI Machine Learning Repository. Otherwise, maybe try to search in academic journals and research papers

thick current Sep 8, 2023, 2:34 PM

#

pseudo holly Got it 😜 Maybe try searching in Kaggle Datasets and UCI Machine Learning Repos...

I already tried searching it on kaggle, and cannot find one, or maybe I search for the wrong keyword. And also, I will try the others you mention. Thank you for your help.

south shard Sep 8, 2023, 3:16 PM

#

Are you going to schedule a new diffusers event? I was looking forward to that.

severe pewter Sep 8, 2023, 5:18 PM

#

How can I improve my DNN solution here:
https://www.kaggle.com/code/touhidurrr/predict-survival-in-titanic-with-deep-learning

Predict Survival in Titanic with Deep Learning

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

fair ingot Sep 8, 2023, 6:33 PM

#

Anyone else having an issue saying there is no CSV file found when submitting?
I know the file is being created, maybe it's not in the correct place?
Outputting it to /kaggle/working/submission.csv

#

When I submit predictions I see an error but when I check the latest version of the notebook under "Output" tab I see the file with data in the correct format

grave solstice Sep 9, 2023, 7:42 AM

#

Guys, please help me find resources for: Analysis of News Articles and videos for regional languages

I want to make a Media News Monitoring and Feedback System that can handle multiple regional languages, categorize news stories, and notify me about negative coverage of news in the media.
Please suggest some good resources related to sentiment analysis from news articles and video transcripts

deft fox Sep 9, 2023, 9:33 AM

#

fair ingot Anyone else having an issue saying there is no CSV file found when submitting? I...

@fair ingot Try saving the file to "submission.csv" rather than "/kaggle/working/submission.csv"

kindred rune Sep 9, 2023, 10:59 AM

#

Help Required: I am try to detect and remove the outliers from a dataframe.

The dataframe is very extensive and huge so I have selected three key features ['TS', 'Mean_RMS', 'Mean_ToF']. The main idea is to calculate z scores and detect outliers (whose z scores are greater than 3 standard deviations). Then append the indices of those outliers in a separate list. After that use this list of indices to filter out the rows from the main dataframe df.
See my code and error I am encountering:

from scipy import stats
from sklearn.utils import resample
from joblib import Parallel, delayed

Define the number of samples to take

num_samples = 10000

Define the number of parallel processes

num_processes = 4

Define the threshold value

threshold = 3

Define the outlier detection function

def detect_outliers(data):
z_scores = stats.zscore(data)
outlier_indices = np.argwhere(np.abs(z_scores) > threshold)[:, 0]
return outlier_indices

Select feature columns to detect outliers from

df_select = df[['TS', 'Mean_RMS', 'Mean_ToF']]

Perform outlier detection on random samples in parallel

samples = [resample(df_select, n_samples=num_samples) for _ in range(num_processes)]
outlier_indices = Parallel(n_jobs=num_processes)(delayed(detect_outliers)(sample) for sample in samples)

Flatten the list of outlier indices

outlier_indices = np.concatenate(outlier_indices)

Remove the outliers from the DataFrame

df.drop(outlier_indices, inplace=True)

Reset the index of the DataFrame

df.reset_index(drop=True, inplace=True)

Error:
ValueError: Shape of passed values is (2, 350), indices imply (10000, 3)

Please help me resolve this error. Thanks in advance.

deft fox Sep 9, 2023, 11:06 AM

#

kindred rune Help Required: I am try to detect and remove the outliers from a dataframe. The...

I have an old script that removes outliers by modified Z-scores:

#

https://www.kaggle.com/code/tilii7/you-want-outliers-we-got-them-outliers

You want outliers? We got them outliers!

Explore and run machine learning code with Kaggle Notebooks | Using data from Mercedes-Benz Greener Manufacturing

olive tinsel Sep 9, 2023, 1:02 PM

#

llamaindex, langchain, assembly ai, weaviate, clarifai if we are supposed to make a chatbot with one of these, which would be good and free and can anyone share resources in making ai chatbots😅

tired granite Sep 9, 2023, 2:04 PM

#

Want to try Google Cloud AI Platform Notebooks. But getting the error below and don't see GPUs in any region on Google Cloud. How does one get around this?

nvidia-t4-1x: The zone 'projects/bkowshik-kaggle/zones/us-central1-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later.: Something went wrong. Sorry about that.

pure burrow Sep 9, 2023, 2:13 PM

#

@vivid owl i have problem with an exercise notebook , i delete some part of the initial code and i want to restart the file from the beginning- it a python course the module 5 Exercise: Loops and List Comprehensions

zinc karma Sep 9, 2023, 2:31 PM

#

heyy folks, so I am relatively to new to the field of deep learning. I was working on a project for time series forecasting. It has a lot of factors affecting gdp of a country so I was thinking about mutlivariate analysis but it isnt working like it should. Like I tried using different libraries and approaches but the graph always seems not being impacted much by the factors.. I wasnt able to find any good resources for the same as well. Can anyone help me with that?

vivid owl Sep 9, 2023, 2:33 PM

#

pure burrow <@1000738878971445291> i have problem with an exercise notebook , i delete som...

Step 1: Go to: https://www.kaggle.com/code/colinmorris/exercise-loops-and-list-comprehensions
Step 2: Click "Copy & Edit" that appears at the top right corner (marked in the screen print)

Exercise: Loops and List Comprehensions

Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources

vivid owl Sep 9, 2023, 2:40 PM

#

zinc karma heyy folks, so I am relatively to new to the field of deep learning. I was worki...

Hi - Discord is new and we all are still exploring and experimenting to find the best way to ask questions and get responses, but this worked for me and wanted to share.

Describe the issue in detail so people will know exactly what the issues you are experiencing
Add a link to your Kaggle notebook so that people can take a look and investigate for you (vs. imagine what the error/issue might or could be 🤔 )
People will respond by leaving suggestions in the Comment section of your notebook in the Kaggle platform or here in Discord

Hope you will find this tip helpful. Good luck!

Below is an example of what I described above:

https://discordapp.com/channels/1101210829807956100/1133184287886299237/1148812886026764360

pure burrow Sep 9, 2023, 2:41 PM

#

vivid owl Step 1: Go to: https://www.kaggle.com/code/colinmorris/exercise-loops-and-list-c...

the problem is solved ,thanks

vivid owl Sep 9, 2023, 2:41 PM

#

pure burrow the problem is solved ,thanks

Great! Thank you for letting me know. 👏 🥳 🤩

zinc karma Sep 9, 2023, 2:47 PM

#

vivid owl Hi - Discord is new and we all are still exploring and experimenting to find the...

thanks alot!

vivid owl Sep 9, 2023, 4:59 PM

#

olive tinsel llamaindex, langchain, assembly ai, weaviate, clarifai if we are supposed to mak...

Hi - I am not knowledgeable in the space, but short courses offered by DeepLearning.AI might assist?!

https://www.deeplearning.ai/short-courses/

Short Courses

Take your generative AI skills to the next level with short courses from DeepLearning.AI. Enroll today to learn directly from industry leaders, and practice generative AI concepts via hands-on exercises. Available free for a limited time.

olive tinsel Sep 9, 2023, 5:00 PM

#

vivid owl Hi - I am not knowledgeable in the space, but short courses offered by DeepLearn...

I will go through them thank you😅😅

vivid owl Sep 9, 2023, 5:01 PM

#

olive tinsel I will go through them thank you😅😅

Please come back to Discord and let us know what you will have developed so we all can learn from you! 🤓

olive tinsel Sep 9, 2023, 5:02 PM

#

vivid owl Please come back to Discord and let us know what you will have developed so we a...

Yeahh sure I will try to build something😅

severe pewter Sep 9, 2023, 10:29 PM

#

Beginner Notebooks on DNN, TFDF and RF: How can I improve the accuracy?

I am a beginner and I have 3 notebooks that use 3 different approaches to predict survival on Titanic. I tried many things but I was not being able to get my accuracy above 80%. To break this wall, I need advice of knowledgeable people in the Kaggle Community. Please share your advice with me regarding how to improve my accuracy!

DNN Approach (78% accuracy):
https://www.kaggle.com/code/touhidurrr/predict-survival-in-titanic-with-deep-learning

TFDF Approach (77% accuracy):
https://www.kaggle.com/code/touhidurrr/predict-survival-in-titanic-with-decision-forests

Random Forest Approach (74% accuracy):
https://www.kaggle.com/code/touhidurrr/predict-survival-in-titanic-with-random-forest

Predict Survival in Titanic with Deep Learning

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

Predict Survival in Titanic with Decision Forests

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

Predict Survival in Titanic with Random Forest

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

fringe igloo Sep 10, 2023, 2:39 AM

#

Hey all, sorry for asking, but can anyone point me in the right direction on how to get started learning reinforcement learning on PyTorch?

I need the knowledge to solve one of freecodecamp's ML problems here https://www.freecodecamp.org/learn/machine-learning-with-python/machine-learning-with-python-projects/rock-paper-scissors but the course on FCC mainly used Tensorflow and TF runs very slow on Replit.

freeCodeCamp.org

Learn to Code — For Free

hidden plinth Sep 10, 2023, 4:53 AM

#

I know this is a bit subjective, but do you guys recommend going through all the Learn Lessons first then trying a competition?

hazy spire Sep 10, 2023, 10:03 AM

#

Hello Kaggle Community,

I'm currently working on a project analyzing two decades of Premier League soccer data with the goal of creating a predictive model. However, I'm new to soccer datasets. If anyone has experience or insights to share on soccer data analysis and regression modeling, I'd greatly appreciate your guidance.

Specifically, I'm interested in predicting full time outcomes from half-time data, and predictive modeling based on the historical data. Your tips, resources, or collaboration would be invaluable.

Please reply or reach out if you can help. Thank you!

red hawk Sep 10, 2023, 2:05 PM

#

tired granite Want to try Google Cloud AI Platform Notebooks. But getting the error below and...

you might not have sufficient quota if your project is new (you can request an increase-- instructions here https://cloud.google.com/compute/resource-usage#gpu_quota

Google Cloud

Allocation quotas | Compute Engine Documentation | Google Cloud

#

what exactly do you need help with? Also, that looks to me like a job interview take home question, in which case it is not very appropriate to ask other people to help you solve it...

fading swift Sep 10, 2023, 2:31 PM

#

red hawk what exactly do you need help with? Also, that looks to me like a job interview ...

I needed some help to understand my code better and I am not asking anyone to solve the whole thing for me.

boreal belfry Sep 11, 2023, 3:00 AM

#

Anyone can help me with what I need to do in this competition?
https://www.kaggle.com/competitions/playground-series-s3e21/overview

Improve a Fixed Model the Data-Centric Way!

Playground Series - Season 3, Episode 21

#

I don't know what to do, I want to know what to do 😁

dusk radish Sep 11, 2023, 3:10 AM

#

💀

tacit remnant Sep 11, 2023, 11:56 AM

#

Hii Everyone, I am in a mess !!! I am new to data science initially working as data analyst
I need some help related to one task which got assigned to me which is related to data science , where I have to make a time series model in python can anyone share his experience breiefly here and guide me little bit

split epoch Sep 11, 2023, 4:39 PM

#

Hey guys, I got 0.78229 on my first submission.

Could anyone look over my code and offer some suggestions? This is my first ML project and I want to also make a YouTube video on how I built it out and such, I know I still need to leave a lot more comments/documentation and clean up a few sections

https://www.kaggle.com/ryannolan1/titanic-wip-78-accuracy

Titanic WIP 78% Accuracy

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

raw vortex Sep 11, 2023, 6:45 PM

#

Hey guys plz help me

#

I recently learnt data visualization via kaggle and in the final project i have completed it but on kaggle it shows only 75% and due to which i am not getting my certificate for data visualization as its has been 98% done and it requires 100%

thick current Sep 12, 2023, 1:20 AM

#

Hi, everyone. Could you please take a look at my beginner projects, where I do a prediction of growth days of lettuce in a controlled environment? I think I am missing something I don't understand and I think my dataset also has missing features in order to predict the time it takes lettuce to grow in a controlled environment where temperature, humidity, tds value, ph level are automated and used for predictions.

Here is the link for my kaggle notebook. If can comment what I did wrong, I will gladly take it as a stepping stone to further improve my knowledge. Thank you in advance. https://www.kaggle.com/datasets/jjayfabor/lettuce-growth-days

Lettuce Growth Days

green haven Sep 12, 2023, 2:34 AM

#

Hey, just got a warning for self-promo on Kaggle 100% deserved there, but it says if you keep posting your account will be banned. They mean from now on or should I go back and delete all the ones from the past

deft fox Sep 12, 2023, 3:10 AM

#

green haven Hey, just got a warning for self-promo on Kaggle 100% deserved there, but it say...

@green haven It is best you ask those who warned you. I don't think the problem was that you were promoting your work. I think it is that you posted the same notebook announcement in multiple channels, such as in jobs, that had nothing to do with promoting notebooks. If you tamp that down and post in channels that are meant for sharing, I suspect you will be fine.

green haven Sep 12, 2023, 3:10 AM

#

I mean on Kaggle website

#

Not discord

deft fox Sep 12, 2023, 3:12 AM

#

The first sentence of my response still applies.

#

Sure you can, go to their general discussions:

#

https://www.kaggle.com/discussions/general

General | Kaggle

General.

#

https://www.kaggle.com/discussions/getting-started

Getting Started | Kaggle

Getting Started.

verbal crest Sep 12, 2023, 4:54 AM

#

green haven Hey, just got a warning for self-promo on Kaggle 100% deserved there, but it say...

Generally speaking we mean from now on, but if people report your old spammy posts it might lead to future violations, so if you want to be extra safe you should clean up older spam.

green haven Sep 12, 2023, 11:26 AM

#

Ok, thank you

low mural Sep 12, 2023, 11:55 AM

#

Hello everyone!
I'm looking to practice feature engineering. Do you guys have any recommendations for a Get Started competition where this skill would be particularly useful?

hushed juniper Sep 12, 2023, 12:01 PM

#

does anyone know why some people use log softmax activation during training instead of seperating the log and the softmax?

dull hornet Sep 12, 2023, 2:21 PM

#

Hi everyone. It's been some time since I practiced ML. My focus was only on data visualization and analytics so neglected this area. How do I start all over again when it comes to ML?

mystic bolt Sep 12, 2023, 2:57 PM

#

hi guys, is there any good free online statistic book ,I'm new to machine learning anyways

vivid owl Sep 12, 2023, 4:25 PM

#

mystic bolt hi guys, is there any good free online statistic book ,I'm new to machine learni...

Here you go: https://openintro-ims.netlify.app/

Welcome to IMS1 | Introduction to Modern Statistics (1st Ed)

This is IMS1!

mystic bolt Sep 12, 2023, 5:51 PM

#

vivid owl Here you go: https://openintro-ims.netlify.app/

@vivid owl thanks msn

signal lance Sep 12, 2023, 6:18 PM

#

Does anyone know how to config the UI? I want to exclude console UI. Kaggle notebook is awesome, but very hard to adjust the UIs

arctic marten Sep 12, 2023, 8:38 PM

#

I am getting this error/warning message on Kaggle. please how do i solve it?

verbal crest Sep 12, 2023, 9:21 PM

#

signal lance Does anyone know how to config the UI? I want to exclude console UI. Kaggle note...

Looks like a weird bug is making the collapsed console space too big on your screen. I'd suggest posting in the product feedback forum so our engineers can take a look (ideally include your browser / operating system too).

red hawk Sep 12, 2023, 9:22 PM

#

arctic marten I am getting this error/warning message on Kaggle. please how do i solve it?

what are you trying to run on your notebok? it looks like you are trying to load too much into memory and it crashed the notebook

arctic marten Sep 12, 2023, 9:27 PM

#

red hawk what are you trying to run on your notebok? it looks like you are trying to load...

I am working on a dataset that contains csv files for 6 years (eg 2000.csv, 2001.cvs etc). I am trying to merge the whole dataset into one. Is there a way I can run the data successfully? The dataset is from kaggle (2GB memory).

red hawk Sep 12, 2023, 9:28 PM

#

arctic marten I am working on a dataset that contains csv files for 6 years (eg 2000.csv, 2001...

2GB each? or 2GB in total?

#

Also, how are you planning on using the combined file? e.g. it is fairly easy to combine csvs together using a bash command (e.g. https://unix.stackexchange.com/questions/293775/merging-contents-of-multiple-csv-files-into-single-csv-file) without having to load everything in memory, but you do need to remember that you will also have issues trying to, say load the combined csv into pandas on kaggle

Unix & Linux Stack Exchange

Merging contents of multiple .csv files into single .csv file

I want to write a script that merges contents of several .csv files in one .csv file, i.e appends columns of all other files to the columns of first file. I had tried doing so using a "for" loop bu...

arctic marten Sep 12, 2023, 9:41 PM

#

red hawk 2GB each? or 2GB in total?

Total

red hawk Sep 12, 2023, 9:42 PM

#

arctic marten Total

hmm, that is odd, since 2 GB usually should load fine

arctic marten Sep 12, 2023, 9:43 PM

#

red hawk Also, how are you planning on using the combined file? e.g. it is fairly easy to...

I don't have the file in my local machine, it's in Kaggle.

red hawk Sep 12, 2023, 9:43 PM

#

arctic marten I don't have the file in my local machine, it's in Kaggle.

you can run bash commands in kaggle using the ! operator

#

or using %%bash cell magic

arctic marten Sep 12, 2023, 9:45 PM

#

red hawk you can run bash commands in kaggle using the `!` operator

I will try it out. This is the dataset i am talking about https://www.kaggle.com/datasets/yuanyuwendymu/airline-delay-and-cancellation-data-2009-2018

Airline Delay and Cancellation Data, 2009 - 2018

Flight info. of US domestic flights

#

cat *csv > combined.csv
To run this on Kaggle I have to use this !cat *csv > combined.csv right?

red hawk Sep 12, 2023, 9:49 PM

#

yeah

#

although do note that you have to remove the headers first (if you scroll a little on the comments on that answer

34
This answer will duplicate the headers. Use head -n 1 file1.csv > combined.out && tail -n+2 -q *.csv >> combined.out where file1.csv is any of the files you want merged. This will merge all the CSVs into one like this answer, but with only one set of headers at the top. Assumes that all CSVs share headers. It is called combined.out to prevent the statements from conflicting. –
hLk
Oct 12, 2019 at 1:00

is probably what you want

arctic marten Sep 12, 2023, 9:51 PM

#

red hawk although do note that you have to remove the headers first (if you scroll a litt...

How do i achieve this?

red hawk Sep 12, 2023, 9:52 PM

#

arctic marten How do i achieve this?

(see my edited note 🙂 )

arctic marten Sep 12, 2023, 9:53 PM

#

red hawk (see my edited note 🙂 )

Just read that part in the article now
head -n 1 file1.csv > combined.out && tail -n+2 -q *.csv >> combined.out. Let me try it

#

I am getting error

red hawk Sep 12, 2023, 10:03 PM

#

you need to run it on files. e.g. xxx.csv

#

the command in the screenshot is pointing to a directory

arctic marten Sep 12, 2023, 10:08 PM

#

I have to run it in each of the files?

#

Then how is the merging taking place?

red hawk Sep 12, 2023, 10:13 PM

#

no, just run it once

#

the *.csv means running over all the csv files

arctic marten Sep 12, 2023, 10:16 PM

#

Where will the output be saved? combined.out?

#

Error

#

I tried and debugged it, but i am getting something else

red hawk Sep 12, 2023, 10:37 PM

#

arctic marten Where will the output be saved? combined.out?

you need to pass in the directory before the *.csv ie line-delay-xxx/*csv in the

arctic marten Sep 12, 2023, 10:38 PM

#

red hawk you need to pass in the directory before the *.csv ie line-delay-xxx/*csv in the

i did that but there was no output

#

This is it

red hawk Sep 12, 2023, 10:47 PM

#

works for me

#

use

%%bash
head -n 1 /kaggle/input/airline-delay-and-cancellation-data-2009-2018/2011.csv > combined.out \
    && tail -n+2 -q /kaggle/input/airline-delay-and-cancellation-data-2009-2018/*.csv >> combined.out

arctic marten Sep 12, 2023, 10:48 PM

#

You used a different code, i will try this out now

#

The output is combined.out right?

red hawk Sep 12, 2023, 10:49 PM

#

yeah

arctic marten Sep 12, 2023, 10:50 PM

#

But is it not in csv. Mine is still running

#

Your code in the screenshot cat combined.out | wc -1 what does it mean

#

?

red hawk Sep 12, 2023, 10:52 PM

#

the last line is a word count

#

it's a text file, you can just rename it combined.csv

arctic marten Sep 12, 2023, 10:53 PM

#

red hawk it's a text file, you can just rename it combined.csv

Okay. I now changed the directory to path/combined.csv

red hawk Sep 12, 2023, 10:55 PM

#

I have shared my notebook with you https://www.kaggle.com/code/wwymak/notebook018f720f4d/edit

arctic marten Sep 12, 2023, 10:58 PM

#

Still the same issue

deft fox Sep 12, 2023, 11:04 PM

#

@arctic marten I don't know if you realize how lucky you are that Wendy has been troubleshooting this with you line by line, and from what I can tell for the better part of the past hour. If Wendy wants to keep doing it, great for you. Still, at some point I think you have to invest a bit of your own time to figure things out, as these are fairly standard and straightforward operations. I realize that I am butting in without being asked anything, but it is important not to take other people's time for granted. Wendy would most likely not tell you even if that was the case.

arctic marten Sep 12, 2023, 11:05 PM

#

deft fox <@643579512541413394> I don't know if you realize how lucky you are that Wendy h...

Thank You

deft fox Sep 12, 2023, 11:06 PM

#

I appreciate it, but that should be directed to @red hawk

arctic marten Sep 12, 2023, 11:07 PM

#

red hawk I have shared my notebook with you https://www.kaggle.com/code/wwymak/notebook01...

I can't open your notebook. It's saying permission denied.

arctic marten Sep 13, 2023, 12:02 PM

#

Hello @red hawk I was able to successfully load my data, I use nrows to specify the number of rows. Thank You so much yesterday for your time, I truly appreciate it. Thanks for teaching me that how to use unix command to load data in csv (i haven't heard that before). Do have a lovely day.

velvet bridge Sep 13, 2023, 6:29 PM

#

i need help setting up my gpu to jupyter notebook i followed the steps but it still says my cuda gpu is not available after importing torch

thick glacier Sep 13, 2023, 7:55 PM

#

hello everyone!

I have some questions related to data preprocessing. If you have any knowledge, please share it with me.

link:- https://www.kaggle.com/discussions/getting-started/439141

Questions About Data Preprocessing: Contribute Your Knowledge | Kaggle

Questions About Data Preprocessing: Contribute Your Knowledge.

silent kite Sep 14, 2023, 8:57 AM

#

thick glacier hello everyone! I have some questions related to data preprocessing. If you hav...

I have some notebook about data cleaning and data preprocessing. can you check it here. https://www.kaggle.com/zxarifi/code

zx arifi | Notebooks Contributor

Kaggle Notebooks profile for zx arifi

glacial python Sep 14, 2023, 3:45 PM

#

Hi everyone, maybe a super dumb question but I am going through learning exercises and just built my own model based on DecisionTreeRegressor from sklearn. I understand X is feature set and y is the prediction target. But when I have a prediction valie on house prices, what is y value about? I am unable to understand the prediction value when we do not have the concept of time, i.e., when we can expect the prices to be the prediction values.

#

So what exactly then y represents when we get the prediction in the end.

deep flower Sep 14, 2023, 5:13 PM

#

Does anyone have a unique project idea for ML?

hybrid halo Sep 15, 2023, 7:25 AM

#

Hello all,

I am currently trying to decrease the training time by sampling the dataset and then using that trained model to make predictions about the whole dataset.

After training on the sample, we checked the AUC for 10%, 30%, 50% and 100% sample sizes.
If the validation AUC for all of them is very close to each other we can minimize the training time by only training on the 10% of the sample for other datasets and can conclude that the predictions will be the same as that of when trained upon the whole dataset.

The problem is in the case of a very low minority class it is discarded in the sample and the predictions for those are not coming accurately.

The metrics I am using is AUC and the sampling method I am following is stratified sampling.

If you are aware of any better approaches I would like to discuss it.

real isle Sep 16, 2023, 2:02 AM

#

hybrid halo Hello all, I am currently trying to decrease the training time by sampling the ...

It depends on the number of data pints that you have. It is difficult to make suggestions cos we dont know that. In addition, when taking a certain percentage of the data, did you consider if the dataset will be imbalance? That is, having more of certain classes over the other

regal plank Sep 16, 2023, 8:45 AM

#

I noticed there is different colors for functions that can be applied when using tab key.

I assumed the blue on is for the imported package, purple is default, and the wrench is also default related to settings, right ? (just want to confirm my understanding)

main mango Sep 16, 2023, 1:44 PM

#

Have you tried downsampling the other class(es) so that the minority class is better represented? - https://imbalanced-learn.org/stable/under_sampling.html

regal plank Sep 16, 2023, 5:11 PM

#

glacial python Hi everyone, maybe a super dumb question but I am going through learning exercis...

Y is the prediction target. they are both the same.

I think Y is used in the actual code, while you can say prediction target when generally speaking or writing. it is just a convention.

Just like how a model can also be called architechure, there are many similar examples in DS

(if I am wrong someone correct me plz)

#

                SELECT u.id as id, MIN(q.q_creation_date) as q_creation_date, MIN(a.a_creation_date) as a_creation_date
                FROM `bigquery-public-data.stackoverflow.posts_answers` a 
                FULL JOIN `bigquery-public-data.stackoverflow.posts_questions` q ON q.owner_user_id = a.owner_user_id
                RIGHT JOIN `bigquery-public-data.stackoverflow.users` u ON u.id=q.owner_user_id 
                WHERE u.creation_date >= '2019-01-01'and u.creation_date <= '2019-01-31'
                GROUP BY u.id
                     """

                     SELECT u.id AS id,
                         MIN(q.creation_date) AS q_creation_date,
                         MIN(a.creation_date) AS a_creation_date
                     FROM `bigquery-public-data.stackoverflow.users` AS u
                         LEFT JOIN `bigquery-public-data.stackoverflow.posts_answers` AS a
                             ON u.id = a.owner_user_id
                         LEFT JOIN `bigquery-public-data.stackoverflow.posts_questions` AS q
                             ON q.owner_user_id = u.id
                     WHERE u.creation_date >= '2019-01-01' and u.creation_date < '2019-02-01'
                     GROUP BY id
                    """ ```

Solving the first notebook in Kaggle advanced SQL lesson, first query is mine and the second is the answer, when visualizing in head, I think the JOIN logic should give the same results at the end but unfortunately it is wrong as per kaggle check, I want to confirm maybe if it is wrong cause I used a different join or because the results itself (if that makes sense)

Any help is appreciated ty!

red hawk Sep 16, 2023, 6:11 PM

#

regal plank ``` three_tables_query = """ SELECT u.id as id, MIN(q.q_creatio...

what's the question? in any case 'MIN(q.q_creation_date) ' is wrong syntax, the column is called 'creation_date' (and similar error for 'a_creation_date'). I think the joins should give you the right results but the 2nd way of doing joins is quite a bit more clear than your 1st version (maybe a personal preference but I find thinking about 2 left joins a lot more intuitive than a full join and a right join...)

tip: If you paste your query in the bigquery UI it will highlight your query errors for you-- something that running a query in a notebook don't

regal plank Sep 16, 2023, 7:41 PM

#

red hawk what's the question? in any case 'MIN(q.q_creation_date) ' is wrong syntax, the...

Oh thanks, I will check out the bigquery UI, yea it is annyoing to not have specific query errors show in notebook.

Hmm I must have tunne visioned and didnot notice this creation date, thanks for pointing it out!

*update: Just checked, yes my logic works! *

open flicker Sep 17, 2023, 7:23 AM

#

I ran model.fit on the Kaggle online notebook and it is taking a very long time. (I ran it about 10 minutes ago and it's still 35/152 progress) Does running it online slow it down? Would it be faster to run it on a local PC? I have a gaming PC.

main mango Sep 17, 2023, 9:41 AM

#

open flicker I ran `model.fit` on the Kaggle online notebook and it is taking a very long tim...

Without more details, it's hard to tell what the issue is. One likely guess is that you haven't enabled the GPU for your notebook session. But again, it is possible for some models to take very long if the dataset is huge and/or model parameters are in the range of billions e.g. LLMs.

open flicker Sep 17, 2023, 1:44 PM

#

Kaggle seems to prohibit code sharing outside the forum, does this mean that sharing code on GitHub is also prohibited?

fluid hazel Sep 17, 2023, 1:46 PM

#

https://www.kaggle.com/discussions/getting-started/440767 stuck with this problem. Can't save a NB as dataset. It caused my NB to run longer than anticipated and fail submission. All advice is appreciated.

HOW TO Guide for Dataset from NB. | Kaggle

HOW TO Guide for Dataset from NB..

deft fox Sep 17, 2023, 3:32 PM

#

open flicker I ran `model.fit` on the Kaggle online notebook and it is taking a very long tim...

At any given time there could be thousands of users running notebooks, so the available resources vary throughout the day. If you have a reasonably recent computer and a GPU, it is very likely that it will run faster locally.

deft fox Sep 17, 2023, 3:37 PM

#

open flicker Kaggle seems to prohibit code sharing outside the forum, does this mean that sha...

Kaggle prohibits code sharing with non-group members outside of Kaggle. Anything you share with a broad audience is not a violation. That means if you post your code on GitHub and make a public link on Kaggle that anyone can access, you would be fine. Still, it is more convenient for most people to make a notebook on Kaggle and share it like that.

dim quiver Sep 17, 2023, 9:30 PM

#

hello people, I am new to Datascience and ml, I have knowledge about performing Data Preprocessing and EDA and right now i am learning ML models starting from simple and multi linear regression
Can anyone suggest me some already done analysis and cleaning on datasets on kaggle? i would like to see how people go about doing data preprocessing in different ways

#

Also a followup on this question, I would also like to know where can i learn how to create Pipelines for datascience, i already know OOPS and python concepts just wanted to know how can they be implemented

dim sundial Sep 18, 2023, 1:05 AM

#

Any good resource to learn shaders (glsl) for ai ?

dapper stratus Sep 18, 2023, 1:37 AM

#

https://colab.research.google.com/drive/1FDgwTayLdaZDcvXHMmyD5dz0DGeXXrKk?usp=sharing

Google Colaboratory

dapper stratus Sep 18, 2023, 1:38 AM

#

dapper stratus https://colab.research.google.com/drive/1FDgwTayLdaZDcvXHMmyD5dz0DGeXXrKk?usp=sh...

guys it would mean a lot if anyone could let me know please why my program is performing poorly. I get 84% on test results

#

i tried changing the model architecture a lot but it always yielded worse results, this is not my first run

#

also somehow before using data augmentation, it had better results

#

if you run the notebook yourself and are trying to check the test results, the first 150 pics should be of horses and the remainder is of humans
i did a lot of research before asking here and i checked diff methods to yield better results but they did not help much

deft fox Sep 18, 2023, 3:16 AM

#

dapper stratus i tried changing the model architecture a lot but it always yielded worse result...

You don't have enough images to train a model from scratch and still have great accuracy. It takes at least tens of thousands, and even better hundreds of thousands of images, to get truly high performance. Instead, I suggest you start with one of pre-trained image models (VGG, ResNet, SqueezeNet, take your pick) and fine-tune it with your dataset. That should give you better performance.

pseudo harness Sep 18, 2023, 1:42 PM

#

I have a question about notebook-only competitions, what actually prevents me to load a pretrained model or a model I trained myself or even already preprocessed data I created myself and then run the notebook only doing inference? I thought the goal is to test the skills with limited hardware at disposal so I am a bit confused.

keen geode Sep 18, 2023, 2:21 PM

#

know someone why I'm getting this error from "Intro to SQL" course on Kaggle?

deft fox Sep 18, 2023, 3:24 PM

#

pseudo harness I have a question about notebook-only competitions, what actually prevents me to...

It varies among competitions, but in some of them you can do exactly what you proposed: train locally, upload the files, and only do inference on Kaggle. I suggest you consult the competition rules and ask the same question in their discussion forum if unclear.

regal plank Sep 18, 2023, 3:35 PM

#

keen geode know someone why I'm getting this error from "Intro to SQL" course on Kaggle?

yes, check out #1130785765274685500 it is answered there. you will have to copy and use a different code that will be written in discussion forum.

dapper stratus Sep 18, 2023, 10:58 PM

#

deft fox You don't have enough images to train a model from scratch and still have great ...

alright noted, thank you so much for taking the time to help

hybrid halo Sep 19, 2023, 2:57 AM

#

real isle It depends on the number of data pints that you have. It is difficult to make su...

Considering the imbalance nature of dataset I would perform the same by over sampling the majority class and performing the same operations.
Thanks for your suggestion

primal wedge Sep 19, 2023, 2:11 PM

#

I have save and run all in notebook kaggle . But when download notebook. That notebook not appear output. How to solve it? . Isnt bug or something else?

vivid ice Sep 19, 2023, 7:32 PM

#

Hey guys,

Is there anyone here who have experience working with sound data, in particular sound as input and sound as output models? Would love to ask some questions regarding where to get started!

deft fox Sep 19, 2023, 10:24 PM

#

vivid ice Hey guys, Is there anyone here who have experience working with sound data, in ...

I do a variant of this response several times each session. The way you are asking a question is indirectly expecting someone to pre-commit to answering your future questions without knowing what they are going to be about. Instead, I suggest you simply go ahead and ask your question. There may be someone to answer it, or not. Yet you need to put in the initial effort.

dapper stratus Sep 20, 2023, 2:03 AM

#

i was learning data visualization and i stumbled across sns.clustermap

#

#

this was on a relatively small dataset

#

this was on a bigger dataset

#

are some people able to make sense of this or is it better suited for some datasets

deft fox Sep 20, 2023, 4:55 AM

#

@dapper stratus What you are showing is a two-way clustering by certain features on top, and some IDs (presumably users) on the left. The intensity of color corresponds to values for a given feature/user combinations. Features close to each other in the top dendrogram are more similar to each other, while features that are far are dissimilar. Same for IDs. The plot tells you that a number of weekend nights and a number of week nights are correlated features, while number of weekend nights and booking status are not. Same for users/IDs, except that it is very difficult to see most of them as the plot is crowded on the left and right sides.

molten wharf Sep 20, 2023, 8:34 AM

#

Hi I have a question regarding the definition of an "old post".

I have a notebook that just reached 50 non-novice upvotes about 2 hours ago.
But it wouldn't update the status of the medal.

Is it because the notebook was initially created about 3 months ago? I have actively modified until last month and recently updated a bit.

I have googled about the term "old post" in progression section, but no post online / kaggle discussion was clarifying my curiousity.

Is it the matter of the "old post"? or is it my patience?

finite pulsar Sep 20, 2023, 1:10 PM

#

Hi, has anyone here worked on the Multi-label text classification problem?
Some of the features have very less labelled data.
I had tried my hands on SETFIT but it didn't give me good results.

shrewd scarab Sep 20, 2023, 4:18 PM

#

velvet bridge i need help setting up my gpu to jupyter notebook i followed the steps but it st...

Hey, here is a video on how to set up CUDA for PyTorch on Jupyter Notebooks. Hope this helps! https://youtu.be/d_jBX7OrptI

YouTube

SavorSauce

How to Install CUDA for PyTorch in 2023

/// LINKS BELOW ///

Cuda Install
https://developer.nvidia.com/cuda-downloads

Cuda GPU Compatibility
https://developer.nvidia.com/cuda-gpus

Anaconda Install
https://www.anaconda.com/download/

PyTorch CUDA
https://pytorch.org/get-started/locally/

▶ Play video

deft fox Sep 20, 2023, 4:35 PM

#

molten wharf Hi I have a question regarding the definition of an "old post". I have a notebo...

@molten wharf It is rare that a notebook gets a gold medal exactly at 50 non-novice upvotes. It could be because it is older, or because Kaggle has an undisclosed algorithm where they don't count votes from users who have heavily upvoted your posts or notebooks in the past. I think you will need to get at least a few votes over 50.

molten wharf Sep 21, 2023, 12:31 AM

#

@deft fox Ohh...! Thank you so much! Then I may have to be patient for a bit more...
Thank you so much for your insight!🤩

mental cliff Sep 21, 2023, 3:51 AM

#

I am new to stable diffusion. I am still wondering why do we do this step? And why exactly those numbers. This is an example from huggingface: https://huggingface.co/docs/diffusers/quicktour

Quicktour

broken sphinx Sep 21, 2023, 2:10 PM

#

mental cliff I am new to stable diffusion. I am still wondering why do we do this step? And w...

the output from Stable Diffusion is in the -1.0 to 1.0 range as floats. PIL images need to be 8bit int (per channel). in order to convert a float into a valid int (with the correct range) you need to:

add 1.0 to make the range 0.0 to 2.0
multiply by 127 to make the range 0.0 to 255.0 (the number range for 8 bit integers)
cast everything as uint8 now that the value is in the correct range

mental cliff Sep 21, 2023, 5:10 PM

#

broken sphinx the output from Stable Diffusion is in the -1.0 to 1.0 range as floats. PIL imag...

Got it. Thanks for the response!

cerulean basin Sep 21, 2023, 6:52 PM

#

Hello! I asked a question here a few weeks ago about LayoutLM and it never got an answer. #❓┊ask-a-question message
If no one can answer, does anyone have any suggestions on where I could find more info about fine-tuning LayoutLM.

fervent ocean Sep 21, 2023, 7:43 PM

#

Im not getting the medals for the upvotes in discussion tab[yes those votes are neither SELF VOTES or novice votes] and it says "too much requests" whenever i try and post a new topic. How do i fix it, its been 8 hours. I have been constantly trying to post a new topic but it throws the same error. @tender trench @verbal crest@wind silo

verbal crest Sep 21, 2023, 7:56 PM

#

fervent ocean Im not getting the medals for the upvotes in discussion tab[yes those votes are ...

We have secret algorithms that don't count all votes to progression (typically if the same person upvotes you multiple times we stop counting them), be patient and with more upvotes you'll get medals.

The too many requests error happens when you try to do something too often. Stop trying for a day and you should be able to make topics again.

grim urchin Sep 21, 2023, 8:22 PM

#

vivid ice Hey guys, Is there anyone here who have experience working with sound data, in ...

Hey @vivid ice I have studied how to handle the audio data. And how it works

#

https://www.kaggle.com/code/sujaykapadnis/audio-machine-learning-for-speech-recog-intro
These were my notes, there're 4 notebooks. Link of next notebook is at the end of the current one

Audio Machine Learning for speech Recog Intro🥱

Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources

fervent ocean Sep 22, 2023, 2:14 AM

#

verbal crest We have secret algorithms that don't count all votes to progression (typically i...

Thanks for clarification 👍🏻

sleek siren Sep 22, 2023, 10:16 AM

#

Guys I need a help
how do I submit my model of titanic?

fervent ocean Sep 22, 2023, 1:02 PM

#

verbal crest We have secret algorithms that don't count all votes to progression (typically i...

I have waited more than 30 hrs , it doesn't seem to go back to normal. Could u please help me?

verbal crest Sep 22, 2023, 9:39 PM

#

@fervent ocean You'll have to contact support (although if you wait longer it will probably fix itself)

molten walrus Sep 22, 2023, 9:56 PM

#

hey, have 99% to finish my certificate and i can t find the probleme in this exercise, can someone help me ??
https://www.kaggle.com/code/otmanesajid/exercise-categorical-variables

Exercise: Categorical Variables

Explore and run machine learning code with Kaggle Notebooks | Using data from Housing Prices Competition for Kaggle Learn Users

fervent ocean Sep 23, 2023, 4:11 AM

#

verbal crest <@660806453464006656> You'll have to contact support (although if you wait longe...

Aight I will wait some more

main mango Sep 23, 2023, 6:46 AM

#

molten walrus hey, have 99% to finish my certificate and i can t find the probleme in this ex...

It appears step_4.check() is missing.

molten walrus Sep 23, 2023, 12:01 PM

#

main mango It appears `step_4.check()` is missing.

it gives me :
/opt/conda/lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: sparse was renamed to sparse_output in version 1.2 and will be removed in 1.4. sparse_output is ignored unless you leave sparse to its default value.
warnings.warn(

#

when i run this code:

from sklearn.preprocessing import OneHotEncoder

OH_encoder = OneHotEncoder(handle_unknown='ignore', sparse=False)
OH_cols_train = pd.DataFrame(OH_encoder.fit_transform(X_train[low_cardinality_cols]))
OH_cols_valid = pd.DataFrame(OH_encoder.transform(X_valid[low_cardinality_cols]))

# One-hot encoding removed index; put it back
OH_cols_train.index = X_train.index
OH_cols_valid.index = X_valid.index

# Remove categorical columns (will replace with one-hot encoding)
num_X_train = X_train.drop(object_cols, axis=1)
num_X_valid = X_valid.drop(object_cols, axis=1)

# Add one-hot encoded columns to numerical features
OH_X_train = pd.concat([num_X_train, OH_cols_train], axis=1)
OH_X_valid = pd.concat([num_X_valid, OH_cols_valid], axis=1)

# Ensure all columns have string type
OH_X_train.columns = OH_X_train.columns.astype(str)
OH_X_valid.columns = OH_X_valid.columns.astype(str)

#

when i change it to
OH_encoder = OneHotEncoder(handle_unknown='ignore', sparse_output=False)
it removes the error, but it not tell me that i finished the certificate

outer apex Sep 23, 2023, 3:10 PM

#

hi guys i got a little problem that generated shoking results . can anyone help me for just 5mins to explain the main problem of the result (no need to help me correct it i just want to know where is the problem)

#cross_val_score(xgb,X_test,y_test,cv=5).mean() : 0.9928446688501186 y_test_pre=xgb.predict(X_test) mean_squared_error(y_test,y_test_pre) :84941265285.15845
when i saw that shoking result i wanted to try this thing :i tried to calculate the mean squared error in the training data i wanted to see if it is equals to 0 but : y_test_pre7=xgb.predict(X_train) mean_squared_error(y_train,y_test_pre7) btw xgb.score(X_train,y_test_pre7) : 1.0

#

Maybe this can be obvious for you guys but i have like just 2months of experience

obsidian pulsar Sep 23, 2023, 5:40 PM

#

deft fox <@735594383797125295> What you are showing is a two-way clustering by certain fe...

Hello sir

obsidian pulsar Sep 23, 2023, 5:49 PM

#

outer apex hi guys i got a little problem that generated shoking results . can anyone help ...

so Do you have a other questions?

outer apex Sep 23, 2023, 6:01 PM

#

obsidian pulsar so Do you have a other questions?

No thanks you already answered

thick glacier Sep 23, 2023, 8:05 PM

#

Hello everyone!
If you have any advice, please share it with me.

https://www.kaggle.com/discussions/questions-and-answers/442690

Improving Classification Accuracy | Kaggle

Improving Classification Accuracy.

deft fox Sep 23, 2023, 8:47 PM

#

thick glacier Hello everyone! If you have any advice, please share it with me. https://www.ka...

I don't mean to be harsh, but you don't seem to be applying any of the previously learned lessons to this new dataset. I left some suggestions for you that will hopefully be helpful.

obsidian pulsar Sep 24, 2023, 12:40 AM

#

deft fox I don't mean to be harsh, but you don't seem to be applying any of the previousl...

Hello sir, Sorry but Would you like to talk with you?

thick glacier Sep 24, 2023, 3:11 AM

#

deft fox I don't mean to be harsh, but you don't seem to be applying any of the previousl...

Yes, I totally forgot that 😛
I have those techniques too.

austere horizon Sep 24, 2023, 7:34 AM

#

How do i upload files

#

Can anyone tell me plz

molten walrus Sep 24, 2023, 12:59 PM

#

main mango It appears `step_4.check()` is missing.

I just notice it now haha thanks a lot

magic latch Sep 24, 2023, 2:50 PM

#

how to use model from huggingface in some competition ban internet, it's there better way than upload to /kaggle/input?

sleek siren Sep 24, 2023, 4:48 PM

#

Can some one help me out with this...not able to understand which all fields I should slect

thick glacier Sep 24, 2023, 5:31 PM

#

In Predict Health Outcomes of Horses, when i try to submit the file then they show me error.

#

this my submission file

thick glacier Sep 24, 2023, 5:35 PM

#

sleek siren Can some one help me out with this...not able to understand which all fields I s...

photo is not clear.

mental cliff Sep 25, 2023, 4:15 AM

#

I've been trying to train latent diffusion model, somehow the loss does not converge. Is there any issue in my training loop?

📎 message.txt

fervent ocean Sep 25, 2023, 6:58 AM

#

How much time does it take for support team to get back to you?

deft fox Sep 25, 2023, 7:34 AM

#

fervent ocean How much time does it take for support team to get back to you?

I don't think there is a prescribed time. Besides, the weekend just ended in North America.

obsidian pulsar Sep 25, 2023, 12:40 PM

#

deft fox I don't think there is a prescribed time. Besides, the weekend just ended in Nor...

hello, sir.
Sorry for the inconvenience. I have been waiting for your response since yesterday.

obsidian pulsar Sep 25, 2023, 12:42 PM

#

mental cliff I've been trying to train latent diffusion model, somehow the loss does not conv...

Are using a learning rate that is appropriate for your model and Dataset?

blazing thicket Sep 25, 2023, 12:43 PM

#

Hey folks,
I've been working on a project involving a large dataset, and I've hit a bit of a roadblock. I was wondering if there's anyone here who might be able to lend a hand or share some insights.
If anyone has experience with data analysis or visualization and would be willing to help, I'd be incredibly grateful. It doesn't have to be a huge commitment – even a few pointers or suggestions would be immensely helpful.

Thanks a bunch! 🚀

obsidian pulsar Sep 25, 2023, 12:44 PM

#

blazing thicket Hey folks, I've been working on a project involving a large dataset, and I've hi...

How can I help you?

blazing thicket Sep 25, 2023, 12:45 PM

#

obsidian pulsar How can I help you?

Can I Dm you?

obsidian pulsar Sep 25, 2023, 12:45 PM

#

blazing thicket Can I Dm you?

Ok

dapper stratus Sep 25, 2023, 5:58 PM

#

is there a general rule to how much you should be testing?

#

for example, i'm training on 300 pics (the sense behind the challenge was trying to get a good accuracy with a low number of pics) so i used data augmentation and transfer learning and so

#

when it comes to testing though, they didn't restrict how much pics we got for testing, so i just went online and got a dataset that had almost 3000 testing pics but i only used around 80 pics to test my model that was trained on 250 pics

#

would this be considered bad practice and i should have used the whole testing dataset that i had?

deft fox Sep 25, 2023, 7:59 PM

#

obsidian pulsar hello, sir. Sorry for the inconvenience. I have been waiting for your response s...

I have no idea who you are or what you want, and you asked me if I would like to talk to myself. I have no response to that, and hopefully you understand that the lack of literal response is often a response. If you have a question, I suggest you ask it. Since all of us engage with others on a voluntary basis, you may or may not get an answer.

steady dune Sep 26, 2023, 3:56 AM

#

Hi!
I am working on a project in which I need to integrate a DL model into flutter app .

Could any body help me how to integrate that model into flutter app?

obsidian pulsar Sep 26, 2023, 7:07 AM

#

steady dune Hi! I am working on a project in which I need to integrate a DL model into flut...

You can use the TensorFlow Lite

steady dune Sep 26, 2023, 7:20 AM

#

obsidian pulsar You can use the TensorFlow Lite

Could it decreases the accuracy of original model?

Have you integrated before?if so will you provide me your git repo link?

obsidian pulsar Sep 26, 2023, 7:23 AM

#

Um Maybe, it is possible to decrease the accuracy of a n original ml when integrating it into a flutterapp

#

It is too large or complex to run on a m device

#

or not optimized for the Flutter app

#

I have integrated DL models into Flutterapps before, but I don't have any public repositories that I can share 😩

obsidian pulsar Sep 26, 2023, 1:19 PM

#

dapper stratus is there a general rule to how much you should be testing?

alright, it is considered bad practice to o nly use a small subset of your testing dataset. The purpose of the testing dataset is to evaluate how well your model generalizes to new data, and using a small subset of the data will not give you an accurate assessment of this.

sharp plume Sep 26, 2023, 6:07 PM

#

@everyone is it possible to use image processing to find the soil nutrients like Nitrogen, Phosphorus and Potassium for agriculture. If anyone has any idea can you please reply me it would be really useful.

obsidian pulsar Sep 26, 2023, 7:19 PM

#

sharp plume @everyone is it possible to use image processing to find the soil nutrients like...

Um, Approach is to use the texture of the soil to estimate the nutrient levels. For example, soils that are high in clay content tend to have higher levels of nutrients than soils that are high in sand content. By analyzing the texture of the soil, it is possible to get a more accurate estimate of the nutrient levels.😀

sharp plume Sep 26, 2023, 7:20 PM

#

@obsidian pulsar is it possible to find the approx level the nutrients in the soil using image processing techniques

obsidian pulsar Sep 26, 2023, 7:21 PM

#

Yes, it is also possible to use image processing to detect the presence of specific nutrients in the Soi

sharp plume Sep 26, 2023, 7:21 PM

#

I am looking into converting images into HSV color space using computer vision

obsidian pulsar Sep 26, 2023, 7:21 PM

#

For example, nitrogen can be detected by looking for the presence of chlorophyll, which is a green pigment that is essential for plant growth.

sharp plume Sep 26, 2023, 7:22 PM

#

So if you have some inputs or sources that I can use to develop my model

#

That would be really helpful

obsidian pulsar Sep 26, 2023, 7:22 PM

#

Researchers at the University of Arizona

#

SoilOptix

#

Farmers Edge

sharp plume Sep 26, 2023, 7:24 PM

#

Thank you

thick glacier Sep 26, 2023, 7:32 PM

#

Hi Dev,

Why is Kaggle not responding?

lone perch Sep 27, 2023, 4:11 AM

#

Hello everyone,
I got an actual score in my Notebook for the model I built and submitted it for a competition but my public score is 0
Does it take some time to load or is there a problem with how I submitted my output (The submission was successful)? Thanks

deft fox Sep 27, 2023, 4:39 AM

#

lone perch Hello everyone, I got an actual score in my Notebook for the model I built and s...

If it shows zero, that is your score. Kaggle doesn't use 0 as a placeholder until it calculates the score. As to why you got zero, impossible to tell without knowing more details. We don't even know what the metric is, or whether this is classification or regression. Log-loss of 0 would be excellent, and so would MAE or RMSE. Accuracy, less so.

fresh spruce Sep 27, 2023, 8:57 AM

#

Should I use knn model for training how a person looks?

obsidian pulsar Sep 27, 2023, 10:13 AM

#

fresh spruce Should I use knn model for training how a person looks?

or SVMs

#

Deep learning models

#

If you are a Messi fan, 😀 ~

fresh spruce Sep 27, 2023, 11:20 AM

#

obsidian pulsar If you are a Messi fan, 😀 ~

lol 😂

fresh spruce Sep 27, 2023, 11:20 AM

#

obsidian pulsar or SVMs

Thanks

lone perch Sep 27, 2023, 11:51 AM

#

deft fox If it shows zero, that is your score. Kaggle doesn't use 0 as a placeholder unti...

thanks for the insight. I realised I was submitting the predicting values as '0' or '1' instead of 'True' or 'False'. It was a classification task

steady dune Sep 27, 2023, 6:18 PM

#

Currently I am working on a project in which I need to recognise diseases in plants/crops through image taken by mobile camera.

So, I am training my CNN model accordingly. Is it good to go with CNN or any other neural network that you recommend to increase the accuracy of diseases detection?

blissful spire Sep 27, 2023, 7:07 PM

#

Hello, I am trying to resize the images but it the disk space is full. Kindly guide me how can I resolve this issue?

#

deft fox Sep 27, 2023, 7:25 PM

#

blissful spire

One option is to download the files to a local computer and run it there, assuming you have enough disk space. Yet another is to try and delete the original images on Kaggle AFTER you resize them, as that will presumably create more disk space. Not sure this option will work as the original dataset probably doesn't count as your image quota, and you may not have access to delete it.

blissful spire Sep 27, 2023, 7:27 PM

#

deft fox One option is to download the files to a local computer and run it there, assumi...

Thanks @deft fox I appreciate your response 🙂

deft fox Sep 27, 2023, 7:28 PM

#

steady dune Currently I am working on a project in which I need to recognise diseases in pla...

CNNs will likely work. However, it is unlikely that you will be able to collect a truly large number of images, say 10,000+, which is what is needed to train CNNs properly from scratch. I suggest you start with pretrained neural networks (VGG16, ResNet, SqueezeNet, EfficientNet, Inception or something along those lines) and fine-tune them on your images. https://www.analyticsvidhya.com/blog/2020/08/top-4-pre-trained-models-for-image-classification-with-python-code/

Analytics Vidhya

Purva Huilgol

Top 4 Pre-Trained Models for Image Classification with Python Code

We cover 4 pre-trained models for Image Classification that are state-of-the-art(SOTA) and are widely used in the industry as well.

lethal raft Sep 28, 2023, 9:10 PM

#

Guys i need a little help...
Actually i am new to ML field... i have started learning some algorithms but cannot understand how to move on the projects.
Can someone pls help me with how can i start my journey in a proper way in this field.
Also i started reading some ML related books....are they beneficial??

#

Currently i am working on a project for yield prediction using NDVI data but i cannot find much of the data on kaggle...can you suggest me any site or anything that can help me with NDVI dataset /_/\

spare musk Sep 29, 2023, 2:15 AM

#

I am looking for transcribed Australian podcasts on humour, sarcasm and everyday conversation, would anyone be able to help point me in the right direction?

steady dune Sep 29, 2023, 2:46 PM

#

deft fox CNNs will likely work. However, it is unlikely that you will be able to collect ...

@deft fox Thank you for your response and valuable suggestions 👍

cold torrent Sep 29, 2023, 11:36 PM

#

Hi I was wondering if anyone here is good with image analysis using python. I am still very lost, trying to learn on my own... I have labeled image data in yolo format using labelImg. I just don't know what to do with my image data set and labels in python

I have to quantify fluorecently labeled cells in images, any advice would be appreciated 🙏😭

obsidian pulsar Sep 30, 2023, 12:54 AM

#

cold torrent Hi I was wondering if anyone here is good with image analysis using python. I am...

You can use the scikit-image library to load and view your images.

from skimage.io import imread

#

Did you preprocess your images before performing cell analysis?

deft fox Sep 30, 2023, 1:13 AM

#

cold torrent Hi I was wondering if anyone here is good with image analysis using python. I am...

A general approach here is to use train images with labels and object masks for fine-tuning the existing model. After that you test on a separate set of data that hasn't been seen during training. Not sure that YOLO has the ability to quantify fluorescence or anything else, as most of these types of models are meant to be qualitative rather than quantitative. It may be a bigger bite than what you can chew if you have no background in image analysis, as this is a decidedly non-trivial task.

cold torrent Sep 30, 2023, 1:14 AM

#

So I have images like this. I have to count "Red" cells [Necrotic cells] , "Green" cells [Live cells], "Green Yellow" cells [Early Apoptosis] and "Yellow Orange" cels [Late Apoptosis] , and I used labelImg to box cells of each color type and label them as "Live", "Necrotic", EA", and "LA", not sure if that is a right approach

deft fox Sep 30, 2023, 1:18 AM

#

cold torrent So I have images like this. I have to count "Red" cells [Necrotic cells] , "Gree...

This is different from how you initially described it, and might work. It is important to clearly delineate cells when drawing ovals/rectangles around them. Are you relying on your eyes to make a distinction between these colors? If so, the classifier will be only as good as your eyes are.

cold torrent Sep 30, 2023, 1:19 AM

#

I also read that U-NET is something that could be used for something like this where I have different instances [colored] of cells, but not sure if that is the right approach or how to approach using such methodology and how to best label my images for such, I guess I was put in a bit of shark tank with no guidance to figure this out on my own since we are doing this for first time ever

deft fox Sep 30, 2023, 1:19 AM

#

Also, it seems that your signal is diffusely green in cytoplasm no matter what color is in nucleus, and that also might complicate things.

#

To train from scratch using U-NET or any other architecture, you would need at least thousands of images with many labeled cells in each (100+). That's why a pre-trained model might still be desirable as you only need to fine-tune it, which can be done with a relatively smaller number of images.

cold torrent Sep 30, 2023, 1:21 AM

#

i have a dataset of close to 400 images

#

how should I approach a pre-trained model for such a task? Also I agree the diffuse cytoplasm may be an issue

deft fox Sep 30, 2023, 1:23 AM

#

cold torrent i have a dataset of close to 400 images

400 images may sound like a lot to you - and I know from a personal experience that labeling that many images is a pain - but that's nothing for training models from scratch. You'd have to set aside at least a quarter of them for testing, and 300 images x 100 cells is not very much when you have 4 different categories.

cold torrent Sep 30, 2023, 1:28 AM

#

Yes; I have been labeling a lot of images, but I am not sure if I took the right approach - I used the labelImg API and the yolo method of output, which is just the format like this:

1 0.699219 0.573423 0.159375 0.172072
0 0.684766 0.521622 0.016406 0.023423
2 0.288672 0.284234 0.205469 0.217117

it gives the label, coordinates and length i believe.

cold torrent Sep 30, 2023, 1:30 AM

#

deft fox To train from scratch using U-NET or any other architecture, you would need at l...

Which pre-trained model may be best suited. Also how to best approach learning and being able to code a model to perform the task I desire?

deft fox Sep 30, 2023, 1:33 AM

#

cold torrent Yes; I have been labeling a lot of images, but I am not sure if I took the right...

Those numbers seem like relative rather than absolute coordinates. What I have seen is something like 588, 417, 661, 479 which is xmin,ymin,xmax,ymax coordinates. Maybe when you multiply your numbers by image height and weight they became whole numbers as well. YOLO-supplied models should work as a pre-trained models to be tuned.

#

You will have to research this on your own or better yet get local help from someone who knows, as I can't guide you through all the steps via keyboard.

#

Kaggle should have some notebooks that cover all these steps if you are patient and go through many search results.

cold torrent Sep 30, 2023, 1:36 AM

#

Thank you so much, also once I have done that to my best abilities, would it be okay for me to reach out to you directly?

deft fox Sep 30, 2023, 1:38 AM

#

There is no guarantee that I or anyone here will respond when contacted, as all communication is done on a voluntary basis and depends on available time. But there is no harm in trying to get in touch.

cold torrent Sep 30, 2023, 1:38 AM

#

Thank you so much for your help again!

mental storm Sep 30, 2023, 1:36 PM

#

Hello, this is a career related question to the data scientists and ml engineers of the industry.

How should an undergrad student navigate his way into internship and FTOs in this domain

runic geyser Oct 1, 2023, 11:23 AM

#

mental storm Hello, this is a career related question to the data scientists and ml engineers...

The very basic yet crucial thing is to be good at mathematics and statistics. You have to learn all the concepts to proof (knowing applications is a plus).

Secondly, you have to be good at any programming language (I will suggest Python) from moderate to expert level.

Next, Learning various machine learning algorithms and practice them with industry based project (based on which domain you're or want to work in future).

Casting the acceptable RESUME..

Hope this helps..😌 🙂 🤝 🤞

keen forum Oct 1, 2023, 11:25 AM

#

Not really sure why I can't get credit on the last python excercise, but I've even tried copy pasting the solutions and it still won't fill it up: https://www.kaggle.com/code/vidmaric/exercise-working-with-external-libraries/edit any ideas?

buoyant harness Oct 1, 2023, 12:06 PM

#

Hi I'm Jonathan, Singapore born California bred. I work at the intersection between Quantum Computing and Web3 at pQCee, a post quantum computing startup based in Singapore. I'm the product owner of QuantumNFT, a platform that let's developers showcase their quantum programming skills. We're addressing the talent gap problem. We're validating in the QIF Quantum Games Hackathon. We want to do what Kaggle has done for Data science for Quantum Science. For this hackathon, an idea is to build out the competition workflow. Is there anyone from the team that can spare 30 minutes for a discovery call? 🙏

barren phoenix Oct 1, 2023, 12:38 PM

#

Hey there is this allowed
Say I have a friend who's NOT competing in a competition
They decide to lend me their account for GPU hours

This isn't code sharing since they aren't competing and isent multiple accounts of the same person. So it shouldn't violate any rules
Can any1/@mild geode staff confirm?

deft fox Oct 1, 2023, 5:41 PM

#

barren phoenix Hey there is this allowed Say I have a friend who's NOT competing in a competiti...

Pretty sure this is a violation but you should wait for the official response. It’s like having multiple accounts to work with but only submitting from one of them.

barren phoenix Oct 1, 2023, 7:11 PM

#

deft fox Pretty sure this is a violation but you should wait for the official response. I...

Yeah waiting for the official response since the point is I'm borrowing someone's real account for GPU hours (who isent participating (
And I don't have multiple accounts so not violating rules)
It's a 'technicality' but I don't wanna get banned on it

#

cc @twin elbow

deft fox Oct 1, 2023, 7:23 PM

#

barren phoenix Yeah waiting for the official response since the point is I'm borrowing someone'...

You are thinking only about what feels right to you, but moderators have to think globally. What if someone has 10-20 friends who are not participating and all of them are willing to donate their GPU hours? Do we draw the line at 5 friends that can contribute their GPU hours, or is 50 okay as well? If there is no line, soon enough everyone would be making Kaggle friends left and right, which would create inequity in how many GPU hours individuals have at their disposal.

barren phoenix Oct 1, 2023, 7:30 PM

#

deft fox You are thinking only about what feels right to you, but moderators have to thin...

Aaah yeah that makes sense !

But there's a loophole where the "friends" participate in a team with the person and pool GPU hours. The catch is that they didn't actually participate and just lent their account .

Since within a team people can share anything.

Not sure how that's moderated...

If it can't be moderated then it makes no sense not allowing what I proposed .

It could be capped the same as max team size which is 10

But yeah I agree there isn't any one size fits all 'fair' solution and there is a lot of nuance

deft fox Oct 1, 2023, 7:37 PM

#

@barren phoenix Again you are not thinking like a moderator, so this might help. Let's say that someone has multiple accounts (a violation) and is running notebooks on them using the same IP number. Now here comes you without multiple accounts and borrowing your friend's GPU hours, but you are also running notebooks from multiple accounts using the same IP number. How are Kaggle moderators going to distinguish between these two events? Would they even care to do it even if they could?

barren phoenix Oct 1, 2023, 8:14 PM

#

Aaah yup valid point thanks ! Ig I should just stick with my own account

dapper stratus Oct 2, 2023, 1:01 AM

#

i have a question please
if i am making a brain mri tumor classification program and we are given a dataset of 250 images to train and validate on

#

and this is kinda like a challenge

#

i tried to use transfer learning

#

i am testing on an online dataset of 60 images that aren't in my current training data

#

the highest accuracy i am getting with transfer learning is 78%

#

i am testing many models and they are all performing poorer than Xception that got 78%

#

is it even possible to get higher than 78% or am i wasting my time

#

it is worth to note i am using data augmentation for sure and i tried fine tuning hyper parameters like learning rate

thorny stone Oct 2, 2023, 2:44 AM

#

I am looking for someone to run thru the Titanaic competition with me so that I can learn. If anyone is up for the challenge or has any insights for me please share. Thanks

dapper stratus Oct 2, 2023, 3:34 AM

#

dapper stratus i have a question please if i am making a brain mri tumor classification program...

https://colab.research.google.com/drive/1cbxG3VOmLqjt7GqufrQk-8zRdTNAjKBr?usp=sharing

Google Colaboratory

#

here is a link to the notebook with all the models i tried to use, i test on almost 1k images. any help would be appreciated

deft fox Oct 2, 2023, 5:39 AM

#

dapper stratus i have a question please if i am making a brain mri tumor classification program...

It is impossible for anyone who hasn't tried that exact dataset to tell you whether you are doing well or not, and we don't even know what dataset you are using. Generally speaking, it is very difficulty to get excellent performance if you train on 250 images and validate on 60, but 78% sounds decent. You may want to try to split your dataset into 5 folds and make 5 models, and then average their predictions. That might give you a small boost.

obsidian pulsar Oct 2, 2023, 10:05 AM

#

deft fox It is impossible for anyone who hasn't tried that exact dataset to tell you whet...

For brain MRI tumor models, using a single model trained on a larger data set is far superior to using an ensemble of models with cross-validation.

lethal raft Oct 2, 2023, 10:55 AM

#

Currently i am working on a project for yield prediction using NDVI data but i cannot find much of the data on kaggle...can you suggest me any site or anything that can help me with NDVI dataset .

#

NDVI data is basically data extracted from satellite images

#

Please respond

obsidian pulsar Oct 2, 2023, 12:13 PM

#

lethal raft Currently i am working on a project for yield prediction using NDVI data but i c...

NASA Earthdata and Google Earth Engine

#

or Sentinel Hub

tough hornet Oct 2, 2023, 1:27 PM

#

I am new to data science and looking to get a headstart in this domain. I am currently learning python and its libraries.Should I do something else along with this?

deft fox Oct 2, 2023, 3:35 PM

#

obsidian pulsar For brain MRI tumor models, using a single model trained on a larger data set is...

For almost any type of data and models, using a larger dataset is superior than using a small dataset. It was implied in my statement that it is very difficult to get a high-performance model when training on 250 images. If you are saying that doing a single model on a large dataset is far superior than doing an ensemble on the same large dataset, that simply is not the case.

obsidian pulsar Oct 2, 2023, 8:47 PM

#

deft fox For almost any type of data and models, using a larger dataset is superior than ...

Understand, but You have to know Ensemble models are a type of machine learning model that combines the predictions of multiple base models to produce a more accurate prediction. Ensemble models can be trained on datasets of any size. However, they are often more effective when trained on larger datasets.

deft fox Oct 2, 2023, 9:05 PM

#

obsidian pulsar Understand, but You have to know Ensemble models are a type of machine learning ...

Again, you are not writing very precisely so others may get wrong ideas. Ensembles are not combining just base models - they can combine any kind of models. My original contention with your statement is that you seemed to suggest that single models on a large sample would do better than ensembles. Generally speaking, on the same dataset ensembles will do better than any single model. That goes for any dataset, whether big or small.

obsidian pulsar Oct 2, 2023, 9:16 PM

#

deft fox Again, you are not writing very precisely so others may get wrong ideas. Ensembl...

While ensemble modeling can offer excellent performance, it can be a complex process to implement and may not be as effective when handling large datasets. In such cases, it may be more appropriate to rely on a single model that can handle large datasets efficiently. This can help to streamline the overall modeling process and ensure that the final model meets the desired level of accuracy and performance.☺️

deft fox Oct 2, 2023, 10:30 PM

#

obsidian pulsar While ensemble modeling can offer excellent performance, it can be a complex pro...

I am not disputing your last statement. Yes, some people may not care about a complex ensemble to get a 0.01% improvement when a single model may be lighter and easier to implement. Single models may be more appropriate, no doubt about that. Yet "more appropriate" doesn't mean "far superior" which was your original statement. Single models are not "far superior" to ensemble models, even though there could be good reasons to use them.

obsidian pulsar Oct 3, 2023, 12:54 AM

#

deft fox I am not disputing your last statement. Yes, some people may not care about a co...

I said this because I saw cases where a single model could be convenient.🫡
And machine learning is not something you do for fun.

dapper stratus Oct 3, 2023, 12:27 PM

#

deft fox It is impossible for anyone who hasn't tried that exact dataset to tell you whet...

thank you for answering, my dataset is basically 250 pics split into two halves with half being pics of brains with a tumor and the other half being pics w no tumor, my issue was that i expected transfer learning to yield a higher accuracy but i could not get more than 81%, tbf i tested on 1k pics when i only trained on 250 which would not be a real life scenario since if i had 1k pics, i would have most probably used them for training but the challenge for the task was that i need to train on only 250 images

obsidian pulsar Oct 3, 2023, 4:00 PM

#

dapper stratus thank you for answering, my dataset is basically 250 pics split into two halves ...

It is definitely challenging to train a model on such a small dataset. 😩
You can use data augmentation techniques to increase the size of your dataset. This can be done by flipping, rotating, cropping, and adding noise to your images.☺️

obsidian pulsar Oct 3, 2023, 4:33 PM

#

tough hornet I am new to data science and looking to get a headstart in this domain. I am cur...

👍

daring pine Oct 3, 2023, 9:51 PM

#

Is there any pytorch-based time series feature extraction libs? Most of the implementations I saw are based on dataframe.groupby and apply...

#

I mean if there's not I may have an idea and I can start something. I assume torch even on CPU utilizes maximum resources and can achieve better performance. (Or maybe I'm wrong?)

#

The libraries I'm looking at right now are tsfresh and tsfel.

deft fox Oct 3, 2023, 10:29 PM

#

daring pine The libraries I'm looking at right now are tsfresh and tsfel.

You already picked good libraries for this purpose. Another good one is https://github.com/DataCanvasIO/HyperTS

GitHub

GitHub - DataCanvasIO/HyperTS: A Full-Pipeline Automated Time Serie...

A Full-Pipeline Automated Time Series (AutoTS) Analysis Toolkit. - GitHub - DataCanvasIO/HyperTS: A Full-Pipeline Automated Time Series (AutoTS) Analysis Toolkit.

daring pine Oct 3, 2023, 11:01 PM

#

Good to know. Thanks so much for that!

thick vessel Oct 4, 2023, 12:45 AM

#

Is this the place to ask questions regarding specific Kaggle Courses/Exercises?

verbal crest Oct 4, 2023, 5:47 AM

#

@thick vessel You can ask specific questions here or in the discussion forums for each course on the site.

obsidian pulsar Oct 4, 2023, 2:43 PM

#

deft fox You already picked good libraries for this purpose. Another good one is https://...

🫡

red hawk Oct 4, 2023, 4:53 PM

#

daring pine Is there any pytorch-based time series feature extraction libs? Most of the impl...

unless you can utilise the gpu I don't think torch will give you any extra optimizations on top of numpy tbh. (and tsfresh is really decent, as is sktime )

daring pine Oct 4, 2023, 5:38 PM

#

red hawk unless you can utilise the gpu I don't think torch will give you any extra optim...

I'm working on the Optiver competition dataset. TSfresh Takes ~25 secs(where 15 seconds for creating the rolling time series data frame) to do the feature extraction for the first stock in the training set and there are 200 stock_id in that dataset. If I use a for loop on top of my current code it'll probably take >1hr on CPU to run on all training data. This is fine and affordable for the training stage(since I can pre-compute them) but I may be using GPU for inference. But I think overall you are right. I realized that if I use another way to compute, it'll probably spend as much time in creating the rolling data frame.

crisp gust Oct 4, 2023, 5:48 PM

#

I just created a notebook and I wish to share it to the competition's code section, can anyone teach me how to do that? pika_wow

regal plank Oct 4, 2023, 5:56 PM

#

crisp gust I just created a notebook and I wish to share it to the competition's code secti...

In the competition page, Click the "Code" tab.
Upload your notebook file (I assume by searching and selecting it)
Open the notebook, Save a version, in version history, click the 3 dots and at the bottom you will find "Submit to comptetion" button.

Hope this works!

steep pebble Oct 4, 2023, 8:13 PM

#

Hello all! I had a question regarding the implementation of momentum based gradient descent. Should I be zeroing the momentum based gradient at the start of each epoch or keep updating it across epochs?

deft fox Oct 4, 2023, 8:32 PM

#

steep pebble Hello all! I had a question regarding the implementation of momentum based gradi...

Momentum term in SGD is meant to overcome noisy gradients, especially in small datasets. In Keras implementation its value is 0, which means that it is not required. If you don't enter any momentum value, I suspect in most SGD implementations it will default to zero. Presumably that means it is safe not to use the momentum, but it doesn't necessarily mean that momentum=0 is the best choice. I think you should stick with one value for it rather than try to change it from one epoch to another, as that will only add more complexity to the interpretation.

steep pebble Oct 4, 2023, 8:39 PM

#

Sorry I was not referring to the momentum parameter in the update rule. Rather, the velocity that you maintain during training as (momentum parameter) * velocity + lr * gradient

steep pebble Oct 4, 2023, 8:40 PM

#

deft fox Momentum term in SGD is meant to overcome noisy gradients, especially in small d...

should I reset it after each epoch of training or maintain throughout?

gritty cloud Oct 4, 2023, 10:37 PM

#

anyone know what the options are for persisting data in output folder across notebook sessions?

#

I'm converting some csv files to parquet, but I dont want to do that every time i boot up a notebook

deft fox Oct 4, 2023, 10:56 PM

#

gritty cloud I'm converting some csv files to parquet, but I dont want to do that every time ...

You create a private dataset with those .parquet files, and simply point future notebook versions to that dataset.

gritty cloud Oct 4, 2023, 10:57 PM

#

deft fox You create a private dataset with those .parquet files, and simply point future ...

I see, thank you. It’s my first time working on a Kaggle comp!

crisp gust Oct 5, 2023, 4:00 AM

#

regal plank In the competition page, Click the "Code" tab. Upload your notebook file (I assu...

thanks!

lapis mango Oct 5, 2023, 4:20 AM

#

i am taking an applied statistics class with R and i am stuck on an error in my code

regal plank Oct 5, 2023, 9:34 AM

#

is this normal/ok ?

I am downloading image files for an image model classifier, I noticed the red CPU bar and RAM, is this ok ? should I ignore it? is there a way to optimize it ?

Appreciate all the help peepolove

regal plank Oct 5, 2023, 9:49 AM

#

getting this too, looks like I am also running out of RAM.

regal plank Oct 5, 2023, 2:27 PM

#

Hello everyone!

I am currently working on my second computer vision model, and I am facing a hard time in reducing the error_rate and improving accuracy. Despite intensive data cleaning (it actually got worse), and I would appreciate any suggestions!

notebook link:
https://www.kaggle.com/code/raedsherif/green-leaves-classifier/notebook

I also have a few questions:

Is the input data available to you ? I could not find it in the notebook, should I download it to my local device and upload it as a Kaggle dataset, then input it in the notebook?
As you can see, when installing libraries it generates a lengthy output. which looks bad, is there a way to clear or avoid displaying this output?
After cleaning the images, should I create new dl and run fine_tune again ? Does it pick up from where it left off and saves previous progress?

My first model was 76% accurate at predicting plastic types and I was able to share it with a basic interface on Gradio, for this one, I'm aiming for a high level of accuracy and eventually want to deploy the model on a website. Any tips or tricks would be highly appreciated

thank you peepolove

Green leaves Classifier!

Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources

deft fox Oct 5, 2023, 2:35 PM

#

regal plank Hello everyone! I am currently working on my second computer vision model, and ...

If you use pip -q that should make the installation quiet. I suggest you consider using validation loss as your metrics rather than accuracy, as the loss is steadily going down in your case. I think you shouldn't stop the training after the first epoch when the metrics doesn't improve. A typical patience values during which metrics don't improve are 5-20 epochs, but in your case 2-5 may be more appropriate. It seems that you also need to fine-tune for more than 7 epochs, maybe with decaying learning rate.

regal plank Oct 5, 2023, 2:53 PM

#

deft fox If you use **pip -q** that should make the installation quiet. I suggest you con...

Thanks, I will apply these methods, what about using a better model compared to resnet18 ?

*(ofcrs, I want to make sure I improve my model with the resnet18 first, but generally speaking what is your take on using better models?) *

deft fox Oct 5, 2023, 4:05 PM

#

regal plank Thanks, I will apply these methods, what about using a better model compared to ...

Better models will likely give better result, but it won't necessarily be very dramatic. I think working on your implementation is more likely to bring a substantial improvement.

patent kiln Oct 5, 2023, 10:07 PM

#

can anyone tell what does q1.check() mean

rn_image_picker_lib_temp_da79438a-74c3-4bc0-a222-e65602861d94.jpg

glossy edge Oct 5, 2023, 10:09 PM

#

hey guys I would like to start my first data analysis project but I dont know where to start do you recommend me start with a big dataset or a small dataset ,could someone help me pls ?

verbal crest Oct 5, 2023, 10:31 PM

#

@patent kiln q1.check() is the function to check your answers, make sure you have run the cells above it to properly import the learntools package.

obsidian pulsar Oct 5, 2023, 10:36 PM

#

verbal crest <@1052187261652971571> q1.check() is the function to check your answers, make su...

hello, @verbal crest

quiet bison Oct 5, 2023, 10:44 PM

#

I want to get into NLP I now studying transformer what is the next step for me
What best books or courses that I have to take?

patent kiln Oct 5, 2023, 11:01 PM

#

verbal crest <@1052187261652971571> q1.check() is the function to check your answers, make su...

oh i see, also it seems that i cannot copy paste the upper code into the blank one

tidal echo Oct 6, 2023, 4:52 AM

#

patent kiln can anyone tell what does q1.check() mean

You can think of it as a function written to check whether ur answers are right or wrong

#

And this error is likely bcoz u have not run the initial code cells to activate this service

patent kiln Oct 6, 2023, 10:38 AM

#

tidal echo You can think of it as a function written to check whether ur answers are right ...

oh okay thanks

split epoch Oct 6, 2023, 12:17 PM

#

@glossy edge start with titanic. Work on Eda first before trying a new model every day.

#

Have a vid covering titanic and a playlist of all the models if you want to check it out

summer drum Oct 6, 2023, 12:50 PM

#

Hi everyone, I recently learning about preprocessing step in ML. I have a question regarding to standard scaling. There are some algorithms require standard scaling for more accurate in regulations and stuff. And as I scroll through lectures, and some shared notebook on Kaggle, I notice that they apply sklearn standard scaler right after train_set_split, without consider the data type (like nominal features, or features after one hot encoded). My question is: do it affect the performance of the algorithm?

obsidian pulsar Oct 6, 2023, 12:52 PM

#

Hello @everyone!

summer drum Oct 6, 2023, 12:53 PM

#

hello Mnihj

split epoch Oct 6, 2023, 1:53 PM

#

Depends on the data used, but you can throw it into a pipeline and select certain columns for it

#

I do standard scaling after train test split

thick glacier Oct 6, 2023, 6:35 PM

#

code:-
t3 = tf.random.Generator.from_seed(12, alg="threefry")
t3.normal(shape=(2, 3))
error:-
InvalidArgumentError: {{function_node _wrapped__RngReadAndSkip_device/job:localhost/replica:0/task:0/device:CPU:0}} Unsupported algorithm id: 2 [Op:RngReadAndSkip]

why i get this error?

reef furnace Oct 7, 2023, 12:26 AM

#

how to use hugging face model "bert-base-uncased" model in kaggle.
I was trying to login using hf-cli or using a token still autotokenizer is throwing private repo use token error.

summer drum Oct 7, 2023, 4:00 AM

#

split epoch I do standard scaling after train test split

Thank you for the advise.

lapis dirge Oct 7, 2023, 4:36 AM

#

Have anyone tried decision tree on titanic...nd what is the accuracy?

deft fox Oct 7, 2023, 5:04 AM

#

lapis dirge Have anyone tried decision tree on titanic...nd what is the accuracy?

It is a guarantee that many have tried decision trees on Titanic dataset https://www.kaggle.com/search?q=titanic+decision+tree

Search | Kaggle

Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals.

thorny sentinel Oct 7, 2023, 8:46 AM

#

Hey! Does anyone know where to find examples of companies that reported a clear benefit to their business as a result of hosting a Kaggle competition or a competition in another platform? In Luca and Konrad's Kaggle book, the list three examples Netflix, AllState and GE, but I would like to find more examples

lime wyvern Oct 7, 2023, 2:35 PM

#

Is there a good explanation of which GPU (p100 vs T4) you should use anywhere? I've struggled to fid anything!

velvet bridge Oct 7, 2023, 6:51 PM

#

can i fine tune a model with json strucutre or even jsonl, i know the answer is yes. I just need to know if i have to always make the data formatted in this way when fine tuning:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

rustic salmon Oct 7, 2023, 7:32 PM

#

anyone know how to embedded openAI keys to project

#

I am getting serious error , let me know if you can do it

desert tusk Oct 8, 2023, 8:13 AM

#

Do you know a blog (or something else) that review the Kaggle competitions (namely, describe the most common approaches, analysis the best solutions, etc)?

desert tusk Oct 8, 2023, 8:50 AM

#

If not, I think to open one of that

dapper bay Oct 8, 2023, 9:38 AM

#

HI Everyone, does anyone understand the Time series forecasting (Sales forecasting )in Kaggle? I'm having a hard time understanding it.

velvet bridge Oct 8, 2023, 9:31 PM

#

i passed a json doc into model using langchain and when making an inference of the RAG model it does not seem to be responding based off the json data specifically, when asking it certain questions that should have a good response. Do i need to restructure my Json Data away from the nested dict style to something more condensed?

desert tusk Oct 9, 2023, 11:58 AM

#

desert tusk Do you know a blog (or something else) that review the Kaggle competitions (name...

Anyone?

#

I think that it would be meaningful for everyone

glossy edge Oct 9, 2023, 6:01 PM

#

split epoch Have a vid covering titanic and a playlist of all the models if you want to chec...

yeah where can i watch it

split epoch Oct 9, 2023, 6:09 PM

#

https://www.youtube.com/watch?v=SjOfbbfI2qY&list=PLcQVY5V2UY4LNmObS0gqNVyNdVfXnHwu8&ab_channel=RyanNolanData

YouTube

Ryan Nolan Data

Train Test Split with Python Machine Learning (Scikit-Learn)

In this Python Machine Learning Tutorial, we take a look at how you can split a data set through train test split in scikit learn.

This is a great method for prepping your data before you run a model.

Email: ryannolandata@gmail.com
LinkedIn: https://www.linkedin.com/in/ryan-p-nolan/
Twitter: https://twitter.com/RyanNolan_
GitHub: https://githu...

▶ Play video

#

@glossy edge 25 vids in here

wheat furnace Oct 10, 2023, 10:13 AM

#

Hello guys,
Please is there a place to see past project presentation slides and recordings?

fair hawk Oct 10, 2023, 4:35 PM

#

https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset/data

Brain Tumor MRI Dataset

A dataset for classify brain tumors

#

Can someone help out with this project

keen mantle Oct 10, 2023, 5:30 PM

#

Hello

crisp gust Oct 10, 2023, 6:39 PM

#

hey guys, does saving a notebook in kaggle use up my gpu quota as well?

#

and when using the T4 gpu, is there anyway to use both of them at once? to maximize the usage of the quota

tidal echo Oct 10, 2023, 9:38 PM

#

why is it that, when compiling the model , say running 80 epochs, i get to see a pattern in the change of values of validation loss and accuracy? and also there is a pattern in reduction and increment in the learning rate?

obsidian pulsar Oct 11, 2023, 8:31 AM

#

Hello @everyone, What can I help you?

tardy mauve Oct 11, 2023, 8:36 AM

#

is the scoring stage just comparing my submission with the real data or it will use some private data to check? it takes my more than an hour and im still not getting my score

olive tinsel Oct 11, 2023, 5:04 PM

#

Heyy folks, any suggestions like tutorials to start working with tensorflow???

manic iris Oct 12, 2023, 5:25 AM

#

hi everyone, i'm working a clustering model with kmeans. But i have a dilema. My elbow method says that best k is 4, but when i see a 2d pca plot it looks like best k should be 2
When i try using k=2 in the model the cluster doesn't show the obviously groups. What am i doing wrong? should i recheck documentation? harold

daring robin Oct 12, 2023, 6:03 AM

#

im looking for data for time series forecasting, it should be above 5 GB anything around 10gb+ would be nice to practice!

woven topaz Oct 12, 2023, 6:32 AM

#

I have a question . On Kaggle Getting started competitions, we are provided with train and test sets separately. Is it okay to merge both of them for doing preprocessing easily or not ? According to this blog : analytics vidya blog (https://www.analyticsvidhya.com/blog/2021/07/data-leakage-and-its-effect-on-the-performance-of-an-ml-model/) we should not do this because it can lead to Data Leakage . Can anyone tell ?

Analytics Vidhya

CHIRAG GOYAL

Data Leakage And Its Effect On The Performance of An ML Model

In this article, we will discuss all the things related to Data Leakage including what it is, how it has happened, how to fix it,

woven topaz Oct 12, 2023, 6:34 AM

#

olive tinsel Heyy folks, any suggestions like tutorials to start working with tensorflow???

U can start with YT . There are some great tutorials available. Also on Coursera , u can find some great courses

dusty skiff Oct 12, 2023, 7:15 AM

#

i have the age column missing in some of the rows in my data, should I do mean imputation or resort to some other method

naive raft Oct 12, 2023, 7:49 AM

#

Hi, i have 50 clusters of numbers each cluster corresponds to a plume shape, i know the location of origin point, i want to translate each plume or cluster at same position line they superposed each other cause i want to eavlaute the average values of them. anyone knows how to implement this in python

#

i am trying this from yesterday, not able to implement it

twin lion Oct 12, 2023, 5:37 PM

#

Hey, I hope you are all doing well. I want to extract information from a resume. How can I do it? Any guidance, please?

hybrid needle Oct 12, 2023, 6:53 PM

#

hello everyone, can somebody help me and answer on my questions?

deft fox Oct 12, 2023, 11:41 PM

#

hybrid needle hello everyone, can somebody help me and answer on my questions?

I don't think anyone can commit to helping you without knowing the actual question. Why don't you ask and see if anyone responds.

frank pivot Oct 13, 2023, 6:22 AM

#

Hi, who should I contact for a Kaggle bug?
I see a weird bug on Output => Submit page, on CTF competition the web page starts blinking a lot, I cannot click "Submit" and after a few minutes I get 429 too many requests error on any kaggle.com web page, it's like I'm banned for a a few hours then.
Some pages return:

#

daring pine Oct 13, 2023, 11:02 AM

#

So...does data regularization(like min-max, normalization, box-cox, etc. to keep all data items in a limited range) improve performance for Gradient boosting trees like LightGBM(in a regression task)? I do assume they don't do much for standard decision trees in a classification task that I may learn in class since they are based on entropy.

daring pine Oct 13, 2023, 12:23 PM

#

frank pivot Hi, who should I contact for a Kaggle bug? I see a weird bug on Output => Submit...

I think since it's related to a competition, maybe opening a discussion post would be sufficient. Some hosts/admin have access to what's happening to your code on the testing data and can provide related information(that won't leak anything about the testing data of course) for you to debug.

glossy edge Oct 13, 2023, 3:55 PM

#

@split epoch thanks

shrewd scarab Oct 13, 2023, 5:46 PM

#

daring pine So...does data regularization(like min-max, normalization, box-cox, etc. to keep...

I don't think that they would have much of an effect, but it depends on the data. If some features in the data are very volatile, clipping the data may improve the model performance, but LightGBM tends to be very resilient so it probably wouldn't matter much.

daring pine Oct 14, 2023, 1:03 AM

#

shrewd scarab I don't think that they would have much of an effect, but it depends on the data...

Thanks, I guess I'll take a closer look at how the model works.

shell sierra Oct 14, 2023, 2:47 AM

#

Very stupid question, but I am new to the competition in coding in general. Where do I find the data to download, Ik the competition lists pfr and nflverse but how do I download either of those so i can get started

deft fox Oct 14, 2023, 3:17 AM

#

shell sierra Very stupid question, but I am new to the competition in coding in general. Wher...

You must join the competition before downloading the data

deft fox Oct 14, 2023, 3:18 AM

#

daring pine So...does data regularization(like min-max, normalization, box-cox, etc. to keep...

Scaling should not matter to GBMs.

urban hemlock Oct 14, 2023, 2:18 PM

#

Hello, I'm currently training a Convolutional Neural Network. Is this a good way to train my model to unseen data?

#

The training accuracy jumps real high at the start while the validation accuracy gradually gets better

#

obsidian pulsar Oct 14, 2023, 3:40 PM

#

deft fox Scaling should not matter to GBMs.

?

arctic marten Oct 14, 2023, 9:34 PM

#

Hello, I am building an analytic dashboard on Streamlit, my code is above. I want to add the parameter delta on metrics (st.metrics, label =, value=, delta = ) My delta will be the total sales difference (increase or decrease). The aim is to show if sales are increasing or decreasing each year. All the code I have written to achieve this has gone wrong. I got it done on my jupyter notebook but if I implement the same on Streamlit I get an error that my value should be an int, str... Please I need your assistance

sullen pawn Oct 15, 2023, 7:08 PM

#

arctic marten Hello, I am building an analytic dashboard on Streamlit, my code is above. I wan...

I don't think auto is a valid option, is it? The argument is more in the line of: it can be a string for as long as the string contains a numerical value (So you can add an Unit e.g. Degrees Celsius).

arctic marten Oct 15, 2023, 8:46 PM

#

sullen pawn I don't think `auto` is a valid option, is it? The argument is more in the line ...

Is not, I want to change it to the difference. So that when I filter, I will get for eg the total number of sales between the years (to know if the sales are decreasing or increasing with each passing year). So my problem is the Python code that i will use to implement in Streamlit.

latent bough Oct 15, 2023, 11:31 PM

#

Broad stroke question- but can someone explain data models to me and how to build them?
Is it more than just combining datasets into relational tables? i.e. Sales and Customer data joined on transaction id?
Or is it more forecasting or financial modeling? I think the job postings I read are throwing this term around very loosely...

deft fox Oct 16, 2023, 2:41 AM

#

urban hemlock Hello, I'm currently training a Convolutional Neural Network. Is this a good way...

Two things: your training curves are way ahead of validation so I suggest regularizing your network more or add/increase dropout; even after 60 epochs val accuracy is still going up and val loss still going down, so you should train longer.

celest sphinx Oct 16, 2023, 5:09 AM

#

latent bough Broad stroke question- but can someone explain data models to me and how to buil...

A data model is a structured representation of how data is organized and accessed in a database or information system. It defines the relationships, rules, and structure of the data to ensure accurate storage, retrieval, and management of information.

#

One example of a data model is the Entity-Relationship Model (ERM), which represents entities, their attributes, and the relationships between them. For instance, in a university database, "Student" could be an entity with attributes like name and student ID, connected to other entities like "Course" through relationships indicating enrollment.

urban hemlock Oct 16, 2023, 7:54 AM

#

deft fox Two things: your training curves are way ahead of validation so I suggest regula...

Got it, I'll update you when I do this later. Thanks for the suggestion!

urban hemlock Oct 16, 2023, 7:54 AM

#

deft fox Two things: your training curves are way ahead of validation so I suggest regula...

Also, the graphs I showed you were a sign of overfitting, right?

latent bough Oct 16, 2023, 12:06 PM

#

celest sphinx One example of a data model is the Entity-Relationship Model (ERM), which repres...

Thank you 🙂

uneven radish Oct 16, 2023, 2:49 PM

#

15000 is to big for rmse ?

celest sphinx Oct 16, 2023, 3:19 PM

#

uneven radish 15000 is to big for rmse ?

it's relative to your dataset's value xD

#

i'm guessing you didnt normalise it

#

If you estimate number of sales for your local shop and you have a RMSE of 15000 it's huge yeah xD

#

if you estimate number of sales for the whole Macdonald's corporation, 15000 RMSE is very good

uneven radish Oct 16, 2023, 3:26 PM

#

@celest sphinx can you check dm?

celest sphinx Oct 16, 2023, 3:26 PM

#

no i dont take DMs but i was just giving you a quick answer to your question

#

usualy to evaluate your model performance

#

you start by making a "baseline" model, so a very simple model for example : always predict the average value of the dataset

#

there are mode accurate baseline models of course

#

then you can compare your model performances to those statistical baseline models

#

And to answer your question on a business view : Your acceptable error depends on what the client is willing to lose

#

you should never aim to reach 0 RMSE anyway since that means your learnt the noise in the data too. Usualy your main aim to know if you model is good is if your training RMSE is close to your validation RMSE

#

i'd rather have a model with 0.11 TRAIN & VAL RMSE than a model with 0.0001 train RMSE and 0.10 VAL RMSE

uneven radish Oct 16, 2023, 3:42 PM

#

Thank you

deft fox Oct 17, 2023, 5:34 AM

#

urban hemlock Also, the graphs I showed you were a sign of overfitting, right?

Don’t know if there was overfitting as validation scores were still improving.

urban hemlock Oct 17, 2023, 7:56 AM

#

deft fox Don’t know if there was overfitting as validation scores were still improving.

I see, I see. Have you encountered this by any chance, or do you find this unusual when training a model?

dim quiver Oct 17, 2023, 9:40 AM

#

Hi guys, I am studying machine learning and suddenly I got something in my mind,
Recently i learned and did some statistics problems basically hypothesis testing and stuff but I can't visualize their implementation in industry.
I heard that statistics is the base of all AI and ML but it's hard for me to solve problems using statistics and I just know theory of it. Does anyone know of good use cases or some books which can help me with it.
Much appreciated

torn jasper Oct 17, 2023, 5:38 PM

#

Hello everyone, I trust this message finds you well. I am seeking advice on whether pursuing an online Master's degree in Data Science from Coursera is a prudent choice. Furthermore, do you have any recommendations concerning funding options for pursuing a Master's degree?

deft fox Oct 17, 2023, 6:58 PM

#

urban hemlock I see, I see. Have you encountered this by any chance, or do you find this unusu...

I have seen this before. Your model is training too fast because there isn’t enough regularization. Adding a dropout layer should slow it down and also run for longer. There shouldn’t be a predetermined number of epochs. Rather, go for an arbitrarily large number of epochs and use early stopping with patience at 5-15 epochs.

obsidian pulsar Oct 18, 2023, 1:46 PM

#

deft fox I have seen this before. Your model is training too fast because there isn’t eno...

real_mvp

celest sphinx Oct 19, 2023, 11:51 AM

#

What does Kaggle mean btw?

#

just a random word that sounds nice?

#

chatGPT said

#

"The name is a play on the Japanese word "kaggle" (pronounced "kah-glay"), which means a group of people who come together to learn and collaborate."

chilly bison Oct 19, 2023, 3:50 PM

#

Hello everyone. I've a trouble. I have been using one notebook for some time and it worked just fine until few days ago I started to get the following error:

CUDA error: CUDA driver version is insufficient for CUDA runtime version
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Interestingly enough, my friend was using the same notebook and two days it worked fine for him while mine wasn't working already. Yesterday he started to receive this error too. Could anyone please tell a solution here?

worn herald Oct 19, 2023, 5:30 PM

#

celest sphinx "The name is a play on the Japanese word "kaggle" (pronounced "kah-glay"), which...

Pretty believable! But the real answer can be found here: https://www.reddit.com/r/dataisbeautiful/comments/80xl66/hey_reddit_im_anthony_goldbloom_founder_of_kaggle/duyw6gm/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button 🙂

[Deleted Account]'s comment on "Hey Reddit, I’m Anthony Goldbloom, ...

Explore this conversation and more from the dataisbeautiful community

celest sphinx Oct 19, 2023, 5:31 PM

#

haha thanks

blazing salmon Oct 19, 2023, 8:21 PM

#

Plz share the list of all ML algorithms that are affected by correlated features.

green kernel Oct 20, 2023, 4:30 AM

#

Can anyone tell me what I'm doing wrong? I want to access a json endpoint from the chess.com API using python. But I'm getting error code 403 again and again. This is my code:

import requests

api_url = "https://api.chess.com/pub/player/jitesh117"  

try:
    response = requests.get(api_url)

    if response.status_code == 200:
        data = response.json()

        print("API Response:")
        print(data)
    else:
        print(f"Request failed with status code: {response.status_code}")

except requests.exceptions.RequestException as e:
    print(f"Request error: {e}")

urban hemlock Oct 20, 2023, 11:39 AM

#

deft fox I have seen this before. Your model is training too fast because there isn’t eno...

I reduced the batch size and added the dropout layer and it got better. Thanks for the help! On another note, can you give a few ideas on how to handle unknown classes? For example, the classes are dogs and cats and the unknown (well, known unknowns) would be all the other animals.

timid harbor Oct 20, 2023, 6:19 PM

#

Hello ,I am new to data science and I want a little guidance on the binary classification with software defects

graceful axle Oct 20, 2023, 10:57 PM

#

Why people ask for thumps up in their projects

deft fox Oct 21, 2023, 12:17 AM

#

graceful axle Why people ask for thumps up in their projects

Medals are awarded after a certain number of upvotes. Expertise levels increase as the number of medals goes up. In short, more upvotes means greater recognition.

graceful axle Oct 21, 2023, 12:34 AM

#

Are transformers = attention models?

outer apex Oct 21, 2023, 8:01 AM

#

anyone can help?

storm slate Oct 21, 2023, 11:21 AM

#

hi i am a beginner how to tune your model to make it perfect for example when to add a dense layer or a drop out is there a course/book for that?

sullen pawn Oct 21, 2023, 3:39 PM

#

green kernel Can anyone tell me what I'm doing wrong? I want to access a json endpoint from t...

HTTP Error 403 indicates an authentication failure, you should check the API specs for any required authentication tokens (perhaps in the data as 'Authentication Bearer' or simply a url token)

sullen pawn Oct 21, 2023, 3:40 PM

#

outer apex anyone can help?

Missingpy is quite old, your scikit-learn package should match the version as is required by missingpy to remain operational (although I recommend looking for an alternative package to missingpy)

sullen pawn Oct 21, 2023, 3:42 PM

#

chilly bison Hello everyone. I've a trouble. I have been using one notebook for some time and...

Maybe an automatic system update that updated your CUDA drivers (and maybe you friend received the update later?)?

cinder gulch Oct 22, 2023, 12:55 PM

#

hello everyone. I'm from Vietnam and want to find a mentor for data science especially coding. Can someone help me please? thank you so much.

fair hawk Oct 22, 2023, 3:38 PM

#

woven topaz U can start with YT . There are some great tutorials available. Also on Coursera...

Hey could you please help out with this doubt: Not able to access the kaggle tpu in kaggle notebook

graceful axle Oct 22, 2023, 3:44 PM

#

Have there been any competitions that nobody has been able to win?

arctic marten Oct 22, 2023, 9:04 PM

#

Hello Everyone, Please I need help here. I am trying to forecast using Prophet. The first Screenshot is my dataframe but after creating my first model, the shape of my data reduced after i had added that i am making a prediction for 365 days (second screenshot). Please what is really going on? I should be expecting 3203 + 365 rows

stoic bear Oct 23, 2023, 10:09 AM

#

Why AI won't replace data scientist/analyst

obsidian pulsar Oct 23, 2023, 12:57 PM

#

deft fox Medals are awarded after a certain number of upvotes. Expertise levels increase ...

sorry but you are wrong

graceful axle Oct 23, 2023, 1:07 PM

#

stoic bear Why AI won't replace data scientist/analyst

That is a bias question. It could replace data scientists. Don't be afraid, be a good scientist and accept the potential consequences of progress.

magic timber Oct 24, 2023, 10:00 AM

#

I was trying to plot a plotly express graph, on x axis it was days number and on y it was cumulative sum column. I made a average line chart fot it with different color, But it stays on the back. How do i bring it to the top??

woven topaz Oct 25, 2023, 3:06 AM

#

fair hawk Hey could you please help out with this doubt: Not able to access the kaggle tpu...

Have you selected the Tpu from accelerator option?

icy wind Oct 25, 2023, 9:43 AM

#

Hello Everyone, can anyone walk me through the multidimensional data vectorization without creating a change in the data shape . I have been experiencing some challenges in this regard

daring hedge Oct 25, 2023, 2:13 PM

#

storm slate hi i am a beginner how to tune your model to make it perfect for example when to...

coursera deeplearning.ai andrew ng

daring hedge Oct 25, 2023, 2:17 PM

#

graceful axle Have there been any competitions that nobody has been able to win?

I think some of those big competitions are going to require significant compute resources and teamwork, i.e. money. I am just trying to win swag at this point.

graceful axle Oct 26, 2023, 12:10 AM

#

https://www.nltk.org/book/ch01.html#sec-computing-with-language-texts-and-words

#

I was trying that out

#

It says that the exercises are graded

#

But it only shows the questions

#

How do I get the gradings?

#

I want to know if my answers are what they are suppose to be

#

Or do they just refer to the little circle on the side?

copper minnow Oct 26, 2023, 6:13 PM

#

icy wind Hello Everyone, can anyone walk me through the multidimensional data vectoriz...

I had a same issue, when I was imputing the data and handling categorical values, the shape changed dramatically. It was because the indexes were persisted, maybe resetting the indexes can help

icy wind Oct 26, 2023, 6:19 PM

#

copper minnow I had a same issue, when I was imputing the data and handling categorical values...

I am think of flattening the columns (column- wise) so I can have my original matrix, will update you if mine eventually works

agile stag Oct 26, 2023, 6:23 PM

#

i have some questions abt regression and datasets shapes but i don't know if it's the right channel

shrewd scarab Oct 27, 2023, 1:28 AM

#

agile stag i have some questions abt regression and datasets shapes but i don't know if it'...

If it is a question, then this is the right channel!

agile stag Oct 27, 2023, 1:33 AM

#

yeah i managed to talk with some of you guys already ! so basically my problem is the following one:

I have 2 datasets , one contains 3072 animals with 875 columns whoch are bacteries inside it , and the 2nd one is a predict dataframe with 840 animals and 6 attributes.
Objectif , predict the weight (real) , an other variable (real) and finally both at the same time.

problems; As you saw , my dataframes have differents shapes so a lot of regression doesn't work with it and i don't know if i should just take the first 840 animals in my 1st dataframe or no because i don't know if they are the labelised one.

Solution: I tried to first predict the weight which is a small real , it was a slaughter , negative score , 2 Md MSE etc etc so i transformed the weight into classes 0/1 (is fat or no) and now i'm using classiffication models but without this , i wouldn't know how to fix my previous problem.

2nd problem : i have a pretty good accuracy (0.90) using KNN but it doesn't give me infos about which bacterias causes the animal to be fat or no. I think of doing a Naives Bayes or a Logistic Regression to compare results

Do you guys have any ideas on how to see the weight of the differents attributes that leads to the classification

heavy fractalBOT Oct 27, 2023, 1:36 AM

#

a_himitsu has been warned

Reason: Duplicated text

agile stag Oct 27, 2023, 1:06 PM

#

Ok problem of size fixed we managed to get a new dataset with correct sizes , now remains the question of knowing the importance of parameters in the choice of the classifier

shrewd scarab Oct 27, 2023, 8:54 PM

#

agile stag Ok problem of size fixed we managed to get a new dataset with correct sizes , no...

Assuming that you are using a KNN model, and are trying to get the importance's of each of the attributes which are also features, you would probably need to use permutation importance to measure the effects of each feature on your classifier. Hope that helps. If you use sklearn log regression you can get the feature importances with model.coef_[0] I think.

agile stag Oct 27, 2023, 9:03 PM

#

shrewd scarab Assuming that you are using a KNN model, and are trying to get the importance's ...

Yep that's exactly what i am doing and what i want to find out ! Does scikit provides a function for the permutation importance using KNN ?

shrewd scarab Oct 27, 2023, 9:05 PM

#

agile stag Yep that's exactly what i am doing and what i want to find out ! Does scikit pro...

Not to my knowledge, you would need to write the permutation importance code yourself. Scikit learn has a permutation importance function that works with tree based models and linear models, but not one that works with KNN models.

agile stag Oct 27, 2023, 9:19 PM

#

shrewd scarab Not to my knowledge, you would need to write the permutation importance code you...

Alr alr thanks for the answer ! I'll take 5 more minutes of your time with a last question: i am predicting an attribute actually , then i'll need to predict a second one and finally both at the same time, which models allows to make a multi-classification at the same time ? (I'll probably predict a binary value and the 2nd one is actually a real but i'll probably convert it in classes from 1 to 10

shrewd scarab Oct 27, 2023, 9:28 PM

#

agile stag Alr alr thanks for the answer ! I'll take 5 more minutes of your time with a las...

I'm not exactly sure of how you would want to solve the problem. But if you are trying to predict a binary class and a real number, you should go with 2 different models. Otherwise if you are changing the 2nd one to categorical, you could use the MultiOutputClassifier in scikit learn.

agile stag Oct 27, 2023, 9:53 PM

#

shrewd scarab I'm not exactly sure of how you would want to solve the problem. But if you are ...

Alright i'll test it right away ! Thanks for the help ! CatHeartBongo

fast plinth Oct 27, 2023, 10:00 PM

#

Hi, Would anyone like to team up with me for the competition?

topaz vigil Oct 27, 2023, 10:47 PM

#

Hi guys i am going through the python tutorial and doing the excerise for the arithmetic and variable section and i cannot figure out what im doing wrong. Any advice?

deft fox Oct 27, 2023, 11:09 PM

#

topaz vigil Hi guys i am going through the python tutorial and doing the excerise for the ar...

It says at the bottom that ‘survived’ is not defined. You have to first define that variable.

lone perch Oct 28, 2023, 3:23 AM

#

I have a dataframe where some of the items are lists. Normally the tables I've seen just have single items like "Age" where you just have one integer. How do I do EDA in this case?

celest sphinx Oct 28, 2023, 9:04 AM

#

lone perch I have a dataframe where some of the items are lists. Normally the tables I've s...

Ig you can one-hot-encode it but if you have everycountry in the world might be a lot of columns

idle bobcat Oct 28, 2023, 9:09 AM

#

lone perch I have a dataframe where some of the items are lists. Normally the tables I've s...

For EDA I’d split this data into separate columns. The first production company could be the main one and you could analyze movies with more than 1 company against the others for specific characteristics like movies with more companies have more budget or something like that

hearty oriole Oct 28, 2023, 9:18 AM

#

lone perch I have a dataframe where some of the items are lists. Normally the tables I've s...

if you want to know how EDA is performed then have a look over here https://medium.com/@borhadepiyush/how-to-perform-eda-5ecaf4a3e52a. This is blog I posted on medium, I guarantee you that it will definitely help you into your study.

lone perch Oct 28, 2023, 8:57 PM

#

celest sphinx Ig you can one-hot-encode it but if you have everycountry in the world might be...

Oh okay so treat it as a normal categorical column

lone perch Oct 28, 2023, 8:58 PM

#

hearty oriole if you want to know how EDA is performed then have a look over here https://medi...

Alright thanks, nice read! I didn't really find the answer to the question there but I'll also have a look at the notebook you attached to see

lone perch Oct 28, 2023, 9:00 PM

#

idle bobcat For EDA I’d split this data into separate columns. The first production company ...

wouldn't that leave a lot of rows with null values? I'd look into that as well though that's interesting, thanks!

zealous creek Oct 29, 2023, 1:31 AM

#

shrewd scarab Not to my knowledge, you would need to write the permutation importance code you...

sklearn's permutation importance can use any estimator. Think about what the permutation importance calculation is. You reshuffle the values in a feature and check how much the performance of the estimator degrades. The worse the degradation, the more important the feature is. There is no reason why the calculation should be limited to certain estimators.

shrewd scarab Oct 29, 2023, 2:14 AM

#

zealous creek sklearn's permutation importance can use any estimator. Think about what the per...

My bad, I thought I read somewhere that it doesn't work for tree-based models.

zealous creek Oct 29, 2023, 2:19 AM

#

shrewd scarab My bad, I thought I read somewhere that it doesn't work for tree-based models.

Trust the sklearn manual only. 🙂 https://scikit-learn.org/stable/modules/generated/sklearn.inspection.permutation_importance.html#sklearn.inspection.permutation_importance

scikit-learn

sklearn.inspection.permutation_importance

Examples using sklearn.inspection.permutation_importance: Release Highlights for scikit-learn 0.22 Feature importances with a forest of trees Gradient Boosting regression Pixel importances with a p...

rustic salmon Oct 29, 2023, 8:22 AM

#

Could someone recommend a video tutorial on creating project presentation recordings? I'm looking for guidance on the process. Your assistance is much appreciated.

lean kraken Oct 29, 2023, 3:01 PM

#

anyone who uses rx 6650 xt i need some help?

deft fox Oct 29, 2023, 4:33 PM

#

rustic salmon Could someone recommend a video tutorial on creating project presentation record...

There is a whole channel in KaggleX section devoted to recording tools. I will point you to the first post in that section: #1160980660966674452 message

rustic salmon Oct 29, 2023, 4:36 PM

#

okay thans

finite crescent Oct 29, 2023, 6:31 PM

#

Good evening everyone, I am Hassan passionate about learning data science as a beginner what resources and site to start from baby step? Happy to be here to learn unlearn and relearn.

proper solar Oct 29, 2023, 10:44 PM

#

when people split the training data into training and validation data, do they often do another pass at the end where they use all the possible training data before submitting? I imagine having a validation split is only useful for deciding if what youre doing improves the model or not

main mango Oct 30, 2023, 7:13 AM

#

proper solar when people split the training data into training and validation data, do they o...

After hyperparameter tuning using cross-validation, the model is usually refit with the best parameters on the whole training set to fully utilize the data. I've heard more than once of how exposing the model to more data (train + validation) at the end boosts performance. It's worth a try.

wide fiber Oct 30, 2023, 8:29 AM

#

Hi, is it possible to pull a private notebook?

thorny kelp Oct 30, 2023, 3:26 PM

#

I have been trying to fine tune the faster RCNN with the help of this notebook https://www.kaggle.com/code/yerramvarun/fine-tuning-faster-rcnn-using-pytorch/notebook
but after fine tuning the model I don't know how to get the weight file and use it to run detections on my local system , it would be great if anyone can help me out.

Fine-tuning Faster-RCNN using pytorch

Explore and run machine learning code with Kaggle Notebooks | Using data from Fruit Images for Object Detection

clear compass Oct 30, 2023, 10:17 PM

#

I want to work together with a friend. What is a good place to share data?

jade geyser Oct 31, 2023, 7:52 AM

#

clear compass I want to work together with a friend. What is a good place to share data?

what data you want to share? i am also lookinng to work together

clear compass Oct 31, 2023, 9:49 AM

#

i want to do the titanic competition just a place to put ideas and add data

clear compass Oct 31, 2023, 2:48 PM

#

jade geyser what data you want to share? i am also lookinng to work together

^

jade geyser Oct 31, 2023, 2:59 PM

#

clear compass ^

okii....for the titanic the data is available to be downloaded on the competition page right?

clear compass Oct 31, 2023, 3:01 PM

#

jade geyser okii....for the titanic the data is available to be downloaded on the competitio...

yh, however i want something that we can add ideas and share helpful videos and potentially code.

jade geyser Oct 31, 2023, 3:01 PM

#

clear compass yh, however i want something that we can add ideas and share helpful videos and ...

you can dm someone who is interested....i am up for it if you want

daring hedge Oct 31, 2023, 3:44 PM

#

thorny kelp I have been trying to fine tune the faster RCNN with the help of this notebook h...

Let me ask you, did you fine tune on your local or are you using Google Colab or something else?

thorny kelp Oct 31, 2023, 4:13 PM

#

daring hedge Let me ask you, did you fine tune on your local or are you using Google Colab or...

I used Google colab for fine tuning

cunning thunder Oct 31, 2023, 4:18 PM

#

if the data is extremely skewed to one side and the boxplot showes alot of outliers are they really outliers, such as this data. It just seems I cant really consider these as outliers. Is the boxplot a not good enough of test for outliers?

deft fox Oct 31, 2023, 10:11 PM

#

cunning thunder if the data is extremely skewed to one side and the boxplot showes alot of outli...

It depends on the type of data, but generally speaking these are not necessarily outliers. There are many types of skewed distributions that are legitimate - the tail of your histogram represents rare events. You may want to modify the data using Box-Cox (or log) transformation to bring it to something resembling a normal distribution.

daring hedge Nov 1, 2023, 2:42 AM

#

thorny kelp I have been trying to fine tune the faster RCNN with the help of this notebook h...

Can you use model.save_weights (for just the weights) or for the whole architecture model.save? Def use ChatGPT for those types of issues. It can usually point you in the right direction. Unless, I am misunderstanding.

cunning thunder Nov 1, 2023, 4:00 AM

#

deft fox It depends on the type of data, but generally speaking these are not necessarily...

Thank you!

heavy fractalBOT Nov 1, 2023, 10:26 PM

#

km2468 has been warned

Reason: Duplicated text

balmy tundra Nov 1, 2023, 10:27 PM

#

I'm not sure how to proceed with my question now, I was wondering if the latest keras-core is supported

verbal crest Nov 2, 2023, 12:38 AM

#

@balmy tundra Apologies, I think you hit a false positive on our auto-mod tool. Try asking your question again, I've updated the settings.

thorny kelp Nov 2, 2023, 5:52 AM

#

daring hedge Can you use model.save_weights (for just the weights) or for the whole architect...

yeah i can use that but not sure how to use that weight file which is saved in my local system to run detections on new images

balmy tundra Nov 2, 2023, 11:21 AM

#

verbal crest <@672066376999108618> Apologies, I think you hit a false positive on our auto-mo...

is the latest keras-core suppported in kaggle notebooks? https://keras.io/keras_core/guides/getting_started_with_keras_core/ I ran first 2 cells in this guide but I got errors that keras wasn't defined

deft fox Nov 2, 2023, 4:07 PM

#

thorny kelp yeah i can use that but not sure how to use that weight file which is saved in m...

The way to do that is to create a Kaggle dataset, upload your file, and then point to it from any Kaggle notebook.

proper solar Nov 3, 2023, 1:47 AM

#

Are all the for metal competitions the ones that have money prizes right now... there's several pages of them but I assume thats correct?

velvet slate Nov 3, 2023, 4:33 AM

#

Hii
Forgive me for this stupid question but
How do I participate in a kaggle competition 🙂
I haven't participated in any competitions before this, and I want to partake in the ai text detection competition

#

More specifically, I want to know do I register myself, and how to submit my model and such.....

exotic girder Nov 3, 2023, 6:56 AM

#

Would there be a 'Kaggle DS & ML' survey-competition this year?

balmy tundra Nov 3, 2023, 1:51 PM

#

What does optimal training data for an ML trading model look like? Would it consist of just OHLCV values? Wouldn't you have to train the model on data that shows profit, since that is the ultimate goal of traders? How would you express that profit in a training set?

deft fox Nov 3, 2023, 2:01 PM

#

velvet slate Hii Forgive me for this stupid question but How do I participate in a kaggle com...

Start by going to a competition of interest, and click on the button "Join Competition" - it is on the right side. If you are logged on with your Kaggle account, it will take you to a page to read and accept the conditions. Then you go to the data section to download the data. Finally, there are discussions and code sections where you can find and re-use the code others have shared, or ask questions.

sage mason Nov 3, 2023, 4:21 PM

#

hello all I have recently joined to my first kaggle competition, but what in the rules of that competition it says "Internet access disabled", does that mean I can't import external libraries?

velvet slate Nov 3, 2023, 4:36 PM

#

deft fox Start by going to a competition of interest, and click on the button "Join Compe...

I see
Thanks a lot!

onyx storm Nov 4, 2023, 6:14 AM

#

Hi, everyone

#

I am a fresh graduate who knows Python Data Structures and started working in a company with SQL and a little bit pyspark on JupyterHub. Wanted to have a guide to Kaggle how to start participation in contest and learn.

red hawk Nov 4, 2023, 1:31 PM

#

onyx storm Hi, everyone

an easy way is to read some documentation https://www.kaggle.com/docs/competitions

Competitions Documentation

Find challenges for every interest level

west galleon Nov 5, 2023, 6:16 AM

#

Hi everyone,
I need some help with getting started. I want to work on the Detect AI-generated Text competition but I'm not sure how to get started, since I've never really worked on a project in Kaggle or participated in a competition.
I'm hoping to participate to learn as I go. I had to start somewhere so I chose this.
Any advice would be appreciated.
Thank you.

rare jetty Nov 5, 2023, 2:37 PM

#

#

There is a NAN values, and i need help to remove/replace these NAN values.

lone compass Nov 5, 2023, 4:56 PM

#

west galleon Hi everyone, I need some help with getting started. I want to work on the Detect...

There's a couple of tutorials with the basics of Python and such. After that pick any "getting started" competition and look at published notebooks, starting with the short ones that don't have high scores (=> easier to understand).

lone compass Nov 5, 2023, 4:56 PM

#

rare jetty

You'll need to provide more context.

muted talon Nov 5, 2023, 5:47 PM

#

Are there any good resources from past competitons regarding heuristics of thumb rules for large image resizing in CV, or are things like https://arxiv.org/abs/2103.09950v1 actually used?

arXiv.org

Learning to Resize Images for Computer Vision Tasks

For all the ways convolutional neural nets have revolutionized computer vision in recent years, one important aspect has received surprisingly little attention: the effect of image size on the accuracy of tasks being trained for. Typically, to be efficient, the input images are resized to a relatively small spatial resolution (e.g. 224x224), and...

deft fox Nov 5, 2023, 10:47 PM

#

rare jetty

@rare jetty To drop NaN values simply use df.dropna function in pandas. There are many ways to fill in missing values, as that is a non-trivial subject. The simplest way is to fill in mean or median values per column, but I suggest that you go to Kaggle and search for "missing values." There will be many notebooks showing ways this can be done.

rare jetty Nov 6, 2023, 1:27 AM

#

Thanks everyone.
I have fill NAN values with the help of chatgpt

simple night Nov 6, 2023, 12:16 PM

#

Hi!

I've got a question about what would be currently the best deep learning architecture for analyzing features of Raman spectra. Does anyone have worked with this type of data?

It is a 1-dimensional (vector) image in which all positions can present some information. CNNs and ResNets may be a nice option, what do you guys think? What about visual transformers or other architectures?

Thankks!

light socket Nov 6, 2023, 1:39 PM

#

simple night Hi! I've got a question about what would be currently the best deep learning ar...

i would recommend you try a vision transformer only if you have large amounts of data otherwise CNNs and resNets always work

shut ginkgo Nov 6, 2023, 3:32 PM

#

hello,
Am working on an ai project and there seem to be many null values in the dataset
would you advice me to go with fillna or dropna?
also If I use fillna and fill in avg random values wouldn't it affect the dataset?

And since the project is dealing with Healthcare would there be a huge affect if I add in avg values.

zealous creek Nov 6, 2023, 4:36 PM

#

shut ginkgo hello, Am working on an ai project and there seem to be many null values in the ...

#💬┊general message

shut ginkgo Nov 6, 2023, 4:47 PM

#

done

zealous creek Nov 6, 2023, 4:48 PM

#

shut ginkgo done

The same question was asked in general. So just scroll up a bit and check out the discussion.

shut ginkgo Nov 6, 2023, 4:49 PM

#

zealous creek The same question was asked in general. So just scroll up a bit and check out th...

oh okay

dusk jungle Nov 6, 2023, 6:45 PM

#

I'm not sure if this is the right place to ask, but I used to be able to edit my published notebooks without having to rerun the code cells. However, when I click on "edit" to do some changes to my markdown cells, I need to rerun all code cells to view their outputs. Does anybody know a way around this?

inner cape Nov 7, 2023, 3:19 AM

#

Hello everyone, I would like to ask where we can buy a dataset of pornographic text, we need to train chatbots

desert tusk Nov 7, 2023, 12:39 PM

#

Does the link at the kaggle playbook (https://packt.link/KaggleDiscord ) refer to this server?

Discord

Discord - A New Way to Chat with Friends & Communities

Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.

muted talon Nov 7, 2023, 7:13 PM

#

inner cape Hello everyone, I would like to ask where we can buy a dataset of pornographic t...

I dont know about the ethical implication of any of this, but you could probably user whisper or any other voice-to-text model and just go over a database of videos

heavy fractalBOT Nov 8, 2023, 4:16 AM

#

tilii7 has been warned

Reason: Bad word usage

deft fox Nov 8, 2023, 4:19 AM

#

@verbal crest This bot may be a bit too sensitive. I got warned (and my message deleted) for using a word p_rn, which happens to be what we are legitimately discussing here.

deft fox Nov 8, 2023, 4:20 AM

#

muted talon I dont know about the ethical implication of any of this, but you could probably...

@muted talon I think you may be assuming incorrectly that @inner cape has an access to a large database of p_rn videos.

verbal crest Nov 8, 2023, 6:16 AM

#

@deft fox Sorry about that, we've got the word on the warn list just because it's one of the most commonly used words by spam bots.

urban hemlock Nov 8, 2023, 6:39 AM

#

How would you train a CNN model to identify if a picture is something and not something? For example, you're training a model to identify if the image is a dog or not a dog. This is different from training the model to identify if it is a dog or a cat. The "Not a Dog" could be anything such as buildings, other animals, colors, etc. Any ideas?

If you have an idea in mind, please reply to this message. Thank you!

muted talon Nov 8, 2023, 8:51 AM

#

deft fox <@200390090139107329> I think you may be assuming incorrectly that <@61228235334...

I'm not assuming that per say, but with a scrapper it should not be a hard thing to obtain, considering they want/need the data

desert tusk Nov 8, 2023, 1:09 PM

#

Can I win a kaggle comptition without own GPU?

desert tusk Nov 8, 2023, 5:10 PM

#

desert tusk Does the link at the kaggle playbook (https://packt.link/KaggleDiscord ) refer t...

Anyone??

shrewd scarab Nov 9, 2023, 1:53 AM

#

desert tusk Can I win a kaggle comptition without own GPU?

Kaggle has its own gpu's for the notebooks, you just have to select it under accelerators. You get a certain amount of time per month.

deft fox Nov 9, 2023, 6:44 AM

#

desert tusk Can I win a kaggle comptition without own GPU?

I am going to play the odds here and say no. Most people don't win Kaggle competitions, with or without GPUs. But you can compete and do well without owning a GPU, because all Kagglers get 30 hours per week of free GPU time. There are also many competitions where GPU is not needed.

desert tusk Nov 9, 2023, 6:48 AM

#

shrewd scarab Kaggle has its own gpu's for the notebooks, you just have to select it under acc...

So, does kaggle think to increase the quanta?

desert tusk Nov 9, 2023, 9:16 AM

#

More question, why dose boosting algorithm perfer better than random forest in kaggle? (I don't see any winning solution with random forest but xgboost)

acoustic flame Nov 9, 2023, 2:00 PM

#

desert tusk More question, why dose boosting algorithm perfer better than random forest in k...

umm gradeint boosting algorithm are very efficient since they try to converge the gradient rather unless like in bagging algos which which relies on bootstrapping for better results and due to this approach of gradient boosting it leads to better results

desert tusk Nov 9, 2023, 2:10 PM

#

acoustic flame umm gradeint boosting algorithm are very efficient since they try to converge th...

I don't understand why trying to converge the gradient is better than bagging.
As I see that, we want n estimators that should be better than random + independent given the target. In boosting, these assumptions aren't correct

junior tapir Nov 9, 2023, 3:06 PM

#

is there anyway i can put kaggle in dark mode?

acoustic flame Nov 9, 2023, 6:28 PM

#

desert tusk I don't understand why trying to converge the gradient is better than bagging. A...

yess these assumptions arent correct for boosting yet boosting seems to perform decently in these cases and the question of why converging to gradient seems to better coz it usually capture more variance in training data compared to bagging i hope that gives you a little bit more intuition

#

also let me know if you have anymore questions

graceful axle Nov 10, 2023, 9:53 AM

#

I have a question regarding "Progression System",
does a Silver medal, count as Bronze as well?
so if I got 2 Silver medals, will I became competition expert? or I need to achieve exactly 2 bronze medals ⁉️

deft fox Nov 10, 2023, 11:44 AM

#

graceful axle I have a question regarding "Progression System", does a Silver medal, count as ...

You need exactly 2 bronze medals to become an expert, but if those medals later turn to silver you will still be an expert. Two bronze medals is a minimum, and anything else above it still qualifies you as an expert until you reach the next level.

graceful axle Nov 10, 2023, 11:46 AM

#

deft fox You need exactly 2 bronze medals to become an expert, but if those medals later ...

perfect, thanks for clarification ❤️

lone token Nov 11, 2023, 12:14 AM

#

desert tusk More question, why dose boosting algorithm perfer better than random forest in k...

There are many reasons that winning solutions use xgboost. One of the reason you might be overlooking ist that the xgboost implementaion comes with a lot of optimzation that goes beyond "boosting": efficient in memory/computation, flexible objective and learning control paramters, robost default parameters, etc.... These factors play a more important role than theoratical soundness in time-constrained competition.

stoic bear Nov 11, 2023, 11:16 AM

#

does anyone have experience in mcq question genrate using NLP, I mean how should I approach a problem

#

multiple choice question more than one correct

pulsar sparrow Nov 11, 2023, 1:48 PM

#

stoic bear does anyone have experience in mcq question genrate using NLP, I mean how should...

Maybe we we can see the probability of each option and compare them . If 2 options are correct I think they should have a similar probability predicted by the model.

mystic harness Nov 11, 2023, 4:59 PM

#

is it possible to edit a post to make it my team's solution?

crystal maple Nov 11, 2023, 7:00 PM

#

https://www.kaggle.com/code/ayeshairshadcoder/big-mart-sales-prediction

i dont know why but model is underfitting the data ...

big mart sales prediction

Explore and run machine learning code with Kaggle Notebooks | Using data from BigMart Sales Data

#

The training accuracy is damn high but testing accuracy in low

#

like 80 / 50

deft fox Nov 11, 2023, 7:31 PM

#

crystal maple https://www.kaggle.com/code/ayeshairshadcoder/big-mart-sales-prediction i dont ...

Your model is overfitting the data. There could be many reasons as I didn't go through every single line, but for sure you won't get the best performance by using any regressor with default values such as in this line regressor = XGBRegressor(). Regressor parameters need to be tuned, and doing cross-validation would help with that. Also, those numbers are r2 scores rather than accuracy. Accuracy is a classification metric.

copper whale Nov 12, 2023, 1:46 AM

#

Why can't I use any AI model such as Mistral 7b? It allocates ram infinitely until it crashes the container.

#

How can I use an AI model in kaggle?

crystal maple Nov 12, 2023, 3:58 AM

#

deft fox Your model is overfitting the data. There could be many reasons as I didn't go t...

Thanks man, let me update my code

desert tusk Nov 12, 2023, 3:25 PM

#

lone token There are many reasons that winning solutions use xgboost. One of the reason you...

I don't that this is the reason. I don't see any random forest solution in the first places, that may point to something else

red hawk Nov 12, 2023, 5:31 PM

#

copper whale Why can't I use any AI model such as Mistral 7b? It allocates ram infinitely unt...

Yes you should be able to , but you have to do it correctly. Its depends a lot on how you are loading the model. E.g. are you loading in the correct precision? (float32 will crash the gpu for sure) Are you trying to fine tune? (in which case you have to use BitsAndBytes 4bit quantisation otherwise GPU will run out of memory)

copper whale Nov 13, 2023, 12:01 AM

#

red hawk Yes you should be able to , but you have to do it correctly. Its depends a lot o...

I finally figured it out after analyzing different notebook scripts. it's the bfloat yeah. I set it to auto and I can now finally manage to run some AI models and start building my dataset.

zealous creek Nov 13, 2023, 12:42 AM

#

desert tusk I don't understand why trying to converge the gradient is better than bagging. A...

You are correct that the trees are independent in a Random Forest. But just because the assumption does not hold in Gradient Boosting, it does not mean that Gradient Boosting performs worse. It's quite the opposite, the assumption was relaxed for a reason. If the trees are not independent but rather they learn on the mistakes of the previous trees, the same predictive power can be achieved faster and with fewer trees. XGBoost went one step beyond Gradient Boosting because it is the first tree-based algorithm that has L1 and L2 regularization to help prevent overfitting. This is how tree-based algorithms evolved: RF => GB => XGB So it is not a surprise that most winning models are based on XGB.

desert tusk Nov 13, 2023, 6:55 AM

#

zealous creek You are correct that the trees are independent in a Random Forest. But just beca...

Does someone knows about good explaination of lightGBM which includes a numeric example?

dusky nacelle Nov 13, 2023, 7:18 PM

#

I am trying to get this question answered: If I upgrade to Google Cloud AI Platform Notebooks can I also submit that notebook on a competition ? basically by passing the runtime cap of 11 hours for instance or basically halving training time cause I am paying ?

verbal crest Nov 13, 2023, 7:47 PM

#

dusky nacelle I am trying to get this question answered: If I upgrade to Google Cloud AI Platf...

Depends on the competition. For code competitions you can't use Google Cloud to surpass the limits. If it's a standard competition where you are submitting a CSV, you can use as much compute as you want (on your local machine or on Google Cloud or wherever).

deft fox Nov 13, 2023, 9:38 PM

#

dusky nacelle I am trying to get this question answered: If I upgrade to Google Cloud AI Platf...

You can do whatever you want for training, including training on the local machine if you have it. When you create a model, save it on Kaggle so it can be accessed from any of your notebooks. Yet the time limit will come into play for the inference, which has to be done on via Kaggle notebooks and that is where the time limit will be enforced without exception.

dusky nacelle Nov 13, 2023, 9:44 PM

#

deft fox You can do whatever you want for training, including training on the local machi...

Thanks for that I was not aware I can only hook inference to the submission notebook, so I have been including training all this time :P. I actually had this thought at a random point, but I was not sure how to upload my model on Kaggle, is it via the datasets?

deft fox Nov 13, 2023, 9:46 PM

#

dusky nacelle Thanks for that I was not aware I can only hook inference to the submission note...

Yes, anything can be uploaded by creating a dataset. You link to it by adding a data source to your notebooks.

graceful flax Nov 14, 2023, 4:09 AM

#

why the nfl competition is not accepting responses ?

mystic bolt Nov 14, 2023, 1:26 PM

#

guys is there any book for begineer at data science?

sly jolt Nov 14, 2023, 1:53 PM

#

Hi, when awarding medals.. Does Kaggle also consider how old a notebook is? Like my notebook is 5 months old with 27 upvotes (22 non novice), yet it hasnt got silver medal

sly jolt Nov 14, 2023, 1:57 PM

#

sly jolt Hi, when awarding medals.. Does Kaggle also consider how old a notebook is? Like...

https://www.kaggle.com/code/akshitsharma1/easy-peasy-detailed-cnn-tutorial-for-beginners This is the notebook am facing issue with. please can someone check

Easy-Peasy Detailed CNN Tutorial for Beginners

Explore and run machine learning code with Kaggle Notebooks | Using data from Digit Recognizer

atomic tapir Nov 14, 2023, 5:32 PM

#

hi i was wondering if i could use ngrok to open a tunnel in order to remotely collect training data, i am taking part in the UBC Ovarian Cancer Subtype Classification and Outlier Detection competition
im passing this data to another computer with wandb...

sweet jasper Nov 14, 2023, 7:09 PM

#

Hello everyone!!! I am participating in a competition where it states that "Freely & publicly available external data is allowed, including pre-trained models" (so I understand I can use huggingface and other services) but it also states "Internet access disabled" for the notebook, so what do I do? Do I have to download the model?

deft fox Nov 14, 2023, 8:32 PM

#

sly jolt Hi, when awarding medals.. Does Kaggle also consider how old a notebook is? Like...

Yes, the age of the notebooks is a factor. Also, not all non-novice votes count. Kaggle doesn't explain that in detail, but in general non-novices who often upvote your posts or notebooks also may not count.

hollow grail Nov 15, 2023, 11:04 AM

#

for non internet notebook competitions, if I add a model using the sidebar while editing a notebook, it should be available when running on the hidden set right?

deft fox Nov 15, 2023, 4:02 PM

#

hollow grail for non internet notebook competitions, if I add a model using the sidebar while...

Right.

summer crypt Nov 16, 2023, 12:32 AM

#

hey! does anyone know why this training loop might not be updating gradients correctly:

for epoch in range(num_epochs):
    epoch_list.append(epoch+1)

    model.train()
    train_loss = 0.0

    for images, depths in tqdm(train_loader):
        images = images.to(device)
        depths = depths.to(device)

        outputs = model(images)

        loss = depth_loss(outputs, depths)

        loss.backward()
        optimizer.step()

        train_loss += loss.item()

    train_loss /= len(train_loader)

    model.eval()
    val_loss = 0.0

    with torch.no_grad():
        for images, depths in tqdm(val_loader):
            images = images.to(device)
            depths = depths.to(device)

            outputs = model(images)

            loss = depth_loss(outputs, depths)
            val_loss += loss.item()

    val_loss /= len(val_loader)

    print(f"Epoch [{epoch+1}/{num_epochs}], Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}")

    train_losses.append(train_loss)
    val_losses.append(val_loss)```

fickle perch Nov 16, 2023, 1:40 AM

#

summer crypt hey! does anyone know why this training loop might not be updating gradients cor...

So, in your training loop, it looks like you missed adding optimizer.zero_grad(). This is super important in PyTorch because, without it, your gradients start to pile up across different batches of data. Think of it like this: each time you pass a batch through your model and compute the loss, PyTorch calculates how much it needs to adjust the weights (that's the gradients). But if you don't reset these gradients to zero before the next batch, you're not just adjusting based on the new data, but you're also including adjustments from the previous data. It's like trying to fix a recipe but you're also considering the ingredients from your last cooking session. Not ideal, right?

So, just pop optimizer.zero_grad() right at the start of your training loop, just before you feed the images and depths into your model. This will make sure that each batch's weight adjustments are made cleanly, based only on that batch's data. That should fix the gradient updating issue and get your training back on track! 🚀👍

ornate ravine Nov 16, 2023, 2:53 AM

#

summer crypt hey! does anyone know why this training loop might not be updating gradients cor...

Is this part of an active competition? Make sure to not share code that’s currently being used for Kaggle competitions in the spirit of fairness

summer crypt Nov 16, 2023, 4:38 AM

#

fickle perch So, in your training loop, it looks like you missed adding optimizer.zero_grad()...

Thank you so much Derek! That was exactly the issue. This makes a ton of sense, thanks for the explanation

summer crypt Nov 16, 2023, 4:39 AM

#

ornate ravine Is this part of an active competition? Make sure to not share code that’s curren...

I'm so sorry, this is not code from a current competition, just a general question from a different project I'm working on!

void notch Nov 17, 2023, 11:28 AM

#

Hello Everyone, I download the dataset: automotive vehicles engine health dataset. However, I'm facing a lot of issues with the data. I'm not getting an accuracy greater than 64-67% for multiple models. I used RF, DT, MLP model. I'm focusing on MLP honestly.
I've fixed the imbalance classes issue, I did some feature engineering and fixed the outliers as well.
Is the issue from my side or from the data itself? Any suggestion on what should I do?

haughty oriole Nov 17, 2023, 1:23 PM

#

Hello guys my model keeps improving in validation scores and but keeps decreasing in kaggle score. can you guys give me some advice for this?

acoustic flame Nov 17, 2023, 1:54 PM

#

haughty oriole Hello guys my model keeps improving in validation scores and but keeps decreasin...

ooh your model seems to over fit i guess think of it this way

#

your dataset might have black cats and

#

models learn only black cats are cats

#

which is problematic ofc

#

try using cross validation to get a better estimate of the error

#

and use some regularization methods

barren phoenix Nov 17, 2023, 4:32 PM

#

If I submitted before the deadline but after the notebook runs and it's scored it's past the deadline will that be considered?

finite galleon Nov 17, 2023, 6:52 PM

#

My notebook cell freezes with the asterisk sign when I try to run it in Kaggle. Is this normal or there is some way to solve this problem. I restarted the kernel once already but the same issue happened.

deft fox Nov 17, 2023, 9:27 PM

#

finite galleon My notebook cell freezes with the asterisk sign when I try to run it in Kaggle. ...

There is nothing in this code snippet that would produce a visible output. What you think of as "freezing" is a notebook that imports several packages, sets a couple of parameters, and at that point it is done.

red hawk Nov 18, 2023, 1:24 PM

#

void notch Hello Everyone, I download the dataset: automotive vehicles engine health datase...

try a boosted tree algo like xgb or lightgbm. Tbh for a tabular dataset I won't go near a MLP (they're finicky to train correctly for tabular data). and unless your data is seriously unbalanced (ie < ~10%) of 1 class I won't bother with class balancing either. (mostly, except for the really unbalanced case, fixing it makes my model worse, not betterr. Other than that, you are the one who have explored the data so you are better placed to answer if there are 'issues' with the data...

deft fox Nov 18, 2023, 4:42 PM

#

red hawk try a boosted tree algo like xgb or lightgbm. Tbh for a tabular dataset I won't ...

I second everything @red hawk said. It may be worth trying this neural network https://github.com/dreamquark-ai/tabnet . It won't necessary produce better results than GBMs (it might on small datasets), but it typically produces different predictions than any other methods. Thus, it ensembles well with other models.

GitHub

GitHub - dreamquark-ai/tabnet: PyTorch implementation of TabNet pap...

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf - GitHub - dreamquark-ai/tabnet: PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf

slate pulsar Nov 20, 2023, 7:07 AM

#

https://stackoverflow.com/questions/77513962/facing-modulenotfounderror-even-though-module-exists-in-same-parent-directory

can anyone help me solve this?

Stack Overflow

Facing 'ModuleNotFoundError' Even Though Module Exists in Same Pare...

When I run train_pipeline.py:
from zenml import pipeline

from steps.ingest_data import ingest_df
from steps.clean_data import clean_df
from steps.model_train import train_model
from steps.evaluation

graceful axle Nov 20, 2023, 12:56 PM

#

hi everybody,

one of my submission notebooks are now running for more than 2 hours which is not normal, is this a bug in kaggle? or something is wrong with my notebook ? any idea?

deft fox Nov 20, 2023, 6:13 PM

#

graceful axle hi everybody, one of my submission notebooks are now running for more than 2 h...

Running times vary depending on server load. There are many notebooks running at once, and sometimes the system is slower.

graceful axle Nov 20, 2023, 6:46 PM

#

deft fox Running times vary depending on server load. There are many notebooks running at...

thanks for response, it is now 8 hours, and I double checked to make sure that I don't have endless loop or something like it in my notebook !

graceful axle Nov 20, 2023, 8:31 PM

#

oserror: [e053] could not read config file from c:\users\….
I am facing this error, kindly help me.

deft fox Nov 20, 2023, 9:13 PM

#

graceful axle oserror: [e053] could not read config file from c:\users\…. I am facing this err...

You don't give us enough context. With the limited info we have, an educated guess is that either the file doesn't exist or you have a space in file/directory name.

graceful axle Nov 20, 2023, 9:14 PM

#

oserror: [e053] could not read config file from c:\users\nandini agarwal\appdata\local\programs\python\python311\lib\site-packages\pyresparser\config.cfg

deft fox Nov 20, 2023, 9:14 PM

#

graceful axle thanks for response, it is now 8 hours, and I double checked to make sure that I...

I would suggest that you run inference on a single test file that is available and time it, then multiply that time with 2000. That should give you some idea how long it is going to take.

deft fox Nov 20, 2023, 9:14 PM

#

graceful axle oserror: [e053] could not read config file from c:\users\nandini agarwal\appdata...

Like I said, you have a space in directory name.

graceful axle Nov 20, 2023, 9:15 PM

#

Yes

deft fox Nov 20, 2023, 9:16 PM

#

graceful axle oserror: [e053] could not read config file from c:\users\nandini agarwal\appdata...

What is the exact python line that generates the error?

slim parrot Nov 21, 2023, 1:19 AM

#

For the mohs-hardness regression data set, where can I find what the acronyms of the features actually are e.g. whatis "el_neg_chi_Average". Am I just supposed to google this or is there somewhere I can find this for future competitions also?
'https://www.kaggle.com/competitions/playground-series-s3e25/data?select=train.csv

Regression with a Mohs Hardness Dataset

Playground Series - Season 3, Episode 25

main mango Nov 21, 2023, 6:28 AM

#

graceful axle oserror: [e053] could not read config file from c:\users\nandini agarwal\appdata...

It could be because of backslash escapes e.g. c:\users\nandini\... is read as

c:\users
andini\...

\n is taken as "newline". Try using raw strings r"..." i.e.

r"c:\users\nandini agarwal\appdata\local\programs\python\python311\lib\site-packages\pyresparser\config.cfg"

austere pier Nov 21, 2023, 2:06 PM

#

Hi everyone,
we are doing a project based on anomaly detection through video surveillance. Our project is used mainly in sports stadiums to detect anomalies such as assault, explosion, fighting among fans etc. The surveillance video is captured by autonomously repositioning slave robots through cameras. These robots then check for the anomalies. If an anomaly is found, it sends the video footage to a central server for anomaly classification. We want an unsupervised model which takes videos as inputs. It also learns from the live video it detects during deployment.
Can anyone suggest a model to be used at the slave robot cameras?

cerulean coral Nov 21, 2023, 6:53 PM

#

hey! im started the course of python and the first exercise show me this error, how can i fix this?

#

i got it , I puted "run all" and it works

cunning iron Nov 22, 2023, 1:17 AM

#

Hi everyone, i'm currently participating in the SenNet competition and to send a submission you need to turn off the internet access of the notebook. Since i'm using the segmentation-models-pytorch library i had to upload the output of !pip download segmentation-models-pytorch as a kaggle dataset. But i get the following error... Anyone had a similar issue while trying to install a package without internet ?

acoustic flame Nov 22, 2023, 5:41 AM

#

graceful axle oserror: [e053] could not read config file from c:\users\nandini agarwal\appdata...

put these types of errors on chat gpt or bing ai they help a lot

livid bison Nov 22, 2023, 4:12 PM

#

Hey, when there is an error in the submission is there anyway to check what went wrong? If not, how do people usually debug it? I am not sure why my submission is generating an error

heavy granite Nov 22, 2023, 4:36 PM

#

Is it normal for the icon of the .json file saved in kaggle to be marked as {i}? If not, how to solve it?

hidden dome Nov 22, 2023, 8:20 PM

#

Is the a room for the optiver-trading-at-the-close channel?

atomic quarry Nov 23, 2023, 11:52 AM

#

Hello everyone, I have a college work similar to kaggle competitions, where we're required to submit our predictions of an "Xtest" and then we'll be evaluated based on how our model performs. For the training phase, I'm leaning towards using cross validation and evaluating my model based on its results, however, a friend is doing a split of the "training" data into training / validation / test, and they say that it gives more correct metrics. My stance is that doing a further split reduces the amount of data we're using to train, and we also risk having false confidence in our model. Am I correct on this matter? And, what's the general rule to follow in the scenario of competitions?

slate citrus Nov 23, 2023, 2:18 PM

#

Hello everyone, i am working on a dataset and it has a categorical column with 31 unique values in it, if i perform onehotencoding on it i will have 31 extra columns, so is this correct way to do it or is there any better way to do it

dusk jungle Nov 23, 2023, 6:54 PM

#

slate citrus Hello everyone, i am working on a dataset and it has a categorical column with 3...

It depends on the data. OneHotEncoder is used when there isn't any order in your data. If there's an underlying hierarchy in your categories (e.g., High School, Undergraduate, Graduate), you may use OrdinalEncoder instead of OneHotEncoder.

dusk jungle Nov 23, 2023, 6:56 PM

#

atomic quarry Hello everyone, I have a college work similar to kaggle competitions, where we'r...

You can use the training data for cross validation, and also have a hold-out set for testing after CV.

dusk jungle Nov 23, 2023, 7:00 PM

#

atomic quarry Hello everyone, I have a college work similar to kaggle competitions, where we'r...

If you have a very large dataset, a single train-validation set might give you accurate results. Overall, Cross Validation is preferable.

atomic quarry Nov 23, 2023, 10:21 PM

#

I see, thank you.

deft fox Nov 23, 2023, 11:55 PM

#

atomic quarry Hello everyone, I have a college work similar to kaggle competitions, where we'r...

Generally speaking, you are correct. Your friend might have picked a fortuitously good data split that gives better metrics on test data, but it doesn't mean the model is better. If you do a 5-fold validation on train data and determine the CV score, in practice it should match better with test scores unless you got really poor train/test split. On the same train/test split, where you use your train data for cross-validation and your friend splits it additionally to train and validation, your approach should get better agreement between CV and test scores in a large majority of cases. Not always, though. That's not a method deficiency, but rather a luck of the splitting process.

slate citrus Nov 24, 2023, 6:12 AM

#

Hello everyone, Question. A deeper tree can fit the training data better, but why it can also lead to overfitting?

heavy granite Nov 24, 2023, 8:32 AM

#

Is it normal for the icon of the .json file saved in kaggle to be marked as {i}? If not, how to solve it?

vapid dirge Nov 24, 2023, 3:07 PM

#

slate citrus Hello everyone, Question. A deeper tree can fit the training data better, but w...

It’s the quality of the training data that could lead to overfittjng. If your model is trained to look for apples, then it may be too generalized and return everything that’s red and round - you may have a lot of training data, but does it describe the characteristics of an apple accurately?
See few-shot training.

tender stratus Nov 25, 2023, 4:55 AM

#

Sorry to ask it here, but I am unable to link my kaggle account with discord, what do I do?

tender stratus Nov 25, 2023, 5:06 AM

#

tender stratus Sorry to ask it here, but I am unable to link my kaggle account with discord, wh...

I think it is linked now, could someone help

placid hamlet Nov 25, 2023, 2:25 PM

#

hello everyone can someone suggest laptop hardware for machine learning? I'm new to it so don't know what kind of hardware it needs, so please guides me.

austere horizon Nov 26, 2023, 12:54 PM

#

Hi

#

My account has blocked? Why

craggy zephyr Nov 26, 2023, 2:31 PM

#

I am learning Machine Learning/Deep Learning on coursera and I also know some basic about Pythons.
I am currently working on my BS Final Year Project named Data Driven Strategy for Load Forecasting of Power Systems.

I want to join a team or wanna work with some experts to learn
Please count me in.

red hawk Nov 26, 2023, 3:13 PM

#

craggy zephyr I am learning **Machine Learning/Deep Learning** on coursera and I also know som...

this channel is meant for asking questions, so you are more likely to find someone to pair with in #👥┊looking-for-a-team or one of the dedicated competition channels

austere horizon Nov 26, 2023, 5:14 PM

#

austere horizon My account has blocked? Why

Anyone help?

deft fox Nov 26, 2023, 7:04 PM

#

austere horizon Anyone help?

Not sure anyone can help you other than admins. Pretty sure they don’t work on Sunday.

dusk jungle Nov 27, 2023, 1:37 PM

#

placid hamlet hello everyone can someone suggest laptop hardware for machine learning? I'm new...

It would be interesting to know how much you're willing to spend to give you a more optimal suggestion. But overall, an Intel Core i5 and 8GB RAM is enough for most tasks. For Deep Learning and Neural Networks, you will need GPU. You don't need to buy a laptop with GPU though, since you can use cloud computing solutions. Kaggle offers free 30hrs/week cloud GPUs so you can train neural nets.

deft fox Nov 27, 2023, 4:23 PM

#

dusk jungle It would be interesting to know how much you're willing to spend to give you a m...

I agree that for most machine learning tasks that don'e involve deep learning almost any modern computer with multiple CPUs will do. But still, this comment But overall, an Intel Core i5 and 8GB RAM is enough for most tasks. I think is off the mark. Hardly anything these days can be done with 8 GB of memory, as the operating system will take a good chunk of that memory, unless one is planning to use Linux exclusively. I don't think it is worth saving $150-200 on memory and I strongly recomment at least 16 GB RAM. A GPU is a must for deep learning application, but that will make a laptop expensive. I think the suggested Kaggle GPU solution is a good option.

dusk jungle Nov 27, 2023, 4:33 PM

#

Sure! I was considering the minimum requirements, but indeed a 16GB RAM at least would be much ideal. I have been using a computer with 8GB and recently upgraded to a 16GB one for more optimal performance.

red hawk Nov 27, 2023, 8:33 PM

#

placid hamlet hello everyone can someone suggest laptop hardware for machine learning? I'm new...

to add to the other recommendations, I would say that if you don't have portability requirements, ie you don't plan to carry the computer around much, it is much better value for money to get a desktop. And definitely get at least 16GB. You don't need a GPU but it can be quite convenient to have one that you don't need to be worried about turning off. (and if you are into video games, you might as well accomplish two things at once by getting a decent nvidia gpu )

graceful axle Nov 28, 2023, 10:24 AM

#

Hi! I joined a competition on Kaggle and they shared a customised python package along with the .csv files. I am using Windows and the package file is .SO which is only for Linux. Does anyone know how I can solve this issue? Right now, I cannot run the package since it doesn't recognize the extension.

plain copper Nov 28, 2023, 2:28 PM

#

I have a question, best answered soon if possible

#

Is it legal to obtain someone's health data to build a project on ml ? That too without any doctor's or government consent ?Like as we know many health datasets are contributed by hospitals and medical researchers but is it legal to be collected by students without any proper knowledge on the field?

cosmic mortar Nov 28, 2023, 3:17 PM

#

Hy everybody...I am searching someone to make team for kaggle competition to learn and share knowledge while working on a project. If interested please reply.

craggy zephyr Nov 29, 2023, 4:53 AM

#

ML Course by Andrew Ng has few assignments, I cant solve the Practice Assignment of Week#2, Can you help me?

tropic copper Nov 29, 2023, 3:38 PM

#

What sort of development environment should I be using as someone relatively new to all of this? Right now I'm just writing Python code in notepad++ and running via cmd line. Would it be more efficient/better in some way for me to use an IDE or some other tool instead?

verbal crest Nov 29, 2023, 8:26 PM

#

@tropic copper Jupyter Notebooks are very popular for datascience, Kaggle's notebook editor (or Colab) are online versions of that style of IDE, but you can also set it up to use locally.

tropic copper Nov 29, 2023, 8:27 PM

#

Thanks for the info! I've used those a lot in Coursera courses. What makes them so popular?

verbal crest Nov 29, 2023, 10:35 PM

#

@tropic copper I think the ability to interweave code and output back and forth (and to go back and edit previous steps when needed) is all very handy when doing data science exploration.

placid hamlet Nov 30, 2023, 8:21 AM

#

Is nividia GTX 1650 sufficient for an entry level deep learning tasks?

hushed crescent Nov 30, 2023, 3:29 PM

#

Hello 👋
I'm participating to my first kaggle and I'm blocked at the submission level.
My submission notebook crashes and I'm trying to figure out exactly how it works to make sure I'm not doing it wrong.

Do you know if the submission notebook runs have access to the internet? My first notebook cell is a pip install and if the notebooks do not have access to the internet it would explain the failure :/

sharp pawn Nov 30, 2023, 6:04 PM

#

Hey all! Im fairly new to CS as a whole and was wondering if there are some pre-reqs i should know before attempting kaggle comps? Projects are the best for learning im told! But I also know I am fairly inexperienced and i will not learn much if it is too hard

deft fox Nov 30, 2023, 7:33 PM

#

placid hamlet Is nividia GTX 1650 sufficient for an entry level deep learning tasks?

4 BG memory is borderline even for entry level deep learning tasks. If you don't need to download an external model, or deal with relatively small classification tasks, it might work. I have two GTX 1080s which are 8 GB each, and I find it insufficient more frequently these days than 5 years ago.

timid forge Dec 1, 2023, 11:41 AM

#

Hi all need help to know best practices. I'm working on a project where I need to build couple tables where if I have the most granular data then it would create duplicates so is it better to have 3 different tables each of them having primary key on one column or making one table where primary key is combination of multiple columns

junior cave Dec 2, 2023, 7:52 AM

#

Hey Folks! I'm just starting on my ML adventure and I've got a question. I've created a simple one layer neural network to solve the Titanic Challenge. I've set it up such that when training is done I export the weights to disk so they can be reused. Training appears to be working pretty well. However, when I start the network with the trained weights and train some more the network starts in a state with a bit more loss than when training ended on the previous run. I would think that I would start at the point at which training ended. Does anyone know why this would be the case? Here's a link to my (very ROUGH/experimental) project incase it is helpful - https://github.com/chuckfinca/kaggle_titanic_competition

GitHub

GitHub - chuckfinca/kaggle_titanic_competition

Contribute to chuckfinca/kaggle_titanic_competition development by creating an account on GitHub.

craggy zephyr Dec 2, 2023, 8:26 AM

#

ML with Andrew Ng, Course#1, Week#2, Practice Lab Assignment
I am facing an issue regarding the assignment mentioned above.
After submitting, I receive an error. " Comment line with index: UNQ_C1 wasn’t found in code"
Can someone help me with this?

craggy zephyr Dec 2, 2023, 10:46 AM

#

ML with Andrew Ng, Course#1, Week#2, Practice Lab Assignment

I am facing an issue regarding the assignment mentioned above. After submitting, I receive an error. " Comment line with index: UNQ_C1 wasn’t found in code" Can someone help me with this?
Link: https://lnkd.in/daynpUSe

This link will take you to a page that’s not on LinkedIn

sweet ice Dec 2, 2023, 8:03 PM

#

Hi Kagglers I hope everyone is better, I want some advice or help to get job any machine learning and / or data science, feel free to dm's me

copper whale Dec 3, 2023, 1:49 AM

#

Can Kaggle ressources support multithreading for mistral 7b? I want to build a dataset and I need an AI to help me do that. ChatGPT rate limits. Thing is that going one prompt at a time is very slow and I wondering if it is possible to multithread ( so ask multiple times per time for model to generate text).

austere horizon Dec 3, 2023, 1:28 PM

#

Can anyone help me

#

my old account has blocked please help me

severe rune Dec 3, 2023, 2:11 PM

#

Code:

tuned_model = "codegood/HF_AWS_Mistral_SC"

trainer.model.config.save_pretrained(tuned_model + "config")
trainer.save_model(tuned_model)
torch.save(model.state_dict(), "/kaggle/working/HF_AWS_Mistral_SC/Mistral_torch_model.bin")
trainer.push_to_hub(tuned_model_SC)
print("Model saved to Huggingface")

Error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

I'm trying to load and retrain my fine tuned model. I'm able to load the model, but get the above error during trainer.train(). Not able to figure out the problem.

Also, how to upload the bin model to the Huggingface directly?

little kelp Dec 3, 2023, 4:04 PM

#

Hi everyone, it is possible to delete a submission on the leaderboard, or just simply hide my score ?

tame bough Dec 4, 2023, 4:50 AM

#

Hey everyone! I have one project which I would love to share on public platforms like Kaggle and GitHub. However, I cannot share it without anonymizing, any idea on how to anonymize large amounts of data in Excel? So that powerBI would show anonymized names?

echo latch Dec 4, 2023, 1:11 PM

#

Hey i am new to kaggle i have question
Why use jupyter notebook and use cells
why not use and ide and simply type the code (without cells) and each function or line of code specifically in a cell

shrewd scarab Dec 4, 2023, 5:15 PM

#

echo latch Hey i am new to kaggle i have question Why use jupyter notebook and use cells w...

Jupyter notebooks are easier to work with as compared to using just a normal python file. If you are working with really complex code that takes a long time to run, it can be easier to just run it one cell at a time. It is also easier to debug(sometimes) since the code can be debugged one cell at a time. I also may just be a personal preference shared by most of the community.

verbal crest Dec 4, 2023, 10:13 PM

#

@echo latch In the Kaggle editor you can swap from a notebook to a 'script' which is just a single python file if you'd prefer to work that way. It's really just a matter of personal preference.

velvet plume Dec 4, 2023, 10:58 PM

#

Would anyone in the data science field with at least 1-2 years of experience be willing to participate in an interview for a school project (I'm considering pursuing a career in data science) ? The questions are below (Feel free to dm me or respond within the channel), Thanks for your time!

What caused you to gain interest in data science, and how did you enter the field?

Can you describe a typical day or week in your role ?

What types of projects do you typically work on ?

What programming languages, tools, and libs are most essential for you ?

Can you share an example of a challenging problem you've faced in a project and how you went about solving it?

How do you stay updated with the latest developments and trends in the field ?

What are some common misconceptions about the field of data science, and how would you address them?

What advice do you have for someone just starting their career in data science ?

tropic copper Dec 5, 2023, 3:40 PM

#

@echo latch I am pretty new and I just started using a locally ran Jupyter notebook. It is SO much more convenient and easy than Notepad++ and than running via CMD prompt. It helps me chunk the program into easily digestible sections than I can move around, reposition, etc.

I've also found a lot of time can be saved with Jupyter because I can quickly add or remove "debugging" points, e.g if I want to see what the output looks like after I remove or edit a particular block of code, it seems much easier and quicker to do this in Jupyter.

gilded flame Dec 6, 2023, 6:59 PM

#

Dear community,

this semester I started a course where we learn to code kind of AI in Python. For me its looks like more machine learning. But nvm.

So the task as final exam is to code a programm which can answer to questions about a data set:

_

‒ Two individual / unique research questions per student are required
Procedure:
‒ Students search themselves for large and relevant data sets
‒ Students define two questions that should be answered for the selected data sets
‒ Lecturers check the data sets and the corresponding questions (such that problems are
difficult enough but not too difficult)
‒ Implementation of the solutions by the students on their own systems using the
presented libraries and methods of the lecture

The resulting program code (as *.py) and the corresponding program execution need to
be analyzed regarding the run-time behavior. Which parameters influence the run-time in
which way?
‒ Code analysis and run-time behavior evaluation need to be executed per research
question.
The methods we learn and shall use:

Data and data preparation (Pandas etc.)
Classification I (Support Vector Machines)
Classification II (Decision trees, Random forests)
Clustering (kmeans, DBScan)
Testing and quality assurance (run time analysis)
Dimension reduction, anomaly detection
Neural Networks
raining deep learning networks
Pipelines and MLOps
As im doing all the homework, I dont think that the coding part is my challenge.

My problem is, to define two question which would fit the requirements. Can you give me examples? The question should not be answerable by statistic.

For example I choose this dataset:
https://www.kaggle.com/datasets/nelgiriyewithana/billionaires-statistics-dataset/discussion
=> But I dont know which question could I define for this, which can be solved by the methods above.

I am also open for new datasets.

Thanks in advance!

Billionaires Statistics Dataset (2023)

Exploring the Global Landscape of Success

cursive vigil Dec 6, 2023, 7:29 PM

#

Best AI ML DL DS Roadmap

Hi! What is the best complete roadmap for AI, ML, DL, and Data Science?

Some roadmaps I have found:

Which one should I choose?
I am not a beginner in programming (8y as a hobby and 3y working), but it was not related to AI.

shrewd scarab Dec 7, 2023, 3:48 AM

#

cursive vigil ## Best AI ML DL DS Roadmap Hi! What is the best complete roadmap for AI, ML, D...

The best roadmap for any of those would change at least somewhat, since they are all slightly different. Many people get stuck up thinking about the best possible path, but the most important thing you can do is to start on a path. If you are looking for a great starting place(since you have some experience), I would recommend doing some analysis on Kaggle datasets using pandas, matplotlib, etc.

void notch Dec 7, 2023, 10:57 AM

#

Hello !!
How can I improve the accuracy? I'm using an MLP model of only Dense Layers.
How is it possible to remove these crazy spikes.

I have tried the following:
1- Early Stopping
2- Reducing model complexity
3-Reduce LR
4- Dropout layers and Batch Normalization
5- Gaussian noise layer
6 - fixed the issues with the dataset.

You help is much appreciated!

deft fox Dec 7, 2023, 5:29 PM

#

void notch Hello !! How can I improve the accuracy? I'm using an MLP model of only Dense La...

I suggest you reduce the width and the depth of your network, use batch normalization and dropout, smaller learning rate, and try larger batch size. But in the end you still may not get much better accuracy.

void notch Dec 7, 2023, 5:33 PM

#

deft fox I suggest you reduce the width and the depth of your network, use batch normaliz...

I’ve added dropout layers and batch normalization layers as well. Also, the batch size is have is 32 i have increased it to 64 and it got even worse. I’m starting the LR at 1e-4.
Is there anything else, i need to try? I’ll to reduce the width and depth

deft fox Dec 7, 2023, 5:36 PM

#

void notch I’ve added dropout layers and batch normalization layers as well. Also, the batc...

Maybe try smaller batch size than 32. Large dropout (0.4-0.5) may be needed. In the end, without more data (how much do you have?) this could be the best score one can get.

void notch Dec 7, 2023, 5:38 PM

#

deft fox Maybe try smaller batch size than 32. Large dropout (0.4-0.5) may be needed. In ...

In total of 19,000 samples, around 11,000 for training set, 4000 for validation and 4000 for testing

next prairie Dec 7, 2023, 9:15 PM

#

Hey everyone!

Just finished “Machine Learning Specialization” on Coursera, from Andrew Ng. Excited to dive deeper into the field!

Any recommendations on what I should learn next or any valuable resources you could suggest? Your insights would be greatly appreciated!

night gorge Dec 8, 2023, 11:17 AM

#

Ive recently done the tutorial Titanic Competition, and wanted to redo it with an ML model. However, my model is now getting a 0 public score. Idk where I'm going wrong or how to test…

Here is the link to my notebook https://www.kaggle.com/code/abishekjayan/this-is-where-it-starts

This is where it starts

Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster

deft fox Dec 8, 2023, 11:23 AM

#

In most competitions you are supposed to submit probabilities, not binary predictions. I suggest you delete your In: 12 cell and in the following cell use output = pd.DataFrame({'PassengerId': df_test.PassengerId, 'Survived': predictions.flatten()})

night gorge Dec 8, 2023, 11:48 AM

#

ok let me try...btw is there any way to check the score without submitting for competition?

#

right now in order to know the score im just submitting and checking the public score

night gorge Dec 8, 2023, 12:11 PM

#

deft fox In most competitions you are supposed to submit probabilities, not binary predic...

still 0 score

vague folio Dec 9, 2023, 5:33 AM

#

I need help to know more about feature engineering. Please provide me some resources so that I can catch the vital concept.

mossy delta Dec 9, 2023, 12:18 PM

#

night gorge Ive recently done the tutorial Titanic Competition, and wanted to redo it with a...

So for that very same competition, has anyone been able to fill the missing values in the Cabin section?

austere pier Dec 10, 2023, 3:56 PM

#

Hi.
Can anyone suggest a good unsupervised learning method for anomaly detection(like assault, robbery, vandalism etc)?

quick mirage Dec 12, 2023, 11:02 AM

#

Can anyone help me with the following discussion : https://www.kaggle.com/discussions/questions-and-answers/461022

AI Art Generation | Kaggle

AI Art Generation.

tough panther Dec 12, 2023, 12:14 PM

#

Hello guys can someone provide an explanation for this?

quick mirage Dec 12, 2023, 12:39 PM

#

tough panther Hello guys can someone provide an explanation for this?

Hi Alex, from the learning curve displayed, it seems that the algorithm hasn't learned the target function, this is shown by the high and increasing training error, and since the definition of bias is the ability of the learning algorithm to approximate the learning function, it seems that "according to the question" the test error is unacceptable, it seems that the model isn't approximating the function well. Also the model has little generalization error "Variance" since as the number of data points (the size of the dataset increases) the training error and the test error come closer to each other (to the high error value that is unacceptable), I recommend that video for understanding the curve better : https://youtu.be/zrEyxfl2-a8?si=k4DdOTt0TM72kagH, its a great course btw that I helped me alot during my studying of the Machine learning course, feel free to ask for any elaboration

YouTube

caltech

Lecture 08 - Bias-Variance Tradeoff

Bias-Variance Tradeoff - Breaking down the learning performance into competing quantities. The learning curves. Lecture 8 of 18 of Caltech's Machine Learning Course - CS 156 by Professor Yaser Abu-Mostafa. View course materials in iTunes U Course App - https://itunes.apple.com/us/course/machine-learning/id515364596 and on the course website - ht...

▶ Play video

tough panther Dec 12, 2023, 12:40 PM

#

quick mirage Hi Alex, from the learning curve displayed, it seems that the algorithm hasn't l...

tyvm

quick mirage Dec 12, 2023, 12:40 PM

#

tough panther tyvm

u r welcome

austere horizon Dec 12, 2023, 12:57 PM

#

#

Please any one help me with this

verbal crest Dec 12, 2023, 6:19 PM

#

austere horizon

We can't help you here, you need to contact support (kaggle.com/contact)

austere sequoia Dec 15, 2023, 8:12 AM

#

does anybody know how can I learn to do ensembles that perform well in competitions? stacking, etc. that improves the metrics? Thank you!

pulsar sky Dec 15, 2023, 9:08 AM

#

Hi guys quick question I had
Can selectolax be used to scrape dynamic content of a webpage?

crimson dragon Dec 17, 2023, 9:07 PM

#

Hello , can someone please explain to me the cross-validation and how can i use it

deft fox Dec 17, 2023, 9:50 PM

#

crimson dragon Hello , can someone please explain to me the cross-validation and how can i use ...

You have to be willing to put in a minimal effort on your own. Your question is easily answered by Googling https://www.google.com/search?q=cross-validation

www.google.com

Cross-validation

Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set.

fallen mist Dec 18, 2023, 2:39 PM

#

Hey, I'm hosting my own competition and I can't see how I can pin my demo notebook to the top of the code notebooks section? Any ideas? I've seen it done in other competitions. Is this something staff can answer @tardy lodge?

lone hearth Dec 18, 2023, 3:08 PM

#

Does anyone know what could be the issue when submitting a notebook that runs fine in Kaggle? I submitted to the LLM Detection Competition using Keras NLP. I ran the notebook before and it was fine, training, evaluating, and saving the submission.csv. It failed when I submitted though, so I copied this from the Log: Downloading data from https://storage.googleapis.com/keras-nlp/models/distil_bert_base_en_uncased/v1/vocab.txt
272.5s 101 Traceback (most recent call last):
272.5s 102 File "<string>", line 1, in <module>
272.5s 103 File "/opt/conda/lib/python3.10/site-packages/papermill/execute.py", line 128, in execute_notebook
272.5s 104 raise_for_execution_errors(nb, output_path)
272.5s 105 File "/opt/conda/lib/python3.10/site-packages/papermill/execute.py", line 232, in raise_for_execution_errors
272.5s 106 raise error
272.5s 107 papermill.exceptions.PapermillExecutionError:
272.5s 108 ---------------------------------------------------------------------------
272.5s 109 Exception encountered at "In [18]":
272.5s 110 ---------------------------------------------------------------------------
272.5s 111 gaierror Traceback (most recent call last)
272.5s 112 File /opt/conda/lib/python3.10/urllib/request.py:1348, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
272.5s 113 1347 try:
272.5s 114 -> 1348 h.request(req.get_method(), req.selector, req.data, headers,
272.5s 115 1349 encode_chunked=req.has_header('Transfer-encoding'))
272.5s 116 1350 except OSError as err: # timeout error

frozen sail Dec 19, 2023, 4:09 PM

#

lone hearth Does anyone know what could be the issue when submitting a notebook that runs fi...

Try these things and if it doesn't work it has to do with the content of the notebook itself.

urban bone Dec 20, 2023, 4:19 AM

#

Hey, there. Does anyone working on question generation using llm. Pls do help me.

spiral valve Dec 20, 2023, 10:35 AM

#

Hi there. I posted a question here: https://www.kaggle.com/competitions/titanic/discussion/462500 . Can any1 help me? It's about the titanic challenge (im using pytorch)

Titanic - Machine Learning from Disaster

Start here! Predict survival on the Titanic and get familiar with ML basics

dull mauve Dec 20, 2023, 10:02 PM

#

has any one tried using huggingface autotrain advanced on kaggle, how was the experience? please share

lethal raft Dec 21, 2023, 10:21 AM

#

Have anyone done any work on Predict energy behavior of prosumers??

cold tundra Dec 22, 2023, 12:54 PM

#

Hi guys,
I built a web app to predict the classification of flowers using Machine Learning. I just need help with the last step, I have so far been able to succesfully connect the HTML and their corresponding routes, just the last step is not working.

While the model makes a prediction it returns a number

0 or 1 or 2 depending upon the flower it has predicted, instead I get a None in there in the HTML file, But I checked the logs the output from the predicting function is correct.

Kindly help me debug this.

Code link:
https://github.com/Kaus1kC0des/OIBSIP/tree/main/Data Science/Task 1

GitHub

OIBSIP/Data Science/Task 1 at main · Kaus1kC0des/OIBSIP

This repository contains all the code my projects during my internship with Oasis Infobyte - Kaus1kC0des/OIBSIP

plush cairn Dec 23, 2023, 5:39 PM

#

How many kaggle notebooks one can run in parallel?
Mine gives error when I try the third.

deft fox Dec 24, 2023, 8:36 PM

#

plush cairn How many kaggle notebooks one can run in parallel? Mine gives error when I try ...

I've ran at least 5-6 notebooks in parallel, but it was a while ago. Maybe they changed something recently, or implemented stricter controls during busy times. What is the error message?

plush cairn Dec 24, 2023, 10:24 PM

#

deft fox I've ran at least 5-6 notebooks in parallel, but it was a while ago. Maybe they ...

"GPU Session cap reached"

deft fox Dec 25, 2023, 4:39 AM

#

plush cairn "GPU Session cap reached"

There is a 30-hour per week GPU limit on Kaggle. That message most likely means you have reached it and will have to wait until next week.

ember void Dec 25, 2023, 5:45 AM

#

Does submitting a notebook in GPU mode, consumes our GPU quota?

muted talon Dec 25, 2023, 3:54 PM

#

For content based recsys, when doing the recommendation based on popularity, to mitigate the caveat of having high discrepancies regarding the number of evaluations / ratings per items, a damped mean of the the target metric is a common and solid solution.
Was wondering, what other alternatives to the damped mean are there?

split flume Dec 27, 2023, 12:45 AM

#

how difficult would be be to implement graph based neural networks (GCN and GNN) in kaggle? I am struggling to find projects which utilize them

frozen sail Dec 27, 2023, 1:42 PM

#

dont spam

fathom grove Dec 27, 2023, 7:21 PM

#

For the Titanic ML dataset competition, there is a lot of missing data present in the Age column and the cabin column. My current guess is that age has to do a lot in matters of survival (Physical ability etc). I've found that https://en.wikipedia.org/wiki/Passengers_of_the_Titanic#:~:text=The ship's passengers were divided,military personnel%2C industrialists%2C bankers%2C contains a list of passengers with their ages. Is it correct if I can impute the values from this?

Passengers of the Titanic

A total of 2,240 people sailed on the maiden voyage of the Titanic, the second of the White Star Line's Olympic-class ocean liners, from Southampton, England, to New York City. Partway through the voyage, the ship struck an iceberg and sank in the early morning of 15 April 1912, resulting in the deaths of 1,517 passengers.The ship's passengers w...

verbal crest Dec 28, 2023, 7:08 AM

#

fathom grove For the Titanic ML dataset competition, there is a lot of missing data present i...

Check out some of most voted notebooks in the titanic competition - most talk about dealing with misssing values in the data (an important data science skill the competition is trying to teach). You shouldn't look for external data with all the answers - the goal is to find ways to deal with missing data). Also check out the Kaggle course on missing values here: https://www.kaggle.com/code/alexisbcook/handling-missing-values

Handling Missing Values

Explore and run machine learning code with Kaggle Notebooks | Using data from Detailed NFL Play-by-Play Data 2009-2018

glossy finch Dec 28, 2023, 11:04 PM

#

Because the function pd.get_dummies() depends on the data it is being fit on, df_train and df_test end up having different columns.
Therefore, if I fit a model on the training data, it cannot fit onto the test data.

#

how to solve this?

split flume Dec 29, 2023, 4:29 AM

#

split flume how difficult would be be to implement graph based neural networks (GCN and GNN)...

nvm i figured out a basic one. randomized adjacency matric so low accuracy but works as a proof of concept

hasty sand Dec 31, 2023, 12:40 PM

#

Hi kaggle community, I have been working on a project and I am unsure on how to do calculations on tuple data. My dataframe has data in the form (x, y) in every cell and I would like to add numbers of all the y data, depending on what x is, to a row total. I have about 1500 rows. what is the best way of doing this?

hasty sand Dec 31, 2023, 1:00 PM

#

Here is the data I would need to do this on.

📎 personality_scores.csv

graceful axle Jan 1, 2024, 10:24 PM

#

Hi community,

I have a question related to k-fold cross-validation.

I'm currently training a classification model on a relatively small dataset (approximately 500 images across 5 classes) using a 5-fold approach. At the end of this process, I have five models. For my submissions, I utilize all five models to make predictions and take the average of their scores.

Is there any specific approach to replace these 5 models with only 1 model?
I need to do this to be able to use model ensemble method.

fathom grove Jan 2, 2024, 3:54 PM

#

verbal crest Check out some of most voted notebooks in the titanic competition - most talk ab...

Similarly for the cabin variable, can I refer to external data (Schematics etc) to impute values or is that approach wrong?

elder flower Jan 3, 2024, 11:48 AM

#

Why can't I make my notebook public?

golden nova Jan 3, 2024, 1:51 PM

#

Hello all,
I'm just a begineer to this field. I'm facing a problem or in simpler words stuck in a loop.
I'm pretty well aware about the theory and conceptual knowledge required of py, kaggle, maths, ml and all, but I'm not able to put things together to build my FIRST ML MODEL. Can anybody of you help me out with this.

muted cliff Jan 3, 2024, 1:52 PM

#

Hi ! Where are you stuck ?

#

Did you go through the kaggle tutorials @golden nova ?