fallow peak Aug 3, 2023, 9:02 PM

#

What would you do with a Meta Kaggle for Code dataset? 🙂 https://twitter.com/MeganRisdal/status/1687203400860237825?s=20

meg.ehh 🇨🇦 (@MeganRisdal)

We're going to release a public @kaggle dataset of ML code soon. If you'd like to write some notebooks illustrating some creative uses, please message me for early access!

wintry tree Aug 4, 2023, 7:16 AM

#

fallow peak What would you do with a Meta Kaggle for Code dataset? 🙂 https://twitter.com/Me...

Will the licence allow for training LLMs ?

fallow peak Aug 4, 2023, 1:26 PM

#

wintry tree Will the licence allow for training LLMs ?

We're planning to use Apache 2.0 which is the same license as what all of the individual contents themselves are under already, so yes

#

Google Research recently released this dataset under two licenses which I thought was interesting: https://sites.research.google/open-buildings/. A kind of "choose your own adventure" license.

wintry tree Aug 4, 2023, 8:39 PM

#

fallow peak Google Research recently released this dataset under two licenses which I though...

Seems like a legal dep. nightmare lol but why not

fallow peak Aug 7, 2023, 8:20 PM

#

We recently worked with Stack Overflow to get all of their annual developer survey datasets up on Kaggle. Here's the most recent one: https://www.kaggle.com/datasets/stackoverflow/stack-overflow-2023-developers-survey

Stack Overflow 2023 Developer Survey

quiet cypress Aug 8, 2023, 8:48 AM

#

Is it ok to post our datasets here?

fallow peak Aug 8, 2023, 1:15 PM

#

quiet cypress Is it ok to post our datasets here?

If it's a dataset of yours you'd like to promote, I recommend sharing in #🔗┊sharing-projects. If people only promote their datasets in this channel, it won't be useful to many people. On the other hand, if someone in this channel asks for a data source and you feel yours is a good match then it would make sense to post in here.

#

^^ anyone interested in #💾┊data please feel free to chime in with your thoughts on what you'd like to see / not see in this channel

fallow peak Aug 9, 2023, 7:36 PM

#

fallow peak What would you do with a Meta Kaggle for Code dataset? 🙂 https://twitter.com/Me...

Annndd it's here! Announcing the release of the Meta Kaggle for Code dataset: https://www.kaggle.com/discussions/product-feedback/430422

Introducing Meta Kaggle for Code: A new open dataset for ML | Kaggle

Introducing Meta Kaggle for Code: A new open dataset for ML.

fallow peak Aug 15, 2023, 5:26 PM

#

A reminder to share self-promotional projects in #🔗┊sharing-projects 🙂

#

BTW for folks interested in #💾┊data , the Kaggle Datasets product team (inc'l myself, engineers, designer) is thinking of doing an AMA in Discord in the next few weeks or so. Would people be interested? Anything in particular you'd look forward to learning or talking about with us?

hardy bay Aug 16, 2023, 2:27 AM

#

Want to see how different countries move on the Fifa ranking table. The website https://www.fifa.com/fifa-world-ranking has the rankings. Can I build a small script to scrape this website say once a day and make that a daily updated dataset on Kaggle?

Looked around if something like that existed and don't see any datasets on rankings:
https://www.kaggle.com/datasets/?search=fifa

FIFA/Coca-Cola World ranking

Updated ranking table for Men's and Women's teams.

Find Open Datasets and Machine Learning Projects | Kaggle

Download Open Datasets on 1000s of Projects + Share Projects on One Platform. Explore Popular Topics Like Government, Sports, Medicine, Fintech, Food, More. Flexible Data Ingestion.

fallow peak Aug 16, 2023, 2:45 AM

#

hardy bay Want to see how different countries move on the Fifa ranking table. The website ...

Yes! You can definitely do this. You can use notebook scheduler features to create a pipeline that does this: https://www.kaggle.com/discussions/product-feedback/284587

[Notebooks update] Scheduled Notebooks Dataset Pipelines! | Kaggle

[Notebooks update] Scheduled Notebooks Dataset Pipelines!.

#

Here's more examples and documentation: https://www.kaggle.com/discussions/getting-started/293861

How to create a pipeline of scheduled and triggered notebooks | Ka...

How to create a pipeline of scheduled and triggered notebooks .

#

I used these features to do the same thing with Scrabble game data that I scraped from a website https://www.kaggle.com/datasets/mrisdal/scrabble-data-from-wooglesio

Scrabble data from woogles.io

Analyze games played by BasicBot

harsh quartz Aug 20, 2023, 3:52 PM

#

Excited to share my Cat vs Cat Loaf 100x100 RGB Image Classification dataset! 🐱🍞 Perfect for image classification tasks.
I'd greatly appreciate an upvote to support my work.
Check it out here: https://www.kaggle.com/datasets/erogluegemen/cat-catloaf-classification

This dataset contains a collection of Turkish dictionary definitions extracted from the official website of the Turkish Language Association (TDK). It provides comprehensive definitions for a wide range of Turkish words and phrases. I don't know if you are interested but upvotes are highly appreciated!
Dataset Link: https://www.kaggle.com/datasets/erogluegemen/tdk-turkish-words

You can check my both datasets!! I'm glad to hear your feedbacks🥳

Cat vs. Cat Loaf 100x100 RGB Image Classification

Cat vs. Loaf: 100x100 RGB Image Classifier 🐱🍞

TDK Turkish Words

Turkish Dictionary Definitions: Explore the Richness of the Turkish Language!

bold delta Aug 23, 2023, 2:27 PM

#

Did anyone catch the CatchBot Arena Conversations Dataset posted recently on Kaggle by UC Berkeley's LMSYS org? Its a pretty unique dataset which compares the output of two LLMs and includes human votes / preferences. If you're doing RLHF or comparing LLMs, might be worth a look!

Chatbot Arena Conversations

bold delta Aug 23, 2023, 2:31 PM

#

harsh quartz Excited to share my Cat vs Cat Loaf 100x100 RGB Image Classification dataset! 🐱...

Very nice! Turkce ogrenmiyorum ve Turkce veri bilimi projesi yapmayi dusunuyordum.

harsh quartz Aug 23, 2023, 2:39 PM

#

bold delta Very nice! Turkce ogrenmiyorum ve Turkce veri bilimi projesi yapmayi dusunuyordu...

ohhh glad to hear that😂 where are you from tho ?

bold delta Aug 23, 2023, 3:51 PM

#

harsh quartz ohhh glad to hear that😂 where are you from tho ?

Ben Amerikaliyim ama esim Turk.

harsh quartz Aug 23, 2023, 3:52 PM

#

ben türküm ama kız arkadaşım rus/alman 😂🥳

formal surgeBOT Aug 24, 2023, 7:48 PM

#

harsh quartz ben türküm ama kız arkadaşım rus/alman 😂🥳

The detected language is: Turkish (TR)

verbal jay Aug 26, 2023, 11:55 PM

#

Anyone ever used duckdb here?

serene tiger Sep 2, 2023, 7:09 AM

#

Hii!
I was wondering if someone could help me find a dataset that can be used for a project on water footprint calculator

#

I dont know if I phrased that right, since I am really new to ML, but any response is greatly appreciated !

candid python Sep 2, 2023, 9:56 AM

#

https://www.kaggle.com/datasets/yaranathakur/ipl-all-time-best-batsman
Go through my dataset recently published on kaggle

"All-Time-Best-Batsman.csv" provides a condensed compilation of statistical data and performance metrics from the Indian Premier League (IPL). Spanning multiple seasons, the dataset highlights key batting statistics of legendary cricketers, showcasing their contributions to the league's history. Whether you're a cricket aficionado or an analyst, "All-Time-Best-Batsman.csv" offers insights into the runs, strike rates, and many more that have defined the IPL's thrilling cricketing saga.

IPL All Time Best Batsman

Indian Premier League (IPL) All-Time Best Batsmen: A Decade of Dominance

swift bough Sep 2, 2023, 3:43 PM

#

hardy bay Want to see how different countries move on the Fifa ranking table. The website ...

This additional dataset might be a cherry on top : https://www.kaggle.com/datasets/pranav941/historical-fifa-world-cups-10-awards

Historical : FIFA World Cups 10+ Award Winners

Winners of 10+ FIFA World Cup Awards since 1930

ruby viper Sep 3, 2023, 5:27 AM

#

During my research intern, I have been working with a lot of Tabular Wikipedia Infobox Data. Now my work mostly revolves around the temporal aspect of this data, but I thought I could use my work done during this time to create a Dataset consisting of Wikipedia Infobox Data for all cricketer's found on Wikipedia.

So, here it is,
Link to the Cricketer Infobox Dataset: https://www.kaggle.com/datasets/varunnagpalspyz/uncover-cricket-legends-cricketers-wikidata
Link to the Notebook which contains code for clean and efficient extraction of Wikipedia Infoboxes in JSON format: https://www.kaggle.com/code/varunnagpalspyz/uncover-cricket-legends-data-extraction-with-ease/notebook

If anyone is working with such semi-structured data and is interested in taking up projects in this domain or knows of any work opportunities in this domain, do let me know.

Uncover Cricket Legends: Cricketer's WikiData

Extracting Cricketer's Infobox Data from Wikipedia with Ease!

Uncover Cricket Legends: Data Extraction with Ease

Explore and run machine learning code with Kaggle Notebooks | Using data from Uncover Cricket Legends: Cricketer's WikiData

maiden walrus Sep 3, 2023, 10:24 AM

#

Here is my dataset. Through it, you can see who is smoking or drinking. I think it's good for practicing classification methods. Have a good day.

Link : https://www.kaggle.com/datasets/sooyoungher/smoking-drinking-dataset

Smoking and Drinking Dataset with body signal

Predict smokers and drinkers using body signal data.

fallow peak Sep 3, 2023, 8:46 PM

#

Came across @uncut forum's awesome post about his fine tuning dataset for the LLM science exam competition. I'd love to make finetuning datasets and benchmarks for LLMs easier to use and more helpful on Kaggle. 🤔 https://www.kaggle.com/competitions/kaggle-llm-science-exam/discussion/436383

Kaggle - LLM Science Exam

Use LLMs to answer difficult science questions

shy gust Sep 4, 2023, 5:30 AM

#

Here is the data on Prison Statistics and Information India
https://www.kaggle.com/datasets/anshtanwar/prisoners-dataset

🚨Prison Statistics and Information India

Prison Statistics India (PSI) - 2020

tawdry nexus Sep 6, 2023, 3:35 PM

#

Hello, everyone. Do any of know where can I find sources to create my own dataset? I would like to create a project or dataset, where the it will predict the time a lettuce to grow based on temperature, humidity, tds value, ph level, and nutrient solutions in a controlled environment. Thank you in advance.

swift bough Sep 6, 2023, 10:02 PM

#

Sounds like a Hydroponics project you're working on,
I had a mini-hydroponics project going on a year ago whose data is on my kaggle - I had to pause it since I couldnt control the environment within budding season

tawdry nexus Sep 7, 2023, 12:46 AM

#

swift bough Sounds like a Hydroponics project you're working on, I had a mini-hydroponics pr...

Yes, I am planning to also create a hydroponics system where I can automate the independent variables mentioned above. May I know the name of your project on kaggle?

wanton topaz Sep 7, 2023, 10:24 PM

#

Turkey Earthquakes Data (1994-2023) . https://www.kaggle.com/datasets/ozgecinko/turkey-earthquake-data-1914-2023

This is my first dataset and I've just published it into Kaggle! Since my country is located in an earthquake-prone zone, I have been searching for data on this subject, and I look forward to working with this data very soon. I want to express my gratitude to this Discord server for providing this opportunity to share. 🫶

I would be happy to hear your feedbacks!

Turkey Earthquake Data (1914 - 2023)

Turkey earthquake data in the range from September 1914 to September 2023.

lucid flame Sep 8, 2023, 7:55 PM

#

Hello Kaggle Community!

Exciting news - I've just uploaded a multilabel tweet dataset containing three columns:

Tweet ID (String Format),
Tweet Text: The tweet's actual content ,
Labels: These cover a wide range of concerns, including effectiveness doubts and conspiracy theories.

Ideal for sentiment analysis, NLP, and multilabel classification, this dataset offers insights into diverse vaccine concerns shared on Twitter.
Explore it for your projects and research.

https://www.kaggle.com/datasets/prox37/twitter-multilabel-classification-dataset

Twitter Multilabel Classification Dataset

A Comprehensive Multilabel Twitter Classification Dataset

obsidian idol Sep 9, 2023, 9:02 AM

#

Hello Kaggle Community,

Can someone help me in understanding this dataset.
https://www.kaggle.com/datasets/polomarco/mitbih-with-synthetic?resource=download&select=mitbih_with_syntetic.csv

I am new to machine learning and need to work on this dataset for a project.

mitbih_with_synthetic

cosmic palm Sep 9, 2023, 9:36 AM

#

obsidian idol Hello Kaggle Community, Can someone help me in understanding this dataset. h...

@obsidian idol There are two notebooks that use that dataset. I suggest you go through them to understand the data.

#

https://www.kaggle.com/code/polomarco/ecg-classification-cnn-lstm-attention-mechanism

ECG Classification | CNN LSTM Attention Mechanism

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

#

https://www.kaggle.com/code/kerneler/starter-mitbih-with-synthetic-b3567817-7

Starter: mitbih_with_synthetic b3567817-7

Explore and run machine learning code with Kaggle Notebooks | Using data from mitbih_with_synthetic

obsidian idol Sep 9, 2023, 9:55 AM

#

cosmic palm <@476992345238142977> There are two notebooks that use that dataset. I suggest y...

i understood the dataset with this article https://medium.com/@protobioengineering/how-to-get-heart-data-from-the-mit-bih-arrhythmia-database-e452d4bf7215

Medium

How to Get Heart Data from the MIT-BIH Arrhythmia Database

Use Python to read the most famous heart rhythm (ECG) database in the world.

#

but i don't understand how can i convert an ECG into the same format to make predictions?

shadow river Sep 10, 2023, 6:09 AM

#

Hello there, I am currently working on JPN Comments Senti-Analysis Model for which I tried to find some good JPN comments datasets (ex: Tweets, YT Comments etc). Still, I couldn't come across a proper one so I wanted to ask if there are any datasets available out there based on this?
The only best dataset I could find related to this is: Kojima Hideo Tweets

shell basalt Sep 10, 2023, 10:02 AM

#

Hello Kaggle Community,

I'm currently working on a project analyzing two decades of Premier League soccer data with the goal of creating a predictive model. However, I'm new to soccer datasets. If anyone has experience or insights to share on soccer data analysis and regression modeling, I'd greatly appreciate your guidance.

Specifically, I'm interested in predicting full time outcomes from half-time data, and predictive modeling based on the historical data. Your tips, resources, or collaboration would be invaluable.

Please reply or reach out if you can help. Thank you!

lapis tiger Sep 15, 2023, 12:16 AM

#

shell basalt Hello Kaggle Community, I'm currently working on a project analyzing two decade...

How about analyzing / learning from notebooks created using this dataset?

https://www.kaggle.com/datasets/shilongzhuang/soccer-world-cup-challenge

⚽🏆 Soccer World Cup 2022 Prediction

Predict the Winning Team!!!

sand sage Sep 16, 2023, 9:17 PM

#

shell basalt Hello Kaggle Community, I'm currently working on a project analyzing two decade...

https://soccermatics.readthedocs.io/en/latest/ might give you some ideas

lusty shore Sep 19, 2023, 10:27 AM

#

Anyone has image dataset for tooth disease?

cobalt sonnet Sep 20, 2023, 6:22 AM

#

shell basalt Hello Kaggle Community, I'm currently working on a project analyzing two decade...

Do study the data well, in order to get better predictive results, having a grasp of its features and setting the right features together is key -- think about feature engineering

iron stone Sep 29, 2023, 11:10 AM

#

Hi.
I'm looking for some image dataset about small plastics on the beach.
The idea is identify plastic straw, candy wrap, popsicle packaging, plastic bag, bottle cap, plastic label, etc.
Can someone point me a project about or similar to this?
Thank you.

zinc dew Oct 1, 2023, 3:50 PM

#

How does dataset copyright work? Are datasets protected by copyright Do I have to give credits in my project if I am using someone’s dataset or are they all open?

shy gust Oct 2, 2023, 10:21 AM

#

zinc dew How does dataset copyright work? Are datasets protected by copyright Do I have ...

You may read the License section mentioned by the dataset creator. Most of them are open to use

zinc dew Oct 2, 2023, 12:26 PM

#

shy gust You may read the License section mentioned by the dataset creator. Most of them ...

Thanks

tough sleet Oct 8, 2023, 8:09 AM

#

i want data set for virtually try on clothes can anyone help me out this

idle crescent Oct 8, 2023, 2:49 PM

#

Hi everyone, i am new to kaggle and the world of data science, i have crewted my first dataset based on pictures, please check it out and upvote and give me advises cause i know I'll be needing them, i have just started out.

https://www.Kaggle.com/minhajalii/datasets

Minhaj Ali | Datasets Contributor

Computer Science Enthusiast.
def about me():
who_am_i = "if..elif..else, must not become obsolete!!"
what_i_like = "Love to discover and study new technologies and latest happenings in the field of CS"
skillset =["HTML","CSS","JavaScript","Wordpress","MS SQL Server","python",TensorFlow","Angular","TypeScript","Ionic","PHP"...

old cobalt Nov 1, 2023, 10:36 AM

#

Hey guys
Do try out my first dataset scrapped from a book store website using python and scrapy
Hope you will like it
Link : https://www.kaggle.com/datasets/bishop36/bookstore

bookstore

An online bookstore data scrapped using python

wanton wadi Nov 4, 2023, 11:29 PM

#

Hi everyone, I just created this dataset about Dota2 heroes using Python and BeautifulSoup. This is my first Dataset and planning to add more columns to it. It would be amazing if you could give me feedback, it would be highly appreciated.
https://www.kaggle.com/datasets/naqeebali/dota2-heroes-pick-rate-and-win-rate-by-mmr

Dota2 Heroes (Pick Rate and Win Rate by MMR)

This Dataset has Dota2 Heroes with their Pick rate (%) and Win rate(%) by MMR.

wanton wadi Nov 10, 2023, 3:15 AM

#

I have created a F1 2023 Season Dataset for all the f1 kagglers. I am planning to add more data from different sources to it. Would appreciate some help to find sources to use. https://www.kaggle.com/datasets/naqeebali/f1-2023-season-till-mexico

F1 2023 Season (till Mexico)

This dataset has current Driver's, Constructor Points and Fastest Lap Data

quasi plume Nov 10, 2023, 6:07 AM

#

how does this line of code work

wanton wadi Nov 10, 2023, 4:07 PM

#

quasi plume how does this line of code work

The data column will be in datetime type. You can use the date to index the results like df["2019-01"] without provided it the column name and the full date.

uneven marsh Nov 11, 2023, 8:21 AM

#

Does anyone have Braille english characters training dataset?

opaque birch Nov 13, 2023, 4:39 PM

#

🌍 Introducing "AFRICA: Soil Analysis for iSDAsoil Mapping" Dataset!

📊 Dive into the rich soil data of Africa with my meticulously curated dataset, crafted for iSDAsoil mapping exploration. This comprehensive collection offers a cleaned and structured repository of invaluable soil analysis.

🔗 Dataset URL: https://www.kaggle.com/datasets/agungpambudi/africa-soil-mapping-isdasoil-exploration

🌱 Why Explore This Dataset?

Clean: Ensuring accuracy and reliability
Rich Insights: Uncover a treasure trove of insights crucial for soil mapping and exploration within the African continent.
iSDAsoil Ready: Tailored for iSDAsoil analysis, this dataset simplifies the process for researchers, enthusiasts, and data-driven explorers.

🌟 Key Features:

Diverse soil attributes
Spatial and temporal data for comprehensive analysis
User-friendly, ready for immediate utilization in iSDAsoil tools
Join the journey to unravel the secrets hidden in Africa's soil! Whether you're a researcher, analyst, or enthusiast, this dataset is your gateway to valuable discoveries.

🌐 Don't miss out — Your feedback and contributions are highly welcomed!

AFRICA: Soil Analysis for iSDAsoil Mapping

In-depth soil analysis powering iSDAsoil maps across Africa.

tired bolt Nov 14, 2023, 10:52 PM

#

Hello everyone!

I've created a new dataset that contains school performance of high school students, as well as their demographic, social, parent, and study data.

If you're interested in education and predicting student outcomes I think you'll really enjoy this dataset! I look forward to seeing what you make with it!

https://www.kaggle.com/datasets/dillonmyrick/high-school-student-performance-and-demographics

High School Student Performance & Demographics

High school student perfomance as well as demographic, social, and parent data.

trim halo Nov 16, 2023, 9:18 AM

#

Hey everyone 🌞🤗, I recently wrapped up my final project for KaggleX Cohort. As part of my final project I created two datasets, which I would like to share with the community. The inspiration behind my project was to explore the representation of BIPOC in data science, and different aspects like gender-ratio, unemployment etc.
1. Tech Diversity Dataset: https://www.kaggle.com/datasets/snehilsanyal/tech-diversity-dataset
This is a collection of real diversity datasets collected from big tech companies' diversity reports from 2014-2023 (soon to be updated with other companies).
**2. US Data Scientist Demographics Data:**https://www.kaggle.com/datasets/snehilsanyal/us-data-scientist-demographics-data/
This dataset explores data scientist demographics data in US (race and ethnicity, gender-ratio, unemployment rate) from 2010-2021.

Please feel free to reach out in case of suggestions and feedback. I also plan to extend this dataset and explore features like dropouts in career, layoffs, career transitions and salary.

Tech Diversity Dataset

A collection of real diversity datasets collected from big tech reports (EDI)

US Data Scientist Demographics Data

Dataset on data scientist demographics, gender ratio, pay from United States.

harsh palm Nov 18, 2023, 12:23 PM

#

https://www.kaggle.com/datasets/ibrahimonmars/replit-bounties-dataset

Replit Bounties Dataset

Exploring the Coding Frontier: A Comprehensive Dataset from Replit Bounties

#

Hey, hi guys! The above dataset is extracted from Replit bounties section in order to help people know more about the freelancing market and pricing analysis based on the descriptions and titles of the bounties. The dataset can help normal folks like us to understand freelancing in a much more robust sense. Thank you

old cobalt Nov 19, 2023, 12:11 PM

#

https://www.kaggle.com/datasets/bishop36/jobs-data

Presenting to you my latest dataset on jobs which was scrapped from a website names timesjobs.com using Beautiful Soup

Hope you will like it

Jobs Data

A data scrapped using beautiful soup

limber cipher Nov 21, 2023, 7:09 AM

#

harsh palm Hey, hi guys! The above dataset is extracted from Replit bounties section in ord...

Thank you I will make sure to analyze the data

#

How does one create data

#

Surely data scraping is not the only way

old cobalt Nov 21, 2023, 7:13 AM

#

limber cipher Surely data scraping is not the only way

1 way U can create data using python by creating ur own conditions or analysing any website and then generating data using random function

muted dome Nov 21, 2023, 9:18 PM

#

Looking for facebook comment database for time series and sentiment aanalysis.

sinful loom Nov 22, 2023, 11:28 AM

#

Hello

indigo raven Nov 23, 2023, 12:33 PM

#

Is it normal for the icon of the .json file saved in kaggle to be marked as {i}? If not, how to solve it?

blazing nest Nov 23, 2023, 2:17 PM

#

Hi.
Can someone provide(or at least give a link to) a dataset like UCF crime dataset but with bigger pictures so that I maybe able to do the annotation for the images easily?
Thank you.

old cobalt Nov 26, 2023, 11:45 AM

#

https://www.kaggle.com/datasets/bishop36/hockeydata

Presenting to u my dataset scrapped from hockey website. Do give it a try.

HockeyData

Data scrapped from hockey website using Beautiful Soup

final lark Nov 29, 2023, 8:41 AM

#

https://www.kaggle.com/datasets/muhammadumermujahid/current-instability-in-pakistan

Lets find out the reasons of Instability in Pakistan

Survey about current Instability in Pakistan

Let's find out the reason for economi, political and financial crisis.

opaque birch Dec 1, 2023, 9:55 PM

#

📊 Discover Insights with Kaggle Datasets!

Hi folks, I have uploaded these datasets. If you have time, then check this out and upvote:

☕ Coffee Shop Sales Trends . https://www.kaggle.com/datasets/agungpambudi/trends-product-coffee-shop-sales-revenue-dataset

Explore revenue patterns and product trends to boost your coffee shop business.

🍽️ Global Restaurant Orders Analysis . https://www.kaggle.com/datasets/agungpambudi/analyzing-restaurant-orders-international-dataset

Optimize your menu and operations by delving into international restaurant order data.

✈️ Airline Loyalty Impact . https://www.kaggle.com/datasets/agungpambudi/airline-loyalty-campaign-program-impact-on-flights

Decode the impact of loyalty campaigns on flights and enhance customer experience.

🚗 NZ Vehicle Theft Patterns . https://www.kaggle.com/datasets/agungpambudi/nz-crime-chronicles-motor-vehicle-theft-patterns

Enhance community safety by analyzing motor vehicle theft patterns in New Zealand.

🔍 Unlock Actionable Insights Today!

Maven Roasters: Coffee Shop Sales & Revenue Data

Unveiling Trends: Time Analysis, Transaction & Revenue in Coffee Shop Sales Data

Analyzing International Restaurant Orders Dataset

Restaurant Orders Analysis: Demand Patterns & Popular Items. Refining Strategies

Airline Loyalty Campaign Program Impact on Flights

Analyzing the Impact of Airline Loyalty Campaign on Membership & Flight Bookings

NZ Crime Chronicles: Unveiling Motor Vehicle Theft

Exploring Patterns, Trends & Dynamics of Motor Vehicle Theft for Crime Analysis

half basalt Dec 4, 2023, 5:35 AM

#

Are you a space Enthusiast, coz I have a dataset for ya'll.
Here, check this out
https://www.kaggle.com/datasets/sujaykapadnis/every-known-satellite-orbiting-earth/

Every known satellite orbiting Earth

In-depth details on the 6,718 satellites currently orbiting Earth

quasi plume Dec 16, 2023, 7:08 PM

#

hi
I have a raw vocal of a song and I want to divide the song to map it with the lyrics based on the timestamps to create a dataset for my model. THe lyrics are as [00:28.27] फेरि त्यो दिन सम्झन चाहन्न
[00:33.20] त्यही कथा म दोहोर्याउन चाहन्न
[00:38.07] फेरि त्यो दिन सम्झन चाहन्न
[00:42.80] त्यही कथा म दोहोर्याउन चाहन्न
[00:47.64] माया यो आगो हो, पोल्छ, थाहा छ
[00:52.51] आफैलाई जलाउन चाहन्न
How do i do this?

unreal breach Dec 19, 2023, 11:22 PM

#

are there any data sets with the pharmaceutical drug names and active ingredients?

limber dew Dec 20, 2023, 2:16 PM

#

Hi there. I posted a question here: https://www.kaggle.com/competitions/titanic/discussion/462500 . Can any1 help me? It's about the titanic challenge (im using pytorch)

Titanic - Machine Learning from Disaster

Start here! Predict survival on the Titanic and get familiar with ML basics

sand sage Dec 21, 2023, 1:28 AM

#

limber dew Hi there. I posted a question here: https://www.kaggle.com/competitions/titanic/...

I would suggest you stay with models like xgboost/random forest for these tabular problems. Also the model is not going to learn anything with only 1 feature… you are better off doing a full exploratory data analysis before doing any modelling

limber dew Dec 21, 2023, 9:54 AM

#

sand sage I would suggest you stay with models like xgboost/random forest for these tabula...

I see. The only reason I chose to use torch is because it's required for a job I'm applying to. Do you think there's a middle ground? Or do I have to either choose a different library or a different competition? Also, I tried to use the same model with more preprocessed data, such as age (I filled the NaN values in a smart way) and still got the same results, does it indicate something?

#

Oh and another question: When I handle a binary problem, and I want to set a threshold to round the outputs of my model into 0's and 1's, is the mode of the results a good threshold?

sand sage Dec 21, 2023, 10:33 AM

#

limber dew I see. The only reason I chose to use torch is because it's required for a job I...

If the company you’re applying to needs PyTorch it is likely that they need it for unstructured data eg computer vision/nlp etc. you should find a competition that’s aligns with what they’re working on . (And if they are using neural nets for tabular problems I would be really skeptical of how good their DS team is)

As for the features - I only saw you using max 2 features in your code (but I might have missed something) the dataset has more fields than that so you should look into using all of them (but which ones to use can be guided by your EDA)

limber dew Dec 21, 2023, 10:38 AM

#

Well they actually using pytorch for untabular data, so your'e right about that :). Although I just looked up online and apperantly some people did reach results with NN and even torch. But I might take a different challenge. They probably recommended titanic to learn ML in general, not necessarily torch..

#

Also dw, I guided you to watch an attempt to use only 1 feature in the post. I just added now that I used also more features but it still didn't work.

#

I think I'll either learn random forests, or just switch to a benchmarked cv problem (I've heard about some). Thanks for the reply!

errant imp Dec 22, 2023, 4:43 AM

#

Hello fellow Kagglers,
This is a well-curated and high-quality dataset about Hate Speech on social medias. This could be used for your next NLP project - Hate Speech Detection 😉
I need your support and sincere feedback.
https://www.kaggle.com/datasets/waalbannyantudre/hate-speech-detection-curated-dataset

Hate Speech Dataset🤬

A curated dataset for hate speech detection on social media text

limber dew Dec 22, 2023, 5:58 AM

#

sand sage I would suggest you stay with models like xgboost/random forest for these tabula...

omg rf just worked with 80%! And it's so much easier to use than torch! Thank you!

jaunty pollen Feb 18, 2024, 8:27 PM

#

Any housing pricing data?

noble fractal Feb 18, 2024, 10:19 PM

#

Hello everyone,

Check out this new dataset I've discovered and published on Kaggle:

https://www.kaggle.com/datasets/cauelias/dam-data-to-risk-analysis

This dataset contains a vast amount of information about Brazilian mineral barriers. With 190 columns of rich data, it can be utilized in multiple applications. You can attempt to predict the risk associated with certain barriers, classify them based on the type of minerals, or even utilize regression techniques to analyze the volume.

Take a look and explore the possibilities!

Dam Data to Risk Analysis

Data of Brazilians Dams

prisma path Feb 21, 2024, 6:01 PM

#

I need a dataset someone send me dataset without categorical values

cosmic palm Feb 21, 2024, 8:45 PM

#

prisma path I need a dataset someone send me dataset without categorical values

There is a machine learning database with all kinds of datasets at https://archive.ics.uci.edu/ The way you are asking for it is not how it should be done.

UCI Machine Learning Repository

Discover datasets around the world!

prisma path Feb 21, 2024, 8:46 PM

#

cosmic palm There is a machine learning database with all kinds of datasets at https://archi...

I'm new to this kindly let me know what i should you know

cosmic palm Feb 21, 2024, 9:03 PM

#

prisma path I'm new to this kindly let me know what i should you know

No mystery about it - just a basic courtesy like in any other human interaction. Would you walk into a party where you don't know anyone and say to a room full of strangers "I need a beer, someone give me a beer?"

prisma path Mar 2, 2024, 7:59 AM

#

cosmic palm No mystery about it - just a basic courtesy like in any other human interaction....

Surely no

old cobalt Mar 5, 2024, 1:38 PM

#

Hi guys try this data scrapped from microsoft weather site

📎 Weather_Data.csv 📎 Weather_Data.json

#

If found useful do comment

zinc dew Mar 6, 2024, 1:44 PM

#

Anyone ever taken a look at what the MMLU datasets look like?

restive thicket Mar 28, 2024, 2:49 PM

#

Noticed stanford dogs has a lot of b&w and color pop images: https://www.kaggle.com/datasets/jessicali9530/stanford-dogs-dataset/discussion/486660
Filtering them out appears to improve the accuracy of the dataset.
I copied another highly rated notebook https://www.kaggle.com/code/devang/transfer-learning-with-keras-and-efficientnets added the b&w and color pop filtering and ran it https://www.kaggle.com/code/cosmicbee/transfer-learning-with-keras-and-efficientnets (perhaps some bug? although the data seems distinct so seems unclear, it does converge very fast to a close point)
This other model I tried had a less noticeable improvement: https://www.kaggle.com/code/cosmicbee/dog-breeds-classifier although it did converge faster to that point.

restive thicket Mar 28, 2024, 3:00 PM

#

restive thicket Noticed stanford dogs has a lot of b&w and color pop images: https://www.kaggle....

Worth giving it a try on your own notebooks if anyone else has a stanford dogs one and seeing if it improves a decent amount. I tried searching the model for sepia but only found sepia shaded (naturally) dogs on white backgrounds so I think this may be the lot of them. Tried to randomly sample some folders as well to see if I could find any other oddities.

restive thicket Mar 28, 2024, 3:49 PM

#

spiral shoal Apr 9, 2024, 9:02 AM

#

old cobalt Hi guys try this data scrapped from microsoft weather site

do u know how to scrape data from x without the api

#

i need a dataset of tweets posted by different people.i am a student and i can't afford the api😭😭😭

sharp plume Apr 10, 2024, 9:24 AM

#

hi i am looking for large medical data with cost

spiral shoal Apr 10, 2024, 6:03 PM

#

hello, does anyone konw how to be a master in dataset making

sharp plume Apr 14, 2024, 5:46 AM

#

i think the idea is simple you need data that is real highly quality and depending on your goal that might be information from official resources or book related to the subject matter will organise and and if conversational the conversation need to be focused on one topic with clear between each conversation

#

but the more specific with your goal you are and the more precise information an ideal set For professional use cover all the possible scenario for the profession and all knowledge of master profession

zinc dew Apr 14, 2024, 7:53 AM

#

Hi guys!
uploaded this new dataset which wasnt previously on kaggle!
https://www.kaggle.com/datasets/harshwardhanfartale/eyes-defy-anemia

please show some support by upvoting

EYES-DEFY-ANEMIA

Dataset that contain eye conjunctiva photos of Indian and Italian patients

neat crypt Apr 15, 2024, 9:54 PM

#

Does anyone know where I could find data sets on air quality by year and state?? if you do please replay to this I would love to know :D

old cobalt Apr 18, 2024, 2:45 PM

#

spiral shoal do u know how to scrape data from x without the api

Use scrapy library

#

https://rapidapi.com/nitishpkv/api/live-weather-data

RapidAPI

Live Weather Data API Documentation (nitishpkv) | RapidAPI

Returns the weather data scrapped from websites

#

Okay sorry maybe a typo

#

Just wanted to say try out my first api

#

Let me know how it is

spiral shoal Apr 19, 2024, 3:25 PM

#

thanks

wild echo Apr 19, 2024, 5:56 PM

#

I'm working on something that would do RAG and entity resolution on a companies internal documents, the issue is that companies don't normally like to make that available. Any datasets that simulate that? Especially lots of documents with partial context? So far I'm looking at the Enron email dataset but are there any others? Maybe documents from collaboration on open source projects?

stark wasp Apr 21, 2024, 5:35 PM

#

Hi everyone! Sharing this dataset I've just created about Barcelona's public bike sharing stations (Bicing). It contains time series data over 6 years of electrical and mechanical bikes availability among other metadata and information. It has 250 million rows (16GB in total):
https://www.kaggle.com/datasets/edomingo/bicing-stations-dataset-bcn-bike-sharing/

🚴🗃️ BCN Bike Sharing Dataset - Bicing Stations

Information from the Bicining bike sharing stations of Barcelona over 6 years

autumn turtle Apr 22, 2024, 7:35 AM

#

stark wasp Hi everyone! Sharing this dataset I've just created about Barcelona's public bik...

16 GB sheesh. My laptop would get cooked training a model on that size dataset 😬

stark wasp Apr 22, 2024, 7:45 AM

#

yeah! it's even challenging to load all csvs together in a Kaggle Notebook ram with pandas without doing some preprocessing or trade-offs

silk summit Apr 25, 2024, 1:40 PM

#

For those interested in Medical Image Segmentation, I'm sharing two preprocessed benchmark datasets for cardiac segmentation.
Additionally, weakly-supervised learning, particularly scribble-supervised learning, has been gaining popularity in recent years. This is due to the high cost and difficulty of traditional labeling, especially in the medical field where data sensitivity is paramount.
Therefore, each image in my datasets also comes with corresponding scribble labels, facilitating superior learning in cardiac segmentation.
Moreover, I've included notebooks to guide you on how to load and visualize the data. To learn more about these datasets and access the code, feel free to visit the links below.
https://www.kaggle.com/datasets/anhoangvo/acdc-dataset
https://www.kaggle.com/datasets/anhoangvo/mscmrseg

ACDC Dataset

Data for Automated Cardiac Diagnosis Challenge (ACDC) [Segmentation Task]

MSCMRSeg

Data for MS-CMRSeg 2019: Multi-sequence Cardiac MR Segmentation Challenge

old cobalt Apr 30, 2024, 12:56 PM

#

Hey any of guys know where we can sell data?

#

Even this also projects where can we sell and all?

opaque birch May 5, 2024, 5:44 AM

#

https://www.kaggle.com/datasets/agungpambudi/crm-sales-predictive-analytics

Predictive Analytics for CRM Sales Performance

Uncover Patterns and Trends for Enhanced Sales Strategies

solar kraken May 5, 2024, 2:49 PM

#

solar kraken May 5, 2024, 5:24 PM

#

Hey all I started my YouTube Journey and have created some content. I could really use some community support. Please check out what I have and leave some feedback.

https://www.youtube.com/@timastras1120

YouTube

Tim Astras

solar kraken May 7, 2024, 10:55 PM

#

https://new.express.adobe.com/id/urn:aaid:sc:US:788c52ad-c068-5231-a885-aaa9edef6eaf?invite=true&promoid=Z2G1FQKR&mv=other

Adobe Express

Make Reels and TikTok videos, flyers, resumes, banners, logos, and more with the all-in-one app for fast and easy content creation.

lone anchorBOT May 12, 2024, 4:00 AM

#

freeman3672 has been warned

Reason: Bad word usage

old cobalt May 12, 2024, 4:01 AM

#

why am I getting this warning?

#

📎 Crypto.json

#

📎 Crypto.csv

lone anchorBOT May 12, 2024, 4:01 AM

#

freeman3672 has been warned

Reason: Bad word usage

#

freeman3672 has been banned

Reason: Too many infractions

silk summit May 16, 2024, 2:25 PM

#

A preprocessed dataset for CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation. The dataset also contain scribble label for weakly-supervised learning. In additional, i also give a notebook to show how to loading and visualization the dataset. Please upvote my dataset, notebook and leave a comment for me if you liked it.

Dataset:
https://www.kaggle.com/datasets/anhoangvo/chaos-t1-and-t2
Notebook for loading and visualization dataset:
https://www.kaggle.com/code/anhoangvo/chaos-dataset-loading-and-visualization

CHAOS_T1&T2

Data for CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation

CHAOS Dataset: Loading and Visualization

Explore and run machine learning code with Kaggle Notebooks | Using data from CHAOS_T1&T2

still veldt May 19, 2024, 7:32 PM

#

Hi all, posted a new dataset, would appreciate if you could go check out and provide suggestions or feedback
https://www.kaggle.com/datasets/jainaru/financial-aid-to-ukraine

Ukraine Financial Aid 2024 (Russia-Ukraine War)

Financial, Humanitarian, and Military commitments made to Ukraine by countries

glad kayak May 27, 2024, 7:32 AM

#

Hello , I’ve just posted a new dataset and would love to receive your feedback
Company Document Dataset :
some Docs Generated from data northwind,

usage : NLP , Text Classification ,Text Generation , Information Retrieval
https://www.kaggle.com/datasets/ayoubcherguelaine/company-documents-dataset

Company Documents Dataset

Company Documents Dataset for Classification and Information Retrieval

modern glen Jun 8, 2024, 11:06 AM

#

Hi Guys
I see these days that a Kaggle Grandmasters title has lots of importance on our resume. So I have formed a group where we can help each other in achieving the grandmaster title.
Please join me here:
https://chat.whatsapp.com/JoYLv3VvZjL8So8WWOoWlM

WhatsApp.com

Kaggle Grandmasters

WhatsApp Group Invite

digital hedge Jun 10, 2024, 12:08 PM

#

🚀 Attention Kagglers! Two new medical datasets are now available for your machine learning projects. Dive into the 🩺📊 Cancer Prediction Dataset to develop models for predicting cancer, and explore the 📊 Predict Liver Disease: 1700 Records Dataset to tackle liver disease prediction. Start experimenting and drive impactful healthcare solutions!

🩺📊 Cancer Prediction Dataset 🌟🔬

Predicting Cancer Risk from Medical and Lifestyle Data

📊 Predict Liver Disease: 1700 Records Dataset

Unveiling Patterns and Predictors in Liver Health

digital hedge Jun 11, 2024, 5:00 PM

#

Attention Kagglers! We are excited to announce the addition of two new datasets for your machine learning projects. These medical datasets are invaluable for health-related data analysis and predictive modeling:

🩸 Diabetes Health Dataset Analysis 🩸
Dive into comprehensive diabetes health data to uncover patterns and insights. This dataset is perfect for those looking to explore the factors influencing diabetes and develop predictive models.
- Explore the Diabetes Health Dataset
🩺 Chronic Kidney Disease Dataset Analysis 🩺
Analyze extensive data related to chronic kidney disease. This dataset offers a rich source of information for understanding the complexities of kidney health and creating impactful machine learning solutions.
- Discover the Chronic Kidney Disease Dataset

Don't miss out on these valuable resources to enhance your data science projects and contribute to the medical field with innovative solutions. Happy analyzing!

🩸Diabetes Health Dataset Analysis🩸

Detailed Health Information for Diabetes Research

🩺 Chronic Kidney Disease Dataset Analysis 🩺

Comprehensive Health Data for Chronic Kidney Disease Analysis

harsh cairn Jun 11, 2024, 9:17 PM

#

Hello guys I have created a Data set on Kaggle "YouTube Dataset of all Data Science Channels" consist of all the Data Science educators YouTube channel details and there content details, check it out and if you like it please upvote and comment https://www.kaggle.com/datasets/abhishek0032/youtube-dataset-all-data-scienceanalyst-channels/data

YouTube Dataset of all Data Science Channels🎓🧾

🎊A comprehensive dataset of YouTube channels focusing on data science🧑‍💻🧑‍💻

digital hedge Jun 12, 2024, 9:27 PM

#

I'm excited to announce that two new datasets are now available for you to explore and use in your projects!

🌍 Air Quality and Health Impact Dataset 🌍
Dive into the intricate relationship between air quality and public health. This dataset provides detailed information on air pollutants and their impact on health outcomes. Perfect for those interested in environmental science, public health, and data analysis.
🔗 Explore the Air Quality and Health Impact Dataset

📚 Students Performance Dataset 📚
Uncover the factors influencing students' academic performance. This dataset includes variables such as socioeconomic status, parental education levels, and more. Ideal for education analysts, data scientists, and anyone passionate about improving educational outcomes.
🔗 Explore the Students Performance Dataset

Feel free to dive in, analyze, and share your insights. Happy Kaggle-ing!

Best regards,
Rabie El Kharoua

🌍 Air Quality and Health Impact Dataset🌍

Understanding the Effects of Air Quality on Public Health

📚 Students Performance Dataset 📚

Academic Success Factors in High School Students

sterile pivot Jul 2, 2024, 2:19 AM

#

https://www.kaggle.com/datasets/rauf111/food-recipes

Food Recipes

autumn kite Jul 11, 2024, 6:21 AM

#

Hey can anyone help me how I would I make my own new dataset
I completed ml and I am very much introduced to the thing's but for making dataset from where we decide.colums and rows and specially feature in it and its data values
I am confused so anyone who makes dataset can HELP me please

#

https://github.com/Utkgitdev-07

GitHub

Utkgitdev-07 - Overview

Utkgitdev-07 has 19 repositories available. Follow their code on GitHub.

#

https://www.kaggle.com/utkarshyadav07

Utkarsh Yadav | Contributor

Kaggle profile for Utkarsh Yadav

#

You can follow me guys so we can work together

boreal vault Jul 11, 2024, 5:37 PM

#

New dataset for analysis and additional metrics:

https://www.kaggle.com/datasets/aaronnorman/global-prosperity-indices/data

Global Prosperity Indices 🌏🌎🌍💎

Beyond GDP: Discover What Truly Defines a Country's Prosperity

rustic mango Jul 15, 2024, 3:24 PM

#

https://www.kaggle.com/datasets/sahityasetu/best-snapchat-discover-stories

Best Snapchat Discover Stories

7,397 US Millennials rank best Snapchat Discover stories. Powered by Whatsgoodly

#

https://www.kaggle.com/datasets/sahityasetu/crime-data-in-los-angeles-2020-to-present

Crime Data in Los Angeles (2020 to Present)

This dataset reflects incidents of crime in the City of Los Angeles dating back

opaque birch Aug 3, 2024, 8:35 PM

#

🌟 Unlock the Power of MNIST: Comprehensive Analysis of Multiple Datasets!

Discover the ultimate resource for machine learning enthusiasts and data scientists! Dive into MNIST Multiple Dataset Comprehensive Analysis on Kaggle. This dataset provides a detailed comparison and in-depth analysis of various MNIST datasets, making it an invaluable tool for your next project.

💡 Helps us continue to provide high-quality, valuable data to the Kaggle community.

Explore the Dataset at https://www.kaggle.com/datasets/agungpambudi/mnist-multiple-dataset-comprehensive-analysis

MNIST Multiple Dataset for Comprehensive Analysis

Exploring the Effectiveness of Various MNIST Datasets

rustic mango Aug 7, 2024, 11:00 AM

#

https://www.kaggle.com/datasets/sahityasetu/u-s-chronic-disease-indicators-cdi

U.S. Chronic Disease Indicators (CDI)

CDC's Division of Population Health provides cross-cutting set of 124 indicators

worn plinth Aug 7, 2024, 11:41 AM

#

https://www.kaggle.com/datasets/syednazmussakib/eeg-eye-state-dataset

EEG Eye State Dataset

Analyzing Brainwave Patterns for Eye State Detection

opaque birch Aug 13, 2024, 2:47 PM

#

Explore the trends that could help drive coffee shop's dataset.

https://www.kaggle.com/datasets/agungpambudi/trends-product-coffee-shop-sales-revenue-dataset

Maven Roasters: Coffee Shop Sales & Revenue Data

Unveiling Trends: Time Analysis, Transaction & Revenue in Coffee Shop Sales Data

indigo cosmos Aug 14, 2024, 2:48 AM

#

rustic mango https://www.kaggle.com/datasets/sahityasetu/crime-data-in-los-angeles-2020-to-pr...

This is very interesting since I live in Los Angeles. Thanks for sharing. 🙂

rustic mango Aug 14, 2024, 1:33 PM

#

Please do also check out my datasets many of them are US datasets

#

https://www.kaggle.com/sahityasetu/datasets

Sahitya Setu | Datasets Master

Hello and Welcome to my profile. 👋

I am deeply passionate about analytics, business management, and leveraging data to solve intricate problems and achieve business objectives. My MBA journey equipped me with the tools and technologies essential for addressing business challenges through data analytics, while my BBA broadened my business domain...

worn dock Aug 15, 2024, 8:45 AM

#

Hi guys working on my first DE project and I need some advice-

Scenario-

I have a pipeline loading data from Postgres to BigQuery using Python on GCP cloud function . It loads into a staging table and merges into a production table for further analysis. I would like to accommodate:

Incremental loading
Changes in the source database, such as (UPDATE, DELETE, INSERT), should replicate in the destination warehouse.
-Incase a column name changes or added it should also replicate)

From your experience what’s the best robust & Scalable way to approach this .

Open to suggestions 🙏🏻🙏🏻🙏🏻

rustic mango Aug 18, 2024, 9:34 PM

#

https://www.kaggle.com/datasets/sahityasetu/best-snapchat-discover-stories

Best Snapchat Discover Stories

7,397 US Millennials rank best Snapchat Discover stories. Powered by Whatsgoodly

rustic mango Aug 23, 2024, 5:16 AM

#

https://www.kaggle.com/datasets/sahityasetu/motor-vehicle-collisions-crashes-usa

Motor Vehicle Collisions - Crashes: USA

Motor Vehicle Collisions crash table

full kayak Sep 11, 2024, 7:31 PM

#

Hey everyone
Looking for a dataset of some educational institution .. wanna analyze marketing trends ..
any suggestions!!!!

whole quiver Sep 28, 2024, 1:22 PM

#

https://www.kaggle.com/datasets/davinascimento/chicago-arrest-records

Chicago Arrest Records

Dataset about arrests executed by the Chicago Police Department (CPD)

pearl drum Oct 6, 2024, 3:33 PM

#

https://www.kaggle.com/code/osmanural/diabetesppredictions/edit/run/199759303

DiabetespPredictions

Explore and run machine learning code with Kaggle Notebooks | Using data from Diabetes

wicked needle Oct 8, 2024, 4:39 PM

#

https://www.kaggle.com/datasets/sandiledesmondmfazi/electric-vehicle-population-dataset

Electric Vehicle Population Dataset

EV Population Dataset by Washington State Department of Licensing (DOL)

slim rapids Oct 12, 2024, 4:36 PM

#

Hello everyone
Does anyone know from where can I find a dataset about type 2 supernova?

elder seal Oct 15, 2024, 4:48 PM

#

Hi am working on a prototype of a motion sensor with an api to extract information already labelled via wifi on real time, a data collector but smaller, so it doesnt biased the movement with the additional weight. Do you guys have any suggestions based on your experience any kind of additional features will be best?

zinc dew Nov 13, 2024, 6:02 PM

#

https://www.kaggle.com/datasets/ironwolf437/laptop-price-dataset

Laptop Price - dataset‏

Uncovering the Correlation Between Features and Pricing

zinc dew Nov 13, 2024, 6:07 PM

#

elder seal Hi am working on a prototype of a motion sensor with an api to extract informati...

What is the application that uses motion sensor?

zinc dew Nov 21, 2024, 3:03 PM

#

https://www.kaggle.com/datasets/ironwolf437/who-covid-19-cases-dataset

WHO COVID-19 cases - dataset

Uncovering the Correlation Between cases and between infection areas

west nimbus Nov 23, 2024, 7:07 PM

#

https://tenor.com/view/date-everywhere-data-digital-marketing-gif-24166770

Tenor

bitter island Nov 29, 2024, 2:51 PM

#

Hello everyone, I'm looking for a disasters tweet datasets within the last 3-5 years. Any suggestion? thanks

zinc dew Dec 1, 2024, 12:08 PM

#

https://www.kaggle.com/datasets/ironwolf437/electric-vehicle-population-in-usa

Electric Vehicle Population in USA

predict the type of electric vehicle based on its various characteristics

rugged cosmos Dec 7, 2024, 5:34 AM

#

Hello everyone, I'm looking for metadata on Alzheimer's disease ( MRI and PET).

brisk raft Dec 14, 2024, 7:16 PM

#

**New dataset Alert! **

Global Ease of Doing Business Dataset (2010-2019)

🔗 Check it out here!

This dataset, sourced from the World Bank, encapsulates key indicators related to the ease of doing business in various countries. Covering metrics such as construction permits, costs, and regulatory compliance from 2010 to 2019, it offers a comprehensive view of how countries have evolved in their business environments.

Global Ease of Doing Business Dataset (2010-2019)

Business Landscape Across Global Economies

zinc dew Dec 17, 2024, 7:18 PM

#

I want a dataset for the genetic disorder classification for my project where the user can type the symptons and AI can use the dataset to get which genetic disorder is

#

so if anyone can help me I be thankful

opal garden Dec 21, 2024, 3:36 AM

#

https://www.kaggle.com/datasets/sweetymahale/job-listing-dataset Check Out the Dataset

Job Listing Dataset

slate wigeon Jan 6, 2025, 5:21 AM

#

Hey everyone! 👋

I've just published a new dataset on Kaggle: Resume Dataset.

This dataset contains a collection of resumes in both PDF and text formats, ideal for projects involving data extraction, natural language processing, and machine learning.

If you're interested in exploring or contributing to this dataset, check it out here: Dataset Link

Looking forward to your feedback and seeing the amazing projects you build with it!

#DataScience #MachineLearning #nlp #Kaggle #ResumeDataset

Resume Dataset

A Versatile Dataset for Resume Parsing, Candidate Profiling, and Job Matching

cerulean matrix Jan 9, 2025, 4:04 AM

#

Hi, everybody.
We're building news analysis models and need to collect news data of 20 years.
Is there anybody who knows news data service well?
Please tell me.

scarlet bobcat Jan 9, 2025, 12:00 PM

#

'ello people CH_PikaWave
I am looking for a dataset or API which has every (or most) fictional characters and their pic in it like characters from

anime (required)
movies & novels (required)
games (optional)
If you can find said dataset or API, then ping please. Thanks!

vocal grove Jan 9, 2025, 4:34 PM

#

Hey everyone, This is my Conversational English-Malayalam Dataset, designed for transformer-based models! Unlike other datasets, this one is error-free and not generated using tools like Google Translate. It’s crafted with real, natural conversations to provide authentic, high-quality data for tasks like machine translation, multilingual NLP, and sentiment analysis. Perfectly tailored for models like BERT, GPT, and others, it’s a game-changer for those seeking context-rich dialogues for training. Check it out on Kaggle and see the difference it can make in your projects. Let’s build smarter models together! 🎉 https://www.kaggle.com/datasets/nihalthomas15/lang-trans-eng-malayalam

Malayalam-English NLP

Malayalam-English dataset for NLP

zinc dew Jan 11, 2025, 1:37 PM

#

Hi everyone 😀 !
I’m currently working on building a hybrid LSTM-XGBoost model to predict the CEDCOS score (an overcrowding score for emergency departments) with an hourly prediction horizon of 10 hours.
To enhance the model's accuracy, I’m looking for a reliable retrograde dataset and an open-source API that provides real-time data for flu, RSV, and COVID-19, specifically for Europe (Belgium)🧐.
Currently, my model integrates internal hospital variables along with external factors like weather, traffic, and events, which have already yielded reliable results. However, I believe incorporating infection data could significantly improve the model's performance.
I’ve explored several sources, including:
The WHO
Respicast Forecaster (respicast.ecdc.europa.eu/forecasts) – they provide some data through their GitHub.
But I’m still on the lookout for other options. Has anyone worked with such data or APIs before? Any recommendations, sources, or suggestions for reliable datasets or live feeds would be greatly appreciated!

opaque birch Jan 29, 2025, 4:28 PM

#

https://www.kaggle.com/datasets/agungpambudi/predict-manufacturing-downtime-performance-dataset

Manufacturing Efficiency in Downtime Operations

Explore efficiency, downtime factors, and operator performance in manufacturing.

small sable Feb 4, 2025, 3:12 PM

#

My first dataset on kaggle.
Nepal Premier League 2024 ball by ball data
Would love any suggestions.
https://www.kaggle.com/datasets/samarpanrai/nepal-premier-league-2024-ball-by-ball-data

Nepal Premier League 2024: Ball-by-Ball Stats

Detailed Ball-by-Ball Cricket Stats

kindred kiln Mar 7, 2025, 2:03 AM

#

I'm looking for credible datasets for my projects cardiac arrhythmia, if you have data or you know any sources lemme know thanks btw I'll use it for research purposes, and data should be latest

languid anchor Apr 6, 2025, 3:36 PM

#

How do people publish their datasets?
I'm curious to understand how people go about publishing datasets. Do they generate the data themselves, or do they collect it from somewhere else? If it's the latter, where do they usually get their data from?

harsh quartz May 3, 2025, 2:21 PM

#

Hi everyone,

I’ve just published a dataset of Turkey’s postal codes, and I wanted to share it here in case it’s useful for your geospatial, NLP, or logistics-related projects.

What’s inside:
• Covers 81 provinces, 973 districts, and 73,000+ rows
• Organized by province, district, sub-region, and neighborhood
• Available in CSV and Excel formats
• UTF-8-sig encoded, ready for use with pandas, geopandas, map visualizations, and more

🔗 Dataset link: https://www.kaggle.com/datasets/erogluegemen/turkey-postal-codes-dataset-2025

📮 Turkey Postal Codes Dataset (2025)

A comprehensive dataset of all postal codes in Turkey

robust ginkgo May 15, 2025, 9:53 PM

#

any social media data sets?

queen steeple May 16, 2025, 3:45 PM

#

Hi guys, I am writing on RAG LLM project and unable to find small dataset.
Tha dataset I am getting is having 2m or 45k rows.
If anyone has Stackoverflow questions data with less than 30k rows, pls share the link.
Thankyou

queen steeple May 16, 2025, 4:23 PM

#

fallow peak We recently worked with Stack Overflow to get all of their annual developer surv...

Hey Meag, thanks for sharing data. Something I am not clear is I have to merge both survey_result_public.csv and survey_results_schema.csv for rag LLM project

sharp root Jul 10, 2025, 10:31 PM

#

Does anyone know where the brand new "Template for Transparency in AI Model Training Data" can be found? It was supposedly published today... but when I look I can only find a different CCIA document from January.

outer stone Jul 12, 2025, 10:42 AM

#

Good day to you, I need help please, I have a Data analysis project where I have to analyze 12 dataset for a year. This is my first time taking a project by myself and I don't know where to get started.. should I group the dataset quarterly or do the analysis for each month then combine everything?.. please help 🙏🏻. The tool I will be using is Excel

keen zodiac Jul 21, 2025, 3:29 PM

#

hey guys
i just like created a small webapp which takes pdf as input and u can prompt it "extract (some data) out of this" and it extracts that data andcreates a dataset downloadable in csv excel andjson
i just created it today, i would love to have people try it out and give their opinions on how it could be better, dm me and ill send u the repository link, completely free just try it out and tell me

keen zodiac Jul 22, 2025, 9:37 AM

#

👋 I just built a free tool that turns any PDF, image, or Word doc into a clean dataset using just a prompt — kinda like ChatGPT but for messy files.

Want to give it a quick try and tell me what’s broken or missing? Takes 2 mins. Would love your feedback 🙏
👉 https://pdf2dataset.streamlit.app

pseudo flax Aug 13, 2025, 11:50 AM

#

keen zodiac 👋 I just built a free tool that turns any PDF, image, or Word doc into a clean ...

good day, I get import error on nougat and cv2

crude obsidian Aug 14, 2025, 9:33 PM

#

Hey guys, I just placed my custom made FRIDAY from Marvel Conversation dataset for LLMs on Kaggle.
Its in ChatML format so mostly all models are fine tunable on the dataset
👉 Kaggle: https://www.kaggle.com/datasets/prakhar231/friday-from-marvel-conversations-for-llms
👉 Hugging Face : https://huggingface.co/datasets/git-prakhar/FRIDAY-from-Marvel-Conversations

FRIDAY from Marvel (Conversations) for LLMs

1K+ ChatML dataset for training FRIDAY-like conversational AI assistants

git-prakhar/FRIDAY-from-Marvel-Conversations · Datasets at Hugging...

silver wyvern Sep 13, 2025, 7:12 PM

#

I am Kishor J K

I just Published a dataset on "Vodafone customer churn data"
This data was provided to us as part of hackathon and i taught it would be good idea to share it.

I have also published my note book where I did EDA, visualization and prediction using this dataset.

Dataset link: https://t.co/1CGAgATDCF
Notebook link: https://t.co/VYBEoX5lUa

slate magnet Sep 15, 2025, 6:57 PM

#

Hi, @everybody
I have one question, I'm training ml models for the prediction, which is classification problem of 3 classes, where the number of samples are similar but the predition is skewed.
First class and second class is predicted with low precision tough, third class is never predicted. What's the reason? I can' t find the reason.
Before, when I applyed reinforcement learning, where the three classes were assigned to three actions and one action is never selected, too.
Actually, that is the preeiction model of forex eur/usd.

distant nest Oct 1, 2025, 5:09 PM

#

slate magnet Hi, @everybody I have one question, I'm training ml models for the prediction, w...

I have had this same issue with forex. If it’s the hold that is never selected, try using a gate for long and another for short, and NOT one for hold. That way you can put a threshold on ur gates and ur good.

If its a buy or sell, thats not happening, I would try a different pair that’s obviously trending the other direction, and make sure your rewards n penalties are the same for buy n sells, so there is no bias.

#

Oh wow that’s an old post, lol my bad

slate magnet Oct 2, 2025, 2:14 AM

#

distant nest I have had this same issue with forex. If it’s the hold that is never selected, ...

Thanks for your response, I will try that.
Did you have a good result of forex trading?

distant nest Oct 2, 2025, 11:21 AM

#

slate magnet Thanks for your response, I will try that. Did you have a good result of forex t...

Not consistently, tbh. But I got past that snag recently. Right now I’m having to revisit feature engineering.

slate magnet Oct 2, 2025, 12:29 PM

#

distant nest Not consistently, tbh. But I got past that snag recently. Right now I’m having t...

Right, I think feature engineering is important I observed improvement according to the features.

stuck gust Oct 10, 2025, 7:06 AM

#

I needed to build a dataset for roblox game player counts what are the best sites?'

glass hamlet Oct 17, 2025, 9:25 PM

#

Hey, what is the best practice to deploy Ml models for free? Should I go with hugging face or Render ?

brisk condor Nov 3, 2025, 10:37 AM

#

📘 New Kaggle Notebook Released!
🔍 Analysis of All India Pincode Directory (2025)
Explore the complete dataset of Indian PIN codes — geolocation, postal divisions, states, and more.
🔗 https://www.kaggle.com/code/shibin007/analysis-of-all-india-pincode-directory-2025

Dive into different states, map the data, and use this for logistics, mapping, or ML projects!

thorn grotto Nov 6, 2025, 9:18 PM

#

Balanced & Highly Usable Social Media Extremism dataset: https://www.kaggle.com/datasets/adityasureshgithub/digital-extremism-detection-curated-dataset

hybrid vortex Nov 7, 2025, 9:08 PM

#

Hello All ! 👋
I just published the Cassandra Employee Dataset — a massive 50,000-row dataset perfect for Regression, Classification, Clustering, and EDA.
Super clean, ML-ready, and has a 10/10 usability score. Great for building real-world ML projects. 🚀
Do hit an upvote on the dataset 😁

https://www.kaggle.com/datasets/rockyt07/cassandra-employee-dataset

tawdry hedge Nov 8, 2025, 2:01 PM

#

Hello, where can i find cebuano text corpus/audio datasets available?

versed temple Nov 9, 2025, 12:42 PM

#

https://media.discordapp.net/attachments/1436719817624256534/1436719913518633010/1.JPG?ex=6910a130&is=690f4fb0&hm=6a48397700e40b701b7defba0bc73ccc590e83e58af09eb7035cae318e9fb319&=&format=webp&width=515&height=687
https://media.discordapp.net/attachments/1436719817624256534/1436719914034659408/2.jpg?ex=6910a130&is=690f4fb0&hm=5d3c01e3db0b2fe7135969c69c22cbf49db07bae5ed8cb9a98ac3e18d3c73ce5&=&format=webp&width=515&height=687
https://media.discordapp.net/attachments/1436719817624256534/1436719914512547951/3.jpg?ex=6910a130&is=690f4fb0&hm=59a326eaa4d74733a406431b5c2eb8ee07f6b78d95094102deb1153d2e261407&=&format=webp&width=515&height=687

snow lion Nov 9, 2025, 1:18 PM

#

Hi everyone! I just published a dataset with real-world data about European airlines routes: https://www.kaggle.com/datasets/lunthu/european-airlines-routes. Feel free to use in your projects.

hybrid vortex Nov 23, 2025, 6:20 AM

#

Hi All, I published this dataset with Formula 1 race data from 1950 to 2025: https://www.kaggle.com/datasets/rockyt07/formula-1-championships-1950-2025. Feel free to tinker with it.

sour arch Nov 28, 2025, 6:05 AM

#

Hello everyone! 👋

I’m excited to share my capstone project:

🛡️ SENTINELS – Multimodal Disaster Intelligence System
An AI-powered system for real-time disaster detection, severity analysis, risk prediction & interactive mapping.

🔗 Kaggle Notebook: https://www.kaggle.com/code/mukthanjalibonala/sentinels-multimodal-disaster-intelligence-agent

Connect with me on LinkedIn 👉 https://www.linkedin.com/in/mukthanjalibonala/

Would love feedback, suggestions, and support 🙏

Thank you! 💙

ebon girder Dec 6, 2025, 8:41 AM

#

Hey everyone! 👋
I’m conducting a short academic survey for my Research Methodology internal assessment on “The Impact of ChatGPT in Education.”
It takes less than 3 minutes to complete and all responses will remain anonymous.
Your input will really help me with my project — please fill it out below 👇

🔗Survey Link

Thanks a lot for your time and support! 🙏

velvet karma Jan 4, 2026, 9:14 PM

#

Hi everyone, I put up a dataset based on the top 10 stock picks of wall street bets!
Feel free to take a look!
Dataset - https://www.kaggle.com/datasets/rraghav5600/2025-wallstreetbets-top-10-stocks-dataset

blissful hemlock Jan 16, 2026, 7:40 AM

#

Can anyone tell me where can I find tumor segmentation mask dataset for 2D image segmentation using UNet

burnt cobalt Jan 27, 2026, 9:14 AM

#

📢 New Kaggle Notebook Out!

Used DBSCAN to kick out outliers and let K-Means cluster in peace 😌
Also included a non-linear example where circles behave and centroids don’t.

Feedback welcome — noise will be ignored 😉
https://www.kaggle.com/code/sharmagayatri/dbscan-the-bouncer-of-clustering

scarlet sparrow Jan 27, 2026, 7:37 PM

#

Hi everyone, I uploaded my second and most biggest dataset! As usual, advices are welcome! 😄
Topic is Apple Financial Dataset from 1980 to 2026

contributor Dataset - https://www.kaggle.com/datasets/adamvakar/apple-comprehensive-financial-dataset-1980-2026

subtle grotto Jan 28, 2026, 2:19 AM

#

Hi @everyone
📘 Python Loops & Strings – Kaggle Notebook 🐍
This notebook explains Python loops (for, while) and strings in a detailed and easy-to-understand way, with clear examples.
It’s especially helpful for beginners 🚀

Please check it out and leave a vote ⭐ and a comment 💬 — your feedback is highly appreciated! 🙌
https://www.kaggle.com/code/dastgeerjutt/3-loops-and-strings-detailed

scarlet sparrow Feb 1, 2026, 11:40 PM

#

👋Hi everyone, I uploaded my third and most interesting dataset! Advices are welcome!
Topic is How Financial Crises Are Born: Warning Indicators

Please check and upvote if you like it! plus_one
https://www.kaggle.com/datasets/adamvakar/how-financial-crises-are-born-warning-indicators

lost crest Feb 4, 2026, 6:05 AM

#

🚗⚡ New Dataset on Kaggle: Electric Vehicle Population (Geospatial Insights)

I’ve just published an Electric Vehicle Population dataset on Kaggle, designed for EDA, machine learning, and geospatial analysis.

📊 What you can explore with this dataset:
• EV adoption patterns across regions
• Urban vs. rural penetration gaps
• Trends over time by vehicle type and location
• Opportunities for clustering, forecasting, and policy analysis

🔗 Explore & upvote the dataset:
https://www.kaggle.com/datasets/hammadansari7/electric-vehicle-population

💬 Your take?
Is EV adoption still driven by urban infrastructure and incentives, or are we approaching broader mainstream adoption?
Is the rural lag a data reality—or just a temporary phase?

I’d love to see notebooks, visualizations, and insights built on top of this dataset. Let’s learn from the data.

#DataScience #KaggleDatasets #MachineLearning #GeospatialAnalysis #ElectricVehicles #Sustainability #Python #EDA

@Kaggle @Tesla @robikscube @TowardsDataScience

lost crest Feb 8, 2026, 3:04 PM

#

@everyone
I uploaded a new dataset on Kaggle, "Border Crossing Entry Data—U.S."
Please request to vote my dataset and perform EDA analysis on this dataset Link: https://www.kaggle.com/datasets/hammadansari7/border-crossing-entry-data-u-s

lost crest Feb 11, 2026, 8:04 AM

#

https://www.kaggle.com/datasets/hammadansari7/global-billionaire-dataset
please vote my dataset and do work on it
@everyone

little fog Feb 11, 2026, 8:31 AM

#

https://www.kaggle.com/datasets/mabubakrsiddiq/developer-stress-simulation-dataset
This dataset simulates the stress levels of software developers under various real-world conditions. It includes a mix of workload 💼, personal habits 🛌☕, project deadlines ⏳, code complexity 💻, and interruptions 📞 that influence stress. The data is intentionally non-linear and realistic 🔄, reflecting how stress does not grow uniformly but depends on interactions between multiple factors.

little fog Feb 11, 2026, 2:16 PM

#

New Dataset Just published!

View: https://www.kaggle.com/datasets/mabubakrsiddiq/clear-bg-ocr-dataset-eng-and-zh-22k-images

🔹 Overview

This dataset contains synthetic OCR images of English and Chinese sentences. Each language is organized in a separate folder with corresponding metadata. The images have clear backgrounds, random fonts and font sizes, and optional blur for variability.

The dataset is designed for OCR research, machine learning, and computer vision tasks. Perfect for training models to recognize text in multiple languages and fonts.

🎨 Features

✅ Two-lingual dataset: English & Chinese
✅ Random fonts: Multiple font options for diversity
✅ Random font sizes: Increases model generalization
✅ Optional Gaussian blur: Simulates real-world imaging
✅ Clear backgrounds: Good for clean OCR training
✅ Metadata included: Easy for preprocessing and analysis

💡 Possible Use Cases

🖋️ OCR Model Training: Train models like Tesseract, PaddleOCR, or deep learning OCR pipelines
🤖 Computer Vision Research: Use metadata for font/style classification
🏫 Language Learning Tools: Visual recognition for English or Chinese sentences
🔧 Augmentation Testing: Benchmark text recognition under blur and font variations
🧠 Multi-Lingual OCR Experiments: Test cross-lingual recognition models

⚡ Notes

The Chinese text is rendered using Microsoft YaHei and NSimSun fonts for proper character display.
The English text uses a variety of fonts for diversity.

Please consider giving an upvote!

little fog Feb 11, 2026, 3:18 PM

#

https://www.kaggle.com/datasets/mabubakrsiddiq/developer-stress-simulation-dataset

little fog Feb 11, 2026, 5:02 PM

#

https://www.kaggle.com/datasets/mabubakrsiddiq/global-conflict-incident-dataset
Please explore this dataset, if you like it, please upvote

little fog Feb 12, 2026, 8:12 AM

#

Hi everyone! checkout this dataset

https://www.kaggle.com/datasets/mabubakrsiddiq/global-conflict-incident-dataset

This dataset contains 5,000 synthetically generated records of social conflicts, disputes, and civil disturbances occurring across major cities in Asia, the Middle East, Africa, Europe, and North America.

lost crest Feb 12, 2026, 11:21 AM

#

@everyone
Assalam o alikum!
I posted new dataset on Kaggle: "Pakistan Air Quality & Weather (10 Cities)."
https://www.kaggle.com/datasets/hammadansari7/pakistan-air-quality-and-weather-10-cities
Overview
This dataset contains 3 months of hourly air quality and weather measurements for 10 major Pakistani cities, covering November 2025 to February 2026. With 21,840 complete records, it provides comprehensive data for pollution analysis and prediction modeling.
Cities Covered
Lahore
Karachi
Islamabad
Rawalpindi
Faisalabad
Multan
Peshawar
Quetta
Rahim Yar Khan
Sialkot
Data Source
Air quality and weather data collected from Open-Meteo API, an open-source weather and environmental data provider.
Dataset Statistics
Total Records: 21,840

little fog Feb 13, 2026, 7:18 AM

#

Review and upvote it...

https://www.kaggle.com/datasets/mabubakrsiddiq/global-conflict-incident-dataset
This dataset contains 5,000 synthetically generated records of social conflicts, disputes, and civil disturbances occurring across major cities in Asia, the Middle East, Africa, Europe, and North America.

little fog Feb 14, 2026, 1:56 PM

#

See the dataset

https://www.kaggle.com/datasets/mabubakrsiddiq/developer-stress-simulation-dataset
This dataset simulates the stress levels of software developers under various real-world conditions. It includes a mix of workload 💼, personal habits 🛌☕, project deadlines ⏳, code complexity 💻, and interruptions 📞 that influence stress. The data is intentionally non-linear and realistic 🔄, reflecting how stress does not grow uniformly but depends on interactions between multiple factors.

lost crest Feb 15, 2026, 2:29 PM

#

Assalam o alikum @everyone
please vote for my dataset!
https://www.kaggle.com/datasets/hammadansari7/employee-sales-2026

slate marlin Feb 15, 2026, 8:51 PM

#

Hey everyone be sure to review this newest dataset from LEO-to-HEO
https://www.kaggle.com/datasets/gastondana/spacedos

#

I've been looking into alot of heart disease R&D as of late and came across this US Cardiovascular Mortality Trends from the CDC data and made the dataset: https://www.kaggle.com/datasets/gastondana/us-cardiovascular-mortality-trends-cdc
Fresh notebooks coming in the future!

scarlet sparrow Feb 16, 2026, 12:08 AM

#

👋Hi everyone, I uploaded one more dataset! Advices are welcome!
Topic is Ericsson Innovation Timeline: Patent Evolution

Please check and upvote if you like it!
https://www.kaggle.com/datasets/adamvakar/ericsson-innovation-timeline-patent-evolution

little fog Feb 16, 2026, 3:44 PM

#

https://www.kaggle.com/datasets/mabubakrsiddiq/students-learning-trajectory
This dataset simulates the learning behavior and performance of students over a semester (16 weeks). Each row represents one student in one week, capturing their study habits 📝, lifestyle factors 🛌☕📱, and academic outcomes 🎯.

lost crest Feb 17, 2026, 12:05 PM

#

please vote for my notebook.
https://www.kaggle.com/code/hammadansari7/pakistan-air-quality-crisis-2025-2026

little fog Feb 17, 2026, 2:24 PM

#

New Dataset published!

https://www.kaggle.com/datasets/mabubakrsiddiq/language-identification-dataset-20-languages/data/data/data/data/data
The Language Identification Dataset is a curated collection of approximately 68978 text samples, each paired with a corresponding language label. The dataset was constructed by gathering multilingual text passages from three major sources: the Multilingual Amazon Reviews Corpus, XNLI, and STSb Multi-MT. These sources provide a diverse mix of domains, writing styles, and sentence structures, making the dataset suitable for research and machine learning tasks involving language detection, multilingual NLP, and text classification.

lost crest Feb 18, 2026, 10:55 AM

#

Assalam o alikum @everyone
Please vote for my dataset and complete the task.
https://www.kaggle.com/datasets/hammadansari7/milks-effect-on-human-health

little fog Feb 18, 2026, 12:32 PM

#

New Dataset Published!

https://www.kaggle.com/datasets/mabubakrsiddiq/competition-math-problems-dataset
Please upvote...
This dataset contains over 12,000 math competition problems covering topics like Algebra and others. Each entry includes the problem statement, its difficulty level (Level 1–5), problem type, and a detailed step-by-step solution. It is ideal for training or evaluating AI models in problem-solving, explanation generation, and mathematical reasoning. The problems range from simple calculations to complex multi-step competition-level questions.

lost crest Feb 20, 2026, 11:02 AM

#

Assalam o alikum
please upvote my dataset.
https://www.kaggle.com/datasets/hammadansari7/ramazan-sehri-and-iftar-timings-2026

prisma narwhal Feb 20, 2026, 4:34 PM

#

Just as a reminder: server rules prohibit asking for upvotes. We will be enforcing that more assertively going forward.

little fog Feb 22, 2026, 8:04 AM

#

https://www.kaggle.com/datasets/mabubakrsiddiq/students-learning-trajectory
This dataset simulates the learning behavior and performance of students over a semester (16 weeks). Each row represents one student in one week, capturing their study habits 📝, lifestyle factors 🛌☕📱, and academic outcomes 🎯.

lost crest Feb 23, 2026, 10:19 AM

#

@everyone

Please take a look at my dataset and vote for it.

https://www.kaggle.com/datasets/hammadansari7/gen-z-mental-wellness-and-digital-lifestyle-patterns

prisma narwhal Feb 23, 2026, 5:46 PM

#

lost crest @everyone Please take a look at my dataset and vote for it. https://www.kaggl...

Requesting upvotes violates server rules, and may lead to you being banned from Discord. (Also, it's rude to try and notify everyone on a server!)

lost crest Feb 24, 2026, 2:24 PM

#

Assalam o Alikum !
I'm grateful for everything, including this wonderful sport.
Please take a look at my dataset and vote for it.
https://www.kaggle.com/datasets/hammadansari7/chicago-residential-parking-permitzones20152026

prisma narwhal Feb 24, 2026, 2:30 PM

#

It is against server rules to request upvotes for your work - and can lead moderator action.

slate marlin Feb 26, 2026, 6:25 AM

#

This dataset provides a high-fidelity integration of NASA OSD-679, unifying intraocular pressure (Tonometry), retinal morphology (OCT), & biological metadata into a single cohort. It is designed to train models on the SANS Paradox, enabling cross-modal analysis of how microgravity-induced fluid shifts impact ocular structure & function.

https://www.kaggle.com/datasets/gastondana/osd-679-sans-multi-modal-space-biology-benchmark

spark hollow Mar 1, 2026, 10:30 AM

#

This is a dataset of google historical stock prices from 1980 to 2026 taken form Yfinance library. Please check it out and tell me what you think:
https://www.kaggle.com/datasets/ibrahimshahrukh/google-alphabet-stock-prices-2016-2026

normal ore Mar 2, 2026, 8:18 AM

#

do explore dataset, good for beginners as well ✅ https://www.kaggle.com/datasets/suhanigupta04/student-placement-prediction-dataset

spark hollow Mar 3, 2026, 12:50 AM

#

Guys check out this dataset and comment your thoughts: https://www.kaggle.com/datasets/ibrahimshahrukh/google-alphabet-stock-prices-2016-2026

spark hollow Mar 3, 2026, 3:54 AM

#

Guys check out this dataset on coca cola historical stock price and comment your thoughts: https://www.kaggle.com/datasets/ibrahimshahrukh/coca-cola-ko-stock-prices-19802026

tidal trench Mar 3, 2026, 5:16 AM

#

I need Urdu or English sentiment analysis Data set

lost crest Mar 3, 2026, 2:18 PM

#

Hammad_zahid — 7:17 PM
Assalam o alikum! @everyone
Please give it an upvote.
https://www.kaggle.com/datasets/hammadansari7/gt-r-auction-and-resale-market-dataset-r32r34

little fog Mar 9, 2026, 12:41 PM

#

https://www.kaggle.com/datasets/mabubakrsiddiq/urdu-ghazal-dataset-32-poets-and-their-ghazals

The dataset contains poetry by 30 greatest urdu poets. Here they are:

'mirza-ghalib','allama-iqbal','faiz-ahmad-faiz','sahir-ludhianvi','meer-taqi-meer', 'dagh-dehlvi','kaifi-azmi','gulzar','bahadur-shah-zafar','parveen-shakir', 'jaan-nisar-akhtar','javed-akhtar','jigar-moradabadi','jaun-eliya', 'ahmad-faraz','meer-anees','mohsin-naqvi','firaq-gorakhpuri','fahmida-riaz','wali-mohammad-wali', 'waseem-barelvi','akbar-allahabadi','altaf-hussain-hali','ameer-khusrau','naji-shakir','naseer-turabi', 'nazm-tabatabai','nida-fazli','noon-meem-rashid', 'habib-jalib'
Every ghazal is given in three writing systems:

Urdu (Arabic Script)
Hindi (Hindi writing system)
English (Latin Script)
Divided into three folders: ur, en and hi.

Potential use cases:

NLP
Meter Detection
Modeling AI to predict the poet given the ghazal or couplet
Have fun with data!

lost crest Mar 9, 2026, 12:47 PM

#

Greetings and salutations! Greetings and salutations!
I hope your life has been enjoyable.
Vote for my dataset, please.
https://www.kaggle.com/datasets/hammadansari7/google-stock-market-dataset-20252026

normal ore Mar 9, 2026, 4:50 PM

#

About This Dataset https://www.kaggle.com/datasets/suhanigupta04/gold-futures-5-year-dataset

5 years daily gold futures (GC=F) data from Yahoo Finance with complete OHLCV
Clean, ready-to-use for LSTM/GRU, ARIMA, Prophet time-series forecasting models
11 pre-computed technical indicators: MA7/30/90, RSI, MACD, Bollinger Bands, volatility
No missing values, properly scaled features for immediate ML experimentation

🔗 [Starter Notebook created] — EDA, technical plots, LSTM baseline with RMSE evaluation

prisma narwhal Mar 9, 2026, 6:23 PM

#

lost crest Hammad_zahid — 7:17 PM Assalam o alikum! @everyone Please give it an upvote. htt...

Please: 1) Do not tag in everyone 2) Ask for upvotes. Both of those are against server rules.

little fog Mar 10, 2026, 12:10 AM

#

https://www.kaggle.com/datasets/mabubakrsiddiq/students-learning-trajectory

little fog Mar 11, 2026, 3:51 AM

#

Dataset on student learnings

https://www.kaggle.com/datasets/mabubakrsiddiq/students-learning-trajectory

normal ore Mar 11, 2026, 5:12 AM

#

normal ore About This Dataset https://www.kaggle.com/datasets/suhanigupta04/gold-futures-5-...

Do explore the dataset here https://www.kaggle.com/datasets/suhanigupta04/gold-futures-5-year-dataset. You can refer to the starter notebook as well for help : https://www.kaggle.com/code/suhanigupta04/gold-futures-lstm-forecasting

lost crest Mar 11, 2026, 11:13 AM

#

Please pot my notebook
https://www.kaggle.com/code/hammadansari7/time-series-forecasting-of-stock-market-data

buoyant stone Mar 12, 2026, 11:16 AM

#

Hello hackers,

I need some help. I’m training a conversation disentanglement model using this repo: https://github.com/jkkummerfeld/irc-disentanglement
. It will be used to prepare a conversation dataset for a project.

I don’t have access to compute resources that can run continuously for five days. I’m using Google Colab, but sessions eventually stop when the tab closes or times out. I also can’t afford a cloud provider right now.

If anyone has a home setup that can run uninterrupted for several days and is willing to help, I would really appreciate it. Thanks!

normal ore Mar 15, 2026, 9:32 AM

#

About This Dataset
🏆 2,000+ downloads and counting — synthetic placement dataset on Kaggle!
https://www.kaggle.com/datasets/suhanigupta04/student-placement-prediction-dataset

100,000 synthetic student records simulating real Indian campus recruitment patterns
Features cover the full placement pipeline — academics (CGPA, backlogs), technical skills (DSA, coding, ML), and activities (internships, projects, hackathons)
Two target variables: placement_status (classification) and salary_package_lpa (regression)
Ideal for placement prediction, salary estimation, feature importance analysis, and fairness auditing across branches and tiers

🔗 Starter Notebook available — EDA, baseline ML models, feature importance. Great starting point for your own experiments!

little fog Mar 15, 2026, 4:44 PM

#

https://www.kaggle.com/datasets/mabubakrsiddiq/chinese-pinyin-english-dataset

normal ore Mar 15, 2026, 6:13 PM

#

🧠 Just published a new dataset on Kaggle!

🔗 Mental Health & Burnout in Tech – https://www.kaggle.com/datasets/suhanigupta04/employee-mental-health-and-burnout-dataset

150,000 synthetic tech employee records across roles, company sizes & work modes
Covers work stress, sleep, lifestyle, therapy access & social support
Three correlated mental health scores: stress, anxiety & depression
Two targets: burnout_level (Low/Moderate/High) + seeks_professional_help (binary)

📓 Starter Notebook available — EDA, correlation heatmaps & Random Forest baseline

hallow warren Mar 20, 2026, 4:10 AM

#

🚀🔥 𝗔𝗣𝗣𝗟𝗘 (𝗔𝗔𝗣𝗟) 𝗖𝗢𝗠𝗣𝗟𝗘𝗧𝗘 𝗗𝗔𝗧𝗔𝗦𝗘𝗧 (1980–2026) 🔥🚀
📊 40+ YEARS OF STOCK + FINANCIAL DATA
💡 READY FOR ML • ANALYSIS • RESEARCH

👉 https://www.kaggle.com/datasets/anadiskt/apple-aapl-full-stock-financial-dataset-1980-2026

⭐ SUPPORT WITH AN UPVOTE 🙏

tulip stone Mar 20, 2026, 8:30 AM

#

hallow warren 🚀🔥 𝗔𝗣𝗣𝗟𝗘 (𝗔𝗔𝗣𝗟) 𝗖𝗢𝗠𝗣𝗟𝗘𝗧𝗘 𝗗𝗔𝗧𝗔𝗦𝗘𝗧 (1980–2026) 🔥🚀 📊 4...

Thanks!

normal ore Mar 26, 2026, 10:16 AM

#

🏏 Just published my IPL Dataset (2008–2024) on Kaggle!
https://www.kaggle.com/datasets/suhanigupta04/ipl-dataset-20082024-with-match-features
17 seasons of IPL data with innings-level features engineered
from official ball-by-ball records.

⚡ Powerplay & death over stats per innings
📊 Run rate, dot ball %, boundary counts
🏆 Match outcomes, toss impact & player of match
🤖 Ready for EDA, win prediction & team analysis

dire olive Mar 28, 2026, 4:25 AM

#

I want news data, where I will get it?

dire olive Mar 28, 2026, 4:44 AM

#

https://www.kaggle.com/datasets/adinishad/acled-dataset

normal ore Mar 28, 2026, 12:34 PM

#

normal ore 🏏 Just published my IPL Dataset (2008–2024) on Kaggle! https://www.kaggle.com/d...

Starter notebook also uploaded! https://www.kaggle.com/datasets/suhanigupta04/ipl-dataset-20082024-with-match-features

cinder oar Apr 1, 2026, 11:59 AM

#

Hey everyone,

I’m not sure if you’ve been following the discussions over the past two weeks, but I recently completed a challenge called "14 Days, 14 Datasets." The challenge is now over, but it resulted in several high-quality datasets covering highly relevant topics.

The final topic is very personal to me: my home country, Sudan. As many of you may not know, Sudan has been experiencing conflict since the '90s, though it was previously concentrated in the Darfur region rather than the capital, Khartoum. Since 2019, Sudan has faced widespread demonstrations and government crackdowns that deeply affected Khartoum. Then, in 2023, a full-scale war broke out in the capital.

This conflict began as an attempt by the Rapid Support Forces (RSF) to seize authority from the National Army. Backed by the UAE which has funded the militia to gain control over Sudan’s gold resources—this war has cost civilians everything: their homes, their cars, their life savings, and their lives.

Because of this, I decided to curate a high-quality dataset to provide information on the reality of what is happening in my country.

Dataset Link: https://www.kaggle.com/datasets/waddahali/sudan-conflict-2023-2026

The dataset is fully documented, and the description provides extensive context. I hope you take a look, and please keep Sudan in your prayers.

Thank you all!

normal ore Apr 3, 2026, 9:18 AM

#

🚀 Global E-Commerce Customer Behavior Dataset 2026 🛒
https://www.kaggle.com/datasets/suhanigupta04/e-commerce-customer-behavior-dataset-75k-orders

75K synthetic orders with customer demographics, pricing, discounts, returns, reviews, and churn signals.
Covers 18K customers and 2.5K products across 2023–2026.
Can be used for RFM segmentation, churn prediction, profitability analysis, and retail dashboards.

prisma anchor Apr 4, 2026, 10:09 AM

#

https://www.kaggle.com/code/aadigupta1601/predicting-retraction-risk-in-scientific-papers
new notebook regarding predicting retraction risk in scientific papers

would love your feedback

celest ether Apr 4, 2026, 8:43 PM

#

Anyone can provide the best dataset download link for deepfake detection videos with good qualities videos and of various diiferent varities ?? It will be great help to me.

kindred kiln Apr 5, 2026, 1:57 AM

#

celest ether Anyone can provide the best dataset download link for deepfake detection videos...

Go for hugging face datasets

normal ore Apr 6, 2026, 11:49 AM

#

🎬 New Dataset Live on Kaggle! 🚀
https://www.kaggle.com/datasets/suhanigupta04/global-movies-dataset-19502026
• 100K synthetic movies (1950–2026) with IMDb-style ratings, genres, budgets & revenue
• Director rankings, decade trends, blockbuster prediction targets included
• Perfect for EDA dashboards, rating prediction & recommendation systems
• ML-ready: top_100_prob, blockbuster_flag, franchise_flag targets

civic lagoon Apr 6, 2026, 9:36 PM

#

Visdrone-DET2019 dataset converted to YOLO format
https://www.kaggle.com/datasets/banuprasadb/visdrone-dataset

untold jackal Apr 10, 2026, 12:09 AM

#

🚀New Dataset on Kaggle! (Liver Patient data)

https://www.kaggle.com/datasets/shauryasrivastava01/liver-patient-dataset
• 583 patient records with real clinical biomarkers
• Binary classification (Liver Disease vs Healthy)
• Fully cleaned + preprocessed (no messy columns)
• Includes enzymes, bilirubin, proteins & demographic data
• Perfect for ML projects, EDA, and healthcare modeling

normal ore Apr 10, 2026, 4:55 PM

#

Explore dataset for time series: About This Dataset https://www.kaggle.com/datasets/suhanigupta04/gold-futures-5-year-dataset
5 years daily gold futures (GC=F) data from Yahoo Finance]
Clean, ready-to-use for LSTM/GRU, ARIMA, Prophet time-series forecasting models
11 pre-computed technical indicators
No missing values, properly scaled features for immediate ML experimentation

🔗 [Starter Notebook created] — EDA, technical plots, LSTM baseline with RMSE evaluation

untold jackal Apr 11, 2026, 4:10 PM

#

🚀 **New Dataset On Kaggle ** : Microsoft's all time stock data (latest)

https://www.kaggle.com/datasets/shauryasrivastava01/microsoft-all-time-stock-datalatest
Use Cases:

Time-Series Forecasting
Volatility & Risk Assessment
Algorithmic Trading & Backtesting
Portfolio Optimization
Starter Notebook : https://www.kaggle.com/code/shauryasrivastava01/microsoft-stock-eda-trends-returns-insights

normal ore Apr 25, 2026, 6:16 PM

#

Explore dataset for time series: About This Dataset https://www.kaggle.com/datasets/suhanigupta04/gold-futures-5-year-dataset
5 years daily gold futures (GC=F) data from Yahoo Finance]
Clean, ready-to-use for LSTM/GRU, ARIMA, Prophet time-series forecasting models
11 pre-computed technical indicators
No missing values, properly scaled features for immediate ML experimentation

🔗 [Starter Notebook created] — EDA, technical plots, LSTM baseline with RMSE evaluation

spice sun Apr 26, 2026, 12:54 AM

#

https://www.kaggle.com/datasets/izzarsulynashrudin/brugada-huca

Brugada-HUCA: 12-Lead ECG Recordings for the Study of Brugada Syndrome

Summary
Brugada-HUCA is a dataset of 12-lead electrocardiogram (ECG) recordings developed to support the study and classification of Brugada syndrome, a rare but potentially fatal cardiac arrhythmia. The data were collected retrospectively from patients evaluated at the Cardiology Department of the Hospital Universitario Central de Asturias (HUCA) and were reviewed by clinical experts. Diagnostic labels were assigned according to established international criteria.

The dataset includes 363 subjects, comprising 76 patients diagnosed with Brugada syndrome and 287 healthy control subjects. Each recording is accompanied by diagnostic metadata.

native coral May 2, 2026, 12:40 PM

#

Hey @everyone, use this dataset for the new EDA + model build:

https://www.kaggle.com/datasets/vedantbhavsar43/ipl-2007-to-2026-complete-ball-by-ball-dataset

This is a better base than the older IPL datasets because it already has:

latest available IPL 2026 data
full ball-by-ball coverage
cleaner ML-ready structure
better feature engineering scope

The main advantage is that we can skip a lot of cleaning and directly focus on:

EDA
feature engineering
stronger model building

It should be much better for winner prediction, score prediction, and live match modeling.

last kettle May 15, 2026, 4:44 AM

#

New dataset drop: 69k Japanese names with gender, sourced from real Wikipedia people
I believe this is the only large scale dataset with real Japanese name with gender labeled.
Most Japanese name-gender datasets come from dictionaries or frequency surveys — not real individuals. I scraped Japanese Wikipedia's gender-segregated occupational categories to get a dataset of actual public figures (actors, athletes, politicians, musicians, etc.) with inferred gender labels.

69k entries | 87.1% include birth year
Kanji + hiragana for each name
Crawler code included

Kaggle Dataset: https://x.gd/MffYV

I'll release model for gender prediction from name, and a 450k meda dataset of Japanese names with gender soon

polar star May 16, 2026, 5:02 PM

#

Just released: AI Hiring Bias & Fairness Benchmark

A realistic synthetic recruitment dataset with:
• 5,000 candidate profiles
• Embedded hiring bias patterns
• Fairness auditing & SHAP explainability
• XGBoost + XAI analysis notebook
• Enterprise-style hiring simulation

Perfect for:
MachineLearning FairnessAI ExplainableAI XGBoost DataScience EDA Kaggle

Built for bias detection, hiring prediction, and ethical AI research.
https://www.kaggle.com/datasets/sridipbasu/ai-hiring-bias-and-fairness-benchmark

last kettle May 17, 2026, 9:17 AM

#

last kettle **New dataset drop: 69k Japanese names with gender, sourced from real Wikipedia ...

the largest public gender labeled Japanese Name meta-dataset

731k+ rows!
kaggle dataset: https://x.gd/mMZ4gV

last kettle May 17, 2026, 10:46 AM

#

polar star Just released: **AI Hiring Bias & Fairness Benchmark** A realistic synthetic re...

you need a Synthetic tag to your dataset
https://www.kaggle.com/discussions/general/679960

polar star May 17, 2026, 11:00 AM

#

last kettle you need a Synthetic tag to your dataset https://www.kaggle.com/discussions/gene...

Yes correct, thank you ren!
I have updated the tags accordingly

primal raven May 18, 2026, 3:08 PM

#

If you are interested to work on residential solar energy generation patterns and behavior 🌞

Here is a real **solar energy generation dataset **I have for you

https://www.kaggle.com/datasets/christiancanillas/solar-energy-generation-and-weather-data

More data will be added every first week of the month

agile sable May 20, 2026, 3:20 PM

#

Hey everyone! 👋

I just published my latest notebook on Kaggle: "Behind the Screens: Indian Developer Burnout & Layoff Anxiety Analysis".

I focused on feature engineering to create a custom "Vulnerability Matrix" to visualize burnout risks in 2026. I'd love to get some feedback on my visualization choices and the analytical approach.

Check it out here: https://www.kaggle.com/code/abdallahahmed701/behind-the-screen-indian-developer-burnout-eda

Any feedback or upvotes would be greatly appreciated! 🙏✨

craggy owl May 23, 2026, 1:50 AM

#

@agile sable This is really cool, the topic really relevant right now in the tech industry. The idea of building a Vulnerability Matrix through feature engineering is a creative approach. I like the Existential Dread Checklist visual. What stands out immediately is how close the two bars are for almost every role, do these scores shift when you control for years of experience or company size? A junior dev at a startup probably feels this very differently than a senior one at a big tech firm. The grouped bar format works really well here though, makes the comparison clean and easy to read at a glance.

shut quartz May 24, 2026, 9:57 AM

#

The World Has a Data Problem. We Fix It.
Every AI team hits the same wall eventually.
You have the model. You have the architecture. You have the engineers. But you don't have the data, and everything stops.
Maybe your dataset is too small to train on. Maybe it carries sensitive patient records, financial transactions, or personal identifiers that legal won't let you touch. Maybe you've been waiting months for a vendor to deliver labeled data that still isn't ready. Maybe your edge cases are so rare in real life that your model keeps failing exactly where it matters most.
This is not a skill problem. This is a data problem. And it is quietly killing more AI projects than any other single reason.
We generate synthetic data.
Not as a workaround. Not as a compromise. As a legitimate, statistically rigorous alternative that lets your team move again. We produce tabular, text, image, and time-series synthetic datasets that mirror the distributions, correlations, and behavioral patterns of real-world data without exposing a single real record.
We have solved this for teams in healthcare who couldn't share patient data across departments. For fintech companies building fraud detection models with almost no real fraud examples to train on. For startups that needed 10x their dataset size before a funding deadline. For enterprises blocked by GDPR, HIPAA, and compliance teams that said no to everything.
The problem you are sitting with right now, whether it is a privacy blocker, a data scarcity issue, a class imbalance, a regulatory wall, or a timeline that real data collection simply cannot meet, has a solution. We will tell you exactly what it is within 24 hours of hearing from you.
No long sales cycles. No vague proposals. You describe your data problem in plain language, and we come back with a concrete plan.
Send us your situation: [synthox.ai@gmail.com]
The only thing worse than a data problem is spending another month pretending it will resolve itself.

obsidian cloak May 25, 2026, 8:10 PM

#

last kettle # the largest public gender labeled Japanese Name meta-dataset 731k+ rows! kaggl...

Wouldn't it be possible to create something like this out of a wikipedia dump/json without scraping? Like the english wikipedia people dataset supplied by Wikimedia, can you get something of that sort from the japanese wikipedia via an API? https://www.kaggle.com/datasets/wikimedia-foundation/english-wikipedia-people-dataset