#❓┊ask-a-question

1 messages · Page 5 of 1

west urchin
#

same question

rustic hawk
#

ans also where do i get the coursework in AIstudio?

regal lily
#

So because I am not 18 I cannot use Google Ai Studio leading to me not getting an API so what should I do?

earnest cliff
#

I'm currently in my second semester of systems engineering. I consider myself a junior data analyst; I have knowledge in Power BI, Excel, Tableau, and SQL. I work as an administrative analyst at a company that makes slot machines similar to a casino.
I've been studying something, but I feel it's time to invest in education. I'm looking for those courses that cost 1-2 million, that last 10 months, and so on. They're similar to a bootcamp, but I use them more to reinforce knowledge and so on. I feel like YouTube courses and other platforms don't teach me enough; I feel like I've reached a point where I need guidance.
Would you invest in education like that?

buoyant cipher
#

Hello, I am not being able to open the file "Foundational Large Language Models & Text Generation" as you can see in the screenshot, is that normal?

solar pecan
#

How to check that er have successfully completed our assignments

solar pecan
#

Plz answer my question

wraith sparrow
solar pecan
#

But how should I get confirmed that it is completed

glad ibex
#

I have an issue with uploading a dataset.
I uploaded a 40 chunks of total 2GB (parquet, gzip compression) and it's hanging for a few hours with this:
Is it normal, or should I do something differently? Now I am uploading with Kaggle API

drifting star
drifting star
#

#❓┊ask-a-question I am trying to run an LLM model . And I tried different ways to load the model into memory . like device_map= auto, balanced, max_memory but I still see that 2 of my GPUs are underutilized. Besides my CPU memory is still available (I checked resource utilization). I am getting CUDA Out of memory issues and it is really frustrating. Can someone please help

#

I can share more details if someone can help

wispy meadow
#

Hello, I am a starter in LLMs. Basically for my project, I am trying to train any base model(Currently GPT2), to generate a short story using a user's prompt. Now, I've used a Huggingface dataset(Basically, Prompts and stories). I am feeding them into the model and training it in supervised mode(I passed the stories as lables).
My losses aren't converging. If anyone wants details on my Training arguments I can show them if its allowed.

glossy crown
wraith sparrow
drifting star
#

L4*4

wispy meadow
glossy crown
glossy crown
#

if not you can share

drifting star
#

Qwne2.5- 32B but if VRAM is exhausting then that should be visible on the resource utlization screen.

glossy crown
#

are you using fp8?

wraith sparrow
#

Nope it's clear

glossy crown
#

can you share your code if it is not a part of any comp?

drifting star
wispy meadow
#

I'll share

glossy crown
#

I never directly use accelerate

wraith sparrow
glossy crown
#

I was able to load and use the same model using vllm

patent cobalt
#

why ! tell me the reason for uninstalling the jupyterlab packege for
Clean up the environment by removing an unused package ?

glossy crown
#

qwen 2.5 32b it

wraith sparrow
#

Accelerate n transformer ?

drifting star
#

OutOfMemoryError: CUDA out of memory. Tried to allocate 540.00 MiB. GPU 1 has a total capacity of 22.28 GiB of which 511.38 MiB is free. Process 4171 has 21.77 GiB memory in use. Of the allocated memory 21.56 GiB is allocated by PyTorch, and 1.21 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
add Codeadd Markdown

glossy crown
drifting star
#

I usually get this kind of error

wraith sparrow
wispy meadow
#

@glossy crown here

glossy crown
drifting star
#

But the 2 of the GPU memory is not used. I get this VRAM issue

glossy crown
wispy meadow
glossy crown
drifting star
#

for now inference but I want for fine tuning also. This I get not during inference but when I am loading the model

wispy meadow
#

oh I think its 512

#

atleast thats what I used in the tokenizer

glossy crown
wraith sparrow
glossy crown
wraith sparrow
wispy meadow
glossy crown
wraith sparrow
glossy crown
#

it means that many batches can't fit at once

#

use a smaller batch size

wraith sparrow
glossy crown
#

yeah

wispy meadow
#

Yes I am, but the training is painfully slow and Losses don't converge. Maybe I should reduce max_tokens?

glossy crown
wraith sparrow
#

Oh right

glossy crown
glossy crown
glossy crown
#

does it not decrease in train as well?

wispy meadow
#

Yeah, tweaking any of them to 4, causes the CUDA error

wispy meadow
glossy crown
#

how long did you run it for?

wispy meadow
#

I mean the losses were 6.6+ then came down to 6.4+ then went to 6.6+

wispy meadow
wispy meadow
glossy crown
#

oh

#

weird

#

are you using lora?

wispy meadow
#

Right now no ig

glossy crown
#

I never really tried without lora so can't really say how much slower it is

#

but it is pretty slow

wispy meadow
#

Okay i'll try Lora then.

glossy crown
#

@drifting star did it fix it?

wraith sparrow
#

Isn't it over ?

glossy crown
wraith sparrow
#

Bro it's obv

#

From name

glossy crown
wraith sparrow
#

And code

#

-psychology

wraith sparrow
glossy crown
#

change your views bro, it is gonna make you suffer later in life

wraith sparrow
glossy crown
#

you can never infer gender from someone's coding style

wraith sparrow
#

Like date of joining dc n kaggle too n username

#

In simple words I said behaviour aggregate of all the factors

#

I'm not even learning osint though

glossy crown
#

you are just trying to normalize exclusivity and it is not funny

vestal halo
#

Hey guys, I'm in Kaggle youtube channel, and I don't see live stream yet, any idea when it'll start ?

vestal halo
raw pumice
#

hie i am new in to AI ML Domain , can someone provide good resources to start my journey in thsi domain

wraith sparrow
rapid stratus
#

Hi guys, I'm from India, When it will start?

quaint sable
drifting star
# wraith sparrow Bro it's obv

It is not abvious from name. It is a gender neutral name. Yes I am a female but many men in our community share the same name. We belong to sikh community and maximum names are gender neutral in our community. And yes I was trying for AIMO and it is getting over tomorrow but that ain't the point. The point is to learn. And I want to understand why 2 of my GPUs are underutilized

tough heron
#

Hello everyone, is there a meeting now?

earnest crescent
glad ibex
glad nest
#

HI. Can everyone help me how to summarize the white paper I downloaded in Notebook?

sturdy pelican
#

Is their any assignment for today?

drifting star
#

I linked my kaggle but the welcome channel of kaggle still indicates that I have not linked my account. Further, I don't have any permissions to post on the channel. I tried to delink it once and again linked it but the result is same

drifting star
rain chasm
drifting star
#

Also curious how could you keep your session active for 10 hours is it through API

wraith sparrow
#

No

#

U save the notebook such way

#

I don't remember now

drifting star
#

If you can recall and let me know that will be helpful

wraith sparrow
#

Ig it's right

#

U js simply save it 😅

#

It automatically runs till max 12 hrs

drifting star
drifting star
rugged dune
#

Hey, Where will be the tomorrows live stream?

#

In which platform?

glossy crown
#

if you need to keep something running and train for many hours you can save and run, but make sure to pickle the output

#

otherwise you would lose it

glad ibex
misty violet
#

Hi. I am getting "ClientError: 429 RESOURCE_EXHAUSTED" when I try to run a cell. This is after execution of some 4/5 cells above it. Anyone has any pointers ?

#

Ok. retried it after a minute or so and it worked. I guess the server resource(s) were really exhusted.

wise holly
#

So, for day 1 what we have to do ? Is there any assignment we have to complete

glad nest
thick viper
#

how do i submit assignment

elder canopy
#

In the Evaluation and Structured Data codelab, we explored different ways to assess LLM responses using autoraters and structured output formatting. Given the latest improvements in Gemini 2.0, what are the best practices for optimizing prompt structures to achieve higher accuracy and consistency in structured outputs, especially for tasks requiring reasoning, summarization, or extraction from unstructured text?

Additionally, are there any specific evaluation techniques recommended for ensuring that structured outputs remain reliable across diverse datasets?

wise holly
#

How do know that I completed today wrk

honest gust
glad nest
#

what requirements file or package name I should command pip to install?

glad nest
# honest gust i havent recieved it yet is there something i can do about it ?

If you look in discord someone put the assignment instruction. Today’s Assignments

Complete the Intro Unit – “Foundational Large Language Models & Text Generation”:
Listen to the summary podcast episode (https://www.youtube.com/watch?v=Na3O4Pkbp-U&list=PLqFaTIg4myu_yKJpvF8WE2JfaG5kGuvoE&index=1) for this unit.
To complement the podcast, read the “Foundational Large Language Models & Text Generation” whitepaper (https://www.kaggle.com/whitepaper-foundational-llm-and-text-generation).

Complete Unit 1 – “Prompt Engineering”:
Listen to the summary podcast episode (https://www.youtube.com/watch?v=CFtX0ZyLSAY&list=PLqFaTIg4myu_yKJpvF8WE2JfaG5kGuvoE&index=2) for this unit.
To complement the podcast, read the “Prompt Engineering” whitepaper (https://www.kaggle.com/whitepaper-prompt-engineering).
Complete these codelabs on Kaggle:
Prompting fundamentals - https://www.kaggle.com/code/markishere/day-1-prompting
Evaluation and structured data - https://www.kaggle.com/code/markishere/day-1-evaluation-and-structured-output
Make sure you phone verify (https://www.kaggle.com/settings) your account before starting, it's necessary for the codelabs.
Want to have an interactive conversation (https://support.google.com/notebooklm/answer/15731776?hl=en&ref_topic=14272601&sjid=16012842710481496794-EU) ? Try adding the whitepapers to NotebookLM (https://notebooklm.google.com/?original_referer=https:%2F%2Fwww.google.com%23&pli=1)

Read the whitepaper here: https://www.kaggle.com/whitepaper-foundational-llm-and-text-generation
Learn more about the 5-Day Generative AI Intensive: https://rsvp.withgoogle.com/events/google-generative-ai-intensive_2025q1

Introduction:

The advent of Large Language Models (LLMs) represents a seismic shift in the world of artificial intelligen...

▶ Play video

Read the whitepaper here: https://www.kaggle.com/whitepaper-prompt-engineering
Learn more about the 5-Day Generative AI Intensive: https://rsvp.withgoogle.com/events/google-generative-ai-intensive_2025q1

Introduction:

When thinking about a large language model input and output, a text prompt (sometimes accompanied by other modalities such as ...

▶ Play video
tulip venture
#

I am getting an error for "Install SDK", "Cell In[5], line 1
pip uninstall -qqy jupyterlab # Remove unused packages from Kaggle's base image that conflict
^
SyntaxError: invalid syntax"

glad nest
#

Hi everyone. what requirements file or package name I should command pip to install? as i get error

honest gust
#

what is the use for api key that is generated by ai studio ?

glad nest
#

why this error has come up? ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyterlab-lsp 3.10.2 requires jupyterlab<4.0.0a0,>=3.1.0, but you have jupyterlab 4.3.6 which is incompatible.

drifting oxide
#

im currently doing the intro to ml course and in the first exercise itself, the codes on the notebook don't work? it keeps on loading what should i do? i tried in both chrome and edge

honest gust
drifting oxide
#

no luck with that

wispy meadow
#

is it, more efficient to run an unsupervised Learning on GPT2 to finetune for generating story based on a context, or is using prompt as the input and story as the target to do supervised learning the better choice?

#

I did the latter, but my losses failed to converge. Also, the training times for such stuff, is absurdly high.

glad nest
#

when i downgrade jupyter version to<4.0.0this error came up: An error occurred while committing kernel: Concurrenc yViolation Sequence number must match Draft record: KernelId=79702580, ExpectedSequence=4, ActualSequence=2, AuthorUserId=25935810

#

Also gave me a warning WARNING: Skipping jupyterlab as it is not installed.

#

although it mentions successfully installed

abstract locust
#

hi, how we can submit tasks we already done?

broken mason
#

once we run the notebooks on kaggle, are we supposed to submit them ?

elder canopy
misty violet
#

copy from announcements -- Hi all - To make things simple and clear for everyone, we will award the course badge to everyone who participates in the capstone project.

All other work is optional and can be done at any time. If you're short on time, we recommend prioritising the podcasts and livestream and then the codelabs. The capstone project will use knowledge you learn from the codelabs, so we recommend running through them but you don't have to submit the daily labs.

There is no time limit for podcasts, whitepapers or codelabs, but doing them this week will give you a chance to discuss with the community.

jade cave
#

Hi everyone! I am a student from Iran and it seems that Iranians can not verify their phones due to not receiving any messages. I would appreciate any guidance with this.

umbral rune
#

I don't get the code for the verification of Kaggal.. does anyone have the same issue

tepid lichen
#

I committed both promoting and Evaluation assignment. How do I know that it's completed?

elder canopy
#

Can any one tell me that what project we have to do at last .

obtuse storm
#

So what should I do after completing the codelabs again?

vocal bison
#

Is the livestream happening soon, I'm on the YT channel but don't see stream happening?

tepid lichen
#

Yeah. It will stream in 12 mins or so

mortal surge
#

What are the key differences between Generative AI models and Discriminative AI models?

frozen maple
#

Is the youtube livestream from 4 months ago or is it truly happening now?

distant rune
#

Anant did not get a chance to share his views when everyone else shared their views

dim river
#

How does Gemini 2.0’s multimodal capability compare to GPT-4 in image processing?

visual tusk
#

Guys, I know this question is kind of funny, but I couldn’t find the livestream video on the Kaggle page. The only video I found was published four months ago. That’s why I couldn’t participate in the last test.

pastel tide
#

Do you see it prompting the user for different styles/formats of output as for structured output which we see deployed now by GPT and Claude?

dim river
#

What’s the best way to evaluate the coherence of Gemini API outputs when working with this dataset?

proven burrow
#

Hello , Why don’t I see the first day ?

vagrant quartz
#

Temperature

spare bridge
#

I could not see todays video because of the timing when will it be uploaded and can you send link?

trim relic
#

should we cover the whole white paper on foundational LLM?

drifting star
fallow light
#

آیا برای روز اول تمرین یا کویزی داده شده که باید انجام وتحویل بدهیم؟

noble escarp
#

How I can know my assignment are submitter, how how I can submit?

wispy timber
#

Will there be another course in a few months? Maybe over the summer, so that students won't have to also balance school and homework?

sturdy pelican
#

U can learn after some time, these all videos and livestream are available on the kaggle yt channel

wispy timber
#

yeahh but I won't be able to do the assignments and capstone project

blissful sentinel
#

how to satup day 2

placid elk
#

How would I know if the assignment is done or not?

tepid shuttle
#

code:

warning/error:

as i ran the code in cell, got he warnings in output then this kernel restarting pop-up. and this problem just goes on.

Using TPU VM v3-8, huggingface library used is the latest version

gloomy skiff
#

How will i get to know whether I have completed my day 1 task will i get some notification or will it reflect on my dashboard or home page

vestal sky
#

Is the assignment easy ?

wraith sparrow
#

No nothing's easy

hardy basalt
#

why does my kaggle notebook almost always get stuck on running when i press run all? like my very first cell which is just imports gets stuck on running

#

this only really happens if I press "runall" though if I run each cell individually it works and doesnt get stuck

blissful sentinel
#

how to Complete assignments

hard sleet
#

!!! For the 5 days GenAI Course this is not the right place to ask questions. Please refer to 5dgai-question-forum. This is the general QA Discord chanel of Kaggle.

sullen osprey
#

Dear Kaggle Support Team,
I hope you are well. I recently registered on Kaggle to further my education and enhance my skills in data science. However, I encountered an obstacle during the phone number verification process for accessing codelabs; my verification is being blocked due to sanctions imposed on Iran.
I understand that U.S. sanctions target commercial and political activities, yet educational and research initiatives are generally exempt under U.S. law. Restrictions that prevent access to purely educational resources—like Kaggle codelabs—could be seen as conflicting with these legal exemptions. In fact, U.S. regulations typically allow for academic and humanitarian exchanges, and applying these restrictions in this educational context may run counter to that policy.
Could you please review this matter and consider lifting the phone verification restriction on my account so I can fully participate in Kaggle’s educational opportunities? I believe this change supports the spirit of U.S. law, which aims to foster academic development and innovation.

viscid gate
#

How to get instructions for this 5 Days course? thanks

clever cliff
visual sun
#

Hi, I'm a developer in Korea. I started studying AI this time, and what I want is to become a deep learning engineer in computer vision. So, I'm learning the basics from machine learning. How much should I study machine learning to move on to deep learning? Now I'm taking a look at the book "Introduction to Machine Learning with Python" and practicing it. Please tell me the direction for deep learning in the vision field

charred hull
#

i recived day2 yesterday not day 1 anyone like me ?

wraith sparrow
#

I'm bit disappointed myself

#

They don't ve that name book yet

#

Also this one's latest edition I'm not sure

dense parcel
visual sun
wraith sparrow
#

Never underestimate a Manning book

#

Even a decade old one would be better than some latest edition books of Packt publication

visual sun
winged orbit
peak sleet
#

​​Are there ways for us to effectively create AI agents for embedding evaluation or train the agents through reinforcement learning for embedding optimization?

azure trellis
#

Good day fellow kagglers. Please how do I go from here :df_train = create_embeddings(df_train)
df_test = create_embeddings(df_test)

#

I have not been able to run the two lines of codes and the subsequent ones successfully. I need your insight

wraith sparrow
# peak sleet ​​Are there ways for us to effectively create AI agents for embedding evaluation...

Yes, there are ways to create AI agents for embedding evaluation and optimization. Here are some approaches:

Embedding Evaluation:

  1. Learning to Rank: Train an agent to rank embeddings based on their quality, using reinforcement learning or supervised learning.
  2. Embedding Critic: Design an agent that critiques embeddings based on specific metrics, such as similarity or clustering quality.
  3. Adversarial Evaluation: Train an agent to generate adversarial examples that challenge the embedding's robustness.

Embedding Optimization through Reinforcement Learning:

  1. Reward-based Optimization: Define a reward function that encourages the agent to optimize embeddings for a specific task, such as clustering or classification.
  2. Policy Gradient Methods: Use policy gradient methods, like REINFORCE, to optimize the embedding parameters directly.
  3. Actor-Critic Methods: Employ actor-critic methods, like Deep Deterministic Policy Gradients (DDPG), to optimize embeddings using both value and policy functions.

Other Approaches:

  1. Generative Adversarial Networks (GANs): Use GANs to generate embeddings that are optimized for specific tasks.
  2. Meta-Learning: Train an agent to learn how to optimize embeddings for a variety of tasks, using meta-learning techniques like Model-Agnostic Meta-Learning (MAML).
  3. Evolutionary Algorithms: Employ evolutionary algorithms, like genetic algorithms or evolution strategies, to optimize embeddings.

These approaches can be used to create AI agents that effectively evaluate and optimize embeddings for various tasks.

wraith sparrow
# dim river What’s the best way to evaluate the coherence of Gemini API outputs when working...

Evaluating the coherence of Gemini api outputs involve assessing the relevance, consistency, and overall quality of the generated text. Here r some methods:

Evaluation Metrics:

  1. Perplexity: Use the evaluate method from HF Transformers to calculate perplexity.
  2. BLEU Score: Utilize the sacrebleu library, which is integrated with HF Transformers, to calculate BLEU scores.
  3. ROUGE Score: Use the rouge-score library, which is also integrated with HF Transformers, to calculate ROUGE scores.

Coherence Evaluation:

  1. Language Modeling Evaluation: Use the language-modeling-evaluation pipeline from HF Transformers to evaluate coherence.
  2. Text Classification Evaluation: Utilize the text-classification-evaluation pipeline to evaluate coherence in text classification tasks.

Model-specific Evaluation:

  1. Autoencoder-based Models: Evaluate coherence using reconstruction loss or perplexity for autoencoder-based models like BERT or RoBERTa.
  2. Generative Models: Evaluate coherence using metrics like BLEU or ROUGE for generative models like T5 or Bart.

Example Code:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from transformers import evaluate

Load model and tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
tokenizer = AutoTokenizer.from_pretrained("t5-base")

Evaluate perplexity
perplexity = evaluate(model, tokenizer, "wikitext-2-raw-v1", "perplexity")
print(perplexity)
This code evaluates the perplexity of the T5-base model on the WikiText-2 dataset.

pine anvil
#

Hello there, I am a bit stuck. Are we supposed to work in the same Kaggle notebook?

runic monolith
#

Hello fellas! I've been trying to verify my kaggle account via phone number but to no avail. I've tried over and over again sending code to my number but still no code shows up. Please, any way around this?

bright wyvern
#

Hello there, should i invest myself in Claris Filemaker? i am being transition to work with our company CTO and Filemaker is part of our internal process. Working with CTO over the years led me closer to data world.

whole delta
#

@quick yarrow moving here not to flood the other channel

#

i didn't say yolo is the best model, it does perform well though for your task it will need some additional training

#

do you have a dataset of images that show where existing defects are? you need to show the model what it's trying to locate

#

let me pull up an example

#

the task here is segmenting parts of the spine

#

the image on the right (aka mask or label) is already existing in the dataset

#

thus you can train your model to replicate what is already in your data

#

example of mask vs what a model generated

#

does your data have the masks? if not training a CNN can be difficult

#

also this is how a segmentation model (in this case Unet) segments images

#

yolo doesn't output masks like this, instead it generates bounding boxes

#

given the structure of a semiconductor i would assume something like yoloX is more fitting for your task, it's also less computationally intensive to train compared to U-net especially if your images are hi-res

#

i'm not familiar with how to implement yolo though but afaik it's not too hard

#

@quick yarrow please don't send me friend requests, we can just discuss this here

timid herald
#

Thank you.

slate arrow
#

Day 3 (Generative Agents)
Welcome to Day 3. https://www.kaggle.com/learn-guide/5-day-genai

Learn to build sophisticated AI agents by understanding their core components and the iterative development process.
The code labs cover how to connect LLMs to existing systems and to the real world. Learn about function calling by giving SQL tools to a chatbot, and learn how to build a LangGraph agent that takes orders in a café.

Day 3 Assignments:

  1. Complete Unit 3: “Generative Agents”, which is:

[Optional] Listen to the summary podcast episode for this unit (created by NotebookLM).
Read the “Generative AI Agents” whitepaper.
[Optional] Read a case study which talks about how a leading technology regulatory reporting solutions provider used an agentic generative AI system to automate ticket-to-code creation in software development, achieving a 2.5x productivity boost.
Complete these code labs on Kaggle:

  1. Talk to a database with function calling

  2. Build an agentic ordering system in LangGraph

  3. Watch the YouTube livestream recording. Paige Bailey will be joined by expert speakers from Google - Steven Johnson, Julia Wiesinger, Alan Blount, Patrick Marlow, Wes Dyer, Anant Nawalgaria to discuss generative AI agents.

bright turret
#

How to check the Progress whether I completed the Day 1???

robust tinsel
#

Hi everyone, I have a question on embeddings!

I was working on a RAG search project where the source document (a book) was ancient and hence the language used was antiquated. I was getting very poor performance with a standard embedding model and I suspect the language in the book was just very different to what the embedding model was trained on.

Is there a way in which I can salvage this situation and calculate useful embeddings? Thanks in advance. 🙂

neat wyvern
#

Hello everyone, quick question. Is anyone experiencing issues with Notebook? Its not loading any cells for me.

floral carbon
#

Is there a NotebookLLM API that can enable me leverage its features from within an app?

topaz apex
#

Pls how do I get this code labs, is blurred on the live training and is a capstone project to be submitted when?

whole delta
#

my thought process would be that if you split the doc into smaller, more manageable chunks, you'll have an easier time searching within the vector space during retrieval

#

The other route would be to test different embedding models though I doubt that will bring much improvement, unless it's anything overly specific

#

Another idea that comes to mind, though not sure how feasible given the increased overhead, would be to have an interpreter LLM convert from the antiquated to a more modern version of your language (assuming this is structured similar to trad/simplified Chinese, ancient/modern Greek etc) and use the modern versions for embeddings and retrieval and the original text in your doc store and subsequently the answering LLM prompt

#

Do keep me updated with your findings if possible that sounds really interesting

neon gale
#

Does the server guide just disappear when you complete it all (used to appear near Chanels & Roles and rules in left hand side of desktop app.

obtuse storm
#

Any ideas where can I get EHR data?

round prawn
#

kaggle lookslike crashed

#

kaggle account crashed

unreal grove
#

It's Naveed from Pakistan
I had a question
That, if I wanna become a Gen Ai developer and get a job as soon as possible
So, in your opinion which fields should I focus

jovial spruce
#

Subject: Seeking Guidance on Developing a Generative AI Application for Video SEO Optimization

I am initiating a project to develop a Generative AI application with the goal of optimizing video content for search engine visibility. The intended workflow involves users uploading video files, which the AI agent will then analyze to generate key SEO elements. Specifically, the application should:

Extract relevant information from the video content.
Generate SEO-optimized titles, adhering to a maximum length of 60 characters.
Suggest relevant hashtags (less than 10).
Identify pertinent YouTube keywords.
Identify relevant YouTube long-tail keywords.
I am seeking insights and recommendations from the community on the optimal approach to building this AI agent. Specifically, I would appreciate guidance on:

Recommended AI architectures and models suitable for this type of video content analysis and text generation.
Key considerations for data processing and feature extraction from video content to inform the AI model.
Strategies for training the AI model to generate accurate and effective SEO elements within the specified constraints (character limits, hashtag count).
Potential challenges and best practices to consider during the development process.
Thank you for your expertise and assistance in getting this project off the ground.

robust tinsel
whole delta
foggy crescent
#

what is this capstone project about?

balmy orbit
#

what PC set-up do you guys usually use and think would be good for training models? Sorry if this is too general, I'm just getting started with kaggle and was thinking of getting a M4 mini

wraith sparrow
whole delta
#

that and the ~50gb online storage should be enough for you to get started without spending a dime

#

training locally can be very expensive so do try to exhaust Kaggle first and see if it doesn't fit your needs before dropping that much money on a rig

wraith sparrow
whole delta
#

i'm not familiar w that service or what it costs

#

if they have free compute units/tokens and it's easy enough for you to set up cool

#

but kaggle has the lowest barrier to entry imo

timid herald
#

I'm going to develop population model using real-world data from a specific region.
The goal is to effectively transfer this model to a similar area while maintaining demographic and behavioral accuracy.
A key aspect of this role involves troubleshooting complex data inconsistencies and resolving modeling challenges to ensure high-quality outputs.
could you please give me the best algorithm and how to do?

covert rain
#

Hello Everyone,
Where do I mark the assignments done?
Also, where can we get the capstone?

final sand
#

Hello how to find groups for capstonea project

cedar gale
#

my Day 4 1st codelab is still in queue.... Has anyone got through the queue successfully.... Mine is going on forever.... 😦

blissful creek
#

Hi everyone, I am wondering if there is documentation/reference for the client class.

So far, codelabs showed me examples using

  • client.models.generate_content()
  • client.chats.create()
  • client.chats.send_message()
  • client.tunings.tune()
    etc. Those examples are great, but where can I find a comprehensive list of functions that belong to the client class and arguments of those functions, not just example code snippets?
noble escarp
#

Today I can't get any kind of mail for day 5 tasks and assignments

fast tapir
#

Thanks for the clarification @verbal crest . Could you advise on best practices for sharing the call across channels, if allowed? I aprpeciate your help, thanks!

verbal crest
wraith sparrow
#

datasets vs polars library which is better

uneven pewter
#

I have registered for a competition in diffrent I'd and my main I'd is diffrent I'm in a team what should I do?

gloomy nest
#

Hey, is here someone that knows anything about efficiency optimization in pytorch? I would need some help.
Thank you!

acoustic leaf
#

Hello. Is there anyone who has expertise in webscraping?

wraith sparrow
#

Anyone ve experience with singlestore notebooks ?

wraith sparrow
#

Anyone ve experience with uptrain ai ?

rustic drum
#

i am not getting this (linting) in kaggel notebook, how can i enable it(though i tried finding i haven't got any option regarding that).

is there any plugin required?

graceful axle
#

Guys please help me out

solid lynx
graceful axle
#

What happens if we don't submit before the due date.

#

Can I submit the project after the deadline

visual moat
#

@everyone Hello Everyone, I have a question regarding the code laps notebooks , I am doing all the code laps today, do we see any out files after running all notebooks? or just run is enough??
I know this is late to ask, does anyone know?, can anyone answer?

wraith sparrow
#

run would be enough ig

golden sorrel
#

How do people publish their datasets?
I'm curious to understand how people go about publishing datasets. Do they generate the data themselves, or do they collect it from somewhere else? If it's the latter, where do they usually get their data from?

haughty beacon
#

Hello @everyone,

I should develop AI model who checked a seal and invoice to business. I need a dataset sample.
Can you help me to getting an example of sample please.

Thanks.

distant nymph
#

Can anyone please tell if on kaggle they are asking me to submit the code notebook as well but in "submit Prediction" accepting only csv file so how i can submit the code on it.

astral charm
#

Is there a deadline for completing the project we are supposed to do in KAGGLE??

mystic ocean
#

I'm getting this error when I go to the competitions/gen-ai-intensive-course-capstone-2025q1 page. Is there a way around this?

jovial spruce
vapid nebula
#

I’m trying to download a part of BNF onto NotebookLm but not able to. What I am doing wrong ?

winter furnace
#

anyone here familiar with a workflow where i can do version control using git on a shared server?

for context, i have access to a shared GPU cluster which i can use for ML training, but ofc it's shared, i can't just have my private credentials on that server

so what's the best way of doing training on there while keeping my files under git?

also curious how to initiate training sessions, leave them running even after i close my laptop/close the SSH connection, periodically check training progress, etc.

ancient plover
radiant lodge
#

Please, how do I join a team, for the capstone project?

hexed zephyr
#

I want to become a ML engineer can anyone please guide me
right now i just done my matric exams and i am free for 3 months i want to start learning in this time period
so from where should i start and how i should continue my study

void pumice
#

I have asked for help several times and I am really feeling unsupported in trying to complete my Kaggle project.🥹

quasi flume
#

i dont know how to do the capstone
like should is just make a new notebook on the website and right my code in cells and run it or what/

#

little help please

west star
#

Pls how do I merge teams for my kaggle capstone project
I can't find where to form a team or even merge a team
Pls help 🥲

wraith sparrow
#

When u join the competition go to the Teams tab u can see ur team tho team merger deadline has passed maybe so u can't merge anymore

native wren
whole ore
#

Any support with this error
'404 models/gemini-pro is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.' @here

wraith jay
#

When I run these lines in a fresh notebook:

!pip uninstall -qqy jupyterlab # Remove unused packages from Kaggle's base image that conflict
!pip install -U -q "google-genai==1.7.0" # Also tried "google-genai==1.10.0" with same conflicts
!pip install -U -q chromadb

The chromadb install throws a whole bunch of library conflict warnings. Does anybody heve the same issue?
Any solutions?

The messages seem to refer to big-query and tensorflow, which I will not use directly, but I wonder if they will break anything else...
I tried the solutions Google AI Studio and OpenAI gave me, but they didn't work.

crimson helm
#

hi, i'm new on kaggle and i have a working notebook i want to submit in a competition. but i can't find how to import it ? the only option i found is to create a new notebook from scratch.

crimson helm
crimson helm
crimson helm
torpid shell
#

Hello i have written notebook that exeute a chain via langchain, but the notebook is hanging in kaggle ,while the same is r unnig fine on codespace. am i missing something> does kaggle not allow chains? ihave this setup wtih OpenAI but gemini does not seem to have the same output parser agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_tool_messages(
x["intermediate_steps"]
),
"chat_history": lambda x: memory.load_memory_variables(x)["chat_history"],
}
| prompt
| llm_with_tools
| OpenAIToolsAgentOutputParser()
)

pseudo kestrel
#

Hey i am Jenny, i am trying to work with Kaggle but this is too overwhelming dont know where to start, any help would be highly appreciated.Thanks in advance

unreal mesa
#

Hi, have anyone encountered any issue with yfinance api on kaggle notebook before? It was working perfectly fine yesterday, but suddenly it returns empty dataframe

tidal breach
#

Hi I have comepleted my CapStone Project everything runs fine but when I go to version it it says it failed... can any one help please

quasi flume
quasi flume
#

?? aanyhelp

quasi flume
tidal breach
craggy goblet
#

So I am somewhat new to ML and would like to know more about feature engineering and how does one go about doing it

fiery umbra
# pseudo kestrel Hey i am Jenny, i am trying to work with Kaggle but this is too overwhelming don...

Hi Jenny, I'm new as well but I think a good way to get started is to follow along with some YouTube tutorials. There are some good tutorials that show how to join a beginner competition, get some results using the Kaggle notebook/Jupyter notebook and submit those results for a score. Try searching youtube for "Kaggle titanic" or "Kaggle House Pricing" for help with those beginner competitions. A couple of popular channels I've seen are Ken Jee, and Ryan and Matt Data Science, for example.

pseudo kestrel
glossy crown
#

Are questions asked here answered?

quasi flume
wraith sparrow
tawny flame
#

Where I can upload my project of 5days genai

ivory robin
#

the kaggle web community looks like all spam or is just me?

wraith sparrow
#

-# maybe u r new

ivory robin
#

Indeed I am

coral rivet
#

Do you think I could reach out to you if I need help completing mine?

#

This channel is full of eager people but not enough people lending out a hand

#

It’s suffocating knowing that there’s a “community” here, but it feels like a brick wall

fiery umbra
stuck owl
#

Do anyone have the address of the recommended open source machine learning project on pulse waveform graph analysis? Recently looking for related projects in this area,I want to follow the big brother to do it first to see.

cursive owl
#

Hello, I am pretty new to ML and was working on internal project . I wanted to get some understanding about what are best practices to store trained model (.pkl). Do we push them into github with other code or there are other storages which can be used for maintaining model versions ?

craggy crown
#

Hey there!
After six months of learning to code (and approximately 4 existential crises), I finally trained and uploaded my first-ever deep computer vision model to Hugging Face. 🤗
Look, I’m still very much in the "wait, this actually worked?!" phase of ML, so I’d really appreciate some honest feedback—especially on the documentation and presentation.
Did I explain things okay? Does it make sense, or does it sound like I wrote it at 3 AM after too much coffee? (…Okay, that last part might be true.)
But seriously, any thoughts are welcome—code, structure, weird mistakes. I’m here to learn and you folks know way more than I do. 🚀
And if it’s terrible… well, at least I tried. 😅
https://huggingface.co/spaces/IncreasingLoss/Wildlife_Animal_Classifier

wraith sparrow
#

-# next time try streamlit

wheat mirage
#

I am doing all my project coding from within Kaggle - are there any risks of losing my work? Is my work private?

wraith sparrow
#

-# ofc ofc

hardy basalt
#

what do these green and red numbers mean next to my notebook?

#

my notebook is private btw

wraith sparrow
#

It means the no. of lines added n removed in ur notebook code

hardy basalt
#

ohhhhh

#

gotcha i was so confused

#

thank you

magic tangle
#

I registered on time, and attended the course for 5 days during the course, but i am unable to join the capstone competition (https://www.kaggle.com/competitions/gen-ai-intensive-course-capstone-2025q1/overview) from the start. It gives me attached error. All the emails we get are from no-reply-eventsatgoogle@google.com, who do I need to reach out to?

hoary lagoon
ebon umbra
#

Good morning, how can I obtain a credential at the end of the course?

#

I mean Day Gen AI Intensive Course with Google Learn Guide

craggy goblet
#

I am working with a dataset for a local contest in which I have to predict price for cars and the columns are brand , model , model year , milage , fuel type , ext colour , int colour , transmission , accident , clean_title how should I go about creating features for it / some good tips for feature enginnering and processing

wraith sparrow
#

idts

#

submitting the form should be enough

scenic quest
#

I submitted my notebook and added it as competitions notebook, now it shows like this and I cant preview it
without editing I am not able to view the notebook. Anyway to undo that change ?

hallow fern
#

Good day guys!
Happy learning!
Anyone that has worked with ransomware using deep learning.
I'm working on a similar project and I'm stucked.

lucid pollen
#

Hello, I am unable to join the competetion. It says Permission 'competitions.participate' was denied

pastel tide
#

Anyone get this?

An error occurred while committing kernel: The kernel source must be less than 1 megabytes in size.

meager marten
meager marten
#

I need to be verified to merge team in a competition but I cannot re-verify again and Idk why...

#

Bro...

meager marten
#

When will this be fixed?

#

Contacted + emailed staff yesterday and still no response

carmine pier
#

I attended the 5 day course, completed all notebooks. I submitted the Capstone project via Google form.

But I didn't receive completion certificate or badge.

Is there anything else I need to do.?

verbal crest
modern wing
#

facing this issue in notebooks which I already have and each time I create a new notebook in incognito mode as well. Anyone knows how to fix this?

vernal frost
#

How important is it to learn LLMs right now? I understand learning the very basics, but I mean like going deep into libraries and learning how different ones work.

The reason I ask is because I know LLMs are big right now, but you still need to know feature engineering, EDA, non-neural network models, etc, so it's important to not just do neural networks.

Also, I think right now to get a job working with LLMs you need either a grad degree or experience in software engineering or very specifically MLE, which would mean if you don't have that experience then it would be harder to get a job using them.

Double also, I'm asking because it seems like more and more jobs specifically titled "Data Scientist" have in the job descriptions more LLM stuff than not.

stoic terrace
#

Hey guys, sorry for the stupid question, I'm doing some research about the best selling games of all times and facing some trouble finding the most adequate dataset. I'm sorry if i am in the wrong channel, but could someone please give me some directions? Thanks!

native rover
#

Hey guys. Say I'm training a model in Kaggle and an interactive session is ended. Does my code still run in the background? Or do I have to start again. How do I check?

sullen echo
#

need help in time series modeling

data:

Project  year  Month  MoneyLeft
prj1  2024  1  1000
prj1  2024  2  800
prj1  2024  3  400
prj1  2024  4  100
prj2  2022  3  5000
prj2  2022  4  3493
prj2  2022  5  2000
prj2  2022  6  1000
fabrciate this for 10 to 20 projects ,each prorjecr can have month 12 to month 18
for a new project given moneyLeft  for 2 or 3 months it should predcit next 4 months moneyLeft
the models like ARIMA ,SARIMA ,EXPONENETIAL SMOOTHING  ETC will take only one season or trend,whick means we can train these model only on single project
1 .I have one solution like we can convert this time series problem to regression problem ,we can create lags or windows for three months and can predict for next 4 months , the problem here is it will train on that lags or windows only ,it should also be giving importance for project name (I do not no how to do)

  1. other solution would be we can train the model for each project which is not feasible here in this case
    how to do this
pliant silo
#

Hi everyone,

I am trying to experiment with a Decision Tree (DT) model in scikit-learn. The task is to find the best model by trying all 64 combinations of the following parameters:

  • max_depth (2 values)
  • min_samples_split (2 values)
  • min_samples_leaf (2 values)
  • max_features (2 values)
  • max_leaf_nodes (2 values)
  • criterion (gini and entropy)

For each parameter combination, I need to:

  • Perform 10-fold stratified cross-validation.
  • For each fold, record accuracy, precision, recall, F1-score, and AUC ROC.
  • Calculate the mean and standard deviation for each metric across the 10 folds.
  • Repeat for all 64 parameter combinations.
  • Identify the best combination based on model performance (e.g., best F1 or AUC).
  • Fit the best model on the entire training set and then evaluate it on a test set.

Im currently searching on the internet on how to do this task particular. I would appreciate it if someone could guide me structure the loops/code for this efficiently? And maybe guiding me through the internet (resources, links, websites, youtube video, or anything)

Thank you

spring cobalt
#

How to upload notebook on kaggle from GitHub?

weary gull
slim burrow
#

Hi, I have a large dataset (extracted from an NC file). I'm trying to put it into an LSTM to make a per-day prediction for 30 days for the target variable. The problem is that the dataset is too large to work with inside a Kaggle environment (the free variant). There are 12 million samples in the dataset. That being said, I don't know what I should do here to reduce the size and also train my model effectively. I've used memory mapped arrays for storage optimization (data is still too large) and batch generation for the model. Any help will be appreciated.

shut roost
#

Does anyone here have an access to a paid AI video generator? I have a prompt and I want to make that video but free models are so weak.

worldly osprey
#

I'm trying to extract information from a PDF that contains tables, columns, and hierarchical structures. I'm having trouble preserving the layout and structure during extraction . trying to build PDF chat bot can any one help me on this

wraith sparrow
glossy crown
glossy crown
wraith sparrow
#

yeah they say they provide better n simpler tool integration too but didnt try yet

wraith sparrow
grim bane
#

Those who have uploaded their project on Kaggle, even if their name is not in the top 10 competitions and in the Honorable Mention, will they not get a certificate?

swift onyx
#

hi folks. I signed up for the recent 5 day kaggle course via work, but I couldn't do it at the time. I wanted to do it this week, but my company has a block on their gmail accounts, meaning I cannot access the course content. I can access the materials via my personal gmail account but when I try execute the notebook - specifically cells that need to use pip for example, I get a connection error. A bit of reading online says that kaggle used to have an internet connection setting that is now gone. is there anything I can do to go through the course now? Thanks

clear vale
#

Hey guys! So I am trying to learn about transformers how do they work, how they are implemented from scratch. Can you please recommend me some resources where they are explined?

subtle echo
#

Looking forward to know How can I learn ML in 20 days being a Data Engineer

icy steeple
celest sphinx
#

Hello friends!
Finding datasets that hold the data i am looking for is hell

thankfully i found a properly managed website with a databank that had the info im looking for.
But is there some kind of dataset register linked to a LLM where you can just enter the frequency / type of data you are looking for (e.g. Solar radiation on the athmosphere) / NWSE area you need it for / what time period you need available, and it finds multiple datasets that match your needs ?

digital fulcrum
#

Hey guys how can i start python from scratch coding

lime hamlet
#

Hi guys, im new in DATA SCIENCE and AI Field I just understand ML Models regression classification clustering using sklearn then i move to deep learning ann and cnn using keras what you guys suggest me how to successful in this field ?? any road map?

CV and NLP is also another field how to know which one is better for me and im also new on kaggle please guide me

vital palm
#

Hey folks, my team got an error, do you know if the we can get this file? We think this file is missing for the waveform-inversion competition?

fleet kelp
#

Question...in the Data Science community when working with python or another programming language, do you stick to Jupyter notebooks or do you use an IDE along with the Jupyter Notebooks?

pallid pumice
#

why it takes so long?

wraith sparrow
#

-# just get used to it from now

harsh pier
#

Hello! I have been tryin kaggle for the first time these past days, and yesterday i was able to autofill by pressing tab and it´s currently not working anymore for me, any advice? thanks in advance!!

cyan pollen
#

Hi I've just completed a project for my datamining class and I've been thinking of using it as something to show off on my resume and to recruiters. I have a project report documenting my findings along with a colab/jupyter notebook. I wanted to ask what you guys think is a good way to present my project to others. Would it just be a zip file with all of the documents including the ipynb & html of the notebook? Or would you guys just set everything up on a github account. I'm just curious because I think this is one of the bigger projects that I participated the most in and wanted to know what would be the right way to present it. I'm a business administaration major in my uni and my emphasis is in data analytics and I've never really set anything up on github or used it much. Thanks!

wraith sparrow
# cyan pollen Hi I've just completed a project for my datamining class and I've been thinking ...

To present your data mining project effectively for your resume and recruiters as a business administration major with a data analytics emphasis, use GitHub to showcase your work professionally. It’s the industry standard, accessible, and demonstrates technical skills.

Steps to Present on GitHub

  1. Create a GitHub Account: Sign up at github.com with a professional username.
  2. Set Up a Repository: Create a public repo (e.g., DataMiningProject). Initialize with a README.
  3. Upload Files:
    • Jupyter Notebook (.ipynb): Clean, commented, and error-free.
    • HTML Export: Download notebook as HTML for non-technical viewers.
    • Project Report (PDF): Polished, with clear findings.
    • Requirements.txt: List dependencies (e.g., pip freeze > requirements.txt).
    • Optional: Include dataset (if small) or link to its source.
  4. Write a README (Markdown):
    • Project title and 2-sentence overview.
    • Key findings (e.g., “Improved marketing ROI by 15%”).
    • Technologies (e.g., Python, Pandas).
    • Instructions to run the notebook.
    • Links to HTML, PDF, and dataset.
    • Your contact info (LinkedIn, email).
  5. Clean Notebook: Add markdown explanations, remove sensitive data, ensure it runs.
  6. Upload: Use GitHub’s web interface to drag and drop files.
  7. Share: Add the repo link to your resume, LinkedIn, and applications.

Why Not a Zip File?

A zip file (with .ipynb, HTML, PDF, requirements.txt) is less professional, harder to access, and doesn’t show GitHub skills. Use it only if required.

Tips

  • Emphasize business impact (e.g., cost savings) in your report and README.
  • Test the repo in an incognito browser.
  • Learn basic GitHub via a 30-minute tutorial.
  • Post about the project on LinkedIn with the repo link.

This GitHub setup highlights your project’s value and technical skills.

GitHub

Join the world's most widely adopted, AI-powered developer platform where millions of developers, businesses, and the largest open source community build software that advances humanity.

void maple
#

Hello everyone. I hope you are doing well. Do you know when we will get our certificates for our Kaggle projects, which were submitted last month?

worldly osprey
#

hi can anyone help me to extract the information from the pdf The key challenge is maintaining document structure—particularly tables, hierarchical text, and tables with nested information during extraction and chunking.

gleaming osprey
#

Hello everyone.

#

Now I am working on legal question-answer chat bot project.

#

And I have some questions.

#

Should I stick with Flask for new or would starting with FastAPI or Django be better for scaling later?

#

And are there any open-source legal Tech projects I could use it as a reference?

wraith sparrow
gleaming osprey
#

got it.

lone ruin
#

Can someone help me find a dataset for my project?
must have 1000 rows after removing nulls and dublicates
must work great with 2 models from this list(KNN,SVM,Liner Regression)
and be balanced(without a need to over or under sample)

obsidian bone
#

Does anyone here have an experience with Ray framework? I have issue with running tasks on multiple workers, and I always get this error: The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
I'd appreciate any help to fix this issue

native elk
#

how the heck do i light up a dot

#

do i need to do a badge

bronze fable
native elk
maiden badger
#

Hi, I’m learning backprop and I do understand that when using ReLU, it is undifferentiable at x = 0 so in practice people usually set it to 0 or 1. I’m wondering what’s the difference between the two options, what’s the pros and cons?

alpine token
#

Hi what is an agentic solution to autogenerate queries for a NoSQL DB?

sterile lantern
#

How do you receive the exclusive Kaggle swag?

graceful axle
#

hey , i m a beginner and trying to sumbit my frist entry
how do i fix this?

#

my csv looks like this

sterile lantern
next zenith
#

Is there any way to automatically update particular packages in kaggle everytime i start a session? Something like an init file? or some other way around this, as most of the default packages are very old

lethal rapids
#

Hi, I am unable to verify my kaggle account using persona. It failed and now says to contact support but i haven't received a response from them.

violet nymph
#

Hi, I’ve created a notebook that I’d like to share, but I don’t want to break any rules about spam or self-promotion. I’ve learned a lot from other people’s notebooks, and I found many of them through Discussions.

What’s the right way — or time or place — to share a notebook? For example, if it’s related to a competition, is it more acceptable to post it in that competition’s Discussions rather than in the general ones?

I’ve already received a warning, and now I’m honestly afraid to even mention it.

buoyant dragon
#

Hi, I am a user from Iran and I am interested to participate in competitions of Kaggle, the problem is that almost all competitions has the rule not being a resident of countries like Iran. What is the solution for this? If I have a teammate from other countries, is it allowable to participate in competitions?

muted gorge
#

Hello, I've created a notebook but why it didn't show up at the list of notebook?

proven osprey
#

I had a doubt how does the score works in kaggle even if I got better accuracy from previous submission but my score was less compared to previous submission?

austere pollen
#

Hello

#

Guys there's one problems in my visualization data, but I can't find

spiral heart
#

Hi everyone,

I'm working on a product classifier for ecommerce listings, and I'm looking for advice on the best way to extract specific attributes/features from product titles, such as the number of doors in a wardrobe.

For example, I have titles like:

🟢 "BRAND X Engineered Wood 3 Door Wardrobe for Clothes, Cupboard Wooden Almirah for Bedroom, Multi Utility Wardrobe with Hanger Rod Lock and Handles,1 Year Warranty, Columbian Walnut Finish"

🔵 "BRAND X Engineered Wood 5 Door Wardrobe for Clothes, Cupboard Wooden Almirah for Bedroom, Multi Utility Wardrobe with Hanger Rod Lock and Handles,1 Year Warranty, Columbian Walnut Finish"

I need to design a logic or model that can correctly differentiate between these products based on the number of doors (in this case, 3 Door vs 5 Door).

I'm considering approaches like:

Regex-based rule extraction (e.g., extracting (\d+)\s+door)

Using a tokenizer + keyword attention model

Fine-tuning a small transformer model to extract structured attributes

Dependency parsing to associate numerals with the right product feature

Has anyone tackled a similar problem? I'd love to hear:

What worked for you?

Would you recommend a rule-based, ML-based, or hybrid approach?

How do you handle generalization to other attributes like material, color, or dimensions?

Thanks in advance! 🙏

wraith sparrow
# spiral heart Hi everyone, I'm working on a product classifier for ecommerce listings, and I'...

To extract attributes like the number of doors from e-commerce product titles (e.g., "BRAND X Engineered Wood 3 Door Wardrobe"), use a hybrid approach:

  1. Regex Baseline: Use patterns like (\d+)\s+door(s)? to extract numerical attributes (e.g., "3" or "5" doors). Fast and reliable for consistent titles.
  2. Fine-Tuned Transformer: Fine-tune a model like DistilBERT to handle varied formats (e.g., "three door") and extract other attributes (material, color). Requires labeled data but generalizes well.
  3. Hybrid Logic: Apply regex first; use transformer for cases where regex fails or for complex attributes.

Generalization: Extend regex patterns (e.g., (wood|metal) for material, (walnut|black) for color) and fine-tune the transformer on multi-label tasks to extract multiple attributes.

Why Hybrid? Combines regex’s speed and simplicity with transformer’s robustness for diverse titles. Start with regex, then add transformer as data and complexity grow.

#

-# U can also try scikit-llm btw

spiral heart
#

thanks for the inputs @wraith sparrow

heavy lark
#

Howdy, im new to everything, coding with R and Kaggle.

I'm trying to upload my code, but im running into errors. This is the code I would run on my computer since the files are stored here. I'm assuming kaggle doesnt have access to my computer files. How would i set up code for it to read the datasets i uploaded to kaggle?

heavy lark
livid bison
#

Hey, I'm getting an error that there isn't a submission.csv file in the DRW crypto market competition. I'm pretty sure my script is producing a submission.csv file, but whenever I try to run it in the notebook it doesn't work because it crashes in the middle saying the kernel died. Is it possible that the error is coming because my script is too CPU intensive and crashes before it creates the file? It doesn't seem to be possible to look at the logs...

vast gull
#

What activation function should I use?

#

I'm thinking GEGLU.
But not sure yet.

woeful basalt
#

Hi does anyone used Wav2Lip model before? I got some questions to ask

fast axle
#

We have more questions than we have answers chat. Back to the drawing board

rotund pond
#

hi does anyone know why when i try to add a notebook to my collection, my collection folder name doesn't appear in the list of options?

ivory robin
#

Hi! Has anyone real experience with ML applied to corporate employee databases? I’m looking for realistic project ideas 🤔

wraith sparrow
valid veldt
#

Hi. I'm working on a project which is to work in rural conditions where there are connectivity issues and stuffs. I was wondering to talk to someone who has worked on edge ai to help me answer some questions

ivory robin
#

I’m studying data science and trying to apply my learnings on the company I work for. Although my job title is not about that

fast axle
fleet kelp
bold sedge
#

Are there competitions where I don't need the face verification?

hushed scarab
#

Hey everyone, I'm looking for good books on Generative AI starting from scratch. It would be great if they are available in PDF format and for free. Any recommendations?

rancid pine
#

Hi chat, Im a new learner on kaggle and im trying to make a notebook submission for the titanic survivor prediction competition. But even though my output file is created and visible, the competion wont accept the notebook when I click "Create Submission"

#

any idea why? I can send screenshots if necessary

harsh berry
#

testing post

trim oxide
#

I am trying to build a simple Streamlit app for question answering on a given PDF. What approach should I use? Which open-source LLM would be suitable? What model should I use for embeddings, and should I implement RAG or another method?

raven pike
#

#❓┊ask-a-question hello everyone,i wanted to know the difference between Gradient Descent, Maximum Likelihood Estimation (MLE), and Ordinary Least Squares (OLS) wrt linear regression .If anyone know of some good article on it,please tell

normal fiber
#

#❓┊ask-a-question Hi, guys. I'm a student who is studying machine learning by myself. I recently start studying, so please understand me if my question sounds a bit dumb.

Recently, I'm struggling of dealing with feature engineering. I've heard that feature engineer requires domain knowledgement and it is one of the most crucial part in machine learning. So, I was thinking that, what if I just, make feature as a polynomial feature(like 2, or 3. not too much because of the overfitting) and then, eliminate some unnecessary features by using Variane Threshold or Lasso. Do you guys think that it can work well in any situation?

I want some advice for feature engineering. Thank you, senpai! (greeting from Japan)

craggy goblet
#

#❓┊ask-a-question hello I am a student learning machine learning and I have 2 questions
Is there any good roadmap I can follow
What are some good books for ML and AI

wraith sparrow
fleet kelp
# fleet kelp
poll_question_text

Do you think using Intelisense is a form of cheating yourself?

victor_answer_votes

3

total_votes

5

victor_answer_id

2

victor_answer_text

No...if you are using the suggestions to help learn

victor_answer_emoji_name

🙃

wraith sparrow
wise pewter
#

hi everyone, im taking introduction to ml and dont understand a line of code the underfitting and overfitting part.
I was wondering why do they use the whole dataset to fit the DecisionTreeRegressor model after tuned with the best tree size, instead of training one.

alpine locust
#

Hi I am a data scientist and machine learning engineer. I have worked on many projects on Kaggle and I am now a Kaggle Notebooks Expert. I am looking to work on real world projects. can anyone guide me on where to find them?

proven osprey
wise pewter
gilded drift
# craggy goblet <#1129507816697241822> hello I am a student learning machine learning and I hav...

If you are just a beginner, the you tube videos by Statquest Josh Starmmer and videos by Louis Serrano can provide you a head start. Simple visualization and short videos for understanding the subject in the shortest possible time in my opinion. If you require a ,"no code" visualization flow tools to experiment with data and various models then you can use open-source Orange 3.8x version along with various addons provided. Very easy to learn with a number of videos tutorials. Other open source tools I have experimented with are Weka, Knime.

quiet lake
#

Dear all friends, does anyone know where I can start as a data scientist intern, i completed the course.. please suggest me , or any guidance, you little help can be a big help for someone.. please consider

ivory robin
#

Do you folks have an online portfolio? What tech do you use to set it up?

ivory robin
#

do I look like an angel for any specific reason? 😛

sharp herald
#

Hello everyone! I want to start reading the ML/DS papers. Where can i do that and what kind or specific papers i should read firsly because i am beginer in the ML field 😀

cunning hearth
#

hey i need some debugging help

#

trying to access the higs-boson dataset with this code

data_dir = KaggleDatasets().get_gcs_path('higgs-boson')
train_files = tf.io.gfile.glob(os.path.join(data_dir, "training", "*.tfrecord"))
valid_files = tf.io.gfile.glob(os.path.join(data_dir, "validation", "*.tfrecord"))

print("Found", len(train_files), "train TFRecords")
print("Found", len(valid_files), "valid TFRecords")

however, it can't seem to access this db
output:

get_gcs_path is not required on TPU VMs which can directly use Kaggle datasets, using path: /kaggle/input/higgs-boson

Found 0 train TFRecords
Found 0 valid TFRecords
#

i fixed the error,
turns out, the new name is higgs-boson-dataset, and there's no validation/training files. you have to make ur own, like this

data_dir = KaggleDatasets().get_gcs_path('higgs-boson-dataset')
print(tf.io.gfile.listdir(data_dir))

all_files = tf.io.gfile.glob(os.path.join(data_dir, "*.tfrecord"))
all_files = sorted(all_files)        
random.shuffle(all_files)
split = int(0.8 * len(all_files))
train_files = all_files[:split]
valid_files = all_files[split:]
weak yew
#

I have developed an app using Gemini API, now I wish to make it a web app and host for free on a server. Can anyone guide me on this?
I have designed the project in notebook but I want to make use of other web developing languages to build an interactive single-site website.

plain verge
#

Hi all, this is my first post here!

I'm working on a regression problem where the target y is a function of four features: X1 and X2 are continuous, and N1 and N2 are integer "family/group" IDs. For each family (N1, N2), y is a function of X1 and X2, and y is continuous. Each unique (N1, N2) defines a unique y = f(X1, X2), but calculating y without ML is very expensive and requires a lot of supercomputer time. So, we want to use machine learning to predict y as accurately as possible, using as few training examples as possible. Ideally, we'd use just a handful of data points from some (N1, N2) families (and their X1, X2 values) and then be able to predict y for new (N1, N2) families reliably.

My current pipeline holds out all data for one (N1, N2) pair to simulate predicting for a new family, trains XGBoost on the rest, and evaluates. With lots of (N1, N2) pairs in training, results are excellent (R^2 about 0.99). But with only a handful, generalization to new pairs is poor.

What feature engineering or model strategies would help generalization to unseen families?

  • Are embeddings, one-hot, or other encodings of N1/N2 better?
  • Would neural nets (Bayesian, feed-forward, etc.) help vs. tree models in this case? (I'm more comfortable with trees.)
  • Any best practices for uncertainty estimation?
  • References to similar few-shot tabular regression problems?

Any advice or references would be greatly appreciated!!!

graceful axle
#

Hey! Is there anyone participating metal kaggle hackathon competition?

cyan oar
#

I have a question regarding llms.

I'm a bit confused about which path to choose for the long term, as both have great potential for work. On one hand, building models from scratch, including pretraining, optimization, and evaluation, offers a lot to learn. On the other hand, working with RAG, vector databases, agents, LangChain, and Hugging Face also seems promising. I want to focus deeply and excel in whichever path I choose.
According to the current job market and the oppurtunities all over the world where should I go?

stoic elm
#

Hey! @everyone I have started learning machine learning recently. If anyone expert in this field can guide me, it would be a great help to me. I want to know how to learn it and where to learn it as there are tons of resource and i am unable to select which one should i pursue. My end goal is to be a data scientist. ( dont mind my grammar please)

cyan oar
stoic elm
#

Currently i am learning from krish Naik and cs229 by Stanford

cyan oar
fathom lava
#

I tried everything I could to debug it. Is there anything else I can do? Any help would be greatly appreciated.

strong jolt
#

Is it common to get this type of speed while adding input from a datset (that was created from the output of a notebook)?

#

The dataset if of 4GB T_t

devout pilot
#

lmao

trim atlas
#

Was running a huge notebook that it lagged my computer when opening it, after that I cant run it again and even after deleting it I cant create a new notebook or importing a data source or run my other notebooks or do anything else. Thought it was my fault and I started to panic at 1AM, hell I even linked my Kaggle with Discord just to find some support here, and after 30 minute of trying everything I could I saw an official message about site-wide availability issues, so I'm not the only one experiencing this right?

stone shadow
#

I wrote a retrieval system using UV with a lot of custom models, but it has to be offline for the code competition. I am considering whether there are other solutions besides the package environment. Wheels will encounter a specific package build failure error, and there are several pitfalls.

In the end, I did this (Thus it work, but time cost heavily)

  1. Upload all models and the File as a dataset
  2. Build a .venv in an Internet-enabled notebook
  3. Extract the .venv and run the test data in the competition submission.
peak skiff
#

hi guys. i need help, trying to train a ml model for melonoma classification. used isic dataset w 9 classes but it was highly imbalanced. i tried data augmentation and several different ways to reduce overfitting but i can't get an accuracy more than 60%. i need at least 85% accuracy for my project. i have short time left.

using free google colab rn and can't use any dataset more than 8000 datas. my ram is 8gb. i want to try binary classification and working on it rn. do u have any advices? this is really important for me.

thin raptor
cunning sundial
#

Is it Legal to Share Public Instagram Profile Data as a Dataset on Kaggle?

short haven
#

i am having a problem in power query .
will anyone help me ?
problem is my order date in text format there are no null and error value but when i tried to convert to text to date value then its create a lot of error value .
i tried to take help from chatgpt but it wasnt work .

craggy goblet
#

I am somewhat of a beginner I have submitted in like 3 to 4 competitions (like titanic and house prices )with like mediocre rank and accuracy my questions are

  1. what sorts of compititions should I participate in
  2. how do I get better or learn how to get better
dapper yoke
# craggy goblet I am somewhat of a beginner I have submitted in like 3 to 4 competitions (like t...

This will entirely depend on what field of AI/ML you want to go into. Genai, computer vision, etc. For me, I am interested in generative AI, so I generally look for competitions that had their primary focus on generative AI, like the Gemma 2 competition. To get better, just keep on learning, and stay very open to new opportunities or chances to gain more knowledge. That is how I got better, at least.

craggy goblet
#

Like for basic Machine learning with say titanic , house prices etc type of compitition with predicting RMSE,MAE,MSE type evaluation metrics and sort of datasets where you have to regression or classify based on numerical or categorical columns basically I am a beginner so I am trying to avoid all the time series , NLPs , computer vision , NNs , generative AI or anything along those lines which is like a bit deeper stuff till my skills can be in some sense considered intermediate to some degree like I know what I am doing sort of thing in like normal machine learning contests ( think titanic houseprices etc but not limited to , for some examples as to the type of compititions I am talking about )
Here are a few questions I have

  1. I have finished the statquest Machine learning playlist + learned about data preprocessing steps and know how to use pandas ,matplotlib, seaborn ,scikit , numpy from certain videos as well as know about hypertuning . is there anything else theory wise I would need to know for being able to perform well in compititions
  2. what differentiates a good score from bad one is it knowing when and where to apply well-known preprocessing steps , knowing which features to engineer and which model to select or does it require some other knowledge basically do I need to learn more theory wise to perform better or do I need learn what to do when with what I already know theory wise in application
  3. how do I decide and when to sort of specialize in a certain field of AIML and how to get info about these fields
uneven marsh
#

Hi there, I am new to Kaggle, and mainly AI. I am looking forward to learn how to code AI. I hope that using this knowledge, I can make my own AI models and apply it to things I do in free time: robotics, drones, things of that nature. However, when I try understand information from the internet, I get lost immediately. I really want to get a solid grasp on this aspect of programming. I am certified in Java as I recently passed a Certiport IT Specialist certification for Java. I do know some python too, as I have taken such a course last year in school. But I still am confused.

finite bridge
#

hi kagglers,

when you guys first enter competitions, do you have to do a lot of research to gain domain knowledge? in past competitions, the way winners code all show that they probably had a really good understanding of the comp's domain. do they just know this because its their profession or do they spend a lot of time researching?

bleak anvil
nova rain
#

Hey yall Im Prana. New to python. I have a background in r-studio. There is big knowledge gap even though I have a masters. I’ll give it month or two and I should be engaging more on the rooms. For you intermediate and experts are there any discord groups your recommend a beginner like me to join or any type of platforms. I just need people to bounce ideas back in forth on topics such as LLMS used and things they learned, no judgement ignoring zone. You can dm me.

bleak anvil
#

Hey @nova rain 👋

I only have a bachelor's in physics and pure math—ended up dropping out of my master’s in pure math due to a mix of financial and personal reasons. Honestly though, degrees and titles only go so far. They're supposed to be keys, but it’s really the network and the collaborative energy that makes things move.

What helped me most was bouncing ideas around with others and building multi-agent LLM constructs together. A lot of gaps in my understanding got patched just through collective exploration—kind of like debugging reality with friends. 😄

What kinds of ideas or frameworks have been catching your attention lately?

fresh forge
#

hi, i wanna select the good feature has a relation on Risk level
but when i use a chi2 get result 10 columns has relation on risk level but when check it on corr
i found there is not any relation on risk level

rugged juniper
rugged juniper
rugged juniper
hearty roost
#

Hello! Can anybody helping me with this question? import dask.dataframe as dd
from dask import delayed
from fastparquet import ParquetFile
import glob

files = glob.glob('/kaggle/input/hms-harmful-brain-activity-classification/train_eegs/*.parquet')

@delayed
def load_chunk(path):
return ParquetFile(path).to_pandas()

df_train = dd.from_delayed([load_chunk(f)for f in files])

df_train.compute()

gritty nymph
#

how could i change my name?
it would be better if my displayed name is in english or at least alphabet

tired venture
#

pw is correct,but canr login kaggle api on azure notebook,why?

raven pike
#

has anyone done mnist using alexnet ?i need help regarding it

cursive owl
rigid compass
#

Hey i want to learn machine learning from where i can learn. Any suggestion

ivory wasp
#

Been having some problems with the submission of the Prediction interval competition II: House price eventhough my submission is being generated at the end when i try to save my version it shows that there is no output can anyone help me out why is it like this

primal palm
#

Hello everyone,

I’m a fourth-year AI major at Damascus University.

Over the past three months, I’ve been following the AI learning path on DeepLearning.AI. I’ve already completed the first course in the specialization and am currently working through the second.

As the concepts have grown deeper and more varied, I’ve realized that I’m not yet flexible enough with coding: I often struggle to understand existing code and to write my own. This stems from the limited problem-solving experience I gained during my earlier years at university.

I’m now looking for ways to strengthen my coding knowledge and practice while I continue these courses.

Can anyone help with that?

sleek frigate
#

can someone help me go through a research paper? Its the 2023 paper on BitLinear

regal rock
#

So I am an high schooler junior (11th grade) who has just entered summer. I happen to have some free time and I was wondering to ask how I should start and go about learning ML and start actually being competetive in competitions. Since you all are mostly professionals I knew your advice will be of great help! So what should I do, and how should I start?

wraith sparrow
crystal wing
regal rock
#

I was thinking if there are any specific resources or books

#

But course are also fine if they help me reach my goals

wraith sparrow
frigid shadow
#

🔬 AI Hardware Research - Need Your Input!
Hey everyone! I'm doing research on AI/ML hardware accessibility in India and would love to get insights from this community.
Quick backstory: GPU prices in India are brutal, cloud costs add up fast, and there might be a gap for more affordable AI-focused solutions.
3-min survey about:

Your current setup and pain points
Budget constraints and preferences
What you'd want in an ideal AI GPU
Cloud vs local experiences

Drop your thoughts in the survey: https://forms.gle/UHGh1kzK1p9uiJSb9
I'll share the results here once I have enough responses. Your input could help identify real solutions for our community! 🙏
Feel free to share with other AI folks who might have insight

hallow herald
#

Are we allowed to vibe code /chatgpt the code for hackathon projects? Do judges typically care how the code was written?

#

please ping me

haughty helm
#

I am a BSCS student, and I’m starting my Final Year Project (FYP), which I need to complete within the next 1.5 years. I’m working alone on this project and would like to build something related to Artificial Intelligence or Machine Learning. Although I’m a beginner in AI/ML, I’m a fast learner and actively working to improve my skills in this area. I would be very grateful if you could suggest some suitable FYP ideas along with a brief description of each.

rugged juniper
haughty helm
flint gust
# haughty helm I am a BSCS student, and I’m starting my Final Year Project (FYP), which I need ...

i'd just go around on web, searching for anything that interests me. I'd try to find the info that i'd personally want to know. If i was you, i'd never start doing any project from a suggestion of some random person from discord. It'd a big deal, and you'd better approach it creatively and proactively. It's ok to not know, but try to work out your system of inspiration - a documentary, a book. There are definetely tons of questions that you could answer

haughty helm
pure raft
#

does anybody know if for the new kaggle gemma competition I need to have the project open source? or it should only be available for the judge to review?

glossy crown
digital bobcat
#

Hi all, after completing some theories of ML and doing some toy projects on data science , I applied for internships. Recently I got a callback and they have given me a take home assignment for a predictive modelling task. Now the thing is the dataset they provided is large like too large for me , and I haven't worked in this big dataset like this before and also I use google colab. The shape of dataset is (167020, 217).
What I need is Can anyone help me in how to approach this problem and solve this. All I am asking is that you take a look at the dataset and provide me with suggestions on what all to do step by step , I will do it myself. If you are willing to help , lets connect.
Thank You

dawn ravine
#

How much ram & vram does GPU T4 x2 have in total?

#

does GPU p100 have more?

#

Or tpu vm v3-8

analog hornet
#

i started learning ML, and I did the Titanic competition already. Is there any recommendation that to do next? I'm stilla beginner, so I want to do an easy one.

analog hornet
dull tapir
#

hello

#

can sb help me

#

what should i do to resolve this

naive pike
#

try it on jupyter notebook

dull tapir
dull tapir
#

ok i found the problem

#

the !pip install torch line of my code

#

didnt run properly

rustic carbon
#

Is an 80% accuracy for a random forest clasifier good?

brittle cave
#

Hi guys, probably a stupid question, if i import a model from huggingface then can I use it offline ?

naive pike
wraith sparrow
timber turtle
#

Hey guys I'm currently working in OCR, and i have a question

#

I couldn't ectract "Good Morning" or any text from 1st image even though i can extract text from second image with good accuracy

#

I'm currently using tesseract and do you why this happens?

#

Can you suggest some other approaches so as to extract "Good Morning" from the first image?

sonic citrus
#

Hey guys. I know it's 3 months late, but I wanted to finish the 5-day Gen AI Intensive course. My question is whether it is possible to gain the certificate even though I am late? because the last email from Google Event said, 'The badges and certificates will be added to your profile by the end of April 2025.'
It means a lot if anyone can answer?

wraith sparrow
short haven
#

look at the photo which i attached in post .there are two column of date and time now i wanna extract the difference between two days like 3 days 16 hour .
now how i can do that in excel power query .
i tried to get help from chatgpt but it wasn't works .

mystic solar
#

Hey i am student and we have a class in data analytics, i dont realy get how i can improve my accuray in logistiic regresiion, rnadom forest, knn etc..
can someone help me there pls

vast yarrow
#

Hello.
I am having some trouble with my local experiment and would like some help.
I want to analyze the transcription of recorded conversation audio data, so could you recommend a model for that?
Considering that this involves recording conversations, I would like to identify the speakers as well.
Can you suggest a model for various experiments?

#

I would like to gather opinions from people who have participated in competitions or analyzed audio data that includes conversations in the past. Also, if there is any recommended external data for training, please let me know.

deep tinsel
#

Are we allowed to share github or Streamlit Cloud url for example to share scripts or apps or only kaggle url are allowed ?

paper veldt
#

Hello World! I have a problem with uploading link to Youtube in competition gemma3n. I'm trying to upload good URL of my Youtube-video, but got an error "Invalid URL", but URL is truth!
In script videoUtils.js on website there is an error in string:
fetch("https://youtube.com/oembed?url=".concat(url, "&format=json}")).then(function(res) {

"}" after "json" is unnecessary

Fix it please, because its impossible to upload video and take part in competition

wicked ivy
#

subscribe this channel and see the latest update and learning video from here :https://www.youtube.com/@AuraaiX/videos'

west palm
#

Hey Guys,I am seeeking recommendations for an impactful AI/ML project that would strongly appeal to product based companies when they are hiring. The goal is to maximize my chances of securing a job.
Please do suggest me asap : )

barren path
#

Can Anyone explain what a JSON really is????????????

frosty heron
#

Hi, I have recently made a notebook with plotly graphs, it shows perfectly in Editor mode but it is not showing on Kaggle's notebook viewer, even when I used .show(renderer='iframe').
Notebook: https://www.kaggle.com/code/stevensio/kaggle-journeys-cohorts-and-competition-shifts

I tried to look at it on my girlfriend's macbook and it shows, but somehow, the graphs don't show when viewed on my windows laptop. Could this be an OS issue?

Would really appreciate if anyone could find the issue in my code

pseudo musk
dense wolf
#

helloo guys actually im trying to
dynamically update my second dropdown (car models) based on the selected company using JavaScript and Jinja2 inside a Flask app. I’ve built the Jinja2 dictionary and the script looks correct, but the dropdown doesn’t update when I select a company. What could be causing this issue—am I missing something in how Jinja2 and JavaScript work together

i have also cleaned my data for it my model is ready to predict but im stuch here
though im following a tutorial its not helping its not actually working

the script is given below
<script>

function load_car_models(company_id, car_models_id)
{
    var company = document.getElementById(company_id);
    var car_model = document.getElementById(car_models_id);
    car_model.value="";
    car_model.innerHTML="";
    

    {% for company in companies %}

    if(company.value == "{{company}}")
{
      {% for model in car_models %}

          {% if company in model %}

       var newOption = document.createElement("option");
       newOption.value="{{model}};
       newOption.innerHTML="{{model}"}
       car_model.options.add(newOption);
       newOption.value ="{{ model }}";
       car_model.options.add(newOption);



    {% endif %}

 {%endfor%}
}

{% endfor %}
}
</script>
timid jungle
#

are there any spaces to discuss ml related queries as a student? like a community of students in the same field or smth?

wraith sparrow
#

Is bytedance seed 1.5 vl by far the best img understanding model

#

And for edge devices Florence 2?

timid jungle
#

when dealing with time series data, how do we know if there is serial dependence in the data or not? is it a question of using domain knowledge or should we use methods like lagging and time step each time to check this thing?

raven canopy
#

Hello just wondering if there are any tips for clearing memory from RAM and the GPU? I've been running into out of memory issues using Tensorflow. I tried using a generator in an attempt to reduce RAM for one, while also trying to track and delete every dataset variable related + model variable while also doing gc.collect() and K.clear_session() however I've noticed that for whatever reason, the data I load into the model for model.fit() just sticks in the RAM. Tried deleting that generator via del, assigning it to None too, and I can't seem to get rid of the sticky reference.

Am attempting to do a kfold run but because of my issue it looks like I'd have to restart the kernel every time instead? Also tried running a subprocess and running the fold from a script but it doesn't seem to change anything. Advanced thanks for any replies!

half widget
#

Hey everyone! Trying to build a recommendation system for products for a deal website.

We have a few heuristics of how good a deal it is and how popular a product is (deal score and pop score)

Our current deal gallery just optimizes for these heurisitcs. However, we're running into some issues with product diversity. There's certain categories of products that are discounted more frequently and thus appear in the results more frequently.

Our data is stored in postgres, but we load it into polars (in python). We also have milvus set up as a vector db so all our product titles are vectorized as well. Wondering if anyone has any ideas on how to solve this diversity issue

I'm pretty new to ML/data science but one idea was k means clustering based on the titles or vectors? Or maybe this is too naive of an approach

primal trout
#

hi guys , its my first time discovering this Machine Learning world ! I have a question tho ; have anyone made any money using this ?

hasty valve
vernal citrus
#

i am new to kaggle....can someone tell me ...How to get started????

coral lava
primal trout
wraith sparrow
wraith sparrow
#

Anyone interested in video gen models?

keen abyss
#

Can anyone help me out I am working on data scientist project thing where I am doing licence plate prediction

I used yolo and ocr to build the models
I have data sets which contains cars and also cropped images which has only number plate

The thing here is the number plate is in Urdu and the ocr model is isn't working properly the number are being detected but not the arabic or urdu text

wraith sparrow
wraith sparrow
wraith sparrow
#

Anyone knows good books on text2video models

wraith sparrow
#

Khud dekh lo dono me yaad nhi aa rha

final blaze
#

hello everyone.
I am beginner in kaggle.
below problem is exercise of lesson2 from "Intro to Machine Learning".
why this error occured and how can I fix?

  • problem
    import pandas as pd
    iowa_file_path = '../input/home-data-for-ml-course/train.csv'
    home_data = pd.read_csv(iowa_file_path)
    step_1.check()
    -error
    NameError Traceback (most recent call last)
    /tmp/ipykernel_36/423311291.py in <cell line: 0>()
    8
    9 # Call line below with no argument to check that you've loaded the data correctly
    ---> 10 step_1.check()

NameError: name 'step_1' is not defined

final blaze
pseudo musk
final blaze
#

I installed the notebook by running the code that appears first at the start of the exercise.
installing code is below

Set up code checking

from learntools.core import binder
binder.bind(globals())
from learntools.machine_learning.ex2 import *
print("Setup Complete")

#

============
is this right ?

final blaze
#

I just tried it again your way and moved forward with the practice.

pseudo musk
#

nice good luck

thick axle
#

Hey guys, as a full stack engineer who have more than 6+ years of experience with frontend and backend, devops, databases (SQL + NOSQL) and AWS.

What's the best way to start learning AI and go for roles like AI Engineer. Or make myself capable to build AI solutions for different industries?

wraith sparrow
#

@noob_contrarian xAI partners with Azure for Grok's enterprise-grade scalability, security, and content safety features, essential for reliable global deployment. While RunPod is cheaper for basic GPU rentals, it lacks Azure's robust infrastructure and integration, making it unsuitable for our

coral lava
rancid terrace
# thick axle Hey guys, as a full stack engineer who have more than 6+ years of experience wit...

Learn linear regression and logistic regression first, it's the basic for any neural network. Then it's upto you to learn other machine learning models or direct jump to perceptron ( basic unit of neural network)

Understand preprocessing data then

Understand neural networks

Later for understanding language processing study NLP

Understand RNN's LSTM AND GRU

After understanding all this learn CNNs for image processing

After this you are ready to understand LLMs

Learn transformer architecture
Learn BERT model architecture

Learn to use langchain , huggingface

Make rag applications
Like pdf summarizer
Caption generators

Learn to use llm inference

After all this learn to finetune models on custom datasets ( require gpu)

Learn to do lora, peft fietuning and Quantizing models

After this learn to use langgraph and connect llm with looks like sql data retrieved, web search tools

You will understand agentic ai

For more advance topics like deep seek r1 you have to learn reinforcement learning

Prompt engineering for prompting the llm

Building llm fromscratch of deploying finetuned models on huggingfave for specific tasks

For learning all this you have 2 framewoks one is tensorflow, another is pytorch.

For llms ai 85 % new published papers are written in pytorch so recommend you to go with pytorch.

For web application you can also use JavaScript for building web application but python environment is more mature and better so going with python gives you more advantages

Last you can also learn about mcp servers to plugin ai with internet

#

I'm not professional but i hope that i might be helpful to you

thick axle
rancid terrace
# thick axle Wow that looks interesting, However, I was thinking, Is there any free resource ...

Simple steps

Learn regression because neural networks are related to it
Skip other ml. Model in development you don't need them

Learn neural networks, activatiin function, loss functions, optimizers these are the things we need in llms too for finetuning or training our llm on custom. Data

You can just skip rnn or lstm. But they gives you the starting understanding of processing natural language

And for preparing our language flr machine fitting ready we need to understand how NLP WORKS.

So NLP knowledge is important like knowing what is Tokenization etc

Learn transformer model to understand how llm. Works and how they were made then simply go with development

#

Use langchain, huggingface documentation help

#

Resources are : YouTube channels, medium blogs, huggingface, langchain, udemy cources

thick axle
#

@rancid terrace Gotcha, Here is the summary I understood:

  • With the current knowledge I have, I am ready to start learning about regressions. So basically my first step will be to learn what regression is. Nothing else for now. Just learn about regression

  • Then the second step will be to learn neural networks, activatin function, loss functions, optimizers.

  • Then learning rnn or lstm is optional (I will definitely learn it)

  • Then I need to start learning NLP for natural language processing

  • Finally I need to learn transformer model to understand how llm works

  • Now I am ready to start development. So this is Part 2 of my roadmap where I will make my hands dirty by working

  • Start with Langchain and hugging face documentation

Please feel free to correct me If I am wrong. Thank you brother!

rancid terrace
#

Everything is fine

#

Best of luck

#

Focus mostly on development

#

Understand the concepts like regression don't start building from scratch

#

Langchain hugginface handles everything

#

Like NLP neural networks etc

wraith sparrow
bronze lotus
#

Does anyone here know how to integrate the results of a feedback survey web app directly into a model's pipelines, so it can reference the survey's data and learn from it without a middleman doing it?

wraith sparrow
#

not getting how to login to hf using hf cli in order to confirm the acess granted to use hf models

subtle gulch
rancid terrace
# subtle gulch you are a cool dude

Just like the book " how to win and influence people " says

If you want to be a great leader first learn to understand what other needs and what you can provide to help them .

remote granite
#

that is so good.

rancid terrace
#

Pls stop spamming dude we don't care what Mr beast is doing ( he ain't providing us gpu for llms )

rugged anchor
rancid terrace
#

He asked me for mostly development

rugged anchor
#

I want to learn to be a ai engineer and applied knowledge for generative ai will help

#

I asked one of my senior - Statistics
Linear algebra
Calculus
Panda numpy polaris
Handling and visualization
Scit leave xg boost adaboostlinear regress
Support vector kernal hyper plane
Svm random ensamble method
Bagging boosting
Basic dl
When to use what
Preprocessing of nlp tradeoff nltk spacy
Tokenization
Transformer why diff
Prompt
Rags how does work
Vector db and indexing
Azure, AWS gcp
Agentic ai langchain graph
Pipeline memory agent orchestration
Fine-tuning
Llm which what when why - any 3 architecture
Projects - problem solution

He told me to concentrate on this first

mortal swallow
#

guys what to do if you submissions keep getting kaggle errors?

#

your*

rancid terrace
rancid terrace
#

They are basically used to predict in a given dataset

#

For ai engineering learn regression in detail that's sufficient and learn deep learning with NLP ( tokenizqtion etc ) , later understand encoder decoder architect , attention mechanism , transformer architecture .

Learn generative ai , langchain , langgraph , agentic ai , MCP servers , rag applications ,chatbota , computer vision , reinforcement learning etc

#

For ai engineering go with pytorch because it's currently the most popular framework , all the researches are in pytorch

#

For making production ready and easy structure go with tensor flow

#

🙂

rugged anchor
mortal swallow
rancid terrace
#

Try after sometime

mortal swallow
rancid terrace
#

There are many YouTube channels that also teach something like neuro.. something

#

3blue1brown

rugged anchor
#

Thank you Naman

#

It helps me alot

upbeat sluice
#

Hi, I am.new to kaggle had a question regarding the submission criteria. What does it mean that my submission has to have a runtime less than 11 hours? Does it mean that the entire inferences should be completed in less than that?

And also the no internet rule. If my code relies on some packages like nibabel or idk torch then how does the no internet rule work.

Thank you for your help

wraith sparrow
#

does lmarena dont come with tools

fluid needle
#

Hello everyone I am new to kaggle , can anyone explain me how to participate in kaggle hackathon?

zenith matrix
#

Hi, I was excited to participate in the code-golf-2025 tournament, but I just saw that my country (Venezuela) is blocked. Will I really not be able to compete because of my nationality?

supple crane
#

I am working on on the RSNA Intracranial Aneurysm Detection competition and I have a decent graphics card on my home computer. I have been training models for the past 24 hours on the full dataset getting these values:

[Epoch 1] Avg Loss: 1.3610 | Val Weighted AUC: 0.5800
[Epoch 2] Avg Loss: 1.3071 | Val Weighted AUC: 0.4637
[Epoch 3] Avg Loss: 1.3052 | Val Weighted AUC: 0.5383
[Epoch 4] Avg Loss: 1.3069 | Val Weighted AUC: 0.6006
[Epoch 5] Avg Loss: 1.2967 | Val Weighted AUC: 0.5206
[Epoch 6] Avg Loss: 1.3007 | Val Weighted AUC: 0.4524
[Epoch 7] Avg Loss: 1.3029 | Val Weighted AUC: 0.5068
[Epoch 8] Avg Loss: 1.3083 | Val Weighted AUC: 0.5367

I am curious to know of other ways to make my training faster. My computer isnt trying terribly hard, its using most of its RAM but those 8 models have taken about 24 hours to complete. Can anyone help me out?

quasi smelt
#

Hi! Our team is participating in the CMI – Detect Behavior with Sensor Data competition.
We’ve enabled “Always use latest version” in our Kaggle notebook, but some teammates still see older versions when editing.

Just to clarify: Is it against the rules to use GitHub for collaborative EDA and model training only, without exposing any test data?
We’re trying to keep things safe while managing our workflow.

Appreciate any clarification — thanks!

hasty valve
#

for hyperparameter tuning, what are the go to methods? Ive been using gridsearchcv, but came across optuna and it seems a lot better

iron ingot
#

Hi everyone! I’ve recently started diving deeper into the world of machine learning. Before that, I was mainly into mathematics — particularly convex analysis and tensor optimization.

However, no matter how much I look around, it seems like most jobs/projects only value the ability to use existing libraries and build models like Lego blocks from pre-made components.

Do you know of any areas or projects where deeper mathematical knowledge like this is actually useful or appreciated? I’m starting to feel like it doesn’t really matter, which is a bit disheartening😔

subtle gulch
iron ingot
# subtle gulch why dont you do researchs then?

Because research can take years, and there’s no guarantee it will lead to success. That’s why I’m trying to focus more on finding jobs or joining projects that can bring a more stable or predictable income.

upbeat sluice
white glacier
#

has anyone got any good resources for linear mix models?

vestal trail
#

Good ebening. I seem to have some difficulty accessing the data for the DFL - Bundesliga Data Shootout. Could I be so lucky that someone reading this could help me out? thanks in advance.

subtle gulch
dawn nova
#

Hello everyone I am a complete beginner like I did a few ML courses and wanted to do my first project so I chose Kaggles House Prices - Advanced Regression Techniques dataset but there is one thing that has been bothering me in this project ALOT

#

The problem I was having was what is an empty cell. This seems trivial but some categories have NA (not applicable) a string to represent the non existence of something for example BsmtQual it has a category NA for when the house doesnt have a basement so NA not applicable. Others like MasVnrType it has None as a category for when there isnt a MasVnr (whatever that is) and then the empty cells the ones that arent filled in they also have the string NA to represent them

#

So what I wanted to do was to keep the meaningful NA's and None's (I say 's but there was only one None, only in MasVnrType) and impute the empty cells. This all seems easy enough but caused me a whole lot of trouble here is what I did
houses = pd.read_csv("data/train.csv", keep_default_na=False, na_values=[""])
I used read_csv in such a way that pandas doesnt "help me" by converting all the NA's and nones into NaN's bcz then how will I differenciate between the actual empty cell and a NA or none
then
I split the test train and num categorical the usual stuff
after which I did this
meaningful_na_cols = ["Alley","BsmtQual","BsmtCond","BsmtExposure","BsmtFinType1","BsmtFinType2","FireplaceQu","GarageType","GarageFinish","GarageQual","GarageCond","PoolQC","Fence","MiscFeature"]

for col in meaningful_na_cols:
X_train_cat[col] = X_train_cat[col].replace("NA", "NoNoneNo")
X_test_cat[col] = X_test_cat[col].replace("NA", "NoNoneNo")

X_train_cat['MasVnrType'] = X_train_cat['MasVnrType'].replace("None", "NoNoneNo")
X_test_cat['MasVnrType'] = X_test_cat['MasVnrType'].replace("None", "NoNoneNo")
created a unique category so that the meaningful NA's and Nones dont get imputed

imp_cat = SimpleImputer(strategy="most_frequent")
X_train_cat = imp_cat.fit_transform(X_train_cat)
X_test_cat = imp_cat.transform(X_test_cat)
I then imputed and convereted this X_train_cat back to a csv for me to view and this happened

#

my MasVnrType had 4 columns the data description says it has 5 but my training data only had 4 so good nice. Then I did drop first so there should be 3 so out of these 4 MasVnrType categorical columns 3 are meaningful types like Stone or cement or None but one is just NA? How did that get in there?

#

I just need help with csv files and empty cells like can someone please explain in real life datasets how is an empty cell represented how does pandas interprets an empty cell do they put NA string none string NaN like please help me out I am so confused Idk where the problem happens is it the csv file thats formatted this way is it pandas or am I retarded
cheers

limpid garnet
#

What does "plot_df = dataset_df.Transported.value_counts()" does ?

iron ingot
iron ingot
rancid terrace
#

No jokes while giving advice

hollow flicker
#

@eager thicket when we are submiting our code for comepition should we keep our api or the judges use their api

#

can some one help me please

dry lynx
#

Hello!

I want to do the kaggle exercises at my work desktop but the company internet is sloowww and honestly jsut takes forever to run.

is there a way i can run the setup codes locally on a python ide?

hidden hollow
#

Hi everyone! I have started course on Kaggle about Advanced SQL, but ran into problems when trying to complete the tutorials 🥲 This is my first time working with Kaggle notebooks and taking the course as well.

There was this error when i tried to execute the set up cell
/usr/local/lib/python3.11/dist-packages/google/cloud/bigquery/table.py:1727: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.
warnings.warn(

The rest of the tutorial cells don't seem to work as well. Could someone please help to solve this issue? 🙏 Thank you!

sly hound
#

did refreshing the page solve the issue? what troubleshooting has been done?

deft yarrow
#

Hi i have a question about the arc challenge i keep getting error in submission scoring error ? Am i missing something? I changed multiple times and edited but still the same?

hidden hollow
sly hound
#

this is a good time to get your discussions point on kaggle. Start a dissucsion on Kaggle for this issue.

hidden hollow
#

Thanks for advice! I have read existing discussions, yet couldn't see anyone with the same problem. For some reason, I cannot start my own discussion (it says that I do not have a permission), do you know what might be the problem?

wraith sparrow
real dew
urban light
#

what's everyones favourite dataset review/annotation tool? I would like to have a tool where I can import my entire dataset.jsonl containing questions and answers, be able to mark each item as correct/incorrect, and finally be able to update the answer if needed. Also, I would like to be able to share this dataset so other people can do them.

Once everything is complete, I would like to run evals directly on the dataset. And finally, run fine tuning. Does a tool like this exist? 😄

humble lodge
#

I feel that nowadays it is really hard to keep up with all the AI developments. How do you keep yourself informed with all these advancements coming daily? I personally find it very hard to keep track of all the sources (arxiv, new website, big tech blog etc). Interested to know your thoughts!

summer dagger
#

Hello everone, can anybody guide me on how to remember the use of brackets, symbols etc? i have completed variables and functions exercises on kaggle and implemented a VaR model on small data set in VS code, following a yt video but i can't even write a 2 line code without mistakes on my own without looking for help? Thanks

marsh lantern
toxic urchin
#

Hello everyone, I recently made a basic spectrometer with a DVD's surface as the diffraction grating. I want to go beyond and make a device that captures the spectrum and makes the electromagnetic spectrum's waves RGB + other colours too in another wave plot (see SpectralWorkbench since somebody made it and it's cool) and also an AI model that would predict the light fixture type. How do I do it? Integrate all models in new hardware or use my laptop? Plus, coding problems. Could anyone please guide me through this?

wraith sparrow
#

anyone tried goolge's adk using uv

civic seal
#

Hi, Im doing a competition for house prediction and saw someones notebook and saw that wahtever they did to the train.csv they did to the test.csv Like fi they dropped a certain feature they would do the same to the test.csv. Is this normal?

steel valley
civic seal
pure garnet
#

hey,is there anyone know how to join bigqueryai competition server ? im unable to join

summer dagger
civic seal
#

So why do people do a train, val split with the "train_test_split()" function rather than just go into tarinning on all the trainding data and then predicting onthe test data?

civic seal
# steel valley yes same

So i was doing another kaggle competition today and I trained the model but when it came to predicting on the test i got a error aout how the shape was off (features werent the same). And i realized i had to add the same adjustments to the test dataset as i did to the train. But I had to go back and manually add everything. Like if i did train_data.drop('Name', axis=1, inplace=True) I would have to also add test_data.drop('Name', axis=1, inplace=True) is there a more efficient way of doing this?

wraith sparrow
civic seal
#

What is it exactly? Antoher gemini?

wraith sparrow
steel valley
proper wedge
#

Can anybody tell me the 100% free to do online certifications which have high value for ai ml job which are offered by top companies or institutions

mystic rapids
wraith sparrow
dull sky
#

hi! I'm planning to launch a community competition and I'd ask for some advice regarding it. So far I asked permissions from the data providers, and I have a rough idea about the compeitition itself. Is there a guide I should follow?

#

it'd be a timeseries problem and I'd provide a simple data dictionary and I'd use MASE as metric

shadow canyon
#

Hey guys, I'm planning to train a 123million parameter model themed J.A.R.V.I.S (yes, that Jarvis from marvel). I'm carefully selecting the data it gets trained on to get the best results but I have a slight problem. The model will be a conversational one and it will have lots of memories (there's a database for that), from which it will need to query the user responses and put them together,summarize and reply. I have no idea how the data format should be for that kind of thing. I was thinking JSON with alpaca but I'm not so sure. Any advice?

fickle snow
#

hi. I am a beginner in AI.
Should I code everything with Python and Math Libraries only to understand everything clearly,
or should I use available AI libraries like PyTorch?
Thanks in advance.

dull sky
fickle snow
dull sky
#

like artificial general intelligence?

fickle snow
#

yes

fickle snow
#

so do you think i should begin with python and math first or should i use pytorch and tensorflow right away?

#

i'm doing the titanic challenge

#

that's my first time building a model

dull sky
#

about AGI I don't want to discourage you, but as you'll learn more and more you'll understand how far we're from that 🙂

fickle snow
#

i don't know if i'm clear enough, but i'm not talking about learning what first, but doing what first, because i've learn the basic, and i'm trying to build a model, and i don't know should i build it with pure python and math or with AI libraries.

dull sky
#

That's just my opinion, but I'd first look for a smaller project.

Find a field that benefits from deeplearning, let's say agriculture. (you can check on kaggle, or get vague answers from chat bots)

Than you should narrow down the branches of machine learning needed (shallow/deep, [un]supervised etc.), you may check what is hot topic of research at https://paperswithcode.com/

You have a domain/field, what is being or can be used there. If it is agriculture after some search at kaggle reveals that there are several computer vision problems.

You've done all these, now set a smaller project, check how people solve that problem, just pick a good looking notebook, repeat it using the documentations of the used libraries as pointers, try to find the framework being followed there.

fickle snow
#

i appreciate your advice, but that's not related to my question man

#

actually i've got to go now, see you later, and add me on dm too

dull sky
#

I'm just a hobby guy, so I'd focus on practical applications after that.

civic seal
#

Does anyone know why my subission file is getting saved like with the C1 and C2 as column name but when i do the head() of the df it will not show that. (this is the reason my submission was wrong). My code for saving it is this:

submission_df = pd.DataFrame({'PassengerId': test_data['PassengerId'], 'Transported': test_predictions})
submission_df.columns = ['PassengerId', 'Transported']

submission_df.to_csv('submission.csv', index=False)```
feral ore
#

Im having trouble connecting to the localhost on Dbeaver.

steel valley
feral ore
#

I cant add a screenshot here for some reason

#

It says:

Connection to 'localhost' cannot be established.

Reason:
Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.

#

Tried creating a new connection and then it told me:

Connection refused: getsockopt

flint ocean
#

Guys can someone say where to start with in studying about ML LIKE A RAOD MAP AND STUDY PLAN like how you guys started

flint ocean
pine pollen
#

In neural-network regression, should model performance be independent of the number of output variables? Specifically, given the same dataset, will one multi-output model that predicts three targets perform the same as training three separate single-output models (one per target)? Why or why not?

brazen grotto
# pine pollen In neural-network regression, should model performance be independent of the num...

Not necessarily. If the three targets are totally independent, then training one model with three outputs or three separate models could give similar results. But in most real cases the targets share some structure. A single multi-output model can pick up on those shared patterns, which sometimes helps. On the flip side, if the tasks are very different or unbalanced, training them together can actually hurt because the model has to compromise. So the answer really depends on how related the targets are and how much capacity the model has.

ornate pike
#

Does anyone know how long emailing support@kaggle.com takes for a response? They are not responding to my attribution update request even though the kaggle website UI explicitly says to email them.

naive pond
#

Hi ,how can I upscale video on kaggle?

wraith sparrow
#

do kaggle have any official mcp server

gilded drift
thick widget
#

hello everyone i'm hari and I'm new to kaggle how i can upskill me on this platform

gloomy nest
#

I see in a lot of job descriptions that one needs to have hands-on experience with LLMs (train, fine-tune etc.)

Does anybody know some relevant links/documentation from which I can learn about LLMs

Also can you suggest a project that would prove the knowledge needed for a junior job?

Thanks

blissful edge
#

Sorry if this is a stupid question but how to use packages that require internet to download for offline submission?

uncut patrol
limber wasp
#

I have been trying to register for a competition for weeks, but PersonaID refuses to recognize/verify my face. I've went back and forth with support just telling me that they reset it and to try again. Now its the day of the competition and I've done the work but I'm still not able to register for the competition and submit my work.

limber fern
#

are spontaneous applications effective? i'm starting to look for an internship. Also i'm thinking about a good prompt to generate a cover letter for each application/company

brittle cave
#

Who here understand docker (beginner or expert).
Need help with dockerizing an ai project

crimson loom
#

I completed the ID verification, including face recognition, but I still cannot receive the SMS verification code.

#

plz, I need help

cursive owl
brittle cave
nimble meadow
#

can any one explain how to install SVD and import it , i am getting ModuleNotFoundError: No module named 'surprise'

uncut patrol
#

this is on a kaggle notebook right?

#

just add !pip install suprise to the top i think

#

the ! tells juypter to execute a terminal command

nimble meadow
velvet spoke
#

Ai blog generation project -

Issues -
At regeneration time, I have issues of overall quality improvent, user instructions follow, Alignment with original context, factual accuracy, true generation ( not rephrasing, new and true content ) , also consider latency, cost, scalable

Data flow ---
At regeneration, I am giving data past conversation ( first time default data- blog generation, heading h1 h2 h2 h3 so on, primary and secondary keyword, deafault prompt ) quotedblogsection , user instructions.
Note - if it is second regeneration, in second input goes like past conversation ( first time default data+ first regeneration data) . It is happening like this for every regeneration.

Note -
U can give me answer by considering all leaving latency, cost, scalable. Also If u give answers consider including all even latency cost scalable

Aprroach I have -
1.Prompt engineering approachwith pass conversation passing as a reference.
2 multiple agent approach
3 Single agent only

Question ----
1 I want to say which method work best to meet overall expectation and why. ?

2.If not work above method, what another approach I have to apply to fulfill my expection

3 Which model should I use - got 5 nano and got 4.1

wraith sparrow
#

Anyone have any crazy/good idea with (ai) agents?

fresh forge
#

hi , guys i have a question , what i should do if i have dataset and there is column in my datasets call
review contains text or description , can i transformation this col to be labels or one-hot or i should
drop this columns if i wanna clustering that's dataset

wraith sparrow
high leaf
#

Hi @everyone
Can some body help me review my code i trained model for the first time

sudden notch
sudden notch
wraith sparrow
sudden notch
sudden notch
wraith sparrow
sudden notch
#

Oh I see 😂

#

What am I gonna get if I give you an idea?

#

😎

wraith sparrow
sudden notch
#

Mine is ten billion percente better than yours. 👎

#

So why to share for an uncertain idea?

#

I want something better

#

How much experience do you have?

wraith sparrow
burnt geyser
#

I'm doing the birdclef competition as part of a school project.
I trained my base model and got bad results, even though I took care of class imbalance I think I may have done something wrong, I would appreciate it if someone could help me

wraith sparrow
#

can anyone help with finetuning gpt-oss-20b model on ssrl

inland crescent
#

When will cohort 5 of Kaggle X start?

mortal pond
#

Hi need some urgent help in the ML Model, anyoneup?

round prism
#

Does any one know a trading chanel thankss

wraith sparrow
#

Do anyone knows any lightweight Opensource alt to firebase studio

stoic ledge
#

Recently i have started learning ML and for that i have learned python and now moving onto numpy but the maths that i am learning matrices for the matrix calulation so does anyone know how much maths is required in numpy

supple juniper
#

I have joined the discord and linked my account ,why the Agent of Discord is still locked

stoic ledge
stoic ledge
twilit hearth
jolly torrent
#

hello everyone! i recently just started doing sql course in kaggle, at the time i reach WITH ... AS part ive got a problem on first cell which the problem is this

Collecting git+https://github.com/Kaggle/learntools.git
Cloning https://github.com/Kaggle/learntools.git to /tmp/pip-req-build-_uuo8ygc
Running command git clone --filter=blob:none --quiet https://github.com/Kaggle/learntools.git /tmp/pip-req-build-_uuo8ygc
fatal: unable to access 'https://github.com/Kaggle/learntools.git/': Could not resolve host: github.com
error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet https://github.com/Kaggle/learntools.git /tmp/pip-req-build-_uuo8ygc did not run successfully.
│ exit code: 128
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet https://github.com/Kaggle/learntools.git /tmp/pip-req-build-_uuo8ygc did not run successfully.
│ exit code: 128
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

anyone could help me solve this?

supple juniper
stoic ledge
sonic karma
#

Hey Everyone, I would be studying Cloud Computing this semester in my undergrad degree , as I am focused towards ML , I wanted to know what should be my approach towards cloud computing as I want it to learn in the ML way ,
Would you suggest a course , any book or any other resource , please do tell. Thanks

#

For reference I would be following this text book from the curriculum:
Cloud Computing, Theory and Practice by Dan C. Marinescu, THIRD EDITION,
Morgan Kaufmann Publishers, 2022

wraith sparrow
#

ASK UR SENIORS

twilit hearth
ripe blaze
#

Anyone can give me any idea how deployment would work for an AI model? I know tools like ollama/unsloth are used to experiment/finetune models, but once its ready for deployment, is there like a certain tool good for that? Assuming you dont want to deploy a model using unsloth due speed differences orsum.

waxen marsh
#

what are the top kinds of software engineering problems y'all have seen LLM consistently failing at, even when given step-by-step instructions which can be followed to achieve the solution, one problem that I have seen consistently is , If I given multiple interdependent validation points, the LLMs usually fails to generate optimal results even with multiple tries and feedbacks. what is yours?

molten cave
#

Im new Here, should I want to learn python or c++ for CUDA??, new to coding btw

pine mica
#

I am facing challenges completing the python exercise of "Working with External Libraries"
Task No. 3, can anyone help me!?

obsidian elbow
#

May I know how to be a sponsor ?

mental galleon
#

hello guys! I was about to start the 5 day intesive course but the checkboxes don't stay put after refreshing the site.

am I missing something or is there any other way to keep track of what I am doing?

thanks!!

golden lake
#

hello guys! I am new to data science and and am looking for tips, advices and resources to start it out. also what is the necessary math, I also want to mention that I am looking for fundamentals only I will be using python (pandas, numpy and matplotlib for visualization) and maybe SQL. also please guys tell me if SQL is necessary or not, and do we use SQL in ML? thank you all in advance.

olive crest
# golden lake hello guys! I am new to data science and and am looking for tips, advices and r...

Hello!
SQL is definitely important for ML because data and databases form the foundation of any project.

All ML models are trained on datasets, and SQL helps in modifying data and performing feature engineering or scaling.

For math, high school–level statistics and probability are enough to get started. It’s important, however, to understand the math behind algorithms like Linear Regression, KNN, etc.

I’d suggest starting with Python. Don’t go too deep at first...focus on building logic, practicing OOP, and optionally some DSA. Then move on to SQL and practice queries regularly.

After that, learn NumPy and Pandas...the key is to practice lots of problems (Kaggle is great for this).

Next, dive into EDA, feature engineering, and data visualization. Begin with Matplotlib and later try Plotly for interactive visuals.

Once comfortable, start applying ML algorithms with NumPy and scikit-learn to learn how to train models.

Finally, work on small end-to-end ML projects. That’s when everything you’ve learned will come together. You can also refer to YouTube for project ideas.

uneven marsh
#

Is there a video, forum, tutorial, or website out there that can make me understand how to code a simple/basic ML model in 25-30 mins? Something like when I'm on my bus ride to school, I can listen to a video or read the website page?

placid root
#

Hlo guys anyone know how to connect cloud database with jupyter notebook python using sqlalchemy

old mist
#

Hi, @everybody
I have one question, I'm training ml models for the prediction, which is classification problem of 3 classes, where the number of samples are similar but the predition is skewed.
First class and second class is predicted with low precision tough, third class is never predicted. What's the reason? I can' t find the reason.
Before, when I applyed reinforcement learning, where the three classes were assigned to three actions and one action is never selected, too.

#

Actually, that is the preeiction model of forex eur/usd.

civic river
#

Can sm1 help me in integrating tpu with the kaggle notebook am facing a bit of issues

high estuary
green quest
#

Hello, i am new to machine learning, i want to become a self taught ML engineer, so where should i start? what skills should i prefer to master? what tools do i need? help me out please

versed seal
#

Hi! Is there more information about the upcoming Google Agents course in November? Links on kaggle website seem to be redirecting back to the course that was offered last March

fading iron
#

How do you add a dataset to a Notebook in the current Editor? There is no right side bar, and no "Commit and Run" with settings to add it (from the documentation). I've looked in every menu option and dropdown and do not see an Add Data so I can link my Dataset training

#

Heads up if anyone else has this problem because a lot of online posts and ChatGPT is outdated. It's File->Add Input

fading iron
#

Does Kaggle not support matplot lib graphs inline like Colab? that's a very useful feature when running to see loss values during training.

wraith sparrow
#

Idk man use satyrn if on mac

cedar cosmos
#

Hello,
Recently I have started learning ML and for that I have learned python and currently learning Numpy, but I'm a bit confused whether I should learn pandas after Numpy or not.
So can anyone help me with it.

fading iron
#

pandas is amazing highly recommend learning it since Dataframes will come in very handy

wraith sparrow
clever geyser
#

Hi everyone..
So I have been working on Diabetic Retinopathy and have created a custom cnn with SE attention block. My model has reached the plateau and now its performance is not improving. How can I improve my model's performance. Currently it is 80% for training, 78% validation and 77% testing. I have to take it to 91% at least to conclude my research.

#

Any suggestions?

cursive owl
fading iron
#

How can you continue training on a net if you can't write to the input? Only way I see is the download the net and optimizer each time manually then upload to the dataset via the web which is horribly inefficient.

#

Any suggestions are welcome.

stiff girder
#

Hello,

I am currently working on a project where I aim to combine Long Short-Term Memory (LSTM) networks with Convolutional Neural Networks (CNNs) to forecast air pollution levels. Since my approach will require satellite observations, I am particularly interested in using data from the TEMPO (Tropospheric Emissions: Monitoring of Pollution) mission.

I would greatly appreciate any advice

hallow warren
#

Hi everyone, I am new to Kaggle, data science and machine learning. Can someone help me with the Titanic Problem? Like some basic questions about understanding the task.

vagrant pendant
#

guys did any one succeed in getting a job using kaggle, as a fresher?

dim rivet
rugged juniper
rugged juniper
rugged juniper
young vigil
#

hi not sure if this is the right place but i'm trying to save my work but it's not letting me

#

it keeps giving me this error: An error occurred while committing kernel: ConcurrencyViolation Sequence number must match Draft record: KernelId=92809017, ExpectedSequence=3, ActualSequence=2, AuthorUserId=24934154

fading iron
#

means your code was changed since the last save. It's wonky like that. Best bet copy it to another text editor. Reload the notebook, paste, save version

#

love kaggle so much but I do miss google drive in colab lol was so easy to just read and write to a persistant storage. Have to have a dataset set then use kaggle cli to push your data to it your done if you want to keep anything from /kaggle/working.

young vigil
#

i just downloaded and uploaded it instead

lime trout
#

Is there a working professional I can DM? I really need some guidance related to software development and data science.

wraith sparrow
#

anyone havng chatgpt pro here

sleek basin
#

[URGENT] Writeup submission cancelled after deadline - need help with technical issueHello Kaggle community,

I encountered a technical issue with my writeup submission for the BigQuery AI Hackathon and need your help.

My Situation:

  • I successfully submitted my writeup at 8:20 AM (40 minutes before the 9:00 AM deadline)
  • After the deadline (at 9:00:02 AM), I accidentally clicked the "Edit" button to make minor text corrections
  • The system interpreted this as a submission cancellation, changing my status to "Deadline Missed"

What I Need:

  • Can my original submission be restored to its pre-deadline state?
  • This appears to be a technical issue with the edit button functionality after deadline
  • I have complete evidence that the submission was made before the deadline

My Evidence:

My Project:

  • Title: "BigQuery AI Hackathon: Multimodal Health Analysis"
  • Complete multimodal health analysis system with BigQuery AI integration
  • Team: MKM Lab

Questions for the Community:

  • Has anyone else experienced this issue with the edit button after deadline?
  • Is there a way to contact Kaggle support directly for this type of technical issue?
  • Any advice on how to resolve this would be greatly appreciated

Technical Details:

  • The edit button should either be disabled after deadline or not cancel the submission
  • This seems like a UI/UX issue that could affect other participants

Thank you for any help or advice you can provide.

Best regards,
윤원민 (familyunion)
Kaggle ID: giryun288@gmail.com

verbal crest
tired solstice
#

500m dataset .tar file. Takes 10secs to upload. And then 1.5hour to process and still going on . Is it normal? On the webpage it’s always saying estimately finish in 10secs. Which is hilarious. Thank you anyone who care about this

fading iron
#

usually after uploading it takes 15 minutes to really hit. sounds like the website stalled after the upload. I'd close it out and reload the dataset page and see if it's listed and then try to load it from your notebook.

tired solstice
shrewd badge
#

Hi, I just got a new PC setup with a 5060ti 16GB and started working my first nn training. However, I seem to have a problem with GPU utilization. At first, it nicely is at around 93-95%, but then gradually goes down drastically to around 43%, and training is slowing also drastically. It is not an overheating issue, as the temps start at 50C and then gradually go down to 33c. It is also not an OOM issue, as I have plenty of ram, more than 40gb still free during the entire training. CPU is also not a bottleneck. Any ideas what might be causing it? I am on arch linux, Driver Version: 580.82.09 , CUDA Version: 13.0.

patent viper
#

hi, my kaggle notebook keeps hiding my variables. why is that?

ripe blaze
#

Where to deploy a ai model + rag? 20gb vram needed. be nice to have on demand payment. so it doesnt burn through my bank. like I only pay when someones actually using the vram.

errant cape
#

Hello, if I win a prize competition, how can I withdraw the prizes?

graceful axle
#

I checked the research paper, they only classified 3–4 behaviors and got around 0.6 F1. Here in mabe-mouse-challenge we have about 8 labels, so if we use their method, can we get a better score in mabe-mouse-b-dectection challenge?

hushed stag
#

Hello, I want to ask when the course starts. Ai

prime ermine
#

can my mac m4 run llms ? or i need a nivida gpu

wraith sparrow
#

u can buy dgx spark 4tb founders edition

gentle bridge
#

Hello Everyone,
I have been facing a login issue since yesterday. So far I had no issues with my Kaggle account. But yesterday I cleared history in chrome and tried to login through signing in with google. However I got the message "an account already exists with an alias of this email please login using that". I'm not sure what to do here because normally I login through my google account for using Kaggle. Has anyone faced this issue before. I would greatly appreciate it if I can get any help on what I can do.

weary oxide
gentle bridge
fading iron
#

What is the best way to run a notebook? Right now I can onnly do it from the editor. But the idle timer seems broke. I can't even finish a single training session now without it timing out despite updating the output constantly via tdqm updates per epoch. My training takes 1h5m per input file. Not sure what to do, useless if I can't even finish. Plus it says hit cancel or continue editing it wont disconnect but it does. Kills the kernel and restarts automatically.

fading iron
#

Found a workaround if it helps anyone else. Load the notebook then press F12 to get to the dev console, paste this in the console. It mimics a keypress once a minute. Voila no more timer killing off the training.

#

setInterval(() => {
document.dispatchEvent(new KeyboardEvent('keydown', {'key':'Shift'}));
}, 60000);

balmy vapor
#

Hello! I can't seem to get rid of this problem. What do I need to do? I'm using kaggle's jupyter notebook

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
bigframes 2.8.0 requires google-cloud-bigquery-storage<3.0.0,>=2.30.0, which is not installed.
gensim 4.3.3 requires numpy<2.0,>=1.18.5, but you have numpy 2.2.6 which is incompatible.
gensim 4.3.3 requires scipy<1.14.0,>=1.7.0, but you have scipy 1.15.3 which is incompatible.
mkl-umath 0.1.1 requires numpy<1.27.0,>=1.26.4, but you have numpy 2.2.6 which is incompatible.
mkl-random 1.2.4 requires numpy<1.27.0,>=1.26.4, but you have numpy 2.2.6 which is incompatible.
mkl-fft 1.3.8 requires numpy<1.27.0,>=1.26.4, but you have numpy 2.2.6 which is incompatible.
numba 0.60.0 requires numpy<2.1,>=1.22, but you have numpy 2.2.6 which is incompatible.
datasets 3.6.0 requires fsspec[http]<=2025.3.0,>=2023.1.0, but you have fsspec 2025.5.1 which is incompatible.
ydata-profiling 4.16.1 requires numpy<2.2,>=1.16.0, but you have numpy 2.2.6 which is incompatible.
onnx 1.18.0 requires protobuf>=4.25.1, but you have protobuf 3.20.3 which is incompatible.
google-colab 1.0.0 requires google-auth==2.38.0, but you have google-auth 2.40.3 which is incompatible.

#

Some require newer versions, some require older versions, how do you even begin to fix this error?

hidden plinth
#

hey

#

is there any way

#

i can use sklearn models like gradient boosting with gpu

#

i really need it for hyparameter tuning

#

???

toxic junco
#

Guys any good reccomendations for python DSA

deft warren
graceful axle
#

Hello everyone

#

I am trying to use an API to fetch data

#

I am using a loop to iterate all the pages present in the json.

#

I am getting either a gai error or a ConnectionReset Error

#

Please help

#

`import requests

url = "https://api.themoviedb.org/3/movie/popular?language=en-US&page=1"

headers = {
"accept": "application/json",
"Authorization": "Bearer MY_READ_ACCESS_TOKEN
}

df = pd.DataFrame()

for i in range(1,52774):
url = f"https://api.themoviedb.org/3/movie/popular?language=en-US&page={i}"
response = requests.get(url, headers=headers)
time.sleep(0.25)
df = pd.concat([df,pd.DataFrame(response.json()['results'])[['id','title','overview','release_date','popularity','vote_average','vote_count']]],ignore_index=True)

df.head()`

#

This data is of TMDB

#

Here is the error snippet

`gaierror Traceback (most recent call last)
File ~/Desktop/DS_Practice/.venv/lib/python3.12/site-packages/urllib3/connection.py:198, in HTTPConnection._new_conn(self)
197 try:
--> 198 sock = connection.create_connection(
199 (self._dns_host, self.port),
200 self.timeout,
201 source_address=self.source_address,
202 socket_options=self.socket_options,
203 )
204 except socket.gaierror as e:

File ~/Desktop/DS_Practice/.venv/lib/python3.12/site-packages/urllib3/util/connection.py:60, in create_connection(address, timeout, source_address, socket_options)
58 raise LocationParseError(f"'{host}', label empty or too long") from None
---> 60 for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
61 af, socktype, proto, canonname, sa = res

File /usr/lib/python3.12/socket.py:963, in getaddrinfo(host, port, family, type, proto, flags)
962 addrlist = []
--> 963 for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
964 af, socktype, proto, canonname, sa = res

gaierror: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:
...
--> 677 raise ConnectionError(e, request=request)
679 except ClosedPoolError as e:
680 raise ConnectionError(e, request=request)

ConnectionError: HTTPSConnectionPool(host='api.themoviedb.org', port=443): Max retries exceeded with url: /3/movie/popular?language=en-US&page=1 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7825ea3d9430>: Failed to resolve 'api.themoviedb.org' ([Errno -2] Name or service not known)"))`

leaden crescent
tropic warren
primal oxide
old mist
#

Hi, @everyone
Is there anyone who joins radical ai founders' masterclass?
I didn't have an opportunity to apply for that.
Please give me the meeting urls for them.

severe lance
#

Did the GOOGLE’s Kaggle course started ? @everyone

rose harbor
severe lance
#

Oh thanks, I thought I was late.

stone scroll
#

Hello everyone, I started my AI/ML journey this year. Created few projects too and also participated in a "getting started" kaggle competition named "Natural Language Processing with Disaster Tweets" in which i got 189th rank. But i want to explore and learn more so how should i proceed further any suggestions/help/guidance will be appreciated

brave vale
#

Hey everybody

#

I have a problem and want your help,
https://www.kaggle.com/code/nooshinpourtaleby/day-4-fine-tuning-a-custom-model/edit
in this 5 day course I don't know why i get this error for this code:
response = client.models.generate_content(
model="gemini-1.5-flash-001", contents=sample_row)
print(response.text)

ClientError Traceback (most recent call last)
Cell In[37], line 1
----> 1 response = client.models.generate_content(
2 model="gemini-1.5-flash-001", contents=sample_row)
3 print(response.text)

also I tried: "gemini-1.5-flash" but it didnt work:(

graceful axle
#

learning pytorch is not enough what else should someone learn to get master on ML and win Kaggle?

mortal pond
#

Anyone who can help me in the ML Challenge?

upper schooner
#

Hey I am trying to educate myself. Can someone explain to me what are

Gradient decent
loss function
learning rate

I am so confused rn. I just know they are used to optimize an algorithm but how

fast lichen
scarlet galleon
marsh drift
#

Hello, on MABe Challenge, I get an errror on my notebook on the submissions pannel, but when I click on it and look at the logs, it shows "successfully ran". And there is no errors in the logs, it even outputs a csv. What could be the cause ? I am a beginner. Thank you !

drowsy berry
#

Hello

I was using Kaggle's GPU powered notebook for one of the competitions and work

But I was running in to out of memory errors and Kernel Crashes were occurring
This was due to the big embeddings I was generating on large data
And also if I set high hyperparameters for some models
I tried solving those kind of errors using
del
And
gc.collect()

But I wanted to know how I can avoid these errors

What are some of the best practices and optimization techniques to not run into these kind of issues

graceful axle
marsh drift
graceful axle
sudden ledge
#

Hi! I submitted my solution on Kaggle, it finished successfully but the scoring is stuck on "running". Has anyone experienced this?

sudden ledge
terse kiln
#

Im beginner trying to convert my model to a pickle file in pycharm however when I do pickle .dump , it said the code ran successfully but the model.pkl was not created in the folder. I did the proper modes but yet the result was same . Any suggestions?

Edit : this problem is now solved with assistance of @teal smelt

glass mulch
#

Hey everyone! 👋

I hope you’re all doing well. I’m currently preparing for a career in FinTech, and I’m looking to connect with professionals or learners who are already working or have experience in this domain.

I’d really appreciate some guidance on how to make myself a strong candidate for FinTech roles — including:

The essential skill sets (technical + financial)

Recommended certifications or learning paths

Important tools or technologies used in the industry

And a few project ideas that can help build a solid FinTech-focused portfolio

If anyone here is from a FinTech company or has relevant experience, I’d be very grateful for your advice or even a quick chat. 🙏

You can also connect with me on LinkedIn: www.linkedin.com/in/abhay-singh-1694b221b

Thank you so much in advance for your time and support!

— Abhay

teal smelt
terse kiln
teal smelt
charred sluice
#

hi

drowsy berry
teal smelt
muted spindle
muted spindle
sudden notch
#

The model I've created has performed well and in a similar manner on test and validation sets. But it performs ~2.5x poorly on competition data. What's the deal?

trail umbra
#

..

spring nebula
#

Hii

bitter mason
#

Hlw

umbral scaffold
#

Hello guys! I'm working on creating a forex trading bot that helps me out in understanding signals and placing trades per time depending on the patterns observed. Please suggest great YouTube channels and books that can serve this purpose. @teal smelt

teal smelt
teal smelt
# umbral scaffold Hello guys! I'm working on creating a forex trading bot that helps me out in und...

Okay so shit dump, here goes nothing.

So you would first need expertise in whatever language you're planning to use (I highly recommend Python, or Rust. Due to their fast and easy to code environment. And plus these languages are highly appreciated for this specific field of data analysis and ai ml so yeah.. )
Then first before starting anything go read "Ernest P. Chan" people really praise that guy over twitter(X) and reddit for his books titled "Quantitative Training", "Algorithmic Training".
(Check these before starting : https://www.reddit.com/r/algotrading/comments/gily37/ernest_p_chan_books_quantitative_trading/
https://www.reddit.com/r/algotrading/comments/15gz5dn/do_ernest_chans_mean_reversion_strategies_for/ )
NOTE THAT THESE BOOKS I SUGGEST TO LAY DOWN YOUR FOUNDATION WITH ALGOS AND TRADING, YOU'LL HAVE TO PRACTICE A LOT BEFORE STARTING FULL
For more algorithms check out "16 proven algorithm forex trading", "forex trading: theory and practice trady"


Now, we have quite the bookish knowledge (not enough, you still gotta explore articles. I suggest joining discord or reddits for algorithms), anyways:


Now you would need APIs, and pre build ais to understand how they work, for these i suggest you to pull up to huggingface.co and kaggle and start searching for keywords relating to "trading ai", "trading algo", "trading blah blah"; research the APIs and decide which one suits you and study the code or just use it directly.


Analyse pre available data from kaggle.


And there's this guy called "Moon Dev" on YouTube, search for him has a playlist, and couldn't find other better channels :(
And search for more on youtube and reddit and google :(

Amd yeah ask chatgpt anytime you want if you need more resources i could only find these, you gotta do a lot of research tho as this is a not-so-famous topic :(

umbral scaffold
teal smelt
#

Hoi, please don't rely on my stuff fully, research and research like a focused horse...

teal smelt
#

You can use "friend" or "bro" or "🐣"

viral dagger
#

anyone completed the Mathematics for Machine Learning specialization by Deeplearning.AI?

#

because im stuck in a concept i can't seem to understand in the linear algebra course

teal smelt
#

I'll try if i relate with the topic hehe T_T

viral dagger
#

Like converting the math steps directly into python

teal smelt
# viral dagger Like converting the math steps directly into python

NOTE: I have assumed you're not solving "variables" thus no backtracking implement led, if you don't understand what I'm talking about.. just ignore this shitty NOTE.


Okay so like first let's see what the concept means mathematically i guess,
So first let's say ummm imagine a matrix?

a11, a12......, a1n
a21, a22....., a2n
...........................
am1, am2......, amn 
``` so this row echelon form thing basically you would need 4-5 steps and just keep looping them as per your needs.
Pivot: a non zero value, we'll eliminate stuff below it.

### 1: we make the code find a "pivot" in each of the available columns.
### 2: Transpose the matrix or swapping rows if needed so pivot doesn't come out to be zero.
### 3: Normalize the pivot row, to ofcourse make pivot = 1.
### 4: And we'll kill/eliminate all entries below that pivot eventually making them to be zero.
### 5: Run same algorithm but now to down-right element. And so on.
(Basically we're running this for diagonals okay, a11, a22.... So on... And diagonals get to be 1 and adjust other values to keep the over-all result same)



Like imagine a matrix
```py

A = np.array([
    [2, 1, -1, 8],
    [-3, -1, 2, -11],
    [-2, 1, 2, -3]
])

It's echelon is :


[[ 1.   0.5 -0.5  4. ] #here first a11 was made to be 1, i.e. [2÷2=1] and then for all other values also divide with 2.
 [ 0.   1.  -1.  1. ] # note here we first had to transform the matrix with (R2 = R2 - 3R1), for the sake of making value below pivot 0(a21) and adjust others.
 [ 0.   0.   1.  -2. ]] # guess transformation to get this.
errant valve
#

hi im a maths masters student at cambridge specialising in stats. My career aspiration is to become a quant researcher at a top quant firm. One of the things that looks super interesting and was recommended to me was to do kaggle since it is from what ive heard full of data to analyse and make predictions from. My stats knowledge is good but i am down to learn more relevant stuff to do with machine learning or whatever experts here think is relevant. I know python syntax well but dont know data libraries like numpy, pandas, matplotlib at all. Can anyone give any advice on how I could go from this position to high level in kaggle? Like can anyone recommend me any courses, books, approaches to improve as quickly as possible? ty

crystal brook
#

hello i am new here

teal smelt
# teal smelt **NOTE: I have assumed you're not solving "variables" thus no backtracking imple...

Python Implementation:

Now let's move to the Python Code best paart hehehehehe

import numpy as np

def rowEchelon(X):
    X = X.astype(float) # we won't be able to divide properly without floating in the air 😴 😴
    rows, cols = X.shape
    pivotRow = 0 # the row where our python snake is.

    for col in range(cols):
        if pivotRow >= rows: #last row done right?
            break #stop :)
        
         # Now since we're working on matrix, this is the start of everything.
        pivot = None
        for r in range(pivotRow, rows):
           if X[r, col] != 0: #finds if the element in current row is our pivot, or in the row below it.
               pivot = r
               break
 
           if pivot is None:
               return # basically no pivots were found right?

        # Pivot finding done, lets move to  swapping/transpose. (Note we always swap rows with our pivot row)
          if pivot != pivotRow # make sure you're not swapping the exact same row with itself(the pivotRow wth itself)
             X[[pivotRow, pivot]] = X[[pivot, pivotRow]]

        # Normalize current row with our pivot row
            pivotValue = X[pivotRow, col]
            X[pivotRow] = X[pivotRow] / pivotVal
        
        # Now our concept tells us to eliminate the below stuff right?
            for r in range(pivotRow + 1, rows):
                factor = X[r, col]
                X[r] = X[r] - factor * X[pivotRow] #transformation of rows basically 

            pivotRow += 1
    return X

# we can use our example matrix 
X = np.array([
    [2, 1, -1, 8],
    [-3, -1, 2, -11],
    [-2, 1, 2, -3]
])

martix = rowEchelon(X)
print(matrix)

We'll get the same output as array we had used for example above

[[ 1.   0.5 -0.5  4. ]
 [ 0.   1.  -1.  1. ]
 [ 0.   0.   1.  -2. ]]
teal smelt
teal smelt
# errant valve hi im a maths masters student at cambridge specialising in stats. My career aspi...

Idk about books, checkout the courses form edureka and freecodecamp om YouTube about "machine learning, ai/ml".. they'll teach basic about how to analyse data like titanic one if i remember correctly and they also teach basic algorithms you would need..

Then you can move further and watch kaggle's own tutorials, just make sure you've completed the above step and understood the "requirements(python, aiml basics, numpy, pandas, matplotlib, kaggle notebook)" first.

Also it'll take quite a while to be comfortable with this technologies if you've just started this so please don't quit just keep learning...

And finally read articles, discuss your doubts, and solve other people's doubts and start kaggle 🐣

(And idk if this should be enough for high level on kaggle, you'll need to spend time and brain on kaggle competitions for it and I'm not much familiar with competitions; so no shitty advice about that 😅)

twin plover
#

I have a question, I just have very basic knowledge of AI but working with Python and it's libraries as a data analyst from few months, did it's possible to compete in competition of CAFA 6 Protein Function Prediction it's last date January 26, 2026, i have around 2 months time, did it's possible? kindly guide me

twin plover
teal smelt
twin plover
# teal smelt Sorry, i don't get what you're trying to say brother??

A = np.array([
[2, 1, -1, 8],
[-3, -1, 2, -11],
[-2, 1, 2, -3]
])
i mean when we write it as in upper context then why write the matrix in long as in following code
import numpy as np

def rowEchelon(X):
X = X.astype(float) # we won't be able to divide properly without floating in the air 😴 😴
rows, cols = X.shape
pivotRow = 0 # the row where our python snake is.

for col in range(cols):
    if pivotRow >= rows: #last row done right?
        break #stop 🙂
    
     # Now since we're working on matrix, this is the start of everything.
    pivot = None
    for r in range(pivotRow, rows):
       if X[r, col] != 0: #finds if the element in current row is our pivot, or in the row below it.
           pivot = r
           break

       if pivot is None:
           return # basically no pivots were found right?

    # Pivot finding done, lets move to  swapping/transpose. (Note we always swap rows with our pivot row)
      if pivot != pivotRow # make sure you're not swapping the exact same row with itself(the pivotRow wth itself)
         X[[pivotRow, pivot]] = X[[pivot, pivotRow]]

    # Normalize current row with our pivot row
        pivotValue = X[pivotRow, col]
        X[pivotRow] = X[pivotRow] / pivotVal
    
    # Now our concept tells us to eliminate the below stuff right?
        for r in range(pivotRow + 1, rows):
            factor = X[r, col]
            X[r] = X[r] - factor * X[pivotRow] #transformation of rows basically 

        pivotRow += 1
return X

we can use our example matrix

X = np.array([
[2, 1, -1, 8],
[-3, -1, 2, -11],
[-2, 1, 2, -3]
])

martix = rowEchelon(X)
print(matrix)

#

my english is not so good i hope you understand

teal smelt
teal smelt
#

Look at the matrix we had, its values are quite big and distinct right?

A = np.array([
    [2, 1, -1, 8],
    [-3, -1, 2, -11],
    [-2, 1, 2, -3]
])

We did all that code below this (the whole rowEchelon function, to convert our above A matrix to a simpler rowEchelon form):

[[ 1.   0.5 -0.5  4. ]
 [ 0.   1.  -1.  1. ]
 [ 0.   0.   1.  -2. ]]

Q. Why do we need a row echelon form ? 🤔

Bro, i just gave a very very simplified example; but in the reality row echelon is used to solve big linear equations

Here's an example:

Say we have linear equations:

2x + y - z = 8
-3x - y + 2z = -11
-2x + y + 2z = -3

We can represent this set of equations as a matrix:

[[2, 1 , -1, 8 
-3, -1, 2, -11
-2, 1, 2, -3]]

And we will give this ^^ array to our code as A or X(in python code) and get this array as row echelon:

1, 0.5, -0.5, 4
0, 1, 1, 2 
0, 0, 1, -1

Notice how simplified our matrix is? Imagine how it would look in the form of equations now!!

Here's how it would look:

x + 0.5y - 0.5z = 4
y + z = 2
z = -1

Notice how simplified and easy these equations are to handle now, it is so easy to work with this now instead of what we had in the starting :)

teal smelt
errant valve
#

are the kaggle tutorials acc that good or am i looking at the wrong spot? coz from what i remember seeing they are quite short and not that in depth

south gate
#

Hey guys

teal smelt
sour ingot
#

are jobs in ML still in demand?

teal smelt
sour ingot
#

when i search for internships or jobs in ML there very few results compared to full stack development

teal smelt
#

People are taking the field because it's a trend, i don't think so all of them are interested or invested mentally in the field.. at least some other people I know just took it for a trend and are struggling now..
Your actual competition is less than you imagine but yeah you gotta stand out :)

sour ingot
#

i started learning ML i did regression,classification i understood them but when i see job listing i get demotivated

#

plus, should i rush towards deep learning?

teal smelt
#

And ML pays more than full stack, so it'll be a bit rare as it must be hard hence the big paycheck

sour ingot
#

how much did you complete?

teal smelt
teal smelt
sour ingot
#

your 16 and already way ahead of other people damn

#

when i was 16 i was playing pubg

teal smelt
wraith sparrow