#❓┊ask-a-question
1 messages · Page 5 of 1
ans also where do i get the coursework in AIstudio?
So because I am not 18 I cannot use Google Ai Studio leading to me not getting an API so what should I do?
I'm currently in my second semester of systems engineering. I consider myself a junior data analyst; I have knowledge in Power BI, Excel, Tableau, and SQL. I work as an administrative analyst at a company that makes slot machines similar to a casino.
I've been studying something, but I feel it's time to invest in education. I'm looking for those courses that cost 1-2 million, that last 10 months, and so on. They're similar to a bootcamp, but I use them more to reinforce knowledge and so on. I feel like YouTube courses and other platforms don't teach me enough; I feel like I've reached a point where I need guidance.
Would you invest in education like that?
Hello, I am not being able to open the file "Foundational Large Language Models & Text Generation" as you can see in the screenshot, is that normal?
How to check that er have successfully completed our assignments
Mind asking here https://discord.com/channels/1101210829807956100/1303438695143178251
Plz answer my question
They will be tracked automatically
But how should I get confirmed that it is completed
I have an issue with uploading a dataset.
I uploaded a 40 chunks of total 2GB (parquet, gzip compression) and it's hanging for a few hours with this:
Is it normal, or should I do something differently? Now I am uploading with Kaggle API
2GB should not take few hours. I have never used API to upload the data. Also try restarting once and check your internet speed. I have 5g and 2gb data takes about 15-20 mins
He's master btw 💀😅
#❓┊ask-a-question I am trying to run an LLM model . And I tried different ways to load the model into memory . like device_map= auto, balanced, max_memory but I still see that 2 of my GPUs are underutilized. Besides my CPU memory is still available (I checked resource utilization). I am getting CUDA Out of memory issues and it is really frustrating. Can someone please help
I can share more details if someone can help
Hello, I am a starter in LLMs. Basically for my project, I am trying to train any base model(Currently GPT2), to generate a short story using a user's prompt. Now, I've used a Huggingface dataset(Basically, Prompts and stories). I am feeding them into the model and training it in supervised mode(I passed the stories as lables).
My losses aren't converging. If anyone wants details on my Training arguments I can show them if its allowed.
which machine are you using and which model(and corresponding quant) are you using?
Maybe it's exceeding available vram
L4*4
I maybe wrong, but since I've been getting this error aswell Throughout the whole night yesterday, I guess you're using a batchsize that requires more VRAM than you're providing.
model?
are you in any competition?
if not you can share
Qwne2.5- 32B but if VRAM is exhausting then that should be visible on the resource utlization screen.
are you using fp8?
Nope it's clear
are you using fp8 quant?
can you share your code if it is not a part of any comp?
No no I am not im just building a project on my own.
I'll share
Fr anyways is there much diff in perf or smth ??
I was able to load and use the same model using vllm
why ! tell me the reason for uninstalling the jupyterlab packege for
Clean up the environment by removing an unused package ?
qwen 2.5 32b it
Accelerate n transformer ?
OutOfMemoryError: CUDA out of memory. Tried to allocate 540.00 MiB. GPU 1 has a total capacity of 22.28 GiB of which 511.38 MiB is free. Process 4171 has 21.77 GiB memory in use. Of the allocated memory 21.56 GiB is allocated by PyTorch, and 1.21 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
add Codeadd Markdown
A guide to torch.cuda, a PyTorch module to run CUDA operations
accelerate is used to speed up transformer
I usually get this kind of error
btw do u use vllms in/for comps too ?
VRAM issue, I got this same error a lot of times while tweaking batch sizes
@glossy crown here
do you wanna fine tune or do you want for inference?
But the 2 of the GPU memory is not used. I get this VRAM issue
how many tokens are your stories?
How do I check, I mean the stories vary in sizes
max token limit
for now inference but I want for fine tuning also. This I get not during inference but when I am loading the model
maybe try dtype="half"?
can you share the line where you get oom?
@glossy crown
vllm is better than transformers for inference
And for training
Im not getting OOM right now, but I do get it when I change the batchsize per device in the training arguments to higher values and then try to train.
vllm is not for training
Oh right 💀😅
that is fine
it means that many batches can't fit at once
use a smaller batch size
Accelerate is ?
yeah
Yes I am, but the training is painfully slow and Losses don't converge. Maybe I should reduce max_tokens?
it is used in transformers training module
Oh right
what is the min batch size where error occurs?
4
in my experience convergence is slow at beginning, then it speeds up and then loss stagnates/increases slightly
check with your train as well as test data
does it not decrease in train as well?
Yeah, tweaking any of them to 4, causes the CUDA error
It doesn't
how long did you run it for?
I mean the losses were 6.6+ then came down to 6.4+ then went to 6.6+
for test?
2 epochs, 5000 steps in each.
6.6+ to 6.3+
Right now no ig
without lora convergence is really slow
I never really tried without lora so can't really say how much slower it is
but it is pretty slow
Okay i'll try Lora then.
@drifting star did it fix it?
Bro she was trying for aimo 💀
Isn't it over ?
not yet, also use they/them unless you know their gender
harpreet is a gender neutral name
How many days left ?
change your views bro, it is gonna make you suffer later in life
Change ur views 😄
you can never infer gender from someone's coding style
Lol there r too many factors
Like date of joining dc n kaggle too n username
In simple words I said behaviour aggregate of all the factors
I'm not even learning osint though
you are just trying to normalize exclusivity and it is not funny
Hey guys, I'm in Kaggle youtube channel, and I don't see live stream yet, any idea when it'll start ?
10 pm IST
Oh cool Thanks
hie i am new in to AI ML Domain , can someone provide good resources to start my journey in thsi domain
Ok i js guessed bro
Hi guys, I'm from India, When it will start?
Hi there, the same confusion here, I have signed for 5 days course and today is the 1st day, I am expecting live video.but what time will it start?
It is not abvious from name. It is a gender neutral name. Yes I am a female but many men in our community share the same name. We belong to sikh community and maximum names are gender neutral in our community. And yes I was trying for AIMO and it is getting over tomorrow but that ain't the point. The point is to learn. And I want to understand why 2 of my GPUs are underutilized
Hello everyone, is there a meeting now?
On the registration page there is count down timing showing, and right now it is [Starting in 03 hour and 49 minutes]
It's 10+ hours already and no movement, I tried to create a second dataset with the same data, but it's doing exactly the same.
😦
@verbal crest
Sorry for bothering, but it seems I found a bug with dataset creation with an API
Dataset link is https://www.kaggle.com/datasets/dremovd/pump-fun-graduation-02-2025
HI. Can everyone help me how to summarize the white paper I downloaded in Notebook?
Use NotebookLM for that
Is their any assignment for today?
I linked my kaggle but the welcome channel of kaggle still indicates that I have not linked my account. Further, I don't have any permissions to post on the channel. I tried to delink it once and again linked it but the result is same
Dataset link is not accessible
Also curious how could you keep your session active for 10 hours is it through API
If you can recall and let me know that will be helpful
Ig it's right
U js simply save it 😅
It automatically runs till max 12 hrs
Thank you I will try 😀
Thank you 🙂
it will not shut down unless you keep the session idle for 40 mins
if you need to keep something running and train for many hours you can save and run, but make sure to pickle the output
otherwise you would lose it
Yes, but it should be for Kaggle stuff
- It's private
- Dataset is not "finished", the status would prevent anybody to see it
New info:
I reloaded in CSV and it worked instantly
Hi. I am getting "ClientError: 429 RESOURCE_EXHAUSTED" when I try to run a cell. This is after execution of some 4/5 cells above it. Anyone has any pointers ?
Ok. retried it after a minute or so and it worked. I guess the server resource(s) were really exhusted.
So, for day 1 what we have to do ? Is there any assignment we have to complete
you should have the instruction and the assignments in your email
how do i submit assignment
In the Evaluation and Structured Data codelab, we explored different ways to assess LLM responses using autoraters and structured output formatting. Given the latest improvements in Gemini 2.0, what are the best practices for optimizing prompt structures to achieve higher accuracy and consistency in structured outputs, especially for tasks requiring reasoning, summarization, or extraction from unstructured text?
Additionally, are there any specific evaluation techniques recommended for ensuring that structured outputs remain reliable across diverse datasets?
How do know that I completed today wrk
i havent recieved it yet is there something i can do about it ?
what requirements file or package name I should command pip to install?
If you look in discord someone put the assignment instruction. Today’s Assignments
Complete the Intro Unit – “Foundational Large Language Models & Text Generation”:
Listen to the summary podcast episode (https://www.youtube.com/watch?v=Na3O4Pkbp-U&list=PLqFaTIg4myu_yKJpvF8WE2JfaG5kGuvoE&index=1) for this unit.
To complement the podcast, read the “Foundational Large Language Models & Text Generation” whitepaper (https://www.kaggle.com/whitepaper-foundational-llm-and-text-generation).
Complete Unit 1 – “Prompt Engineering”:
Listen to the summary podcast episode (https://www.youtube.com/watch?v=CFtX0ZyLSAY&list=PLqFaTIg4myu_yKJpvF8WE2JfaG5kGuvoE&index=2) for this unit.
To complement the podcast, read the “Prompt Engineering” whitepaper (https://www.kaggle.com/whitepaper-prompt-engineering).
Complete these codelabs on Kaggle:
Prompting fundamentals - https://www.kaggle.com/code/markishere/day-1-prompting
Evaluation and structured data - https://www.kaggle.com/code/markishere/day-1-evaluation-and-structured-output
Make sure you phone verify (https://www.kaggle.com/settings) your account before starting, it's necessary for the codelabs.
Want to have an interactive conversation (https://support.google.com/notebooklm/answer/15731776?hl=en&ref_topic=14272601&sjid=16012842710481496794-EU) ? Try adding the whitepapers to NotebookLM (https://notebooklm.google.com/?original_referer=https:%2F%2Fwww.google.com%23&pli=1)
Read the whitepaper here: https://www.kaggle.com/whitepaper-foundational-llm-and-text-generation
Learn more about the 5-Day Generative AI Intensive: https://rsvp.withgoogle.com/events/google-generative-ai-intensive_2025q1
Introduction:
The advent of Large Language Models (LLMs) represents a seismic shift in the world of artificial intelligen...
Read the whitepaper here: https://www.kaggle.com/whitepaper-prompt-engineering
Learn more about the 5-Day Generative AI Intensive: https://rsvp.withgoogle.com/events/google-generative-ai-intensive_2025q1
Introduction:
When thinking about a large language model input and output, a text prompt (sometimes accompanied by other modalities such as ...
I am getting an error for "Install SDK", "Cell In[5], line 1
pip uninstall -qqy jupyterlab # Remove unused packages from Kaggle's base image that conflict
^
SyntaxError: invalid syntax"
Hi everyone. what requirements file or package name I should command pip to install? as i get error
thankyou so much for letting me know
what is the use for api key that is generated by ai studio ?
why this error has come up? ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyterlab-lsp 3.10.2 requires jupyterlab<4.0.0a0,>=3.1.0, but you have jupyterlab 4.3.6 which is incompatible.
im currently doing the intro to ml course and in the first exercise itself, the codes on the notebook don't work? it keeps on loading what should i do? i tried in both chrome and edge
your jupyter lab version 4.3.6 is not compatible with jupyterlab-lsp 3.10.2 so you can either downgrade your jupyter lab version that is use this pip install "jupyterlab<4.0.0" to install the compatible verion or create a virtual env to work in
try restarting the kernel
no luck with that
is it, more efficient to run an unsupervised Learning on GPT2 to finetune for generating story based on a context, or is using prompt as the input and story as the target to do supervised learning the better choice?
I did the latter, but my losses failed to converge. Also, the training times for such stuff, is absurdly high.
when i downgrade jupyter version to<4.0.0this error came up: An error occurred while committing kernel: Concurrenc yViolation Sequence number must match Draft record: KernelId=79702580, ExpectedSequence=4, ActualSequence=2, AuthorUserId=25935810
Also gave me a warning WARNING: Skipping jupyterlab as it is not installed.
although it mentions successfully installed
show ss ?
hi, how we can submit tasks we already done?
once we run the notebooks on kaggle, are we supposed to submit them ?
You have to choose option on top right side that edit a copy. then you can run the codes.
No we are not supposed to. The submission is for the project at the end of 5 days. See the announcements ... everything else is optional based on how much time you have.
copy from announcements -- Hi all - To make things simple and clear for everyone, we will award the course badge to everyone who participates in the capstone project.
All other work is optional and can be done at any time. If you're short on time, we recommend prioritising the podcasts and livestream and then the codelabs. The capstone project will use knowledge you learn from the codelabs, so we recommend running through them but you don't have to submit the daily labs.
There is no time limit for podcasts, whitepapers or codelabs, but doing them this week will give you a chance to discuss with the community.
Hi everyone! I am a student from Iran and it seems that Iranians can not verify their phones due to not receiving any messages. I would appreciate any guidance with this.
I don't get the code for the verification of Kaggal.. does anyone have the same issue
I committed both promoting and Evaluation assignment. How do I know that it's completed?
Can any one tell me that what project we have to do at last .
So what should I do after completing the codelabs again?
Is the livestream happening soon, I'm on the YT channel but don't see stream happening?
Yeah. It will stream in 12 mins or so
What are the key differences between Generative AI models and Discriminative AI models?
Is the youtube livestream from 4 months ago or is it truly happening now?
Anant did not get a chance to share his views when everyone else shared their views
How does Gemini 2.0’s multimodal capability compare to GPT-4 in image processing?
Guys, I know this question is kind of funny, but I couldn’t find the livestream video on the Kaggle page. The only video I found was published four months ago. That’s why I couldn’t participate in the last test.
its not new
Do you see it prompting the user for different styles/formats of output as for structured output which we see deployed now by GPT and Claude?
What’s the best way to evaluate the coherence of Gemini API outputs when working with this dataset?
Hello , Why don’t I see the first day ?
Temperature
I could not see todays video because of the timing when will it be uploaded and can you send link?
should we cover the whole white paper on foundational LLM?
add exclamation mark in front of pip and try uninstalling
آیا برای روز اول تمرین یا کویزی داده شده که باید انجام وتحویل بدهیم؟
How I can know my assignment are submitter, how how I can submit?
Will there be another course in a few months? Maybe over the summer, so that students won't have to also balance school and homework?
U can learn after some time, these all videos and livestream are available on the kaggle yt channel
yeahh but I won't be able to do the assignments and capstone project
how to satup day 2
How would I know if the assignment is done or not?
code:
warning/error:
as i ran the code in cell, got he warnings in output then this kernel restarting pop-up. and this problem just goes on.
Using TPU VM v3-8, huggingface library used is the latest version
How will i get to know whether I have completed my day 1 task will i get some notification or will it reflect on my dashboard or home page
ask these questions on #5dgai-q-and-a .
Is the assignment easy ?
No nothing's easy
why does my kaggle notebook almost always get stuck on running when i press run all? like my very first cell which is just imports gets stuck on running
this only really happens if I press "runall" though if I run each cell individually it works and doesnt get stuck
how to Complete assignments
!!! For the 5 days GenAI Course this is not the right place to ask questions. Please refer to 5dgai-question-forum. This is the general QA Discord chanel of Kaggle.
Dear Kaggle Support Team,
I hope you are well. I recently registered on Kaggle to further my education and enhance my skills in data science. However, I encountered an obstacle during the phone number verification process for accessing codelabs; my verification is being blocked due to sanctions imposed on Iran.
I understand that U.S. sanctions target commercial and political activities, yet educational and research initiatives are generally exempt under U.S. law. Restrictions that prevent access to purely educational resources—like Kaggle codelabs—could be seen as conflicting with these legal exemptions. In fact, U.S. regulations typically allow for academic and humanitarian exchanges, and applying these restrictions in this educational context may run counter to that policy.
Could you please review this matter and consider lifting the phone verification restriction on my account so I can fully participate in Kaggle’s educational opportunities? I believe this change supports the spirit of U.S. law, which aims to foster academic development and innovation.
How to get instructions for this 5 Days course? thanks
#5dgai-announcements or emails, they will tell you what to learn each day
Hi, I'm a developer in Korea. I started studying AI this time, and what I want is to become a deep learning engineer in computer vision. So, I'm learning the basics from machine learning. How much should I study machine learning to move on to deep learning? Now I'm taking a look at the book "Introduction to Machine Learning with Python" and practicing it. Please tell me the direction for deep learning in the vision field
i recived day2 yesterday not day 1 anyone like me ?
omg i got a 404 error
I'm bit disappointed myself
They don't ve that name book yet
Also this one's latest edition I'm not sure
I checked the revised link. I was thinking about looking at the book, but thank you for the recommendation.
Lol book is really great
Never underestimate a Manning book
Even a decade old one would be better than some latest edition books of Packt publication
Actually, i'm bad at eng yet. So I'm looking for a translated version, but you recommended me before, but I don't have a translated version of this book yet. I'll look at the former book first, and then I'll try this book in English! Thx
could you tell me how did you complete the prompting and evaluation assignment/ just running it in kaggle right/
Are there ways for us to effectively create AI agents for embedding evaluation or train the agents through reinforcement learning for embedding optimization?
Good day fellow kagglers. Please how do I go from here :df_train = create_embeddings(df_train)
df_test = create_embeddings(df_test)
I have not been able to run the two lines of codes and the subsequent ones successfully. I need your insight
Yes, there are ways to create AI agents for embedding evaluation and optimization. Here are some approaches:
Embedding Evaluation:
- Learning to Rank: Train an agent to rank embeddings based on their quality, using reinforcement learning or supervised learning.
- Embedding Critic: Design an agent that critiques embeddings based on specific metrics, such as similarity or clustering quality.
- Adversarial Evaluation: Train an agent to generate adversarial examples that challenge the embedding's robustness.
Embedding Optimization through Reinforcement Learning:
- Reward-based Optimization: Define a reward function that encourages the agent to optimize embeddings for a specific task, such as clustering or classification.
- Policy Gradient Methods: Use policy gradient methods, like REINFORCE, to optimize the embedding parameters directly.
- Actor-Critic Methods: Employ actor-critic methods, like Deep Deterministic Policy Gradients (DDPG), to optimize embeddings using both value and policy functions.
Other Approaches:
- Generative Adversarial Networks (GANs): Use GANs to generate embeddings that are optimized for specific tasks.
- Meta-Learning: Train an agent to learn how to optimize embeddings for a variety of tasks, using meta-learning techniques like Model-Agnostic Meta-Learning (MAML).
- Evolutionary Algorithms: Employ evolutionary algorithms, like genetic algorithms or evolution strategies, to optimize embeddings.
These approaches can be used to create AI agents that effectively evaluate and optimize embeddings for various tasks.
Evaluating the coherence of Gemini api outputs involve assessing the relevance, consistency, and overall quality of the generated text. Here r some methods:
Evaluation Metrics:
- Perplexity: Use the
evaluatemethod from HF Transformers to calculate perplexity. - BLEU Score: Utilize the
sacrebleulibrary, which is integrated with HF Transformers, to calculate BLEU scores. - ROUGE Score: Use the
rouge-scorelibrary, which is also integrated with HF Transformers, to calculate ROUGE scores.
Coherence Evaluation:
- Language Modeling Evaluation: Use the
language-modeling-evaluationpipeline from HF Transformers to evaluate coherence. - Text Classification Evaluation: Utilize the
text-classification-evaluationpipeline to evaluate coherence in text classification tasks.
Model-specific Evaluation:
- Autoencoder-based Models: Evaluate coherence using reconstruction loss or perplexity for autoencoder-based models like BERT or RoBERTa.
- Generative Models: Evaluate coherence using metrics like BLEU or ROUGE for generative models like T5 or Bart.
Example Code:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from transformers import evaluate
Load model and tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
tokenizer = AutoTokenizer.from_pretrained("t5-base")
Evaluate perplexity
perplexity = evaluate(model, tokenizer, "wikitext-2-raw-v1", "perplexity")
print(perplexity)
This code evaluates the perplexity of the T5-base model on the WikiText-2 dataset.
Hello there, I am a bit stuck. Are we supposed to work in the same Kaggle notebook?
Hello fellas! I've been trying to verify my kaggle account via phone number but to no avail. I've tried over and over again sending code to my number but still no code shows up. Please, any way around this?
Hello there, should i invest myself in Claris Filemaker? i am being transition to work with our company CTO and Filemaker is part of our internal process. Working with CTO over the years led me closer to data world.
@quick yarrow moving here not to flood the other channel
i didn't say yolo is the best model, it does perform well though for your task it will need some additional training
do you have a dataset of images that show where existing defects are? you need to show the model what it's trying to locate
let me pull up an example
the task here is segmenting parts of the spine
the image on the right (aka mask or label) is already existing in the dataset
thus you can train your model to replicate what is already in your data
example of mask vs what a model generated
does your data have the masks? if not training a CNN can be difficult
also this is how a segmentation model (in this case Unet) segments images
yolo doesn't output masks like this, instead it generates bounding boxes
given the structure of a semiconductor i would assume something like yoloX is more fitting for your task, it's also less computationally intensive to train compared to U-net especially if your images are hi-res
i'm not familiar with how to implement yolo though but afaik it's not too hard
@quick yarrow please don't send me friend requests, we can just discuss this here
Thank you.
well. I just learned kaggle learn course. could you tell me about the best study material?
Thank you.
Day 3 (Generative Agents)
Welcome to Day 3. https://www.kaggle.com/learn-guide/5-day-genai
Learn to build sophisticated AI agents by understanding their core components and the iterative development process.
The code labs cover how to connect LLMs to existing systems and to the real world. Learn about function calling by giving SQL tools to a chatbot, and learn how to build a LangGraph agent that takes orders in a café.
Day 3 Assignments:
- Complete Unit 3: “Generative Agents”, which is:
[Optional] Listen to the summary podcast episode for this unit (created by NotebookLM).
Read the “Generative AI Agents” whitepaper.
[Optional] Read a case study which talks about how a leading technology regulatory reporting solutions provider used an agentic generative AI system to automate ticket-to-code creation in software development, achieving a 2.5x productivity boost.
Complete these code labs on Kaggle:
-
Talk to a database with function calling
-
Build an agentic ordering system in LangGraph
-
Watch the YouTube livestream recording. Paige Bailey will be joined by expert speakers from Google - Steven Johnson, Julia Wiesinger, Alan Blount, Patrick Marlow, Wes Dyer, Anant Nawalgaria to discuss generative AI agents.
How to check the Progress whether I completed the Day 1???
Hi everyone, I have a question on embeddings!
I was working on a RAG search project where the source document (a book) was ancient and hence the language used was antiquated. I was getting very poor performance with a standard embedding model and I suspect the language in the book was just very different to what the embedding model was trained on.
Is there a way in which I can salvage this situation and calculate useful embeddings? Thanks in advance. 🙂
Hello everyone, quick question. Is anyone experiencing issues with Notebook? Its not loading any cells for me.
Is there a NotebookLLM API that can enable me leverage its features from within an app?
Pls how do I get this code labs, is blurred on the live training and is a capstone project to be submitted when?
That's an interesting case really. Are you splitting parts of the doc into chunks for use within the vector space? Not sure if semantic chunking would work here if your LLM struggles to parse the language, but other options exist
my thought process would be that if you split the doc into smaller, more manageable chunks, you'll have an easier time searching within the vector space during retrieval
The other route would be to test different embedding models though I doubt that will bring much improvement, unless it's anything overly specific
Another idea that comes to mind, though not sure how feasible given the increased overhead, would be to have an interpreter LLM convert from the antiquated to a more modern version of your language (assuming this is structured similar to trad/simplified Chinese, ancient/modern Greek etc) and use the modern versions for embeddings and retrieval and the original text in your doc store and subsequently the answering LLM prompt
Do keep me updated with your findings if possible that sounds really interesting
Does the server guide just disappear when you complete it all (used to appear near Chanels & Roles and rules in left hand side of desktop app.
Any ideas where can I get EHR data?
It's Naveed from Pakistan
I had a question
That, if I wanna become a Gen Ai developer and get a job as soon as possible
So, in your opinion which fields should I focus
Subject: Seeking Guidance on Developing a Generative AI Application for Video SEO Optimization
I am initiating a project to develop a Generative AI application with the goal of optimizing video content for search engine visibility. The intended workflow involves users uploading video files, which the AI agent will then analyze to generate key SEO elements. Specifically, the application should:
Extract relevant information from the video content.
Generate SEO-optimized titles, adhering to a maximum length of 60 characters.
Suggest relevant hashtags (less than 10).
Identify pertinent YouTube keywords.
Identify relevant YouTube long-tail keywords.
I am seeking insights and recommendations from the community on the optimal approach to building this AI agent. Specifically, I would appreciate guidance on:
Recommended AI architectures and models suitable for this type of video content analysis and text generation.
Key considerations for data processing and feature extraction from video content to inform the AI model.
Strategies for training the AI model to generate accurate and effective SEO elements within the specified constraints (character limits, hashtag count).
Potential challenges and best practices to consider during the development process.
Thank you for your expertise and assistance in getting this project off the ground.
This is a really good idea. Either I could look for a way to translate it to a more modern version or just download a contemporary translation of that book. Thanks Kostas!
I’ll give it a shot this weekend 🙂
No probs, one note I would make would be to use the direct, untranslated version of the passage in the context window of the answering LLM, since translations almost always miss something from the original
what is this capstone project about?
what PC set-up do you guys usually use and think would be good for training models? Sorry if this is too general, I'm just getting started with kaggle and was thinking of getting a M4 mini
Maybe better buy a good/descent gpu if u r considering offline training @Qafig
Kaggle offers you limited GPU runtime (30 hours a week, plenty if you're not training anything too intense) online
that and the ~50gb online storage should be enough for you to get started without spending a dime
training locally can be very expensive so do try to exhaust Kaggle first and see if it doesn't fit your needs before dropping that much money on a rig
what about that nvidia's blackwell ai computer for training models ?
i'm not familiar w that service or what it costs
if they have free compute units/tokens and it's easy enough for you to set up cool
but kaggle has the lowest barrier to entry imo
I'm going to develop population model using real-world data from a specific region.
The goal is to effectively transfer this model to a similar area while maintaining demographic and behavioral accuracy.
A key aspect of this role involves troubleshooting complex data inconsistencies and resolving modeling challenges to ensure high-quality outputs.
could you please give me the best algorithm and how to do?
Hello Everyone,
Where do I mark the assignments done?
Also, where can we get the capstone?
@everyone
Hello how to find groups for capstonea project
my Day 4 1st codelab is still in queue.... Has anyone got through the queue successfully.... Mine is going on forever.... 😦
Hi everyone, I am wondering if there is documentation/reference for the client class.
So far, codelabs showed me examples using
- client.models.generate_content()
- client.chats.create()
- client.chats.send_message()
- client.tunings.tune()
etc. Those examples are great, but where can I find a comprehensive list of functions that belong to the client class and arguments of those functions, not just example code snippets?
Today I can't get any kind of mail for day 5 tasks and assignments
Thanks for the clarification @verbal crest . Could you advise on best practices for sharing the call across channels, if allowed? I aprpeciate your help, thanks!
A single post in the general channel with no discord link is the appropriate amount of promotion.
datasets vs polars library which is better
I have registered for a competition in diffrent I'd and my main I'd is diffrent I'm in a team what should I do?
Hey, is here someone that knows anything about efficiency optimization in pytorch? I would need some help.
Thank you!
Hello. Is there anyone who has expertise in webscraping?
Anyone ve experience with singlestore notebooks ?
Anyone ve experience with uptrain ai ?
i am not getting this (linting) in kaggel notebook, how can i enable it(though i tried finding i haven't got any option regarding that).
is there any plugin required?
Yes, I will advice to make submission on the 19th April, 2025
What happens if we don't submit before the due date.
Can I submit the project after the deadline
@everyone Hello Everyone, I have a question regarding the code laps notebooks , I am doing all the code laps today, do we see any out files after running all notebooks? or just run is enough??
I know this is late to ask, does anyone know?, can anyone answer?
run would be enough ig
How do people publish their datasets?
I'm curious to understand how people go about publishing datasets. Do they generate the data themselves, or do they collect it from somewhere else? If it's the latter, where do they usually get their data from?
Hello @everyone,
I should develop AI model who checked a seal and invoice to business. I need a dataset sample.
Can you help me to getting an example of sample please.
Thanks.
Can anyone please tell if on kaggle they are asking me to submit the code notebook as well but in "submit Prediction" accepting only csv file so how i can submit the code on it.
Is there a deadline for completing the project we are supposed to do in KAGGLE??
I'm getting this error when I go to the competitions/gen-ai-intensive-course-capstone-2025q1 page. Is there a way around this?
April 20th is the deadline for the project
I’m trying to download a part of BNF onto NotebookLm but not able to. What I am doing wrong ?
anyone here familiar with a workflow where i can do version control using git on a shared server?
for context, i have access to a shared GPU cluster which i can use for ML training, but ofc it's shared, i can't just have my private credentials on that server
so what's the best way of doing training on there while keeping my files under git?
also curious how to initiate training sessions, leave them running even after i close my laptop/close the SSH connection, periodically check training progress, etc.
can you share this link
got any answers?
Please, how do I join a team, for the capstone project?
I want to become a ML engineer can anyone please guide me
right now i just done my matric exams and i am free for 3 months i want to start learning in this time period
so from where should i start and how i should continue my study
I have asked for help several times and I am really feeling unsupported in trying to complete my Kaggle project.🥹
i dont know how to do the capstone
like should is just make a new notebook on the website and right my code in cells and run it or what/
little help please
Pls how do I merge teams for my kaggle capstone project
I can't find where to form a team or even merge a team
Pls help 🥲
When u join the competition go to the Teams tab u can see ur team tho team merger deadline has passed maybe so u can't merge anymore
NO. 20th april is the deadline for submitting the project after that you cant submit and the project wont be evaluated for the 5 days course
Any support with this error
'404 models/gemini-pro is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.' @here
When I run these lines in a fresh notebook:
!pip uninstall -qqy jupyterlab # Remove unused packages from Kaggle's base image that conflict
!pip install -U -q "google-genai==1.7.0" # Also tried "google-genai==1.10.0" with same conflicts
!pip install -U -q chromadb
The chromadb install throws a whole bunch of library conflict warnings. Does anybody heve the same issue?
Any solutions?
The messages seem to refer to big-query and tensorflow, which I will not use directly, but I wonder if they will break anything else...
I tried the solutions Google AI Studio and OpenAI gave me, but they didn't work.
hi, i'm new on kaggle and i have a working notebook i want to submit in a competition. but i can't find how to import it ? the only option i found is to create a new notebook from scratch.
oh finally found the option "link to a competition" ! 🙂
but it creates a version and delete everything 😢
save a version from Version2 solve my issue. sorry for the noise.
Hello i have written notebook that exeute a chain via langchain, but the notebook is hanging in kaggle ,while the same is r unnig fine on codespace. am i missing something> does kaggle not allow chains? ihave this setup wtih OpenAI but gemini does not seem to have the same output parser agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_tool_messages(
x["intermediate_steps"]
),
"chat_history": lambda x: memory.load_memory_variables(x)["chat_history"],
}
| prompt
| llm_with_tools
| OpenAIToolsAgentOutputParser()
)
Hey i am Jenny, i am trying to work with Kaggle but this is too overwhelming dont know where to start, any help would be highly appreciated.Thanks in advance
Hi, have anyone encountered any issue with yfinance api on kaggle notebook before? It was working perfectly fine yesterday, but suddenly it returns empty dataframe
Hi I have comepleted my CapStone Project everything runs fine but when I go to version it it says it failed... can any one help please
hey can i ask how to do it
like where do you write the code how to submit it
?? aanyhelp
can you please tell me how you did it
how i did what...just build a notebook like we did inthe labs
So I am somewhat new to ML and would like to know more about feature engineering and how does one go about doing it
Hi Jenny, I'm new as well but I think a good way to get started is to follow along with some YouTube tutorials. There are some good tutorials that show how to join a beginner competition, get some results using the Kaggle notebook/Jupyter notebook and submit those results for a score. Try searching youtube for "Kaggle titanic" or "Kaggle House Pricing" for help with those beginner competitions. A couple of popular channels I've seen are Ken Jee, and Ryan and Matt Data Science, for example.
Thank you so much for the help, i really appreciate it
Are questions asked here answered?
thank
i am really sorry if that bothered you
I get kinda lost with instructions
except the ones asked by u 😉
Where I can upload my project of 5days genai
the kaggle web community looks like all spam or is just me?
-# maybe u r new
Indeed I am
Thanks James, it feels like you are the only one answering these questions instead of being dismissive
Do you think I could reach out to you if I need help completing mine?
This channel is full of eager people but not enough people lending out a hand
It’s suffocating knowing that there’s a “community” here, but it feels like a brick wall
Hi Kobe, I'm only a beginner too, but yes if you have beginner questions feel free to ask and I'll do my best!
Do anyone have the address of the recommended open source machine learning project on pulse waveform graph analysis? Recently looking for related projects in this area,I want to follow the big brother to do it first to see.
Hello, I am pretty new to ML and was working on internal project . I wanted to get some understanding about what are best practices to store trained model (.pkl). Do we push them into github with other code or there are other storages which can be used for maintaining model versions ?
Hey there!
After six months of learning to code (and approximately 4 existential crises), I finally trained and uploaded my first-ever deep computer vision model to Hugging Face. 🤗
Look, I’m still very much in the "wait, this actually worked?!" phase of ML, so I’d really appreciate some honest feedback—especially on the documentation and presentation.
Did I explain things okay? Does it make sense, or does it sound like I wrote it at 3 AM after too much coffee? (…Okay, that last part might be true.)
But seriously, any thoughts are welcome—code, structure, weird mistakes. I’m here to learn and you folks know way more than I do. 🚀
And if it’s terrible… well, at least I tried. 😅
https://huggingface.co/spaces/IncreasingLoss/Wildlife_Animal_Classifier
-# next time try streamlit
I am doing all my project coding from within Kaggle - are there any risks of losing my work? Is my work private?
-# ofc ofc
what do these green and red numbers mean next to my notebook?
my notebook is private btw
It means the no. of lines added n removed in ur notebook code
I registered on time, and attended the course for 5 days during the course, but i am unable to join the capstone competition (https://www.kaggle.com/competitions/gen-ai-intensive-course-capstone-2025q1/overview) from the start. It gives me attached error. All the emails we get are from no-reply-eventsatgoogle@google.com, who do I need to reach out to?
Do you use Google form provided by the link you gave in ur message to submit the project?
Good morning, how can I obtain a credential at the end of the course?
I mean Day Gen AI Intensive Course with Google Learn Guide
I am working with a dataset for a local contest in which I have to predict price for cars and the columns are brand , model , model year , milage , fuel type , ext colour , int colour , transmission , accident , clean_title how should I go about creating features for it / some good tips for feature enginnering and processing
I submitted my notebook and added it as competitions notebook, now it shows like this and I cant preview it
without editing I am not able to view the notebook. Anyway to undo that change ?
Good day guys!
Happy learning!
Anyone that has worked with ransomware using deep learning.
I'm working on a similar project and I'm stucked.
Hello, I am unable to join the competetion. It says Permission 'competitions.participate' was denied
Anyone get this?
An error occurred while committing kernel: The kernel source must be less than 1 megabytes in size.
Why can't I merge team?
I need to be verified to merge team in a competition but I cannot re-verify again and Idk why...
Bro...
When will this be fixed?
Contacted + emailed staff yesterday and still no response
I attended the 5 day course, completed all notebooks. I submitted the Capstone project via Google form.
But I didn't receive completion certificate or badge.
Is there anything else I need to do.?
Badges took a little while to roll out, it's been added to your profile now
facing this issue in notebooks which I already have and each time I create a new notebook in incognito mode as well. Anyone knows how to fix this?
How important is it to learn LLMs right now? I understand learning the very basics, but I mean like going deep into libraries and learning how different ones work.
The reason I ask is because I know LLMs are big right now, but you still need to know feature engineering, EDA, non-neural network models, etc, so it's important to not just do neural networks.
Also, I think right now to get a job working with LLMs you need either a grad degree or experience in software engineering or very specifically MLE, which would mean if you don't have that experience then it would be harder to get a job using them.
Double also, I'm asking because it seems like more and more jobs specifically titled "Data Scientist" have in the job descriptions more LLM stuff than not.
Hey guys, sorry for the stupid question, I'm doing some research about the best selling games of all times and facing some trouble finding the most adequate dataset. I'm sorry if i am in the wrong channel, but could someone please give me some directions? Thanks!
Hey guys. Say I'm training a model in Kaggle and an interactive session is ended. Does my code still run in the background? Or do I have to start again. How do I check?
need help in time series modeling
data:
Project year Month MoneyLeft
prj1 2024 1 1000
prj1 2024 2 800
prj1 2024 3 400
prj1 2024 4 100
prj2 2022 3 5000
prj2 2022 4 3493
prj2 2022 5 2000
prj2 2022 6 1000
fabrciate this for 10 to 20 projects ,each prorjecr can have month 12 to month 18
for a new project given moneyLeft for 2 or 3 months it should predcit next 4 months moneyLeft
the models like ARIMA ,SARIMA ,EXPONENETIAL SMOOTHING ETC will take only one season or trend,whick means we can train these model only on single project
1 .I have one solution like we can convert this time series problem to regression problem ,we can create lags or windows for three months and can predict for next 4 months , the problem here is it will train on that lags or windows only ,it should also be giving importance for project name (I do not no how to do)
- other solution would be we can train the model for each project which is not feasible here in this case
how to do this
Hi everyone,
I am trying to experiment with a Decision Tree (DT) model in scikit-learn. The task is to find the best model by trying all 64 combinations of the following parameters:
- max_depth (2 values)
- min_samples_split (2 values)
- min_samples_leaf (2 values)
- max_features (2 values)
- max_leaf_nodes (2 values)
- criterion (gini and entropy)
For each parameter combination, I need to:
- Perform 10-fold stratified cross-validation.
- For each fold, record accuracy, precision, recall, F1-score, and AUC ROC.
- Calculate the mean and standard deviation for each metric across the 10 folds.
- Repeat for all 64 parameter combinations.
- Identify the best combination based on model performance (e.g., best F1 or AUC).
- Fit the best model on the entire training set and then evaluate it on a test set.
Im currently searching on the internet on how to do this task particular. I would appreciate it if someone could guide me structure the loops/code for this efficiently? And maybe guiding me through the internet (resources, links, websites, youtube video, or anything)
Thank you
How to upload notebook on kaggle from GitHub?
Sounds like an NP Hard problem. Perhaps consider a Gentic Algorithm approach to navigate your Solution Space and optimize your Model.
Hi, I have a large dataset (extracted from an NC file). I'm trying to put it into an LSTM to make a per-day prediction for 30 days for the target variable. The problem is that the dataset is too large to work with inside a Kaggle environment (the free variant). There are 12 million samples in the dataset. That being said, I don't know what I should do here to reduce the size and also train my model effectively. I've used memory mapped arrays for storage optimization (data is still too large) and batch generation for the model. Any help will be appreciated.
The public notebook link:
https://www.kaggle.com/code/kekistan0100/lstm-better
Does anyone here have an access to a paid AI video generator? I have a prompt and I want to make that video but free models are so weak.
I'm trying to extract information from a PDF that contains tables, columns, and hierarchical structures. I'm having trouble preserving the layout and structure during extraction . trying to build PDF chat bot can any one help me on this
Is conversational modelling still a thing in 2025 @glossy crown ?
you pinged me how did I not get pinged??
judging by the first few lines, it is a thing in 2025 💀
fr bro ? i thought its mostly limited to conversations
yeah they say they provide better n simpler tool integration too but didnt try yet
Lemme js delete it then I don't wanna do comp++ in 2025 anymore
Those who have uploaded their project on Kaggle, even if their name is not in the top 10 competitions and in the Honorable Mention, will they not get a certificate?
hi folks. I signed up for the recent 5 day kaggle course via work, but I couldn't do it at the time. I wanted to do it this week, but my company has a block on their gmail accounts, meaning I cannot access the course content. I can access the materials via my personal gmail account but when I try execute the notebook - specifically cells that need to use pip for example, I get a connection error. A bit of reading online says that kaggle used to have an internet connection setting that is now gone. is there anything I can do to go through the course now? Thanks
Hey guys! So I am trying to learn about transformers how do they work, how they are implemented from scratch. Can you please recommend me some resources where they are explined?
Looking forward to know How can I learn ML in 20 days being a Data Engineer
I cannot use the Neural Pre-Processing Python (NPPY) https://github.com/Novestars/Neural_Pre_Processing/tree/master
Need help guys
Getting this 'Failed to build surfa' error
Hello friends!
Finding datasets that hold the data i am looking for is hell
thankfully i found a properly managed website with a databank that had the info im looking for.
But is there some kind of dataset register linked to a LLM where you can just enter the frequency / type of data you are looking for (e.g. Solar radiation on the athmosphere) / NWSE area you need it for / what time period you need available, and it finds multiple datasets that match your needs ?
Hey guys how can i start python from scratch coding
Hi guys, im new in DATA SCIENCE and AI Field I just understand ML Models regression classification clustering using sklearn then i move to deep learning ann and cnn using keras what you guys suggest me how to successful in this field ?? any road map?
CV and NLP is also another field how to know which one is better for me and im also new on kaggle please guide me
Hey folks, my team got an error, do you know if the we can get this file? We think this file is missing for the waveform-inversion competition?
Question...in the Data Science community when working with python or another programming language, do you stick to Jupyter notebooks or do you use an IDE along with the Jupyter Notebooks?
why it takes so long?
-# just get used to it from now
Hello! I have been tryin kaggle for the first time these past days, and yesterday i was able to autofill by pressing tab and it´s currently not working anymore for me, any advice? thanks in advance!!
Hi I've just completed a project for my datamining class and I've been thinking of using it as something to show off on my resume and to recruiters. I have a project report documenting my findings along with a colab/jupyter notebook. I wanted to ask what you guys think is a good way to present my project to others. Would it just be a zip file with all of the documents including the ipynb & html of the notebook? Or would you guys just set everything up on a github account. I'm just curious because I think this is one of the bigger projects that I participated the most in and wanted to know what would be the right way to present it. I'm a business administaration major in my uni and my emphasis is in data analytics and I've never really set anything up on github or used it much. Thanks!
To present your data mining project effectively for your resume and recruiters as a business administration major with a data analytics emphasis, use GitHub to showcase your work professionally. It’s the industry standard, accessible, and demonstrates technical skills.
Steps to Present on GitHub
- Create a GitHub Account: Sign up at github.com with a professional username.
- Set Up a Repository: Create a public repo (e.g.,
DataMiningProject). Initialize with a README. - Upload Files:
- Jupyter Notebook (.ipynb): Clean, commented, and error-free.
- HTML Export: Download notebook as HTML for non-technical viewers.
- Project Report (PDF): Polished, with clear findings.
- Requirements.txt: List dependencies (e.g.,
pip freeze > requirements.txt). - Optional: Include dataset (if small) or link to its source.
- Write a README (Markdown):
- Project title and 2-sentence overview.
- Key findings (e.g., “Improved marketing ROI by 15%”).
- Technologies (e.g., Python, Pandas).
- Instructions to run the notebook.
- Links to HTML, PDF, and dataset.
- Your contact info (LinkedIn, email).
- Clean Notebook: Add markdown explanations, remove sensitive data, ensure it runs.
- Upload: Use GitHub’s web interface to drag and drop files.
- Share: Add the repo link to your resume, LinkedIn, and applications.
Why Not a Zip File?
A zip file (with .ipynb, HTML, PDF, requirements.txt) is less professional, harder to access, and doesn’t show GitHub skills. Use it only if required.
Tips
- Emphasize business impact (e.g., cost savings) in your report and README.
- Test the repo in an incognito browser.
- Learn basic GitHub via a 30-minute tutorial.
- Post about the project on LinkedIn with the repo link.
This GitHub setup highlights your project’s value and technical skills.
Hello everyone. I hope you are doing well. Do you know when we will get our certificates for our Kaggle projects, which were submitted last month?
hi can anyone help me to extract the information from the pdf The key challenge is maintaining document structure—particularly tables, hierarchical text, and tables with nested information during extraction and chunking.
Hello everyone.
Now I am working on legal question-answer chat bot project.
And I have some questions.
Should I stick with Flask for new or would starting with FastAPI or Django be better for scaling later?
And are there any open-source legal Tech projects I could use it as a reference?
No need for django u can shift to fastapi for scaling deff
got it.
Can someone help me find a dataset for my project?
must have 1000 rows after removing nulls and dublicates
must work great with 2 models from this list(KNN,SVM,Liner Regression)
and be balanced(without a need to over or under sample)
Does anyone here have an experience with Ray framework? I have issue with running tasks on multiple workers, and I always get this error: The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
I'd appreciate any help to fix this issue
Comment, create post or committment
thx
Hi, I’m learning backprop and I do understand that when using ReLU, it is undifferentiable at x = 0 so in practice people usually set it to 0 or 1. I’m wondering what’s the difference between the two options, what’s the pros and cons?
Hi what is an agentic solution to autogenerate queries for a NoSQL DB?
How do you receive the exclusive Kaggle swag?
hey , i m a beginner and trying to sumbit my frist entry
how do i fix this?
my csv looks like this
do you change it to TRUE? boolean. idk
Is there any way to automatically update particular packages in kaggle everytime i start a session? Something like an init file? or some other way around this, as most of the default packages are very old
Hi, I am unable to verify my kaggle account using persona. It failed and now says to contact support but i haven't received a response from them.
Hi, I’ve created a notebook that I’d like to share, but I don’t want to break any rules about spam or self-promotion. I’ve learned a lot from other people’s notebooks, and I found many of them through Discussions.
What’s the right way — or time or place — to share a notebook? For example, if it’s related to a competition, is it more acceptable to post it in that competition’s Discussions rather than in the general ones?
I’ve already received a warning, and now I’m honestly afraid to even mention it.
Hi, I am a user from Iran and I am interested to participate in competitions of Kaggle, the problem is that almost all competitions has the rule not being a resident of countries like Iran. What is the solution for this? If I have a teammate from other countries, is it allowable to participate in competitions?
Hello, I've created a notebook but why it didn't show up at the list of notebook?
i am finding the same issue.
I had a doubt how does the score works in kaggle even if I got better accuracy from previous submission but my score was less compared to previous submission?
Hi everyone,
I'm working on a product classifier for ecommerce listings, and I'm looking for advice on the best way to extract specific attributes/features from product titles, such as the number of doors in a wardrobe.
For example, I have titles like:
🟢 "BRAND X Engineered Wood 3 Door Wardrobe for Clothes, Cupboard Wooden Almirah for Bedroom, Multi Utility Wardrobe with Hanger Rod Lock and Handles,1 Year Warranty, Columbian Walnut Finish"
🔵 "BRAND X Engineered Wood 5 Door Wardrobe for Clothes, Cupboard Wooden Almirah for Bedroom, Multi Utility Wardrobe with Hanger Rod Lock and Handles,1 Year Warranty, Columbian Walnut Finish"
I need to design a logic or model that can correctly differentiate between these products based on the number of doors (in this case, 3 Door vs 5 Door).
I'm considering approaches like:
Regex-based rule extraction (e.g., extracting (\d+)\s+door)
Using a tokenizer + keyword attention model
Fine-tuning a small transformer model to extract structured attributes
Dependency parsing to associate numerals with the right product feature
Has anyone tackled a similar problem? I'd love to hear:
What worked for you?
Would you recommend a rule-based, ML-based, or hybrid approach?
How do you handle generalization to other attributes like material, color, or dimensions?
Thanks in advance! 🙏
To extract attributes like the number of doors from e-commerce product titles (e.g., "BRAND X Engineered Wood 3 Door Wardrobe"), use a hybrid approach:
- Regex Baseline: Use patterns like
(\d+)\s+door(s)?to extract numerical attributes (e.g., "3" or "5" doors). Fast and reliable for consistent titles. - Fine-Tuned Transformer: Fine-tune a model like DistilBERT to handle varied formats (e.g., "three door") and extract other attributes (material, color). Requires labeled data but generalizes well.
- Hybrid Logic: Apply regex first; use transformer for cases where regex fails or for complex attributes.
Generalization: Extend regex patterns (e.g., (wood|metal) for material, (walnut|black) for color) and fine-tune the transformer on multi-label tasks to extract multiple attributes.
Why Hybrid? Combines regex’s speed and simplicity with transformer’s robustness for diverse titles. Start with regex, then add transformer as data and complexity grow.
-# U can also try scikit-llm btw
thanks for the inputs @wraith sparrow
Howdy, im new to everything, coding with R and Kaggle.
I'm trying to upload my code, but im running into errors. This is the code I would run on my computer since the files are stored here. I'm assuming kaggle doesnt have access to my computer files. How would i set up code for it to read the datasets i uploaded to kaggle?
Update: Chat GPT helped me
Answer:
divvy_trips_2019_q1 <- read.csv("/kaggle/input/cyclistic-bike-share-case-study-datasets/divvy_trips_2019_q1.csv")
divvy_trips_2020_q1 <- read.csv("/kaggle/input/cyclistic-bike-share-case-study-datasets/divvy_trips_2020_q1.csv")
Hey, I'm getting an error that there isn't a submission.csv file in the DRW crypto market competition. I'm pretty sure my script is producing a submission.csv file, but whenever I try to run it in the notebook it doesn't work because it crashes in the middle saying the kernel died. Is it possible that the error is coming because my script is too CPU intensive and crashes before it creates the file? It doesn't seem to be possible to look at the logs...
Hi does anyone used Wav2Lip model before? I got some questions to ask
We have more questions than we have answers chat. Back to the drawing board
hi does anyone know why when i try to add a notebook to my collection, my collection folder name doesn't appear in the list of options?
Hi! Has anyone real experience with ML applied to corporate employee databases? I’m looking for realistic project ideas 🤔
every idea u could possibly think of its already deployed in action
Hi. I'm working on a project which is to work in rural conditions where there are connectivity issues and stuffs. I was wondering to talk to someone who has worked on edge ai to help me answer some questions
Not trying to invent anything. Just seeking business problems that can be solved reallistically givem its a dataset of 500 employees.
I’m studying data science and trying to apply my learnings on the company I work for. Although my job title is not about that
Not yet. I still have some internal review with the aliens before I ship to TOI-700
Are there competitions where I don't need the face verification?
Hey everyone, I'm looking for good books on Generative AI starting from scratch. It would be great if they are available in PDF format and for free. Any recommendations?
Hi chat, Im a new learner on kaggle and im trying to make a notebook submission for the titanic survivor prediction competition. But even though my output file is created and visible, the competion wont accept the notebook when I click "Create Submission"
any idea why? I can send screenshots if necessary
testing post
I am trying to build a simple Streamlit app for question answering on a given PDF. What approach should I use? Which open-source LLM would be suitable? What model should I use for embeddings, and should I implement RAG or another method?
#❓┊ask-a-question hello everyone,i wanted to know the difference between Gradient Descent, Maximum Likelihood Estimation (MLE), and Ordinary Least Squares (OLS) wrt linear regression .If anyone know of some good article on it,please tell
#❓┊ask-a-question Hi, guys. I'm a student who is studying machine learning by myself. I recently start studying, so please understand me if my question sounds a bit dumb.
Recently, I'm struggling of dealing with feature engineering. I've heard that feature engineer requires domain knowledgement and it is one of the most crucial part in machine learning. So, I was thinking that, what if I just, make feature as a polynomial feature(like 2, or 3. not too much because of the overfitting) and then, eliminate some unnecessary features by using Variane Threshold or Lasso. Do you guys think that it can work well in any situation?
I want some advice for feature engineering. Thank you, senpai! (greeting from Japan)
#❓┊ask-a-question hello I am a student learning machine learning and I have 2 questions
Is there any good roadmap I can follow
What are some good books for ML and AI
Do you think using Intelisense is a form of cheating yourself?
3
5
2
No...if you are using the suggestions to help learn
🙃
guys my friend needed an help with a survey kinda from students in states
https://buildpad.io/research/63YT5Zt
(kindly share with ur friends studying in states)
hi everyone, im taking introduction to ml and dont understand a line of code the underfitting and overfitting part.
I was wondering why do they use the whole dataset to fit the DecisionTreeRegressor model after tuned with the best tree size, instead of training one.
Hi I am a data scientist and machine learning engineer. I have worked on many projects on Kaggle and I am now a Kaggle Notebooks Expert. I am looking to work on real world projects. can anyone guide me on where to find them?
Because once you've chosen the best tree size, there's no need to hold out part of the data anymore. You've already decided on the model's best parameters so now you want to give the model as much data as possible to learn .
Thank you 🙏, I did some searchings and they all said avoiding using a whole dataset to train the model in any case
If you are just a beginner, the you tube videos by Statquest Josh Starmmer and videos by Louis Serrano can provide you a head start. Simple visualization and short videos for understanding the subject in the shortest possible time in my opinion. If you require a ,"no code" visualization flow tools to experiment with data and various models then you can use open-source Orange 3.8x version along with various addons provided. Very easy to learn with a number of videos tutorials. Other open source tools I have experimented with are Weka, Knime.
Dear all friends, does anyone know where I can start as a data scientist intern, i completed the course.. please suggest me , or any guidance, you little help can be a big help for someone.. please consider
Do you folks have an online portfolio? What tech do you use to set it up?
do I look like an angel for any specific reason? 😛
Hello everyone! I want to start reading the ML/DS papers. Where can i do that and what kind or specific papers i should read firsly because i am beginer in the ML field 😀
hey i need some debugging help
trying to access the higs-boson dataset with this code
data_dir = KaggleDatasets().get_gcs_path('higgs-boson')
train_files = tf.io.gfile.glob(os.path.join(data_dir, "training", "*.tfrecord"))
valid_files = tf.io.gfile.glob(os.path.join(data_dir, "validation", "*.tfrecord"))
print("Found", len(train_files), "train TFRecords")
print("Found", len(valid_files), "valid TFRecords")
however, it can't seem to access this db
output:
get_gcs_path is not required on TPU VMs which can directly use Kaggle datasets, using path: /kaggle/input/higgs-boson
Found 0 train TFRecords
Found 0 valid TFRecords
i fixed the error,
turns out, the new name is higgs-boson-dataset, and there's no validation/training files. you have to make ur own, like this
data_dir = KaggleDatasets().get_gcs_path('higgs-boson-dataset')
print(tf.io.gfile.listdir(data_dir))
all_files = tf.io.gfile.glob(os.path.join(data_dir, "*.tfrecord"))
all_files = sorted(all_files)
random.shuffle(all_files)
split = int(0.8 * len(all_files))
train_files = all_files[:split]
valid_files = all_files[split:]
I have developed an app using Gemini API, now I wish to make it a web app and host for free on a server. Can anyone guide me on this?
I have designed the project in notebook but I want to make use of other web developing languages to build an interactive single-site website.
Hi all, this is my first post here!
I'm working on a regression problem where the target y is a function of four features: X1 and X2 are continuous, and N1 and N2 are integer "family/group" IDs. For each family (N1, N2), y is a function of X1 and X2, and y is continuous. Each unique (N1, N2) defines a unique y = f(X1, X2), but calculating y without ML is very expensive and requires a lot of supercomputer time. So, we want to use machine learning to predict y as accurately as possible, using as few training examples as possible. Ideally, we'd use just a handful of data points from some (N1, N2) families (and their X1, X2 values) and then be able to predict y for new (N1, N2) families reliably.
My current pipeline holds out all data for one (N1, N2) pair to simulate predicting for a new family, trains XGBoost on the rest, and evaluates. With lots of (N1, N2) pairs in training, results are excellent (R^2 about 0.99). But with only a handful, generalization to new pairs is poor.
What feature engineering or model strategies would help generalization to unseen families?
- Are embeddings, one-hot, or other encodings of N1/N2 better?
- Would neural nets (Bayesian, feed-forward, etc.) help vs. tree models in this case? (I'm more comfortable with trees.)
- Any best practices for uncertainty estimation?
- References to similar few-shot tabular regression problems?
Any advice or references would be greatly appreciated!!!
Hey! Is there anyone participating metal kaggle hackathon competition?
I have a question regarding llms.
I'm a bit confused about which path to choose for the long term, as both have great potential for work. On one hand, building models from scratch, including pretraining, optimization, and evaluation, offers a lot to learn. On the other hand, working with RAG, vector databases, agents, LangChain, and Hugging Face also seems promising. I want to focus deeply and excel in whichever path I choose.
According to the current job market and the oppurtunities all over the world where should I go?
Hey! @everyone I have started learning machine learning recently. If anyone expert in this field can guide me, it would be a great help to me. I want to know how to learn it and where to learn it as there are tons of resource and i am unable to select which one should i pursue. My end goal is to be a data scientist. ( dont mind my grammar please)
You should see some videos about data science roadmap.If you have the proper prerequisites You can start the '100 days of ml' playlist by campusx and continue if you like it. Then you can move to deep learning and so on
I have covered all the prerequisites required... do i have to learn abouts scikit library or any other Library beforehand or i can learn within the course which u suggested?
Currently i am learning from krish Naik and cs229 by Stanford
Scikit learn itself falls under machine learning. If already you have the other prerequisites then you should start as soon as possible
I tried everything I could to debug it. Is there anything else I can do? Any help would be greatly appreciated.
Is it common to get this type of speed while adding input from a datset (that was created from the output of a notebook)?
The dataset if of 4GB T_t
lmao
Was running a huge notebook that it lagged my computer when opening it, after that I cant run it again and even after deleting it I cant create a new notebook or importing a data source or run my other notebooks or do anything else. Thought it was my fault and I started to panic at 1AM, hell I even linked my Kaggle with Discord just to find some support here, and after 30 minute of trying everything I could I saw an official message about site-wide availability issues, so I'm not the only one experiencing this right?
I wrote a retrieval system using UV with a lot of custom models, but it has to be offline for the code competition. I am considering whether there are other solutions besides the package environment. Wheels will encounter a specific package build failure error, and there are several pitfalls.
In the end, I did this (Thus it work, but time cost heavily)
- Upload all models and the File as a dataset
- Build a
.venvin an Internet-enabled notebook - Extract the
.venvand run the test data in the competition submission.
hi guys. i need help, trying to train a ml model for melonoma classification. used isic dataset w 9 classes but it was highly imbalanced. i tried data augmentation and several different ways to reduce overfitting but i can't get an accuracy more than 60%. i need at least 85% accuracy for my project. i have short time left.
using free google colab rn and can't use any dataset more than 8000 datas. my ram is 8gb. i want to try binary classification and working on it rn. do u have any advices? this is really important for me.
It could be a case of underrepresented classes in the training data?
In which case you might want to partition the larger classes then pair them with the underrepresented classes, or modify the loss function to add weights to particular classes
Is it Legal to Share Public Instagram Profile Data as a Dataset on Kaggle?
i am having a problem in power query .
will anyone help me ?
problem is my order date in text format there are no null and error value but when i tried to convert to text to date value then its create a lot of error value .
i tried to take help from chatgpt but it wasnt work .
I am somewhat of a beginner I have submitted in like 3 to 4 competitions (like titanic and house prices )with like mediocre rank and accuracy my questions are
- what sorts of compititions should I participate in
- how do I get better or learn how to get better
This will entirely depend on what field of AI/ML you want to go into. Genai, computer vision, etc. For me, I am interested in generative AI, so I generally look for competitions that had their primary focus on generative AI, like the Gemma 2 competition. To get better, just keep on learning, and stay very open to new opportunities or chances to gain more knowledge. That is how I got better, at least.
Like for basic Machine learning with say titanic , house prices etc type of compitition with predicting RMSE,MAE,MSE type evaluation metrics and sort of datasets where you have to regression or classify based on numerical or categorical columns basically I am a beginner so I am trying to avoid all the time series , NLPs , computer vision , NNs , generative AI or anything along those lines which is like a bit deeper stuff till my skills can be in some sense considered intermediate to some degree like I know what I am doing sort of thing in like normal machine learning contests ( think titanic houseprices etc but not limited to , for some examples as to the type of compititions I am talking about )
Here are a few questions I have
- I have finished the statquest Machine learning playlist + learned about data preprocessing steps and know how to use pandas ,matplotlib, seaborn ,scikit , numpy from certain videos as well as know about hypertuning . is there anything else theory wise I would need to know for being able to perform well in compititions
- what differentiates a good score from bad one is it knowing when and where to apply well-known preprocessing steps , knowing which features to engineer and which model to select or does it require some other knowledge basically do I need to learn more theory wise to perform better or do I need learn what to do when with what I already know theory wise in application
- how do I decide and when to sort of specialize in a certain field of AIML and how to get info about these fields
Hi there, I am new to Kaggle, and mainly AI. I am looking forward to learn how to code AI. I hope that using this knowledge, I can make my own AI models and apply it to things I do in free time: robotics, drones, things of that nature. However, when I try understand information from the internet, I get lost immediately. I really want to get a solid grasp on this aspect of programming. I am certified in Java as I recently passed a Certiport IT Specialist certification for Java. I do know some python too, as I have taken such a course last year in school. But I still am confused.
hi kagglers,
when you guys first enter competitions, do you have to do a lot of research to gain domain knowledge? in past competitions, the way winners code all show that they probably had a really good understanding of the comp's domain. do they just know this because its their profession or do they spend a lot of time researching?
In my particular case, I was pulled in order to do the backend for a framework. So, in this case... I would say most people have a framework in place and they create modules that apply to that particular problem.
That's my observation being here for less than 2 months.
Hey yall Im Prana. New to python. I have a background in r-studio. There is big knowledge gap even though I have a masters. I’ll give it month or two and I should be engaging more on the rooms. For you intermediate and experts are there any discord groups your recommend a beginner like me to join or any type of platforms. I just need people to bounce ideas back in forth on topics such as LLMS used and things they learned, no judgement ignoring zone. You can dm me.
Hey @nova rain 👋
I only have a bachelor's in physics and pure math—ended up dropping out of my master’s in pure math due to a mix of financial and personal reasons. Honestly though, degrees and titles only go so far. They're supposed to be keys, but it’s really the network and the collaborative energy that makes things move.
What helped me most was bouncing ideas around with others and building multi-agent LLM constructs together. A lot of gaps in my understanding got patched just through collective exploration—kind of like debugging reality with friends. 😄
What kinds of ideas or frameworks have been catching your attention lately?
hi, i wanna select the good feature has a relation on Risk level
but when i use a chi2 get result 10 columns has relation on risk level but when check it on corr
i found there is not any relation on risk level
activation function? let me know detail
Hi, This generally means the worker process crashed.
Check for out of Memory
You can use Jupyter Notebooks for experimentation and communication.
but Use an IDE for writing real code, testing, and maintaining larger projects
Hello! Can anybody helping me with this question? import dask.dataframe as dd
from dask import delayed
from fastparquet import ParquetFile
import glob
files = glob.glob('/kaggle/input/hms-harmful-brain-activity-classification/train_eegs/*.parquet')
@delayed
def load_chunk(path):
return ParquetFile(path).to_pandas()
df_train = dd.from_delayed([load_chunk(f)for f in files])
df_train.compute()
I was working on this code, but encounter error:
how could i change my name?
it would be better if my displayed name is in english or at least alphabet
has anyone done mnist using alexnet ?i need help regarding it
Hi , i am working on creating a intent classification along with topic modelling, wanted some directions on how we can use rsics dataset. And has anyone experience with annotating data set https://www.kaggle.com/datasets/veeralakrishna/relational-strategies-in-customer-servicersics/data
Hey i want to learn machine learning from where i can learn. Any suggestion
Been having some problems with the submission of the Prediction interval competition II: House price eventhough my submission is being generated at the end when i try to save my version it shows that there is no output can anyone help me out why is it like this
Hello everyone,
I’m a fourth-year AI major at Damascus University.
Over the past three months, I’ve been following the AI learning path on DeepLearning.AI. I’ve already completed the first course in the specialization and am currently working through the second.
As the concepts have grown deeper and more varied, I’ve realized that I’m not yet flexible enough with coding: I often struggle to understand existing code and to write my own. This stems from the limited problem-solving experience I gained during my earlier years at university.
I’m now looking for ways to strengthen my coding knowledge and practice while I continue these courses.
Can anyone help with that?
can someone help me go through a research paper? Its the 2023 paper on BitLinear
So I am an high schooler junior (11th grade) who has just entered summer. I happen to have some free time and I was wondering to ask how I should start and go about learning ML and start actually being competetive in competitions. Since you all are mostly professionals I knew your advice will be of great help! So what should I do, and how should I start?
U missed inaio this year already
-# @glossy crown runner up boy dm him the groups link
Has anyone used the Microsoft Azure ML stack for building and running models? We will be adopting it and I'm curious about pitfalls and limitations we should be aware of or avoid. https://azure.microsoft.com/en-us/products/machine-learning/
Build machine learning models in a simplified way with machine learning platforms from Azure. Machine learning as a service increases accessibility and efficiency.
So what are these sessions like?
I was thinking if there are any specific resources or books
But course are also fine if they help me reach my goals
Contact @glossy crown
🔬 AI Hardware Research - Need Your Input!
Hey everyone! I'm doing research on AI/ML hardware accessibility in India and would love to get insights from this community.
Quick backstory: GPU prices in India are brutal, cloud costs add up fast, and there might be a gap for more affordable AI-focused solutions.
3-min survey about:
Your current setup and pain points
Budget constraints and preferences
What you'd want in an ideal AI GPU
Cloud vs local experiences
Drop your thoughts in the survey: https://forms.gle/UHGh1kzK1p9uiJSb9
I'll share the results here once I have enough responses. Your input could help identify real solutions for our community! 🙏
Feel free to share with other AI folks who might have insight
We're researching the need for affordable GPU solutions for AI/ML work in India. Your insights will help us understand current challenges and potential solutions. This survey takes 3-5 minutes and your responses are anonymous. We'll share the findings publicly to benefit the AI community.
Are we allowed to vibe code /chatgpt the code for hackathon projects? Do judges typically care how the code was written?
please ping me
I am a BSCS student, and I’m starting my Final Year Project (FYP), which I need to complete within the next 1.5 years. I’m working alone on this project and would like to build something related to Artificial Intelligence or Machine Learning. Although I’m a beginner in AI/ML, I’m a fast learner and actively working to improve my skills in this area. I would be very grateful if you could suggest some suitable FYP ideas along with a brief description of each.
let me know anytime if you need my help
Ideas for my fyp
i'd just go around on web, searching for anything that interests me. I'd try to find the info that i'd personally want to know. If i was you, i'd never start doing any project from a suggestion of some random person from discord. It'd a big deal, and you'd better approach it creatively and proactively. It's ok to not know, but try to work out your system of inspiration - a documentary, a book. There are definetely tons of questions that you could answer
That actually makes a lot of sense. I agree—starting something just because someone else suggests it can feel empty. I think I need to spend more time exploring what genuinely sparks my curiosity and build from there. Thanks for the insight!
does anybody know if for the new kaggle gemma competition I need to have the project open source? or it should only be available for the judge to review?
hellooo!!, which country are you from 😄
Hi all, after completing some theories of ML and doing some toy projects on data science , I applied for internships. Recently I got a callback and they have given me a take home assignment for a predictive modelling task. Now the thing is the dataset they provided is large like too large for me , and I haven't worked in this big dataset like this before and also I use google colab. The shape of dataset is (167020, 217).
What I need is Can anyone help me in how to approach this problem and solve this. All I am asking is that you take a look at the dataset and provide me with suggestions on what all to do step by step , I will do it myself. If you are willing to help , lets connect.
Thank You
How much ram & vram does GPU T4 x2 have in total?
does GPU p100 have more?
Or tpu vm v3-8
i started learning ML, and I did the Titanic competition already. Is there any recommendation that to do next? I'm stilla beginner, so I want to do an easy one.
try housing price
pridiction
Thanks
i think its network issue or enviroment issue
try it on jupyter notebook
is there a solution if i want to use kaggle
Is an 80% accuracy for a random forest clasifier good?
Hi guys, probably a stupid question, if i import a model from huggingface then can I use it offline ?
Yes you can
if it beats baseline then yes
The country where u spent past 18 years
Hey guys I'm currently working in OCR, and i have a question
I couldn't ectract "Good Morning" or any text from 1st image even though i can extract text from second image with good accuracy
I'm currently using tesseract and do you why this happens?
Can you suggest some other approaches so as to extract "Good Morning" from the first image?
Hey guys. I know it's 3 months late, but I wanted to finish the 5-day Gen AI Intensive course. My question is whether it is possible to gain the certificate even though I am late? because the last email from Google Event said, 'The badges and certificates will be added to your profile by the end of April 2025.'
It means a lot if anyone can answer?
Certificates not that important really
look at the photo which i attached in post .there are two column of date and time now i wanna extract the difference between two days like 3 days 16 hour .
now how i can do that in excel power query .
i tried to get help from chatgpt but it wasn't works .
Hey i am student and we have a class in data analytics, i dont realy get how i can improve my accuray in logistiic regresiion, rnadom forest, knn etc..
can someone help me there pls
Hello.
I am having some trouble with my local experiment and would like some help.
I want to analyze the transcription of recorded conversation audio data, so could you recommend a model for that?
Considering that this involves recording conversations, I would like to identify the speakers as well.
Can you suggest a model for various experiments?
I would like to gather opinions from people who have participated in competitions or analyzed audio data that includes conversations in the past. Also, if there is any recommended external data for training, please let me know.
Are we allowed to share github or Streamlit Cloud url for example to share scripts or apps or only kaggle url are allowed ?
Hello World! I have a problem with uploading link to Youtube in competition gemma3n. I'm trying to upload good URL of my Youtube-video, but got an error "Invalid URL", but URL is truth!
In script videoUtils.js on website there is an error in string:
fetch("https://youtube.com/oembed?url=".concat(url, "&format=json}")).then(function(res) {
"}" after "json" is unnecessary
Fix it please, because its impossible to upload video and take part in competition
subscribe this channel and see the latest update and learning video from here :https://www.youtube.com/@AuraaiX/videos'
A U R A • A I
Welcome to Aura AI, where we explore the subtle yet powerful presence of Artificial Intelligence shaping our world. We go beyond the headlines to uncover the essence of AI, demystify complex concepts, and illuminate its future impact. Join us to understand the technology that's defining tomorrow. Subscribe for deep di...
Hey Guys,I am seeeking recommendations for an impactful AI/ML project that would strongly appeal to product based companies when they are hiring. The goal is to maximize my chances of securing a job.
Please do suggest me asap : )
Can Anyone explain what a JSON really is????????????
Hi, I have recently made a notebook with plotly graphs, it shows perfectly in Editor mode but it is not showing on Kaggle's notebook viewer, even when I used .show(renderer='iframe').
Notebook: https://www.kaggle.com/code/stevensio/kaggle-journeys-cohorts-and-competition-shifts
I tried to look at it on my girlfriend's macbook and it shows, but somehow, the graphs don't show when viewed on my windows laptop. Could this be an OS issue?
Would really appreciate if anyone could find the issue in my code
It is like DataFrame or a dictionary or multidimensional array but lightweight and better for transporting data
helloo guys actually im trying to
dynamically update my second dropdown (car models) based on the selected company using JavaScript and Jinja2 inside a Flask app. I’ve built the Jinja2 dictionary and the script looks correct, but the dropdown doesn’t update when I select a company. What could be causing this issue—am I missing something in how Jinja2 and JavaScript work together
i have also cleaned my data for it my model is ready to predict but im stuch here
though im following a tutorial its not helping its not actually working
the script is given below
<script>
function load_car_models(company_id, car_models_id)
{
var company = document.getElementById(company_id);
var car_model = document.getElementById(car_models_id);
car_model.value="";
car_model.innerHTML="";
{% for company in companies %}
if(company.value == "{{company}}")
{
{% for model in car_models %}
{% if company in model %}
var newOption = document.createElement("option");
newOption.value="{{model}};
newOption.innerHTML="{{model}"}
car_model.options.add(newOption);
newOption.value ="{{ model }}";
car_model.options.add(newOption);
{% endif %}
{%endfor%}
}
{% endfor %}
}
</script>
are there any spaces to discuss ml related queries as a student? like a community of students in the same field or smth?
Is bytedance seed 1.5 vl by far the best img understanding model
And for edge devices Florence 2?
when dealing with time series data, how do we know if there is serial dependence in the data or not? is it a question of using domain knowledge or should we use methods like lagging and time step each time to check this thing?
Hello just wondering if there are any tips for clearing memory from RAM and the GPU? I've been running into out of memory issues using Tensorflow. I tried using a generator in an attempt to reduce RAM for one, while also trying to track and delete every dataset variable related + model variable while also doing gc.collect() and K.clear_session() however I've noticed that for whatever reason, the data I load into the model for model.fit() just sticks in the RAM. Tried deleting that generator via del, assigning it to None too, and I can't seem to get rid of the sticky reference.
Am attempting to do a kfold run but because of my issue it looks like I'd have to restart the kernel every time instead? Also tried running a subprocess and running the fold from a script but it doesn't seem to change anything. Advanced thanks for any replies!
Hey everyone! Trying to build a recommendation system for products for a deal website.
We have a few heuristics of how good a deal it is and how popular a product is (deal score and pop score)
Our current deal gallery just optimizes for these heurisitcs. However, we're running into some issues with product diversity. There's certain categories of products that are discounted more frequently and thus appear in the results more frequently.
Our data is stored in postgres, but we load it into polars (in python). We also have milvus set up as a vector db so all our product titles are vectorized as well. Wondering if anyone has any ideas on how to solve this diversity issue
I'm pretty new to ML/data science but one idea was k means clustering based on the titles or vectors? Or maybe this is too naive of an approach
hi guys , its my first time discovering this Machine Learning world ! I have a question tho ; have anyone made any money using this ?
some have made a lot. But this is no get rich quick scheme. Its like any profession; you can make money in football but you have to be top 0.1%
i am new to kaggle....can someone tell me ...How to get started????
hi everyone ,https://github.com/campusx-official/ML-Roadmap-for-2022/blob/main/README.md what do you think about that roadmap in 2025?
i dont necesseraly aim getting rich quick , it never was the goal . I do need to work online tho as a starter
Fr either be in top 0.1% in ai or 've those ppl
Anyone interested in video gen models?
Can anyone help me out I am working on data scientist project thing where I am doing licence plate prediction
I used yolo and ocr to build the models
I have data sets which contains cars and also cropped images which has only number plate
The thing here is the number plate is in Urdu and the ocr model is isn't working properly the number are being detected but not the arabic or urdu text
Try this ocr model https://huggingface.co/microsoft/Florence-2-base-ft
i wanna use this model but memory crashing in both collab n kaggle notebook https://huggingface.co/docs/diffusers/main/api/pipelines/hunyuan_video?usage=memory
anyone can help me fix the issue here https://colab.research.google.com/drive/1wEF2RmuNUyRNInr8DrgbqFj4ni7pmrVq?usp=sharing
Anyone knows good books on text2video models
what issue?
Torch ka koi version mismatch ho rha shayad
Khud dekh lo dono me yaad nhi aa rha
hello everyone.
I am beginner in kaggle.
below problem is exercise of lesson2 from "Intro to Machine Learning".
why this error occured and how can I fix?
- problem
import pandas as pd
iowa_file_path = '../input/home-data-for-ml-course/train.csv'
home_data = pd.read_csv(iowa_file_path)
step_1.check()
-error
NameError Traceback (most recent call last)
/tmp/ipykernel_36/423311291.py in <cell line: 0>()
8
9 # Call line below with no argument to check that you've loaded the data correctly
---> 10 step_1.check()
NameError: name 'step_1' is not defined
I am looking for someone to help me with how to do the exercises in the [Intro to Machine Learning] course on Kaggle.
[https://www.kaggle.com/code/wonderfulexcellent/exercise-your-first-machine-learning-model/edit]
did you setup the notebook before doing all this?
thank you for your help.
I installed the notebook by running the code that appears first at the start of the exercise.
installing code is below
Set up code checking
from learntools.core import binder
binder.bind(globals())
from learntools.machine_learning.ex2 import *
print("Setup Complete")
============
is this right ?
Thank you.
I just tried it again your way and moved forward with the practice.
nice good luck
It's good
Hey guys, as a full stack engineer who have more than 6+ years of experience with frontend and backend, devops, databases (SQL + NOSQL) and AWS.
What's the best way to start learning AI and go for roles like AI Engineer. Or make myself capable to build AI solutions for different industries?
anyone can pls ans this https://x.com/grok/status/1948736131455222227
@noob_contrarian xAI partners with Azure for Grok's enterprise-grade scalability, security, and content safety features, essential for reliable global deployment. While RunPod is cheaper for basic GPU rentals, it lacks Azure's robust infrastructure and integration, making it unsuitable for our
thanks for your review
Learn linear regression and logistic regression first, it's the basic for any neural network. Then it's upto you to learn other machine learning models or direct jump to perceptron ( basic unit of neural network)
Understand preprocessing data then
Understand neural networks
Later for understanding language processing study NLP
Understand RNN's LSTM AND GRU
After understanding all this learn CNNs for image processing
After this you are ready to understand LLMs
Learn transformer architecture
Learn BERT model architecture
Learn to use langchain , huggingface
Make rag applications
Like pdf summarizer
Caption generators
Learn to use llm inference
After all this learn to finetune models on custom datasets ( require gpu)
Learn to do lora, peft fietuning and Quantizing models
After this learn to use langgraph and connect llm with looks like sql data retrieved, web search tools
You will understand agentic ai
For more advance topics like deep seek r1 you have to learn reinforcement learning
Prompt engineering for prompting the llm
Building llm fromscratch of deploying finetuned models on huggingfave for specific tasks
For learning all this you have 2 framewoks one is tensorflow, another is pytorch.
For llms ai 85 % new published papers are written in pytorch so recommend you to go with pytorch.
For web application you can also use JavaScript for building web application but python environment is more mature and better so going with python gives you more advantages
Last you can also learn about mcp servers to plugin ai with internet
I'm not professional but i hope that i might be helpful to you
Wow that looks interesting, However, I was thinking, Is there any free resource or video that I can follow to learn these things step by step? Or if you share any tips and tricks to learn this things step by step ?
Simple steps
Learn regression because neural networks are related to it
Skip other ml. Model in development you don't need them
Learn neural networks, activatiin function, loss functions, optimizers these are the things we need in llms too for finetuning or training our llm on custom. Data
You can just skip rnn or lstm. But they gives you the starting understanding of processing natural language
And for preparing our language flr machine fitting ready we need to understand how NLP WORKS.
So NLP knowledge is important like knowing what is Tokenization etc
Learn transformer model to understand how llm. Works and how they were made then simply go with development
Use langchain, huggingface documentation help
Resources are : YouTube channels, medium blogs, huggingface, langchain, udemy cources
@rancid terrace Gotcha, Here is the summary I understood:
-
With the current knowledge I have, I am ready to start learning about regressions. So basically my first step will be to learn what regression is. Nothing else for now. Just learn about regression
-
Then the second step will be to learn neural networks, activatin function, loss functions, optimizers.
-
Then learning rnn or lstm is optional (I will definitely learn it)
-
Then I need to start learning NLP for natural language processing
-
Finally I need to learn transformer model to understand how llm works
-
Now I am ready to start development. So this is Part 2 of my roadmap where I will make my hands dirty by working
-
Start with Langchain and hugging face documentation
Please feel free to correct me If I am wrong. Thank you brother!
Everything is fine
Best of luck
Focus mostly on development
Understand the concepts like regression don't start building from scratch
Langchain hugginface handles everything
Like NLP neural networks etc
Does anyone here know how to integrate the results of a feedback survey web app directly into a model's pipelines, so it can reference the survey's data and learn from it without a middleman doing it?
anyone can help pls fix it https://www.kaggle.com/code/ashwinbarnwal/superweights
not getting how to login to hf using hf cli in order to confirm the acess granted to use hf models
you are a cool dude
Just like the book " how to win and influence people " says
If you want to be a great leader first learn to understand what other needs and what you can provide to help them .
that is so good.
Pls stop spamming dude we don't care what Mr beast is doing ( he ain't providing us gpu for llms )
Hey would this same be applicable for a fresh college graduate
Depends , if you are going for ml or data science or genai engineering
He asked me for mostly development
I want to learn to be a ai engineer and applied knowledge for generative ai will help
I asked one of my senior - Statistics
Linear algebra
Calculus
Panda numpy polaris
Handling and visualization
Scit leave xg boost adaboostlinear regress
Support vector kernal hyper plane
Svm random ensamble method
Bagging boosting
Basic dl
When to use what
Preprocessing of nlp tradeoff nltk spacy
Tokenization
Transformer why diff
Prompt
Rags how does work
Vector db and indexing
Azure, AWS gcp
Agentic ai langchain graph
Pipeline memory agent orchestration
Fine-tuning
Llm which what when why - any 3 architecture
Projects - problem solution
He told me to concentrate on this first
Check for the os path of the file , is it correct or not , check for network connectivity
Who do you need to learn svm , xgboost etc . You will never use them in ai
They are basically used to predict in a given dataset
For ai engineering learn regression in detail that's sufficient and learn deep learning with NLP ( tokenizqtion etc ) , later understand encoder decoder architect , attention mechanism , transformer architecture .
Learn generative ai , langchain , langgraph , agentic ai , MCP servers , rag applications ,chatbota , computer vision , reinforcement learning etc
For ai engineering go with pytorch because it's currently the most popular framework , all the researches are in pytorch
For making production ready and easy structure go with tensor flow
🙂
Thank you for the advice , can you recommend some of the resources to learn this.
what if both have no problems?
I see thank you.
Ye why not
YouTube : krish nayik , campusx , huggingface , langchain , openai courses
Blog / articles : geeks for geeks , medium.com , etc
There are many YouTube channels that also teach something like neuro.. something
3blue1brown
Hi, I am.new to kaggle had a question regarding the submission criteria. What does it mean that my submission has to have a runtime less than 11 hours? Does it mean that the entire inferences should be completed in less than that?
And also the no internet rule. If my code relies on some packages like nibabel or idk torch then how does the no internet rule work.
Thank you for your help
does lmarena dont come with tools
Hello everyone I am new to kaggle , can anyone explain me how to participate in kaggle hackathon?
Hi, I was excited to participate in the code-golf-2025 tournament, but I just saw that my country (Venezuela) is blocked. Will I really not be able to compete because of my nationality?
I am working on on the RSNA Intracranial Aneurysm Detection competition and I have a decent graphics card on my home computer. I have been training models for the past 24 hours on the full dataset getting these values:
[Epoch 1] Avg Loss: 1.3610 | Val Weighted AUC: 0.5800
[Epoch 2] Avg Loss: 1.3071 | Val Weighted AUC: 0.4637
[Epoch 3] Avg Loss: 1.3052 | Val Weighted AUC: 0.5383
[Epoch 4] Avg Loss: 1.3069 | Val Weighted AUC: 0.6006
[Epoch 5] Avg Loss: 1.2967 | Val Weighted AUC: 0.5206
[Epoch 6] Avg Loss: 1.3007 | Val Weighted AUC: 0.4524
[Epoch 7] Avg Loss: 1.3029 | Val Weighted AUC: 0.5068
[Epoch 8] Avg Loss: 1.3083 | Val Weighted AUC: 0.5367
I am curious to know of other ways to make my training faster. My computer isnt trying terribly hard, its using most of its RAM but those 8 models have taken about 24 hours to complete. Can anyone help me out?
Hi! Our team is participating in the CMI – Detect Behavior with Sensor Data competition.
We’ve enabled “Always use latest version” in our Kaggle notebook, but some teammates still see older versions when editing.
Just to clarify: Is it against the rules to use GitHub for collaborative EDA and model training only, without exposing any test data?
We’re trying to keep things safe while managing our workflow.
Appreciate any clarification — thanks!
for hyperparameter tuning, what are the go to methods? Ive been using gridsearchcv, but came across optuna and it seems a lot better
Hi everyone! I’ve recently started diving deeper into the world of machine learning. Before that, I was mainly into mathematics — particularly convex analysis and tensor optimization.
However, no matter how much I look around, it seems like most jobs/projects only value the ability to use existing libraries and build models like Lego blocks from pre-made components.
Do you know of any areas or projects where deeper mathematical knowledge like this is actually useful or appreciated? I’m starting to feel like it doesn’t really matter, which is a bit disheartening😔
why dont you do researchs then?
Because research can take years, and there’s no guarantee it will lead to success. That’s why I’m trying to focus more on finding jobs or joining projects that can bring a more stable or predictable income.
One way is to limit the data size. Research MIP and how to use it (convert 3d to 2d, so you can make your data lighter). Also maybe resize the images.
That's a good plan .
has anyone got any good resources for linear mix models?
Good ebening. I seem to have some difficulty accessing the data for the DFL - Bundesliga Data Shootout. Could I be so lucky that someone reading this could help me out? thanks in advance.
you can get a research job, there are many ML jobs that are for research position you know that right? it doesn't need to be something that will take years
Hello everyone I am a complete beginner like I did a few ML courses and wanted to do my first project so I chose Kaggles House Prices - Advanced Regression Techniques dataset but there is one thing that has been bothering me in this project ALOT
The problem I was having was what is an empty cell. This seems trivial but some categories have NA (not applicable) a string to represent the non existence of something for example BsmtQual it has a category NA for when the house doesnt have a basement so NA not applicable. Others like MasVnrType it has None as a category for when there isnt a MasVnr (whatever that is) and then the empty cells the ones that arent filled in they also have the string NA to represent them
So what I wanted to do was to keep the meaningful NA's and None's (I say 's but there was only one None, only in MasVnrType) and impute the empty cells. This all seems easy enough but caused me a whole lot of trouble here is what I did
houses = pd.read_csv("data/train.csv", keep_default_na=False, na_values=[""])
I used read_csv in such a way that pandas doesnt "help me" by converting all the NA's and nones into NaN's bcz then how will I differenciate between the actual empty cell and a NA or none
then
I split the test train and num categorical the usual stuff
after which I did this
meaningful_na_cols = ["Alley","BsmtQual","BsmtCond","BsmtExposure","BsmtFinType1","BsmtFinType2","FireplaceQu","GarageType","GarageFinish","GarageQual","GarageCond","PoolQC","Fence","MiscFeature"]
for col in meaningful_na_cols:
X_train_cat[col] = X_train_cat[col].replace("NA", "NoNoneNo")
X_test_cat[col] = X_test_cat[col].replace("NA", "NoNoneNo")
X_train_cat['MasVnrType'] = X_train_cat['MasVnrType'].replace("None", "NoNoneNo")
X_test_cat['MasVnrType'] = X_test_cat['MasVnrType'].replace("None", "NoNoneNo")
created a unique category so that the meaningful NA's and Nones dont get imputed
imp_cat = SimpleImputer(strategy="most_frequent")
X_train_cat = imp_cat.fit_transform(X_train_cat)
X_test_cat = imp_cat.transform(X_test_cat)
I then imputed and convereted this X_train_cat back to a csv for me to view and this happened
my MasVnrType had 4 columns the data description says it has 5 but my training data only had 4 so good nice. Then I did drop first so there should be 3 so out of these 4 MasVnrType categorical columns 3 are meaningful types like Stone or cement or None but one is just NA? How did that get in there?
I just need help with csv files and empty cells like can someone please explain in real life datasets how is an empty cell represented how does pandas interprets an empty cell do they put NA string none string NaN like please help me out I am so confused Idk where the problem happens is it the csv file thats formatted this way is it pandas or am I retarded
cheers
What does "plot_df = dataset_df.Transported.value_counts()" does ?
Thanks for the support — I hope that wasn’t sarcasm 🙂
That's actually a good idea — I hadn't thought about options like that (probably because I haven't come across them 😅 ), but I think this kind of direction really does feel closer to me.
Nope it's not sarcasm
No jokes while giving advice
@eager thicket when we are submiting our code for comepition should we keep our api or the judges use their api
can some one help me please
Hello!
I want to do the kaggle exercises at my work desktop but the company internet is sloowww and honestly jsut takes forever to run.
is there a way i can run the setup codes locally on a python ide?
Hi everyone! I have started course on Kaggle about Advanced SQL, but ran into problems when trying to complete the tutorials 🥲 This is my first time working with Kaggle notebooks and taking the course as well.
There was this error when i tried to execute the set up cell
/usr/local/lib/python3.11/dist-packages/google/cloud/bigquery/table.py:1727: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.
warnings.warn(
The rest of the tutorial cells don't seem to work as well. Could someone please help to solve this issue? 🙏 Thank you!
did refreshing the page solve the issue? what troubleshooting has been done?
Hi i have a question about the arc challenge i keep getting error in submission scoring error ? Am i missing something? I changed multiple times and edited but still the same?
No, unfortunately refreshing didn't help, and I don't know what else can I do to solve the issue
Should there be some set up done / something downloaded for code in kaggle to work?
this is a good time to get your discussions point on kaggle. Start a dissucsion on Kaggle for this issue.
Thanks for advice! I have read existing discussions, yet couldn't see anyone with the same problem. For some reason, I cannot start my own discussion (it says that I do not have a permission), do you know what might be the problem?
anyone pls help https://github.com/VideoVerses/VideoTuna getting error while inferencing
You could try contacting Kaggle support
what's everyones favourite dataset review/annotation tool? I would like to have a tool where I can import my entire dataset.jsonl containing questions and answers, be able to mark each item as correct/incorrect, and finally be able to update the answer if needed. Also, I would like to be able to share this dataset so other people can do them.
Once everything is complete, I would like to run evals directly on the dataset. And finally, run fine tuning. Does a tool like this exist? 😄
I feel that nowadays it is really hard to keep up with all the AI developments. How do you keep yourself informed with all these advancements coming daily? I personally find it very hard to keep track of all the sources (arxiv, new website, big tech blog etc). Interested to know your thoughts!
Hello everone, can anybody guide me on how to remember the use of brackets, symbols etc? i have completed variables and functions exercises on kaggle and implemented a VaR model on small data set in VS code, following a yt video but i can't even write a 2 line code without mistakes on my own without looking for help? Thanks
Totally normal! A few tips that help:
-
Use official documentation often — don’t try to memorize everything, just know where to find it.
-
Type code yourself instead of copy-pasting, even in tutorials.
-
Build small scripts like a calculator or to-do list to apply what you learn.
-
Read other people’s code to see how they structure things.
With time, the syntax will stick naturally!
Hello everyone, I recently made a basic spectrometer with a DVD's surface as the diffraction grating. I want to go beyond and make a device that captures the spectrum and makes the electromagnetic spectrum's waves RGB + other colours too in another wave plot (see SpectralWorkbench since somebody made it and it's cool) and also an AI model that would predict the light fixture type. How do I do it? Integrate all models in new hardware or use my laptop? Plus, coding problems. Could anyone please guide me through this?
anyone tried goolge's adk using uv
Hi, Im doing a competition for house prediction and saw someones notebook and saw that wahtever they did to the train.csv they did to the test.csv Like fi they dropped a certain feature they would do the same to the test.csv. Is this normal?
Hello @civic seal yes that's perfectly normal. Compare it to learning for an exam yourself, if the teacher says: don't learn chapter 3, it would be unfair to put chapter 3 in the exam.
Are you also supposed to drop features on the test set like if you dropped 5 features on the train ebcause its 90% missing you would also drop the feature on the test? I also saw somenoes notebook do this
hey,is there anyone know how to join bigqueryai competition server ? im unable to join
yes same
Do you mean this https://www.kaggle.com/competitions/bigquery-ai-hackathon ?
What's going wrong?
Thanks for the suggestions, i will follow.🫡
So why do people do a train, val split with the "train_test_split()" function rather than just go into tarinning on all the trainding data and then predicting onthe test data?
So i was doing another kaggle competition today and I trained the model but when it came to predicting on the test i got a error aout how the shape was off (features werent the same). And i realized i had to add the same adjustments to the test dataset as i did to the train. But I had to go back and manually add everything. Like if i did train_data.drop('Name', axis=1, inplace=True) I would have to also add test_data.drop('Name', axis=1, inplace=True) is there a more efficient way of doing this?
if u dont mind give it a try once
What is it exactly? Antoher gemini?
no way give it try urself its way different
yeah you could drop the features before splitting into train and test
Can anybody tell me the 100% free to do online certifications which have high value for ai ml job which are offered by top companies or institutions
You should try
Elements of AI – University of Helsinki & MinnaLearn Or IBM SkillsBuild
-# anyways anyone would like to fix the broken docs page https://github.com/Ash-Blanc/paper2sw.git
hi! I'm planning to launch a community competition and I'd ask for some advice regarding it. So far I asked permissions from the data providers, and I have a rough idea about the compeitition itself. Is there a guide I should follow?
it'd be a timeseries problem and I'd provide a simple data dictionary and I'd use MASE as metric
Hey guys, I'm planning to train a 123million parameter model themed J.A.R.V.I.S (yes, that Jarvis from marvel). I'm carefully selecting the data it gets trained on to get the best results but I have a slight problem. The model will be a conversational one and it will have lots of memories (there's a database for that), from which it will need to query the user responses and put them together,summarize and reply. I have no idea how the data format should be for that kind of thing. I was thinking JSON with alpaca but I'm not so sure. Any advice?
hi. I am a beginner in AI.
Should I code everything with Python and Math Libraries only to understand everything clearly,
or should I use available AI libraries like PyTorch?
Thanks in advance.
It depends on your goals. To be honest people put way too big emphasis on the models. Somewhere I read that professional projects are about 10% planning, 80% data preparation, 10% model training.
Anyway, if you're interested in the math and implementation part https://www.youtube.com/watch?v=w_2vCijLiiM&list=PLkDaE6sCZn6FNC6YRfRQc_FbeQrF8BwGI&index=16
my goals is to be good in ML enough to create a AGI
i'm not doing this just for fun. ML is my life.
like artificial general intelligence?
yes
yeah i've taken that course on coursera
so do you think i should begin with python and math first or should i use pytorch and tensorflow right away?
i'm doing the titanic challenge
that's my first time building a model
about AGI I don't want to discourage you, but as you'll learn more and more you'll understand how far we're from that 🙂
yeah, but just set that aside and lets get back to my question
i don't know if i'm clear enough, but i'm not talking about learning what first, but doing what first, because i've learn the basic, and i'm trying to build a model, and i don't know should i build it with pure python and math or with AI libraries.
That's just my opinion, but I'd first look for a smaller project.
Find a field that benefits from deeplearning, let's say agriculture. (you can check on kaggle, or get vague answers from chat bots)
Than you should narrow down the branches of machine learning needed (shallow/deep, [un]supervised etc.), you may check what is hot topic of research at https://paperswithcode.com/
You have a domain/field, what is being or can be used there. If it is agriculture after some search at kaggle reveals that there are several computer vision problems.
You've done all these, now set a smaller project, check how people solve that problem, just pick a good looking notebook, repeat it using the documentations of the used libraries as pointers, try to find the framework being followed there.
i appreciate your advice, but that's not related to my question man
actually i've got to go now, see you later, and add me on dm too
I'd probably go though a simple multi layer perceptron model in pure python, maybe some common layers (CNN), but I wouldn't go much deeper than that. Once you have that covered go to github and check how prefessionals solved the same problem.
I'm just a hobby guy, so I'd focus on practical applications after that.
thanks for the advice
Does anyone know why my subission file is getting saved like with the C1 and C2 as column name but when i do the head() of the df it will not show that. (this is the reason my submission was wrong). My code for saving it is this:
submission_df = pd.DataFrame({'PassengerId': test_data['PassengerId'], 'Transported': test_predictions})
submission_df.columns = ['PassengerId', 'Transported']
submission_df.to_csv('submission.csv', index=False)```
Im having trouble connecting to the localhost on Dbeaver.
which error?
I cant add a screenshot here for some reason
It says:
Connection to 'localhost' cannot be established.
Reason:
Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
Tried creating a new connection and then it told me:
Connection refused: getsockopt
Guys can someone say where to start with in studying about ML LIKE A RAOD MAP AND STUDY PLAN like how you guys started
@qafig ask him
What does that mean?...is that a username?
In neural-network regression, should model performance be independent of the number of output variables? Specifically, given the same dataset, will one multi-output model that predicts three targets perform the same as training three separate single-output models (one per target)? Why or why not?
Not necessarily. If the three targets are totally independent, then training one model with three outputs or three separate models could give similar results. But in most real cases the targets share some structure. A single multi-output model can pick up on those shared patterns, which sometimes helps. On the flip side, if the tasks are very different or unbalanced, training them together can actually hurt because the model has to compromise. So the answer really depends on how related the targets are and how much capacity the model has.
Does anyone know how long emailing support@kaggle.com takes for a response? They are not responding to my attribution update request even though the kaggle website UI explicitly says to email them.
Hi ,how can I upscale video on kaggle?
do kaggle have any official mcp server
It will be good if you can share your background? Stat? Maths? CS? Unless you express where you stand 'where to start' will fetch less meaning response.
hello everyone i'm hari and I'm new to kaggle how i can upskill me on this platform
I see in a lot of job descriptions that one needs to have hands-on experience with LLMs (train, fine-tune etc.)
Does anybody know some relevant links/documentation from which I can learn about LLMs
Also can you suggest a project that would prove the knowledge needed for a junior job?
Thanks
Sorry if this is a stupid question but how to use packages that require internet to download for offline submission?
download the wheel files and put them in a dataset and then link that dataset to your notebook, and then you can install the packages from the wheel files using pip without internet! 🙂
I have been trying to register for a competition for weeks, but PersonaID refuses to recognize/verify my face. I've went back and forth with support just telling me that they reset it and to try again. Now its the day of the competition and I've done the work but I'm still not able to register for the competition and submit my work.
are spontaneous applications effective? i'm starting to look for an internship. Also i'm thinking about a good prompt to generate a cover letter for each application/company
Who here understand docker (beginner or expert).
Need help with dockerizing an ai project
I completed the ID verification, including face recognition, but I still cannot receive the SMS verification code.
plz, I need help
I can help, if you can tell me where you need help
i have a project uses ollama models
i need someone to help with making a docker file that
- installs ollama
- pulls appropriate models
- installs python libraries
adds some argument to docker
can any one explain how to install SVD and import it , i am getting ModuleNotFoundError: No module named 'surprise'
this is on a kaggle notebook right?
just add !pip install suprise to the top i think
the ! tells juypter to execute a terminal command
to install surprise - SVD we need to downgrade the version of numpy but still there is same issue
Ai blog generation project -
Issues -
At regeneration time, I have issues of overall quality improvent, user instructions follow, Alignment with original context, factual accuracy, true generation ( not rephrasing, new and true content ) , also consider latency, cost, scalable
Data flow ---
At regeneration, I am giving data past conversation ( first time default data- blog generation, heading h1 h2 h2 h3 so on, primary and secondary keyword, deafault prompt ) quotedblogsection , user instructions.
Note - if it is second regeneration, in second input goes like past conversation ( first time default data+ first regeneration data) . It is happening like this for every regeneration.
Note -
U can give me answer by considering all leaving latency, cost, scalable. Also If u give answers consider including all even latency cost scalable
Aprroach I have -
1.Prompt engineering approachwith pass conversation passing as a reference.
2 multiple agent approach
3 Single agent only
Question ----
1 I want to say which method work best to meet overall expectation and why. ?
2.If not work above method, what another approach I have to apply to fulfill my expection
3 Which model should I use - got 5 nano and got 4.1
Anyone have any crazy/good idea with (ai) agents?
Claude
hi , guys i have a question , what i should do if i have dataset and there is column in my datasets call
review contains text or description , can i transformation this col to be labels or one-hot or i should
drop this columns if i wanna clustering that's dataset
-# is there any significant diff between https://github.com/Tencent/Youtu-agent and crewai?
Hi @everyone
Can some body help me review my code i trained model for the first time
Sorry but your English is a bit confusing. ONLY if it is categorical, use one-hot.
Why should I tell you? What am I going to get?
Wdym
Turn on airplane mode, wait for 10s, turn it off, try again. If it still doesn't work, there's not anything you can do.
What do you mean?
Yes what do you mean
Ur gonna get one idea in return (for now)
Mine is ten billion percente better than yours. 👎
So why to share for an uncertain idea?
I want something better
How much experience do you have?
In trading ideas?
I'm doing the birdclef competition as part of a school project.
I trained my base model and got bad results, even though I took care of class imbalance I think I may have done something wrong, I would appreciate it if someone could help me
When will cohort 5 of Kaggle X start?
Hi need some urgent help in the ML Model, anyoneup?
Does any one know a trading chanel thankss
Do anyone knows any lightweight Opensource alt to firebase studio
Recently i have started learning ML and for that i have learned python and now moving onto numpy but the maths that i am learning matrices for the matrix calulation so does anyone know how much maths is required in numpy
I have joined the discord and linked my account ,why the Agent of Discord is still locked
it can be because of you did not have linked your discord account with kaggle
And you can link it by going on the kaggle official website
NumPy will do matrix calculations for you. However if you want to understand how it is working then learn the mathematics. Now how much maths required depends on how much advanced operations you want to perform.
For a basic beginner level, high school level maths for matrices is enough.
hello everyone! i recently just started doing sql course in kaggle, at the time i reach WITH ... AS part ive got a problem on first cell which the problem is this
Collecting git+https://github.com/Kaggle/learntools.git
Cloning https://github.com/Kaggle/learntools.git to /tmp/pip-req-build-_uuo8ygc
Running command git clone --filter=blob:none --quiet https://github.com/Kaggle/learntools.git /tmp/pip-req-build-_uuo8ygc
fatal: unable to access 'https://github.com/Kaggle/learntools.git/': Could not resolve host: github.com
error: subprocess-exited-with-error
× git clone --filter=blob:none --quiet https://github.com/Kaggle/learntools.git /tmp/pip-req-build-_uuo8ygc did not run successfully.
│ exit code: 128
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× git clone --filter=blob:none --quiet https://github.com/Kaggle/learntools.git /tmp/pip-req-build-_uuo8ygc did not run successfully.
│ exit code: 128
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
anyone could help me solve this?
thanks,i have got this Badge
Thanks for answering
But is there any sense to learn how the matrix calculations is done in Numpy as I think that would help me later to refine the data and train ML models
Hey Everyone, I would be studying Cloud Computing this semester in my undergrad degree , as I am focused towards ML , I wanted to know what should be my approach towards cloud computing as I want it to learn in the ML way ,
Would you suggest a course , any book or any other resource , please do tell. Thanks
For reference I would be following this text book from the curriculum:
Cloud Computing, Theory and Practice by Dan C. Marinescu, THIRD EDITION,
Morgan Kaufmann Publishers, 2022
ASK UR SENIORS
Yes it is good to know that because that will help in optimizations when required
But that is not mandatory.
Thanks for answering 🙂
Anyone can give me any idea how deployment would work for an AI model? I know tools like ollama/unsloth are used to experiment/finetune models, but once its ready for deployment, is there like a certain tool good for that? Assuming you dont want to deploy a model using unsloth due speed differences orsum.
what are the top kinds of software engineering problems y'all have seen LLM consistently failing at, even when given step-by-step instructions which can be followed to achieve the solution, one problem that I have seen consistently is , If I given multiple interdependent validation points, the LLMs usually fails to generate optimal results even with multiple tries and feedbacks. what is yours?
Im new Here, should I want to learn python or c++ for CUDA??, new to coding btw
I am facing challenges completing the python exercise of "Working with External Libraries"
Task No. 3, can anyone help me!?
May I know how to be a sponsor ?
hello guys! I was about to start the 5 day intesive course but the checkboxes don't stay put after refreshing the site.
am I missing something or is there any other way to keep track of what I am doing?
thanks!!
hello guys! I am new to data science and and am looking for tips, advices and resources to start it out. also what is the necessary math, I also want to mention that I am looking for fundamentals only I will be using python (pandas, numpy and matplotlib for visualization) and maybe SQL. also please guys tell me if SQL is necessary or not, and do we use SQL in ML? thank you all in advance.
Hello!
SQL is definitely important for ML because data and databases form the foundation of any project.
All ML models are trained on datasets, and SQL helps in modifying data and performing feature engineering or scaling.
For math, high school–level statistics and probability are enough to get started. It’s important, however, to understand the math behind algorithms like Linear Regression, KNN, etc.
I’d suggest starting with Python. Don’t go too deep at first...focus on building logic, practicing OOP, and optionally some DSA. Then move on to SQL and practice queries regularly.
After that, learn NumPy and Pandas...the key is to practice lots of problems (Kaggle is great for this).
Next, dive into EDA, feature engineering, and data visualization. Begin with Matplotlib and later try Plotly for interactive visuals.
Once comfortable, start applying ML algorithms with NumPy and scikit-learn to learn how to train models.
Finally, work on small end-to-end ML projects. That’s when everything you’ve learned will come together. You can also refer to YouTube for project ideas.
thank you so much!
Is there a video, forum, tutorial, or website out there that can make me understand how to code a simple/basic ML model in 25-30 mins? Something like when I'm on my bus ride to school, I can listen to a video or read the website page?
Hlo guys anyone know how to connect cloud database with jupyter notebook python using sqlalchemy
Hi, @everybody
I have one question, I'm training ml models for the prediction, which is classification problem of 3 classes, where the number of samples are similar but the predition is skewed.
First class and second class is predicted with low precision tough, third class is never predicted. What's the reason? I can' t find the reason.
Before, when I applyed reinforcement learning, where the three classes were assigned to three actions and one action is never selected, too.
Actually, that is the preeiction model of forex eur/usd.
Can sm1 help me in integrating tpu with the kaggle notebook am facing a bit of issues
if you only use pandas numpy and matplotlib, you're not a data scientist. not even data analyst, need to have much more for over saturated entry level tech jobs
Hello, i am new to machine learning, i want to become a self taught ML engineer, so where should i start? what skills should i prefer to master? what tools do i need? help me out please
Hi! Is there more information about the upcoming Google Agents course in November? Links on kaggle website seem to be redirecting back to the course that was offered last March
How do you add a dataset to a Notebook in the current Editor? There is no right side bar, and no "Commit and Run" with settings to add it (from the documentation). I've looked in every menu option and dropdown and do not see an Add Data so I can link my Dataset training
Heads up if anyone else has this problem because a lot of online posts and ChatGPT is outdated. It's File->Add Input
No we waiting
Does Kaggle not support matplot lib graphs inline like Colab? that's a very useful feature when running to see loss values during training.
Idk man use satyrn if on mac
Hello,
Recently I have started learning ML and for that I have learned python and currently learning Numpy, but I'm a bit confused whether I should learn pandas after Numpy or not.
So can anyone help me with it.
pandas is amazing highly recommend learning it since Dataframes will come in very handy
Ok Thanks 👍
Also polars
Hi everyone..
So I have been working on Diabetic Retinopathy and have created a custom cnn with SE attention block. My model has reached the plateau and now its performance is not improving. How can I improve my model's performance. Currently it is 80% for training, 78% validation and 77% testing. I have to take it to 91% at least to conclude my research.
Any suggestions?
77% to 91 % seems to long leap but you can try some data augmentation techniques such as rotations, flips etc. Also it would be worth checking for class imbalance.
How can you continue training on a net if you can't write to the input? Only way I see is the download the net and optimizer each time manually then upload to the dataset via the web which is horribly inefficient.
Any suggestions are welcome.
Hello,
I am currently working on a project where I aim to combine Long Short-Term Memory (LSTM) networks with Convolutional Neural Networks (CNNs) to forecast air pollution levels. Since my approach will require satellite observations, I am particularly interested in using data from the TEMPO (Tropospheric Emissions: Monitoring of Pollution) mission.
I would greatly appreciate any advice
Hi everyone, I am new to Kaggle, data science and machine learning. Can someone help me with the Titanic Problem? Like some basic questions about understanding the task.
guys did any one succeed in getting a job using kaggle, as a fresher?
Did my master's project on a similar problem. Sounds like it's completely underfitting if it's only getting 80% on the data it was trained on. Consider making your model more complex, increasing learning rate, using a form of early stopping if you aren't already, etc.
Hi, the Titanic task is to predict Survived (0/1) from passenger features.
Hi, with my experience is that use TEMPO L2/L3 products (NO₂, O₃, HCHO; filter by QA/clouds), regrid to a fixed lat–lon, and align them with ground PM2.5/PM10 and meteorology (wind, temp, RH, PBL height)
Hi, you can’t write to /kaggle/input
Save checkpoints to /kaggle/working during training
Keras ModelCheckpoint('/kaggle/working/ckpt.h5')
hi not sure if this is the right place but i'm trying to save my work but it's not letting me
it keeps giving me this error: An error occurred while committing kernel: ConcurrencyViolation Sequence number must match Draft record: KernelId=92809017, ExpectedSequence=3, ActualSequence=2, AuthorUserId=24934154
means your code was changed since the last save. It's wonky like that. Best bet copy it to another text editor. Reload the notebook, paste, save version
love kaggle so much but I do miss google drive in colab lol was so easy to just read and write to a persistant storage. Have to have a dataset set then use kaggle cli to push your data to it your done if you want to keep anything from /kaggle/working.
hope this helped
thanks so much!!!
i just downloaded and uploaded it instead
Is there a working professional I can DM? I really need some guidance related to software development and data science.
anyone havng chatgpt pro here
[URGENT] Writeup submission cancelled after deadline - need help with technical issueHello Kaggle community,
I encountered a technical issue with my writeup submission for the BigQuery AI Hackathon and need your help.
My Situation:
- I successfully submitted my writeup at 8:20 AM (40 minutes before the 9:00 AM deadline)
- After the deadline (at 9:00:02 AM), I accidentally clicked the "Edit" button to make minor text corrections
- The system interpreted this as a submission cancellation, changing my status to "Deadline Missed"
What I Need:
- Can my original submission be restored to its pre-deadline state?
- This appears to be a technical issue with the edit button functionality after deadline
- I have complete evidence that the submission was made before the deadline
My Evidence:
- Complete GitHub repository: https://github.com/mkmlab-hq/bigquery-ai-hackathon-submission
- All project files were completed and uploaded before the deadline
- Browser history shows successful submission at 8:20 AM
- Local files with timestamps showing completion before deadline
My Project:
- Title: "BigQuery AI Hackathon: Multimodal Health Analysis"
- Complete multimodal health analysis system with BigQuery AI integration
- Team: MKM Lab
Questions for the Community:
- Has anyone else experienced this issue with the edit button after deadline?
- Is there a way to contact Kaggle support directly for this type of technical issue?
- Any advice on how to resolve this would be greatly appreciated
Technical Details:
- The edit button should either be disabled after deadline or not cancel the submission
- This seems like a UI/UX issue that could affect other participants
Thank you for any help or advice you can provide.
Best regards,
윤원민 (familyunion)
Kaggle ID: giryun288@gmail.com
Please contact kaggle support via email, there are no support staff who can help you on discord.
500m dataset .tar file. Takes 10secs to upload. And then 1.5hour to process and still going on . Is it normal? On the webpage it’s always saying estimately finish in 10secs. Which is hilarious. Thank you anyone who care about this
usually after uploading it takes 15 minutes to really hit. sounds like the website stalled after the upload. I'd close it out and reload the dataset page and see if it's listed and then try to load it from your notebook.
Thank you. I think it’s just dead or somewhat. I uploaded it as a model rather than dataset and fixed the problem
Hi, I just got a new PC setup with a 5060ti 16GB and started working my first nn training. However, I seem to have a problem with GPU utilization. At first, it nicely is at around 93-95%, but then gradually goes down drastically to around 43%, and training is slowing also drastically. It is not an overheating issue, as the temps start at 50C and then gradually go down to 33c. It is also not an OOM issue, as I have plenty of ram, more than 40gb still free during the entire training. CPU is also not a bottleneck. Any ideas what might be causing it? I am on arch linux, Driver Version: 580.82.09 , CUDA Version: 13.0.
hi, my kaggle notebook keeps hiding my variables. why is that?
Where to deploy a ai model + rag? 20gb vram needed. be nice to have on demand payment. so it doesnt burn through my bank. like I only pay when someones actually using the vram.
Try lightning ai studio
Hello, if I win a prize competition, how can I withdraw the prizes?
I checked the research paper, they only classified 3–4 behaviors and got around 0.6 F1. Here in mabe-mouse-challenge we have about 8 labels, so if we use their method, can we get a better score in mabe-mouse-b-dectection challenge?
Hello, I want to ask when the course starts. Ai
can my mac m4 run llms ? or i need a nivida gpu
u can buy dgx spark 4tb founders edition
Hello Everyone,
I have been facing a login issue since yesterday. So far I had no issues with my Kaggle account. But yesterday I cleared history in chrome and tried to login through signing in with google. However I got the message "an account already exists with an alias of this email please login using that". I'm not sure what to do here because normally I login through my google account for using Kaggle. Has anyone faced this issue before. I would greatly appreciate it if I can get any help on what I can do.
Try logging in via incognito if that doesn't work search ur inbox and check if multiple accounts are associated with the same mail if that's not issue raise ur ticket in kaggle help that might help also try resetting password
Thanks I will try that. I never had any duplicate accounts
What is the best way to run a notebook? Right now I can onnly do it from the editor. But the idle timer seems broke. I can't even finish a single training session now without it timing out despite updating the output constantly via tdqm updates per epoch. My training takes 1h5m per input file. Not sure what to do, useless if I can't even finish. Plus it says hit cancel or continue editing it wont disconnect but it does. Kills the kernel and restarts automatically.
Found a workaround if it helps anyone else. Load the notebook then press F12 to get to the dev console, paste this in the console. It mimics a keypress once a minute. Voila no more timer killing off the training.
setInterval(() => {
document.dispatchEvent(new KeyboardEvent('keydown', {'key':'Shift'}));
}, 60000);
Hello! I can't seem to get rid of this problem. What do I need to do? I'm using kaggle's jupyter notebook
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
bigframes 2.8.0 requires google-cloud-bigquery-storage<3.0.0,>=2.30.0, which is not installed.
gensim 4.3.3 requires numpy<2.0,>=1.18.5, but you have numpy 2.2.6 which is incompatible.
gensim 4.3.3 requires scipy<1.14.0,>=1.7.0, but you have scipy 1.15.3 which is incompatible.
mkl-umath 0.1.1 requires numpy<1.27.0,>=1.26.4, but you have numpy 2.2.6 which is incompatible.
mkl-random 1.2.4 requires numpy<1.27.0,>=1.26.4, but you have numpy 2.2.6 which is incompatible.
mkl-fft 1.3.8 requires numpy<1.27.0,>=1.26.4, but you have numpy 2.2.6 which is incompatible.
numba 0.60.0 requires numpy<2.1,>=1.22, but you have numpy 2.2.6 which is incompatible.
datasets 3.6.0 requires fsspec[http]<=2025.3.0,>=2023.1.0, but you have fsspec 2025.5.1 which is incompatible.
ydata-profiling 4.16.1 requires numpy<2.2,>=1.16.0, but you have numpy 2.2.6 which is incompatible.
onnx 1.18.0 requires protobuf>=4.25.1, but you have protobuf 3.20.3 which is incompatible.
google-colab 1.0.0 requires google-auth==2.38.0, but you have google-auth 2.40.3 which is incompatible.
Some require newer versions, some require older versions, how do you even begin to fix this error?
hey
is there any way
i can use sklearn models like gradient boosting with gpu
i really need it for hyparameter tuning
???
Guys any good reccomendations for python DSA
Hi, try this !pip install --force-reinstall numpy==1.26.4 scipy==1.13.1 "fsspec<=2025.3.0" "protobuf>=4.25.1" "google-auth==2.38.0" "google-cloud-bigquery-storage>=2.30.0,<3.0.0" --break-system-packages
Hello everyone
I am trying to use an API to fetch data
I am using a loop to iterate all the pages present in the json.
I am getting either a gai error or a ConnectionReset Error
Please help
`import requests
url = "https://api.themoviedb.org/3/movie/popular?language=en-US&page=1"
headers = {
"accept": "application/json",
"Authorization": "Bearer MY_READ_ACCESS_TOKEN
}
df = pd.DataFrame()
for i in range(1,52774):
url = f"https://api.themoviedb.org/3/movie/popular?language=en-US&page={i}"
response = requests.get(url, headers=headers)
time.sleep(0.25)
df = pd.concat([df,pd.DataFrame(response.json()['results'])[['id','title','overview','release_date','popularity','vote_average','vote_count']]],ignore_index=True)
df.head()`
This data is of TMDB
Here is the error snippet
`gaierror Traceback (most recent call last)
File ~/Desktop/DS_Practice/.venv/lib/python3.12/site-packages/urllib3/connection.py:198, in HTTPConnection._new_conn(self)
197 try:
--> 198 sock = connection.create_connection(
199 (self._dns_host, self.port),
200 self.timeout,
201 source_address=self.source_address,
202 socket_options=self.socket_options,
203 )
204 except socket.gaierror as e:
File ~/Desktop/DS_Practice/.venv/lib/python3.12/site-packages/urllib3/util/connection.py:60, in create_connection(address, timeout, source_address, socket_options)
58 raise LocationParseError(f"'{host}', label empty or too long") from None
---> 60 for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
61 af, socktype, proto, canonname, sa = res
File /usr/lib/python3.12/socket.py:963, in getaddrinfo(host, port, family, type, proto, flags)
962 addrlist = []
--> 963 for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
964 af, socktype, proto, canonname, sa = res
gaierror: [Errno -2] Name or service not known
The above exception was the direct cause of the following exception:
...
--> 677 raise ConnectionError(e, request=request)
679 except ClosedPoolError as e:
680 raise ConnectionError(e, request=request)
ConnectionError: HTTPSConnectionPool(host='api.themoviedb.org', port=443): Max retries exceeded with url: /3/movie/popular?language=en-US&page=1 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7825ea3d9430>: Failed to resolve 'api.themoviedb.org' ([Errno -2] Name or service not known)"))`
Level up with AI — 8 eBooks for creators & innovators.
https://payhip.com/b/V6brj
I use Google colab there you can change runtime to a GPU
Hi everyone! I'm completely new to data science, and this is my first time uploading a dataset. Could you please let me know what you think about it? This dataset will be used for an upcoming analysis
https://www.kaggle.com/datasets/meiliaa/france-business-insolvencies-19902025
Hi, @everyone
Is there anyone who joins radical ai founders' masterclass?
I didn't have an opportunity to apply for that.
Please give me the meeting urls for them.
Did the GOOGLE’s Kaggle course started ? @everyone
The 5 days Ai Agents course is actually set to start from Nov 10, if that's the course you are asking about.
Oh thanks, I thought I was late.
Hello everyone, I started my AI/ML journey this year. Created few projects too and also participated in a "getting started" kaggle competition named "Natural Language Processing with Disaster Tweets" in which i got 189th rank. But i want to explore and learn more so how should i proceed further any suggestions/help/guidance will be appreciated
Hey everybody
I have a problem and want your help,
https://www.kaggle.com/code/nooshinpourtaleby/day-4-fine-tuning-a-custom-model/edit
in this 5 day course I don't know why i get this error for this code:
response = client.models.generate_content(
model="gemini-1.5-flash-001", contents=sample_row)
print(response.text)
ClientError Traceback (most recent call last)
Cell In[37], line 1
----> 1 response = client.models.generate_content(
2 model="gemini-1.5-flash-001", contents=sample_row)
3 print(response.text)
also I tried: "gemini-1.5-flash" but it didnt work:(
learning pytorch is not enough what else should someone learn to get master on ML and win Kaggle?
Anyone who can help me in the ML Challenge?
Hey I am trying to educate myself. Can someone explain to me what are
Gradient decent
loss function
learning rate
I am so confused rn. I just know they are used to optimize an algorithm but how
https://www.kaggle.com/datasets/dasgroup/rba-dataset/data?select=rba-dataset.csv
hello I kna need a help on smth about the dataset
im making association mining rule machine learning then I found this dataset but its too huge so my idea is I want to drop some of the rows and columns anyone can help me no idea what im doin
are you asking how to do this, or whether this is a good idea?
Hello, on MABe Challenge, I get an errror on my notebook on the submissions pannel, but when I click on it and look at the logs, it shows "successfully ran". And there is no errors in the logs, it even outputs a csv. What could be the cause ? I am a beginner. Thank you !
Hello
I was using Kaggle's GPU powered notebook for one of the competitions and work
But I was running in to out of memory errors and Kernel Crashes were occurring
This was due to the big embeddings I was generating on large data
And also if I set high hyperparameters for some models
I tried solving those kind of errors using
del
And
gc.collect()
But I wanted to know how I can avoid these errors
What are some of the best practices and optimization techniques to not run into these kind of issues
have u tried running the version 2 of that notebook
yeah id did, but, the version 1 is the one that should be working apparently. But I thought the notebook explaination was suoposed to work too
even i tried running the verison 1 its showing similar errors
Hi! I submitted my solution on Kaggle, it finished successfully but the scoring is stuck on "running". Has anyone experienced this?
for me it took 1day
ok thx 🙂
Im beginner trying to convert my model to a pickle file in pycharm however when I do pickle .dump , it said the code ran successfully but the model.pkl was not created in the folder. I did the proper modes but yet the result was same . Any suggestions?
Edit : this problem is now solved with assistance of @teal smelt
Hey everyone! 👋
I hope you’re all doing well. I’m currently preparing for a career in FinTech, and I’m looking to connect with professionals or learners who are already working or have experience in this domain.
I’d really appreciate some guidance on how to make myself a strong candidate for FinTech roles — including:
The essential skill sets (technical + financial)
Recommended certifications or learning paths
Important tools or technologies used in the industry
And a few project ideas that can help build a solid FinTech-focused portfolio
If anyone here is from a FinTech company or has relevant experience, I’d be very grateful for your advice or even a quick chat. 🙏
You can also connect with me on LinkedIn: www.linkedin.com/in/abhay-singh-1694b221b
Thank you so much in advance for your time and support!
— Abhay
Hoi, share the code please.
Something might be wrong with the way you're saving your file or specifying the correct file path.
Need the code for the full hunting :)
(By code i mean the specific pickel.dump() and function where you're opening/closing your file and specifying the file path. f.open() f.close().)
Still need assistance??
Hello thanks for replying back , can I give u the GitHub link of that project in dm?
Yup, why not :)
(Problem fixed)
hi
Yes
I'll share the code if required
But I am talking about a general case
Since this has occurred to me many different times
Idk, as far as I know something might be wrong with your code... If it's general I don't know what's causing it maybe dm me the code let's see if we can find the solution...
Hello, Do you know How much time it takes for a competition to publish private leaderboard after the competition close ?
https://www.kaggle.com/competitions/grand-xray-slam-division-b/leaderboard
A day minimum
Thx!
The model I've created has performed well and in a similar manner on test and validation sets. But it performs ~2.5x poorly on competition data. What's the deal?
..
Hii
Hlw
Hello guys! I'm working on creating a forex trading bot that helps me out in understanding signals and placing trades per time depending on the patterns observed. Please suggest great YouTube channels and books that can serve this purpose. @teal smelt
Not my expertise 😅,
Gimme an hour or two for some research but yeah i might not be able to help you fully, but I'll try 🐣
Okay so shit dump, here goes nothing.
So you would first need expertise in whatever language you're planning to use (I highly recommend Python, or Rust. Due to their fast and easy to code environment. And plus these languages are highly appreciated for this specific field of data analysis and ai ml so yeah.. )
Then first before starting anything go read "Ernest P. Chan" people really praise that guy over twitter(X) and reddit for his books titled "Quantitative Training", "Algorithmic Training".
(Check these before starting : https://www.reddit.com/r/algotrading/comments/gily37/ernest_p_chan_books_quantitative_trading/
https://www.reddit.com/r/algotrading/comments/15gz5dn/do_ernest_chans_mean_reversion_strategies_for/ )
NOTE THAT THESE BOOKS I SUGGEST TO LAY DOWN YOUR FOUNDATION WITH ALGOS AND TRADING, YOU'LL HAVE TO PRACTICE A LOT BEFORE STARTING FULL
For more algorithms check out "16 proven algorithm forex trading", "forex trading: theory and practice trady"
Now, we have quite the bookish knowledge (not enough, you still gotta explore articles. I suggest joining discord or reddits for algorithms), anyways:
Now you would need APIs, and pre build ais to understand how they work, for these i suggest you to pull up to huggingface.co and kaggle and start searching for keywords relating to "trading ai", "trading algo", "trading blah blah"; research the APIs and decide which one suits you and study the code or just use it directly.
Analyse pre available data from kaggle.
And there's this guy called "Moon Dev" on YouTube, search for him has a playlist, and couldn't find other better channels :(
And search for more on youtube and reddit and google :(
Amd yeah ask chatgpt anytime you want if you need more resources i could only find these, you gotta do a lot of research tho as this is a not-so-famous topic :(
This is comprehensive enough. Thank you so much.
Hoi, please don't rely on my stuff fully, research and research like a focused horse...
Well said, my chief
I'm not your chief bro please 😭
You can use "friend" or "bro" or "🐣"
anyone completed the Mathematics for Machine Learning specialization by Deeplearning.AI?
because im stuck in a concept i can't seem to understand in the linear algebra course
Bro just share the concept name or whatever problem you're facing in a paragraph, someone will read it and help you right away :) (it's not good to ask for help like this, just elaborate the issue)
I'll try if i relate with the topic hehe T_T
O ty I had an issue mainly with the row echelon implementation in python in assignment 2 like the backtracking section mainly but i think I semi get it but not that much lol
Like I solved it but I got a bit confused on how some values were defined etc
But in summary it's basically the row echelon form
I understand it concept wise but I'm having trouble implementing it in python
Like converting the math steps directly into python
NOTE: I have assumed you're not solving "variables" thus no backtracking implement led, if you don't understand what I'm talking about.. just ignore this shitty NOTE.
Okay so like first let's see what the concept means mathematically i guess,
So first let's say ummm imagine a matrix?
a11, a12......, a1n
a21, a22....., a2n
...........................
am1, am2......, amn
``` so this row echelon form thing basically you would need 4-5 steps and just keep looping them as per your needs.
Pivot: a non zero value, we'll eliminate stuff below it.
### 1: we make the code find a "pivot" in each of the available columns.
### 2: Transpose the matrix or swapping rows if needed so pivot doesn't come out to be zero.
### 3: Normalize the pivot row, to ofcourse make pivot = 1.
### 4: And we'll kill/eliminate all entries below that pivot eventually making them to be zero.
### 5: Run same algorithm but now to down-right element. And so on.
(Basically we're running this for diagonals okay, a11, a22.... So on... And diagonals get to be 1 and adjust other values to keep the over-all result same)
Like imagine a matrix
```py
A = np.array([
[2, 1, -1, 8],
[-3, -1, 2, -11],
[-2, 1, 2, -3]
])
It's echelon is :
[[ 1. 0.5 -0.5 4. ] #here first a11 was made to be 1, i.e. [2÷2=1] and then for all other values also divide with 2.
[ 0. 1. -1. 1. ] # note here we first had to transform the matrix with (R2 = R2 - 3R1), for the sake of making value below pivot 0(a21) and adjust others.
[ 0. 0. 1. -2. ]] # guess transformation to get this.
hi im a maths masters student at cambridge specialising in stats. My career aspiration is to become a quant researcher at a top quant firm. One of the things that looks super interesting and was recommended to me was to do kaggle since it is from what ive heard full of data to analyse and make predictions from. My stats knowledge is good but i am down to learn more relevant stuff to do with machine learning or whatever experts here think is relevant. I know python syntax well but dont know data libraries like numpy, pandas, matplotlib at all. Can anyone give any advice on how I could go from this position to high level in kaggle? Like can anyone recommend me any courses, books, approaches to improve as quickly as possible? ty
hello i am new here
Python Implementation:
Now let's move to the Python Code best paart hehehehehe
import numpy as np
def rowEchelon(X):
X = X.astype(float) # we won't be able to divide properly without floating in the air 😴 😴
rows, cols = X.shape
pivotRow = 0 # the row where our python snake is.
for col in range(cols):
if pivotRow >= rows: #last row done right?
break #stop :)
# Now since we're working on matrix, this is the start of everything.
pivot = None
for r in range(pivotRow, rows):
if X[r, col] != 0: #finds if the element in current row is our pivot, or in the row below it.
pivot = r
break
if pivot is None:
return # basically no pivots were found right?
# Pivot finding done, lets move to swapping/transpose. (Note we always swap rows with our pivot row)
if pivot != pivotRow # make sure you're not swapping the exact same row with itself(the pivotRow wth itself)
X[[pivotRow, pivot]] = X[[pivot, pivotRow]]
# Normalize current row with our pivot row
pivotValue = X[pivotRow, col]
X[pivotRow] = X[pivotRow] / pivotVal
# Now our concept tells us to eliminate the below stuff right?
for r in range(pivotRow + 1, rows):
factor = X[r, col]
X[r] = X[r] - factor * X[pivotRow] #transformation of rows basically
pivotRow += 1
return X
# we can use our example matrix
X = np.array([
[2, 1, -1, 8],
[-3, -1, 2, -11],
[-2, 1, 2, -3]
])
martix = rowEchelon(X)
print(matrix)
We'll get the same output as array we had used for example above
[[ 1. 0.5 -0.5 4. ]
[ 0. 1. -1. 1. ]
[ 0. 0. 1. -2. ]]
See the explanation and code I've sent, I've kept the steps/order of theory and code same so that you don't get lost while understanding, and also please ping me if you face doubts about any variables or the function, I've tried my best to explain throw comments. And also i wrote the code on discord chat so yeah the indentation might be wrong at some places just fix that... That's it
Idk about books, checkout the courses form edureka and freecodecamp om YouTube about "machine learning, ai/ml".. they'll teach basic about how to analyse data like titanic one if i remember correctly and they also teach basic algorithms you would need..
Then you can move further and watch kaggle's own tutorials, just make sure you've completed the above step and understood the "requirements(python, aiml basics, numpy, pandas, matplotlib, kaggle notebook)" first.
Also it'll take quite a while to be comfortable with this technologies if you've just started this so please don't quit just keep learning...
And finally read articles, discuss your doubts, and solve other people's doubts and start kaggle 🐣
(And idk if this should be enough for high level on kaggle, you'll need to spend time and brain on kaggle competitions for it and I'm not much familiar with competitions; so no shitty advice about that 😅)
I have a question, I just have very basic knowledge of AI but working with Python and it's libraries as a data analyst from few months, did it's possible to compete in competition of CAFA 6 Protein Function Prediction it's last date January 26, 2026, i have around 2 months time, did it's possible? kindly guide me
I have a question, why we use this program of code when we write it easily with just an array
Sorry, i don't get what you're trying to say brother??
A = np.array([
[2, 1, -1, 8],
[-3, -1, 2, -11],
[-2, 1, 2, -3]
])
i mean when we write it as in upper context then why write the matrix in long as in following code
import numpy as np
def rowEchelon(X):
X = X.astype(float) # we won't be able to divide properly without floating in the air 😴 😴
rows, cols = X.shape
pivotRow = 0 # the row where our python snake is.
for col in range(cols):
if pivotRow >= rows: #last row done right?
break #stop 🙂
# Now since we're working on matrix, this is the start of everything.
pivot = None
for r in range(pivotRow, rows):
if X[r, col] != 0: #finds if the element in current row is our pivot, or in the row below it.
pivot = r
break
if pivot is None:
return # basically no pivots were found right?
# Pivot finding done, lets move to swapping/transpose. (Note we always swap rows with our pivot row)
if pivot != pivotRow # make sure you're not swapping the exact same row with itself(the pivotRow wth itself)
X[[pivotRow, pivot]] = X[[pivot, pivotRow]]
# Normalize current row with our pivot row
pivotValue = X[pivotRow, col]
X[pivotRow] = X[pivotRow] / pivotVal
# Now our concept tells us to eliminate the below stuff right?
for r in range(pivotRow + 1, rows):
factor = X[r, col]
X[r] = X[r] - factor * X[pivotRow] #transformation of rows basically
pivotRow += 1
return X
we can use our example matrix
X = np.array([
[2, 1, -1, 8],
[-3, -1, 2, -11],
[-2, 1, 2, -3]
])
martix = rowEchelon(X)
print(matrix)
my english is not so good i hope you understand
It's fine, i understand you
Okay i get it.
Look at the matrix we had, its values are quite big and distinct right?
A = np.array([
[2, 1, -1, 8],
[-3, -1, 2, -11],
[-2, 1, 2, -3]
])
We did all that code below this (the whole rowEchelon function, to convert our above A matrix to a simpler rowEchelon form):
[[ 1. 0.5 -0.5 4. ]
[ 0. 1. -1. 1. ]
[ 0. 0. 1. -2. ]]
Q. Why do we need a row echelon form ? 🤔
Bro, i just gave a very very simplified example; but in the reality row echelon is used to solve big linear equations
Here's an example:
Say we have linear equations:
2x + y - z = 8
-3x - y + 2z = -11
-2x + y + 2z = -3
We can represent this set of equations as a matrix:
[[2, 1 , -1, 8
-3, -1, 2, -11
-2, 1, 2, -3]]
And we will give this ^^ array to our code as A or X(in python code) and get this array as row echelon:
1, 0.5, -0.5, 4
0, 1, 1, 2
0, 0, 1, -1
Notice how simplified our matrix is? Imagine how it would look in the form of equations now!!
Here's how it would look:
x + 0.5y - 0.5z = 4
y + z = 2
z = -1
Notice how simplified and easy these equations are to handle now, it is so easy to work with this now instead of what we had in the starting :)
@twin plover i tried my best to explain please see this...
thanks
okay tysm
are the kaggle tutorials acc that good or am i looking at the wrong spot? coz from what i remember seeing they are quite short and not that in depth
Hey guys
They're great but Not beginners friendly ig, for me at least.
I had to go through freecodecamp and edureka on YouTube first.
They just try to lay an outline of the topic and you are the explorer they expect to have some prior knowledge..
are jobs in ML still in demand?
Ofcourse, always be
when i search for internships or jobs in ML there very few results compared to full stack development
People are taking the field because it's a trend, i don't think so all of them are interested or invested mentally in the field.. at least some other people I know just took it for a trend and are struggling now..
Your actual competition is less than you imagine but yeah you gotta stand out :)
i started learning ML i did regression,classification i understood them but when i see job listing i get demotivated
plus, should i rush towards deep learning?
Nope, ML has many fields right? Like building AI Agents, data scientists, data analysis, chatbot, AI and more..
It's just so diverse that you look only at a small part of it, i believe there are more jobs just under different tags
And ML pays more than full stack, so it'll be a bit rare as it must be hard hence the big paycheck
how much did you complete?
Job listings shouldn't be your motivation brother... Your interest in the field is what you need..
I'm somewhat a beginner only as of now for AIML :(, I'm mainly into full stack -
ohk
your 16 and already way ahead of other people damn
when i was 16 i was playing pubg
My phone is too trash to run it
when i was 16 i was playing sololearn