#❓┊ask-a-question
1 messages · Page 3 of 1
so i need to upload my model file to in w-okada files model_dir folder
as u see theres 3-2-1-0
I'm not on my laptop now, but can you try uploading the model on GitHub and then cloning repo in kaggle notebook
Or can you try downloading from gdrive , using wget or curlcommand
well it would be nice
but i dont know how to use
wget and curl commands
yea i cant upload on app
i got some error

4 hour wasted for nothing
trying to fix these things :D
nah bro kaggle is idk
well i think ill pay collab
Haha , yes that will be a better option
More storage as well
no need storage tbh
i wanna just use voice changer via ngrok
i need gpu for that
im using paperspace for training models
but paperspace not support ngrok
So you just want to host your model then ?
if it it would be perfect
i have a voice model i trained for w-okada voice changer. i need a gpu notebook that i can use w-okada through ngrok to use it.
You want it run 24x7
in order to use this model, I need to load this model into the w-okada voice changer files that I have installed on the notebook. you can load it directly through w-okada, but when using the voice changer through ngrok, it is difficult to do this, it gives a warning. so I need to load my model directly into the w-okada files in the notebook, so I have to move my model with the pth extension into that file
You can check huggingface service as well
didnt try it before
when i try, same things happen i guess
cuz i dont know how to do on huggingface
like in the kaggle
whats this even i dont get it
on collab didnt i get kind of errors
but collab getting so much money so
Yes , its more user friendly
Let me check huggingface services if they support what you want
Kaggle? How much do they cost
30 hours per week is free
so
idk they have paid service
30 hours per week is 5000x better than collab
cuz collab gaves
1hours free per account ::D
No I believe they give 3 hours daily
But it can disconnect anytime,
i just need to move this in model_dir file here
I will check, how we can do it,
On my way home
You can upload your model on huggingface
@storm moon did you try this
yea but after
you can download model from there
you have it on kaggle notebook ? or locally ?
ok , so you need to move your pth file from one folder to another , right ?
YEA
but
this one output
move from input to output ?
!mv /kaggle/working/source_folder/mode.pth /kaggle/working/destination_folder/
lets try
change the model input path and and output folder path there
first path should be of models pth file and second. path should be of folder you want to move pth file to
is it possible to create a file
at output
yes
folder
okey can u give me code line creating folder in model_dir
but
i closed already
mkdir destination_folder/new_folder_name
closed what
kaggle
then you will lose model , won't you ?
or did you download model locally or saved it on kaggle
model is already exist in my pc
i trained 1 month ago
tbh i dont want cuz i dont want to public these my models
as private model , so only you can access it
It doesn't mean much cuz i dont need -*-
it will just ease out your process , but if you don't want then its ok : )
i already uploaded private
as u see
anyway its not working w-okada on kaggle
idk why
laggy or something
i think ill continue on collab
what is the problem ?
ohh, sorry i didn't know
I'm done dealing with this :D but thanks for asking
it been 6-7 hours
still trying to do on kaggle
waste of time
just tell me the problem so i will try to see in my free time
don't need to waste your time
just run this
and use w-okada program
normally
if u do this will be good
ok
if someone can the you should also do it
but i ran this notebook before
when you first asked help
and it does worked for me as i send you the screenshot above
you mean , you have to speciify model path here , which one of the below is used to specify model path ?
--content_vec_500 pretrain/checkpoint_best_legacy_500.pt \
--content_vec_500_onnx pretrain/content_vec_500.onnx \
--content_vec_500_onnx_on true \
--hubert_base pretrain/hubert_base.pt \
--hubert_base_jp pretrain/rinna_hubert_base_jp.pt \
--hubert_soft pretrain/hubert/hubert-soft-0d54a1f4.pt \
--nsf_hifigan pretrain/nsf_hifigan/model \
--crepe_onnx_full pretrain/crepe_onnx_full.onnx \
--crepe_onnx_tiny pretrain/crepe_onnx_tiny.onnx \
--rmvpe pretrain/rmvpe.pt \
you said you had model on kaggle here ,
then what is the problem ? could you be more clear
click this blue text
open w okadar
okada
up there ull see edit button
click there
also theres edit button too choose blank one edit button
choose ur pth file
upload if u can
ok , so i'm getting this error as well
you won't get this error on google colab then ?
i can direct upload my model to in model_dir folder
so
cuz these files in drive google i can access easily
Hi, I am trying to pass speech signals to my model by extracting features using Non negative matrix factorization. but unable to find the correct results. Can anyone guide me?
show me your colab file structure , i will try to do the same in kaggle.
it should be done in both
i don't want notebook
i want to know where and how did you locate model in that directory?
pth model ,
!mv source_path dest_path should be able to move model to specific directoy
can anyone tell me how to modify LLM layers (hugging face text generation models) and how to add custom heads to them?
Hey guys. I want to rank up on Kaggle. In order to become "Kaggle Expert", do I have to become "expert" in every category (competitions, datasets, notebooks, discussions)? Or just becoming "expert" in one of those categories is enough?
what do you want to do ?
you can create new layer and pass the input from base model to it ,
that's how LM head is added to base model , for generation purpose
so far this is what I've done
- I get the model using AutoModel.from_pretrained
- then I passed in tokenized text
- I get the outputs of the model using .last_hidden_state
but the last hidden state has a shape of [batch size, Sequence length, Embedding size]
which is not constant... it changes from one sentence to another
I saw some people get the output usingout_ids.last_hidden_state[:, 0, :]but that only takes the embedding of the first token...
I want to take outputs from the LLM model and feed it into a custom pytorch model
but I've having trouble with dealing the last hidden state... i don't know how to work with it....
so thought I might modify it or change it...
ok so which model are you using
yes model output will change according to your tokens
so if I want to pass it to a fully connected network
am I supposed to change input dimension of fully connected network every time i pass a new sentence?
that's not logical, it's like instantiating a new model every sentence passed
example:
first sentence passed -> last_hidden_state.shape = [1, 59, 3072]
which means first layer of fc_model= torch.nn.Linear(59*3072, 256)
second sentence passed -> last_hidden_state.shape = [1, 92, 3072]
which means first layer of fc_model= torch.nn.Linear(92*3072, 256)
you get what I'm saying?
you can add paddings to get constant length outputs
that's what everyone do when training with batches
ok for example padding=128 and it will be last_hidden_state.shape = [1, 128, 3072]
input to torch.nn.Linear(128*3072, 256)
is that how people create custom heads for llms?
I found this,
class BertForSequenceClassification(BertPreTrainedModel):
def __init__(self, config):
super().__init__(config)
self.num_labels = config.num_labels
self.config = config
self.bert = BertModel(config)
classifier_dropout = (
config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
)
self.dropout = nn.Dropout(classifier_dropout)
self.classifier = nn.Linear(config.hidden_size, config.num_labels)
self.init_weights()
as far as I know bert hidden size is 768, so in self.classifier it will be nn.Linear(768, num_labels=2 for example)
the output will then be [Batch size, sequence, num_labels]
tell me what exactly are you trying to do
if padding size is constant
lm head for llama
yes
for example say padding_size=128,
then the last_hidden_state.shape = [1, 128, 3072]
this is 3d tensor
torch.nn.Linear accepts [batch size, in_dim]
if i do 128*3072, that's a huge number
I just want to know how people connect the head efficiently
in this case for bert, they just passed hidden_size without multiplying it with sequence length...
so where did sequence length go...
for bert in case of classification we use first token [:,0,:]
yes ok good
now the question is
[:, 0, :] only takes embbedding of first token
what about the rest of tokens
as CLS tokens contains the whole information of sentence
their information won't be passed
wait what?
wdym contains the whole information of the sentence?
we use whole embedding if we want to do token classification
thats what standard method is
it is what it is , CLS tokens has all the info we need for classification
so CLS token having the whole sentence's information is only applicable for bert? or is it applicable for all LLMs?
no
are you running it locally or on collab
kaggle notebook
its different architecture
wait a little i will show you the flow
wait lemme share my code
tokenized_sentence = tokenizer(sentence, return_tensors='pt', padding="max_length", max_length=128)
out_ids = model(**tokenized_problem)
class Decision_Model(torch.nn.Module):
def __init__(self, in_dim, out_dim):
super(Decision_Model, self).__init__()
self.fc = torch.nn.Sequential(
torch.nn.Linear(in_dim, out_dim, dtype=torch.bfloat16),
torch.nn.Softmax(dim=1)
)
def forward(self, x):
return self.fc(x)
basic_des = Decision_Model(what dimension to put here? , n_labels)
//Problem is here, how do I pass out_ids.last_hidden_state with shape of [1, 128, 3072] to basic_des??
outputs = basic_des(out_ids.last_hidden_state) ???
so you are using base gemma model not GemmaForCausalLM
what dimension output you get from gemma model
Yes base model only
and what do you want to do
I showed you in the above code
this
just want to pass out_ids.last_hidden_state to torch.nn.Linear
i mean end goal ,
that's all, that's the whole problem
what will you achieve after doing it
ok i will reach out to you soon , i will head home now
but they already have GemmaForSequenceClassification
[inside] def __init__(self, config):
self.num_labels = config.num_labels
self.model = GemmaModel(config)
self.score = nn.Linear(config.hidden_size, self.num_labels, bias=False)
[inside] def forward(self,...):
transformer_outputs = self.model(
input_ids,
attention_mask=attention_mask,
position_ids=position_ids,
past_key_values=past_key_values,
inputs_embeds=inputs_embeds,
use_cache=use_cache,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
)
hidden_states = transformer_outputs[0]
logits = self.score(hidden_states)
their passing of hidden_states to self.score is strange...
I will look more into this
No it's the first element of output so its last states
i know that
hidden_states shape is still [batch_size, token_length, hidden_size]
wait we can pass 3d tensor to torch.nn.Linear?
Yes
The only thing that matters is last dim should match to input dim
If you want to know more try adding print statements in transformers lib code
Hey guys I was looking for some help with this problem. I was hoping to figure out a different solution to the one posted (which calls the function made in the previous problem). The issue I think I am having is the elif in the second for loop at the bottom which checks if the key is not in the match list.
you are reassigning values every time
Right, I was wondering how I can get the elif statement to only activate if the key isnt in the match list
If I remove the elif statement, it works, but the issue is if there is a key thats in keywords that isnt in the match list it just ignores it instead of adding an empty value.
then initialize x with with {key:[]} and just append the index or i in this case
Oh gotcha, ill go back and try that in a sec, but the key is a local variable in the second for loop, so I dont think I can initialize it with that right?
send me that code i will refactor it
x={}
for i, doc in enumerate(doc_list):
infos = doc.split()
match = [info.rstrip('.,').lower() for info in infos]
for key in keywords:
if key.lower() in match:
x[key] = [i]
return x```
could you send the whole function
def multi_word_search(doc_list, keywords):
"""
Takes list of documents (each document is a string) and a list of keywords.
Returns a dictionary where each key is a keyword, and the value is a list of indices
(from doc_list) of the documents containing that keyword
>>> doc_list = ["The Learn Python Challenge Casino.", "They bought a car and a casino", "Casinoville"]
>>> keywords = ['casino', 'they']
>>> multi_word_search(doc_list, keywords)
{'casino': [0, 1], 'they': [1]}
"""
x={}
for i, doc in enumerate(doc_list):
infos = doc.split()
match = [info.rstrip('.,').lower() for info in infos]
for key in keywords:
if key.lower() in match:
x[key] = [i]
return x
# Check your answer
q3.check()```
you can delete your code blocks now, just to make it more cleaner
def multi_word_search(doc_list, keywords):
"""
Takes list of documents (each document is a string) and a list of keywords.
Returns a dictionary where each key is a keyword, and the value is a list of indices
(from doc_list) of the documents containing that keyword
>>> doc_list = ["The Learn Python Challenge Casino.", "They bought a car and a casino", "Casinoville"]
>>> keywords = ['casino', 'they']
>>> multi_word_search(doc_list, keywords)
{'casino': [0, 1], 'they': [1]}
"""
x= {k:[] for k in keywords}
for i, doc in enumerate(doc_list):
infos = doc.split()
match = [info.rstrip('.,').lower() for info in infos]
for key in keywords:
if key.lower() in match:
x[key].append(i)
return x
# Check your answer
Wow that was so simple, thanks
I think I need to practice using list comprehension and for loops a bit more
@obsidian bone did you get it to work , the way you wanted ?
I was doing laundry, just finished
Will look into it later and tell you
By looking at raw data from a dataset, how can one decide whether it is necessary to add features (say, mean, variance, etc.)?
Is there any standard approach, or should I just do a trail and check?
If there is an approach, how do I know which features I should include?
Thank you
@sly inlet
tokenized_problem = tokenizer(sentence, return_tensors='pt')
out_ids = model.forward(tokenized_problem['input_ids'].to(DEVICE))
debug_layer = torch.nn.Linear(hidden_size, n_labels, dtype=torch.bfloat16).to(DEVICE)
debug_out = debug_layer(out_ids[0])
debug_out.mean(1)
debug_out.shape = [batch_size, n_labels]
I checked Gemma's sequence classification, they used last token which is [:, -1, :] to get their logits,
but i wanted to accomodate all the tokens, so seems like taking the mean is the only option here
thanks for your help
i believe that is standard way for it then
i don't think it will work better with this approach , but can try
I'll try both ways, with mean and with taking the last token and see what happens
last token method will pretty much work as Transformers have implemented it.
but i'm curious to see how mean of embeddings will behave
oohhh...
so transformers only depend on last token to predict next word?
like last hidden states
I am really having trouble understanding transformers at all.... in theory they are something, and on practice they are totally different thing
it behaves terrible...💀
it keeps outputting the same label on every sentence
will try last token now
it's the same with last token
What ? Try it for the first token then
k wait
last try with their default implementation of GemmaForSequenceClassification
i would love to test this at myside as well
but i'm busy with my office work now , i will check out this variations later
good luck with your office work
its hard to understand the code without playing with , i will later show you how much i understood after trying things with it for a week
but input_ids , attention_ids, position_ids and attention_mask are important things to consider
you can try generating a answer with padded prompt , without attention_ids it fails but with it it works fine.
its all about rotary embeddings
see you later
Greetings floks, am looking for a dataset that has Artworks with their descriptions (i only need about the name of the artwork with the semantic description, like what it is and what it means )
hope someone knows a dataset that suits this case
Can i get a job here after becoming a newbie ML and DL engineer?
define "here" 😅
Hello. When we are using two GPUs in kaggle notebook, how do include both of them when running a LLM with huggingface pipeline?
GPUs available: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
But currently, i am doing something like:
pipe = pipeline("text2text-generation", model="grantslewis/spelling-correction-english-base-finetuned-places", device = 0)
Which i think only makes use of one of GPUs
hello guys, i found a dataset that someone has worked on that i would like to use aswell and visualize just like he did on tableau, but despite trying to follow the instructions im so lost on how to download his jupyternotebook. can someone help me? thank you very much
Hello, I want to learn more about Machine Learning and want to know whether anybody has any suggestions as to which resources(research papers, online courses, books/textbooks,etc.) one could use to cover a lot of topics in a reasonable depth such that you could ascertain which fields of AI/ML you like or don't like. Coming from a complete beginner standpoint at ML here, but have the basic prerequisites of statistics, multivariable calculus, and linear algebra covered.
I did the machine learning basic and advanced courses of Kaggle so far. They cover some basic concepts like train/test etc. which I'm sure you will need no matter which direction you head later. Each of them is estimated 4 hours long (text + training).
hello, i am new to ML and was wondering how much an apple M1 with 16Gb will be able to handle for training? I found a dataset with 5 million data that i would like to work on but i'm sure that this is way too much for a laptop. what is a general range a laptop could handle? should i slice it down to 100-500k data? Also, should i just use google colab instead of jupyter?
have you seen this
https://youtu.be/l8pRSuU81PU?si=xOnR0-fXZDXhEyIz
We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusi...
yeah I did, i added to watch later list lol
yeah , kind of webseries on GPT-2
i am excited to watch
but i like whatever this man posts
but got some essays to finish this week... writing is so boring
i'm glad i finally over that crap
yeah, he explains it well too, I watched his transformers from scratch video
he has a discord server as well
man I already miss doing AI... it's been 3 days I didn't touch kaggle, feel like something important missing from my life
I hate university
aah i see
CS major ?
ELectrical and electronics engineering 🥲
hey, i'm also from electronics 😁
for realll?
u finished or still studying?
graduated last year
oooh congrats
which year you are in ?
atleast you are saved from it
my last year, lmfao. Finals are on 24th june, and if I pass this semester I'll graduate
ngl, CS things are so simple , we got pure madness of maths and theorms
i know right? unlike Electronics
u got motor and voltage source, good luck connecting driver to them
and make sure you don't end up exploding ICs
I hate this major...
wish I went software engineering instead
its better in elctronica at least you will get hurt by max 24v not like electical , playing with live AC
actually we played with 220v AC
ended up blowing a light bulb
i had fun though , connecting and building stuff, even though i didn't understand a thing
glad we didn't get injured
eehh idk, for me I didn't have that much fun
arduino has most of it's codes ready, u just connect components copy past the code and run the circuit Tadaaa!
i didn't like that my teacher always force me to actively participate in electrical labs , like i like my life as everyone else do
auto transorformers , delta , star
yeah transformers lmfao
we know 2 transformers now
one in electrical and one in AI
that are completely unrelated
bet , AI has long way to go
search now and you will get electrical one
or the movie one
too much competition
remove s
but HF needs to do more publicity
tbh, i hate this part ,
i wish it won't come to me at work place
honestly I hate it cause it doesn't feel like real engineering
but it does bring results
change few letters and you get something new, how am i supposed to get all answer with single prompt i don't know
chain of thoughts and self consistency for example, they do improve LLMs reasoning
yeah... LLMs are weird
upto certain point
still better than zero shot prompting tho
as your prompt gets bigger it start hallucinating
yes
check HF servers general chat
how a space can turn things over
Was it Mamba architecture the one that didn't have issues with tokenization?
cause looking at transformers architecture based, they are really unstable.
yes they were training model on byte level
here's recent try on it
https://arxiv.org/pdf/2402.19155
long way to go for byte level models to compete with transformers , its hard to find pattern with bytes on text data mostly
interesting will look into it, thanks
https://www.kaggle.com/code/abhishek0032/marathon-analysis-indian-atheletes hello everyone i have created a project check it out and if you have any suggestions let me know if you like it please upvote
Can someone explain why my code is running out of RAM? (please ignore the commented parts, I'm doing a test run to see if the submission is working correctly without training the models first)
in the code there is datasetsTrain variable. Can I know what's the len(datasetsTrain)?
I added the length of 1000, and it took 2GB of rams when appending the model to models
Umm, okay so here's the original problem https://www.kaggle.com/competitions/facial-keypoints-detection
Detect the location of keypoints on face images
your neuralnetwork model has 288685 paramters to train, which is also something to consider.
So I'm initialising a different dataloader and model for each feature to be predicted
why not initialize only 1 model
but what im confused about is that it crashes only during prediction and not training
I could do that and store weights for each feature in a different file
most probably the len(datasetsTrain) is so large
and you initiating a model for that times of length
I was just experimenting with whatever I could think of at the time, and it working for training and validation so i didn't optimise it
but there is no error during training or validation, only during submission
and that's what im confused about
hmmm lemme check
I also copied portions of the code and didn't remove the comments so ignore the double comments, im sorry
looks pretty ugly
i had commented out the training loop for checking if the submission works correctly
while training
alright I'm trying the code on dataset, I'll return to you in couple of minutes
wait you using this in your local machine or kaggle notebook?
colab
oh
yeah seems like your submission function is doing something weird
17.4 GB right
now
bruh what did u do lol
wait lemme check the submission function
ok notebook crashed
For each row, I was using the image id to access the image from the test dataset and using the encoded feature name to access the model to be called (which could be optimised by saving weights)
ok fixed the problem
def submission(models, lookupDataset, imageDataset, organEncodings):
for model in models:
model.eval()
for i in tqdm(range(len(lookupDataset))):
# print(lookupDataset[i])
# print(imageDataset[int(lookupDataset[i][0][1])-1].unsqueeze(0))
# print(imageDataset[int(lookupDataset[i][0][1])-1].unsqueeze(0).shape)
#print(f"Row Number {i}")
#print(f"Before:\n{lookupDataset[i]}\nAfter:")
with torch.no_grad():
lookupDataset[i][0][3] = models[int(lookupDataset[i][0][2])](imageDataset[int(lookupDataset[i][0][1])-1].unsqueeze(0))
#print(lookupDataset[i])
return lookupDataset
you should add with torch.no_grad() before you feed the data into models
because without it, it calculates the backprops of the models and accumulates it
which takes space in ram
Even when I don't call loss.backward()?
yeah
What's loss.backward() for then? Doesn't it store the gradients?
no, it computes gradients of parameters in neural network layers
Um
somehow I used a dataloader and now it works fine
but when you do forward pass
the model create differential graph
if it's gradient enabled
so what's it calculating here?
oh sorry i meant it's accumulating the gradients for parameters
no backprops
I'm still confused
doesn't zero-grad simply set them to 0?
and how are they accumulating it if loss.backward() is never called?
zero grad resets the accumulated gradient
by forward pass
wait lemme show u sth
wdym by accumulates gradients
on the second one with torch.no_grad(), it returns the output without recording the graph
how large are the gradients?
what gradients?
i mean, those grad fn objects
I have 12k entries, so is each object like 1 GB?
nvm 30k entries actually
Thanks, this makes sense
basically when you do forward pass, the tensors record history of the computation graph
which takes space in memory
but with torch.no_grad
u tell the model not to track the computation graph
I gtg now, can we discuss more of this later? (Ill read it)
but just do calculation and output number
Thanks for your help!
Hello everyone, I'm gonna try out one Kaggle competition, but I don't understand what "internet access disabled" means. Can you detail what it means???
when you are on notebook, on the right side panel there is "session options", under that there is a toggle for "internet on".
and as far as I remember you need to verify your account to have that thing.
Internet will be disabled and only the things on Kaggle can be used
I already verified my account. Thank you anyway!
Yeah I misread your question, I thought you were asking on where the button is lol
Sarvesh answered it
@stable dragon Does that mean I can't upload any packages or data to my kernel? Can't I download the datasets offered by Kaggle? If I run the code in my local and upload it to my kernel, is it a violation to the rule?
You can use anything that's available on Kaggle. You can also upload stuffs on Kaggle and use those, it's just that anything outside it can't be used
@stable dragon Thanks for your quick reply. I'm getting closer.... Can you give me specific examples that are not allowed to enlighten this newbie 😔 ?
Would recommend checking with a public notebook, that would help you more.
Alternatively to experiment with things, turn off the internet for the notebook that you are working on, and try executing the code
@stable dragon Thank you so much!!!
Hi! Can I run stable diffusion and kobold ai on kaggle without be banned?
I'm too recording a course about stable diffusion, can I use kaggle to teach and the students use without be banned? How it works?
Can I train checkpoints to stable diffusion on keaggle without be banned?
I think there are notebooks that load dynamically stuff from GitHub, this would be not allowed
@plucky vector Thank you! Could that be the only restriction?
Can anyone tell me how to import a python notebook with all the inputs to the local via API.
Right now I need to individually download all the input files
Hey guys I have a question , can we build sequential model ?
Like is it possible to train a model on X1_i Inputs Y1_i Output and then the second one is running on X1_i + Y1_i to give output Y2_i ??
Context : (I am trying to build this for a product where we are predicting what the user is likely to select. I have learned about Supervised Learning Algorithims including ensemble techniques)
yeah it's possible
So I am new to the whole data science thing and just started working with kaggle. Is there anyone that might be able to give me some pointers?
Do the Courses on Kaggle to learn the basics. They are for free and with many hands-on exercises
I started the courses and I think there is a link broken in one of them cause it wont load a dataset that is in the input unless I'm doing something very wrong
There is a discussion area for every training in the courses, and also an area where you get hints how to get the right solution. Did you look there?
At least in the courses Machine Learning I + II and Neural Nets that I did
I was doing the Getting staRted tutorial. And its done in a notebook. I'm doing a full career pivot here don't know too much about data science besides what I've learn in a coursera cert so my knowledge in all this is very limited.
Ah, I didn't do that one, so I probably can't help specifically. Did the data load in the other training notebooks, in the other lessons?
the dataset loaded in the previous training notebook and I tried a trick that I did in my own notebook. But even if the dataset shows in the input it won't load with R.
That's strange. Did you select R as the language instead of Python for your cell?
I must admit, I didn't work with R so far on Kaggle
Yea R is the default on it. So it's really confusing me.
On that subject the only reason I'm working with R right now is that's the programing language that the Google cert I took taught us. But I haven't really come across anyone that uses it for data analytics. Does it matter if I stick with R or should I switch to learning python?
Well, as far as I understand with R you can do only data statistics, and with python you can do almost everything you can do with any other programming language too -- program scripts, for example, and especially on Kaggle also do machine learning stuff.
I don't know what your background is, but in the natural sciences python is very widespread -- because it's easy to learn and most scientists aren't programmers. They want to put some code together to do something for them without diving into the concepts of stuff like object-oriented programming
I'm getting ready to retire from the Marine Corps and I want to pivot into data analytics. I was just in a class where they told me Python is a more preferred skill, so I think I'll go back and do some classes on that and start working some more case studies again.
You can learn both. The languages are a bit different, but the concepts of the analytics themselves are the same. Like math books in different languages will still contain the same math
Yea I know. But I'm trying to find a job so I want to start with a programming language that is more widely used... I think.
There are news that US government forbid any IT services for Russia after 1st September. Is there any information about kaggle, will it stop working for russian users?
I don't know
anyone truly knowledgeable about big data & business intelligence here? i got a uni assignment and im cooked
please dm me
I have posted a question on how to become an ML expert with my current knowledge, if anyone wants to help I would appreciate it. Thank you in advance 🙂 https://www.kaggle.com/discussions/general/512308
How to become ML expert.
Has anyone run into the problem where the kaggle python package cannot find kaggle.json? I have double and tripple checked that the kaggle.json file is in the "~/.kaggle" directory and I have also run the chmod 600 command on the file.
EDIT:
I figured it out. I had to set the KAGGLE_CONFIG_DIR to be the full path to the .kaggle directory. For some reason the relative paths ("~/") do not work. The permissions were set to 600.
hello, does anyone know any good resources of understanding the "dataset"? I'm not so good at concepts such as feature engineering and understanding the visualization of the data. So I guess like data preprocessing?
i was training some model and the progress epchs stops and i refresh it now the session stuck in booting kernel but when i see the console its still running. how to solve this ?
Hello, I'm having a problem with something really simple but I can't seem to make it work.
When I try running the code below, it doesn't recognize min_frequency as a parameter despite it being in the documentation. So I'm trying to upgrade scikit-learn to a newer version but it doesn't seem to be working. I run my code on kaggle. Help would be much appreciated 🙇♂️
!pip install --upgrade scikit-learn --use-deprecated=legacy-resolver
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder
# Define the columns to be encoded
categorical_cols = ["Gender"] # replace with your column names
gender_categories=['M','F']
# Create the encoder
encoder = OrdinalEncoder(min_frequency=10,
categories=[gender_categories])```
For more context, I got this error the first time I tried to ugrade:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spopt 0.6.0 requires shapely>=2.0.1, but you have shapely 1.8.5.post1 which is incompatible.```
The version also hasn't seemed to change yet
I am having a problem while predicting using YOLOv8 with 2 GPUs i ahve set a batch size of 16 but i am still having the same problem
yes R is also used for data science, if you are comfortable you can use it
Hi this may sound like a bit of a silly question but what channel is for the discussions on the Titanic competition?
#🚢┊titanic I would guess
If you go to id:customize you can see many more channels than in the standard list
Hi I've cloned from a git repository, and i tried to open the files under the output/RadioUNet directory but to no avail. Is the behaviour as expected? I tried googling about it and chatgpt did say it is possible to open the files by clicking on it
I can't seem to find the channel for the kagglex competition even though I've verified myself on discord, any help?
import plotly.express as px
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
medical_df = pd.read_csv('/kaggle/input/test12ssda/medical-charges.csv')
sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (10, 6)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
fig = px.histogram(medical_df,
x='age',
marginal='box',
nbins = 47,
title='Distribution of Age')
fig.update_layout(bargap=0.1)
fig.show()```
Someone please help
it's been an hour since I started trying to fix it
it doesn't show the graph but doesn't give any errors
and yes I printed medical_df and it was fine
so nothinf with the dataframe
@ruby pewter
Use copilot or LLMs to fix these errors, would be faster and easier
Does anyone know good ways to approach fitting classifier models on high dimensional vector embedding data?
As apart of my research at my university we are testing the accuracy of ML classifier models in network anomaly detection after the data has been converted to a textual format and embedded using LLM sentence embeddings. The goal is to measure the change in performance of the models with the data with and without LLM embeddings used in preprocessing.
A problem I’m encountering is the datasets I work with are typically 1.5 million instances and when running the data through a sentence transformer model the dimensionality of the data skyrockets (about 350+ embedding features for the current model I’m using) and fitting the models we’re testing (RFC, ET, SVM, ADA, XGB, GB) takes a significant amount of time which is not good as my code gets run in Google Colab and constantly having to re up on compute tokens is not cost efficient at all.
I’ve thought about running the embeddings through PCA but I think the dimension would have to be reduced so much that it would be significantly diminishing the strength of the embeddings. Before I try this route, are there any suggestions on better ways to approach this problem that I don’t know of? Thank you!
It says it's correct and it locally runs fine
for numerical variables, use px.histogram with the histfunc parameter set to count or sum
😫 Well as I see it, the problem you are facing is caused by the fact that OrdinalEncoder in scikit-learn does not have a min_frequency parameter
It only has a category and handle_unknown parameter.
Why not preprocess the data using the counters in the ingest module before encoding the data?
Are you sure that patameter exists?
Hi, Thanks for the reply! It's true that Scikit learn version 1.2.2 doesn't have min_frequency and I can just make it myself, but some newer features like TargetEncoder take a lot more effort to make from scratch. For now I just moved to running the code locally on Scikit learn 1.5.0 and I'm no longer getting errors, but I wish there was a way to run it straight on Kaggle notebooks
You're welcome!
they may not always have the latest versions of popular libraries.
but you can try
You can create a custom environment on Kaggle by installing the required packages in a new environment to use the latest versions of scikit-learn and others.
so they have to update the libraries for the platform themselves?
I thought it was something similar to a VM
How can I create the custom environment?
So I'm relatively new to the data science world (experience in JAVA, but thats pretty much it). I've been trying to learn Python, but from what I have seen, it seems that most data science work involves using python libraries more than anything–– coding jargon isn't too necessary. Correct me if im wrong, as I am still learning python
And while we're at it, can someone introduce me to the basic functions (with examples, don't need to be too in-depth) of each python library?
I had one question, in the leaderboards it says:
This leaderboard is calculated with approximately 20% of the test data. The final results will be based on the other 80%, so the final standings may be different.
how will you calculate the remaining 80% of the data given you dont know which model I used
I suppose, you have to submit a model or kaggle notebook on this competition
Is there a good way to propagate image augmentation transforms to a label when loading training data (torchvision)? I am trying to train some model with pixel coordinates on the image as labels, which means I need to figure what transforms are being applied to the image to transform the label as well
Hello everyone, I need some advice on where to dedicate my time. Let's say I wanted to start a company down the road in the automotive/consulting business, if I'm working as a automotive mechanical design engineer, during my free time would you guys suggest learning more about the automotive industry/business or machine learning? I enjoy machine learning a decent bit but idk how transferable it will be for a business
I know that that is where the industry is going tho, EVs, auto driving, etc
👍
Have you tried using OOB estimation?
Yes, I've had that experience too.
I used the torchvision library from pytorch.
anyone was able to download LLM files from huggingface using git clone? mine one stuck
never heard about it, will give it a read thanks!
Hi there, I am working on a competition which requires me to submit the notebook and runs it in offline mode, I want to import SentenceTransformer module, how can I import that
?
You need to include the wheel in the notebook input. See this link that describes it: https://www.kaggle.com/code/narnaoot/installing-packages-without-internet-for-kaggle
Is there a way to download the dataset of this competition https://www.kaggle.com/competitions/prediction-of-the-cefr-level-of-english-texts/overview ? Im aware that the competition is over, does that mean that the dataset is not downloadable anymore?
Challenge Data - IMT Nord Europe (29/01 - 02/02/2022)
Hm, if I go to the "data" tab I see all the data with a download button
same, but when i click it, it redirects to the rule tab, and it ends there
no way to approve the rules, bcuz it says its not required
ive tried using Kaggle API also might i add
i got 403 permission denied
anyone here ? I'm new.... I had a question, maybe in general too... so if I let's say had embeddings for my train set, would I have to run that over every time when I submit or can I save the embeddings on my kaggle directory and just look up the numpy array when I submit? I'm doing the essay prediction learning one...
Yes, when working with embeddings, it's a good idea to save them separately from your model and recall them when you need them
how are people in the essay learning scoring one using the LB for their own models? seems like that won't fly in final submission?
can the entire model based on training just be saved as a model so that it can run in a few minutes instead of using embeddigns to find the features directly first in the test.csv? wondering if there's a way to just make a model and quickly lookup scores in the test.csv without the intermediate steps?
Hey guys, I have one nube question.
I'm trying to make my first submission to a code competition. I did my work on Jupyter Colab by going through some baseline examples.
Pretty much every examples use deberta-v3-base as their model and tokenizer, and they create the these by doing something like:
class PATHS:
model_path = '/kaggle/input/huggingfacedebertav3variants/deberta-v3-base'
self.tokenizer = AutoTokenizer.from_pretrained(PATHS.model_path)
model = AutoModelForSequenceClassification.from_pretrained(PATHS.model_path, num_labels=CFG.num_labels)
I am not sure from where and how you can add the models/tokenizers to the kaggle/input directory. I tried this tutorial (https://www.kaggle.com/code/shravankumar147/save-huggingface-model-to-local-for-no-internet/notebook), but I still don't see deberta_v3_small_pretrained_model_pytorch created in my kaggle/input directory when browsing from the Jupyter notebook created through competition's submission page.
Would love to get help on this. Thanks!
Yes, model serialization.
you can use the save() method to save your model
model.save(your model)
can you submit a notebook via good colab or is that only for files like submission.csv? I ran out of GPU and they won't let me submit with GPU enabled now
You have to join the competition before downloading the data - even after the competition is finished.
i see, so now that the competition is over, there is no way to download the data?
Join the competition in “Data” tab.
Im not sure how to do that, Ive clicked the license, it redirected me to the rules tab, but there is nothing to accept, when i click download, the same things happen (i.e rule tab). Ive tried the cli script, i got 403 permission denied. Can you guide me on how to join the competition via the data tab? thanks
It seems that particular competition is not set up for late submissions, so maybe there’s no way to join and access the data.
roger that, thanks for the confirmation
sorry to double post, but very urgent given two days before competition deadline on aimo: my team requires a dataset upload to submit our notebooks and it seems this is broken. can anyone help? submission deadline in 40 minutes
Hi Everyone, need help. I am trying to submit my notebook for AIMO competition but every time I am getting "Notebook threw exception" for the submission. I have put most of the code in try, except blocks to catch any exception. I just want to know what exception where it is throwing, please help I have spent days and nights to come this far. Here is the link to my notebook. https://www.kaggle.com/code/kiritidesarkar/aimo-deepseekmath-db-solved-trainingquestions. There are only 2-3 days left for the competition and I am not able to get even a score because of this exception.
I have checked the logs but no exceptions I could see there
I am a beginner who just started learning and I had a doubt "Should we do imputation even when more than 50% of the values are missing ?"
if so what will be the best method?
impute when you feel that the variable is extremely critical and imputation won't logically hamper your output in any way
but if a variable has 50% of its values missing i'd rather just drop it if it is insignificant anyways
try imputing and check the correlation with the dependent variable using
sns.heatmap(df.corr())
hey, I had a doubt related to the Titanic Dataset I have recently started Learning ML can anyone tell me what columns I should target to give the required output?
I am not able to understand what output does they want?
who will be the winner?
the output it wants is, what was the percentage of women and men that survived on your test data
okay thanks
Hello, I have little experience in the world of data science, but I want to participate in a competition to test my skills and acquire new ones. Could you recommend one, please?
How to get room direction in indoor photo/image.
You can find beginner level problems on kaggle
Try to build those problems model
It will surely help you gain a better understanding of model
Hi, Is there any structured course that gives proper directions to Kaggle beginners on building profile? or any mentor in the group? Kindly guide. TIA!
Yes there is, check the beginners project , it gives guidance on how to build the model (ex: titanic)
You should be okay once you do this guided projects
not the single project, but the structured course
cz a lot of people are like me, new to kaggle but dont know how to build the profile. To start with , its titatnic project , but then whats next, as in series of tasks
Hi everyone, I’m new here and need some help. Can someone please explain what I need to do for this Skin Cancer competition in a simple way?
- What do I need to do? - Develop a program to identify dangerous skin spots from images.
- What language and tools should I use? - Use Python and tools like TensorFlow or PyTorch.
- Which data should I use? - Use the Training and Test data provided in the competition files or data from outside.
Thanks a lot!
pheeww.. another newbie question:wWhen I submit in a Notebook competition, does Kaggle simply replace test.csv with the hidden dataset? 🤔 My submission is taking too much time, so if that’s correct, I need to re-engineer my approach.
Hi everyone, do you know if there exists a dataset for fake (LLM-generated) online reviews in german language? I am looking for somthing similar to https://osf.io/tyue9/ but in german.
Thanks!
When I try to get the code, I get the too many requests message
I'm looking for anyone interested in collaborating with me on an ML model for turning raster curves into vector vector (Bezier) curves? Is there a section on this Discord most appropriate for asking?
I'm calculating pAUC_80 using the implementation found on the icic skin cancer competition, but it is not matching up with the outputted score, any ideas on how to fix this? I want to test my models true value without wasting a submission
Hello everyone, i have a question and i would really appreciate your assistance.
I have 2 networking and ip addresses data files with .RR format (ex: myipv6add.RR, myipv6add2.RR) and i want to extract into MySQL file .. how can i write a script in python to do that ? 
The kaggle quota of GPU resets every week right?
I'm sorry if my question is too basic in Kaggle, but I really can't find an answer. Any answer would be appreciated. >> When I submit in a Notebook competition, does Kaggle simply replace test.csv with the hidden dataset? 🤔 My submission is taking too much time, so if that’s correct, I need to re-engineer my approach.
when will participants for kaggleX fellowship program be notified if they got accepted or not?
Yep, it resets each week.
Are you training your model in your submission?
Thanks for your comment. Yes. I'm working on the metric competition (USPTO-explainable-AI).
If you can export your model you might want to do that
and import it into another notebook
so you arent training it everytime you submit
hello guys, are there any source that specifically collects data on traffic?
hello i am working on projects accross the web and participating in competetions as well as starting as a freelancer to enhance approach in the industry and work on real world data. DMs are open.
Hey guys, how do I access my GPU and TPU in my kaggle notebooks?
Do i need to verify my phone number or smth?
yes
alright, thank you very much! and after that, you can modify the settings in the notebook, right?
yes , you will get option for accelerator
thanks man
hi, I recently got banned from my other account due to using prohibited code. How to get my account unbanned, I really learned my lesson now 😭
Hi, i need some support on 1 dataset i've created, because it's very biased
what were you exactly doing ? and what is prohibited code ?
how do I team up with a friend?
@wispy eagle There is a "team" tab that allows you to add friends to your team to work together. You might need to refresh after accepting the rules.
hi, I git cloned facefusion from github, which is not allowed. Now I really want to get my account back (kaggle: imduckman), it has my other valuable code :(, how to contact support, I'm crying rn 😭
I have been working on the Image Segmentation task, but most of the notebooks I reffered have used MSE as the loss function to train the model, they are comparing the generated mask and the ground truth using the MSE Loss function. Generally for the image segmentation IoU should be used, when I tried that, my IoU loss stopped decreasing and it stagnated at 90,while MSE loss decreased quite well until 0.02. Can anyone suggest,why this is happening. Any help or suggestion will be quite helpful thanks in advance
could anyone help me on getting my Kaggle account back 😔, I've been so depressed and stressed... My kaggle account is imduckman, I got banned after using prohibited code. My account has lots of my valuable code that now I cannot access. I really realized my wrongdoing and learned my hard lesson now... I really apologize for my violation, I promise I would never violate the rules again. Thanks a lot
hii everyone 👋, how do I upgrade to kaggle pro ? and I recently ran out of memory when deploying gamma model in my kaggle notebook, can someone tell how to get more memory from kaggle?
Thanks in advance!!
Hi, please when is the next application to register as a mentee opening ?
how do i start with kaggle? I have done a machine learning course
@agile veldt Could you elaborate your explanation? I'm wondering how my notebook submission is scored. The host has hidden dataset. The hidden dataset should follow my algorithm on my notebook to get the results, and the results will be scored by the host's metrics. Then, the hidden dataset should replace test.csv in my notebook. Am I missing something in my logic?
did you get any info?
@wind silo Hello and sry for the ping :)
but, any updates on the results of KaggleX
no updates regarding KaggleX fellowship program acceptance or not.
Hello, everyone
I want to create new similar style music from 100 ambient music.
Please help me.
hello guys, I have been doing ML for about 1 month and I have no problem understanding the models and maths
but whenever i start implementing it myself
on any datset
i just go blank
and dont know what to do
can someone please help ??
Hello everyone,I am new to kaggle .I want to participate in competitions and started learning ml but don't know as a beginner how to participate in competition ect , please anyone guide me.
use the free courses provided by kaggle
Hi @high sigil, thanks for your message following up on the application status. We are in the process of reviewing applications. We will be notifying applicants at the end of July. Please visit kaggle.com/kagglex or our discussion board for updates.
Link to KaggleX discussion: https://www.kaggle.com/discussions/general/357233
KaggleX Program Q&A.
thanks @dapper yoke
hello everry one i 'm looking for anyone interested in collaborating with me on an ML
How can you implement the volov8 or volov5 architecture in tensorflow for image classification?
where is it written that it's not allowed?
cause i've seen some notebooks git cloning the repo
Hey guys, does anyone know if there is a data api that I can use to access my user data (ex: competitions, leaderboard rankings, notebooks)? I want to programmatically access my information so that I can display it to others without having to redirect them to kaggle.
i had the same problem, do i a few guided projects, start to solve, if u cant read through the solution entirely once, then implement it with looking at the solution.
I get this error while downloading the data via API, any suggestions?
403 - Forbidden - You must accept this competition's rules before you'll be able to download files.
PS: I have already accepted the rules for the competition
hello guys
my add-ons is not showing up in my kaggle notebooks
i have tried to sign out sign in multiple times and created mulitple notebooks yet I am unable to add on my secret keys
guys how do I use the attention mask like this when we have batche of sentences with different lengths?
like do I multiply attention mask with tokenized input, or do I input it seperately to the transformer model?
context: I am not using Hugging face library, I built custom transformers based model
How would i gain followers on kaggle and upvoted by people?
Hi, i'm relatively new to ML. And i always get this far when i make models (then i make the plot of predicted values vs actual values) but that's it. How does this apply to the real world? What tools do you use? Do you have documentation of that or recommendations?
Because if I go to an interview and show them my script, i'll just sit there and not know how i could implement it into something "real."
hey guys any attempt to use get_gcs_path() function just results in an error, does anyone know how to fix?
Ohk...use flask in which you create model exactly like this but working should be in pipeline format means you have to code logging with this code which u done right now that just one step closer
And make projects that is for ml .. nothing more it have
That's my GitHub follow me and see my diamond price prediction project
Also follow me on kaggle please
Hey can anyone help me how I would I make my own new dataset
I completed ml and I am very much introduced to the thing's but for making dataset from where we decide.colums and rows and specially feature in it and its data values
I am confused so anyone who makes dataset can HELP me please
Does any one help me to find some end to end data science industry project?
Mainly I looking for finding fraudster/bill defaulters using their credit history
Or finding out customer key insight of a business store using their all their customer transaction history
Mainly who is their target customer and who can be in future whom should they focus on
I am not able to download the model file from the working directory
any fix for that ?
is there a way to early stop flaml automl? I did find hyperband and early stop, but not a standalone function
Whenever I want to tune hyperparameter with keras tuner it raises exception saying "RuntimeError: Number of consecutive failures exceeded the limit of 3". Can anyone help me how to solve this problem?
Hi, i was wondering if anyone knew any libraries that have a bot in a maze, and the bot has to try and find its way to a goal known that may or may not move, and is unknown, using a neural network?
Hey guys, i am relatively new, i was thinking if it is possible to
https://www.kaggle.com/datasets/abcsds/pokemon/data
predict pokemon type 1 from its attributes with classical machine learning , i got 25.5 % (better than random guessing )
https://www.kaggle.com/code/anikeetgarg1/pokemon-type-prediction
is it possible? how? what did i do wrong?, can someone also know how to use mutual info?
thank you so much please tag me if there are any answer to the question
Can anyone help me how to make dataset..means I have idea about it and column name...but how can I find data for that column
is there an IDE ,that has conda and jupyter integration and works reasonably well? I tried pycharm community, but the notebook is a payed a payed feature...
Please Why is this not working please
i have successfully imported all the other documents for train and test
you might nott have run the cell above it
how do you guys find unclean datasets on kaggle? I am doing data cleaning on structured data and i am looking for dirty datasets like housing prices or loans
or price predictions
it can be either classification or regression
Hey, could you advise me on your favorite python libraries to quickly try out different model types (SVM, RandomForest, etc) on a dataset? Thank you!
Sklearn
hi, i am new to kaggle. Can someone tell me how to make a comment in discussion?
Whenever i open any discussion page, i didn't have the permission to make a comment.
can i give kaggle competition on my own? no team.
i did
The issue cited is that train_data is not defined
Either variables name is wrong or
You might not have run all the cells after you edited them
I don't know if there is any other reason that could happen, try rerunning all the cells
Where can I better understand how a notebook submission should look like for a competition? Thanks ❤️
Hi everyone, I'm stuck on a part of my code, I need to predict the price of gold in India.
I chose to use the ARIMA model, but for some reason the value is repeated after the ninth index.
I already tried:
- Put the ''Date'' column in the index.
- Keep the date column in the dataframe.
- Turned the ''Price'' column into a list, and it didn't work.
Someone could help me?
PS: Sorry for my bad english, I'm brazilian, and actually I don't have time for became more fluent
look at other notebook how they submit it, i do that
#❓┊ask-a-question guys I do not have a master degree is it good for me to continue pursuing ML (I am currently 2nd year b.tech) ??
are there any jobs I can do without a masters degree?
Thanks for the answer. I've been looking through notebooks under the "Code" section of the competition, but I don't get how I could know if this is a submission notebook or just some general exploration or something else. Is there any way to know what an actual submission would look like?
most notebooks have a score attached to them, even if that is not the case I have observed most notebooks to be submission notebooks
Thanks for answering. I think my question is really this:
What is the notebook submission supposed to have?
Does it have to load data, train a model, and write results?
Many thanks
in getting started competitions you will observe a sample submission.csv already loaded, thats all you have to submit nothing else (as in the same format)
you dont submit the notebooks, just that csv file having the predicitions
But now I'm trying to do the ISIC competition with friends, so I guess I need to submit a notebook so that I can generate predictions on the hidden test set
hey . i uploaded my prediction csv file. how i can know if it is ok or not? how to chec k ?
you predict on the test data in your notebook itself, no need to submit a notebook, as I said
hey everyone, I'm having trouble verifying my account using a phone number, I actually tried two of them but in both I get that "too many requests" orange message, and I've tried more than once in different times of the day but still to no avail, any help would be highly appreciated 🙏
Thanks for the constant answers. I'm still dumbfounded on how it's possible to predict on the test data and generate responses if that data is hidden from me.
Is there a function that loads the test data?
Thanks so much
Hi guys I just submit the first prediction titanic with the tutorial, but I am wondering how it goes from now, I do not think it is over. Right?!? tks
The titanic competition runs forever since its a tutorial. After making your first submissin you should try to improve your score by trying different techniques, then you can move onto other competitions.
The best way to figure this out is to look at what other public notebooks do to make submissions: https://www.kaggle.com/competitions/isic-2024-challenge/code
Identify cancers among skin lesions cropped from 3D total body photographs
Hey hey hey
Ive been trying to apply quantile loss function to my xgboost regression but something weird is happening
The 0.05 quantile seems to be working fine, but the 0.5 is wayyy off mark and the 0.95 is not even trying
(On the picture i made the xgboosts overfit the data just to confirm each quantile was doing its job properly)
And getting a flatline when trying to overfit tells me something is weird
Ok i think i figured out the problem is that the gradient of quantile loss is always the same value if you re doing bad there is no incentive to go up or down so the xgboost is just stuck at 0
So ill try to modify the quantile loss function so it knows it is getting closer
Now it works well ☺️
still waiting for help
Idk looks something like you're accessing the kernel multiple times or something but i think i recognize kaggle interface so it has to do with your connection to their website
Chatgpt works very well with copy paste-ing error messages
Give it a shot
hello everyone, I am planning to go to university, but I can’t decide on the program, can you tell me how important mathematics is in data science and how many thousands of hours it is best to spend on it in order to study it to the required level to work as a data scientist. Thank you very much in advance
Data Science is all about Mathematics
In order to work in data science, is there a minimum of what skills you must have?
Also is freelance data science a thing?
Hi, guys! Do I have to mention on the Kaggle forum that I'm using external data? I remember there was an External Data Thread in every competition but not anymore.
Hi, how can I have more space on Kaggle?
You should check in each competition specifically for the rules that apply to it.
in a project I need to get the following api keys.
LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...
DEEPGRAM_API_KEY=...
OPENAI_API_KEY=... I have difficulty getting the LIVEKIT_APi keys. how can I create one?
i believe you know they are not free
Aside from openai, the other two are free
Not related to kaggle
https://www.youtube.com/watch?v=gDF_qGzYEYQ&list=PLqXS1b2lRpYTUHPp2MYkgXS7v6_qA-JsF&index=5
guys i was thinking of replicating this project in my laptop, will my 3050 6gb will be enough?
In this simple coding tutorial, you will learn how to make your own Generative AI application with Stable Diffusion, Docker, and Flask. The app will take in a user-provided text prompt and convert it into high-resolution images. (Total size of 2048 x 2048 pixels! 😱)
We will also dive into Docker Init, Diffusers, CUDA, FreeU, EDSR, OpenCV, and D...
can I share my kaggle profile updates in #💬┊general ? if not here .. then where should I post them..? thanks..!
bruh did u do this project
I created an ml model using imagenet v2 on a 12 class trash dataset and since it has achieved a high accuracy, I want to publish it as a notebook. What is, in general, the best format for a kaggle notebook?
i would look at the top notebooks for that dataset on kaggle or of a similiar type but really depends, some people go heavy on the EDA and some others like just heavy modeling focused with little comment but great metrics
hey! I am doing this little project to help detect cardiac arrest and cant find agonal breath datasets anywhere... any tips?
Hey Kagglers, I'm currently doing the Guided Tutorial for the Petal to the Metal competition. I understand what the error image is saying but I don't understand the long string of letters and numbers that comes after the "object at".
The image without the error message is where I'm assuming the error is pointing me to, but here's the problem: The error wants me to use the DefaultDistributionStrategy but I need to use the TPUStrategyV2 in order for this guided tutorial to actually be meaningful. I have no idea how to make the distribution strategies the same but all I know is that I need to keep using the TPUStrategyV2. Help would be appreciated 🙏
🙂Does Kaggle offer an internship or pretraining program like Revature for job placement?🙂
you're correct that you need to use TPUStrategyV2 instead of DefaultDistributionStrategy. To use TPUStrategyV2 ,
you can create an instance of it and pass it to the distribute method of your dataset.
PhysioBank or CAD
thank you so much asao. although, i'm not sure how i would implement this. do you have any example code i could use or how exactly i would do this?
hi! thanks for the quick reply. i checked physiobank but kinda confused about what is on the website… it showed i. one result but when downloaded it was a text file of just random text… also not sure what CAD is
Guys your views on this book? For a beginner? What should be the perquisites?
the only drawback i felt for this book is it uses tensorflow which is pain in the ass
try this free pdf version of book , and see if its fits your need
https://udlbook.github.io/udlbook/
Thanks buddy.
"Notebook Threw Exception" Error
Hi all, I’m seeing a "Notebook threw exception" error after submitting my notebook for the competition. It runs fine locally. Has anyone encountered and resolved this issue? Any tips would be appreciated!
Thanks,
anybody knows the reason for this ? I'm doing a simple project from a book I'm reading and this keeps happening which makes running any cell take way more time than it should
2 things:
Do you recommend using Kaggle learn
Is it better do the notebook first and use the learn page for reference or just read the page and then do the notebook
Hey, i learnt machine learning via kaggle free courses then went for this book, this book is goes in a lot of depth for each topic
Pre-requisits - Pandas, Numpy, matplot lib, you will struggle with ML in whole if you donot master these 3 topics
Just master Pandas, matplot lib then go for it
Hey, I have a few questions regarding hyperparameter tuning for kaggle contest models:
- Where do you tune your hyperparameters? Locally or on kaggle notebooks (or maybe google colab?)?
- How long does it generally take? How to decrease tuning time?
- What's the best library for hyperparameter tuning? Optuna?
Hello everyone, hope you are well.
Is there a way to import a public github repository to my Kaggle notebook environment?
Hi, does anyone know how to load in a pretrained image classification model in kaggle ?
Hello everyone,
I need help with this project if anyone can help it will be very helpful.
Here is the Link - https://www.kaggle.com/discussions/questions-and-answers/522860
Help Needed: Improving Time Series Prediction Accuracy with Ensemble Models.
Hey all, this is a long shot, but anyone willing to have a discussion on ethics in data science? This can range anywhere from privacy to bias. And just full disclosure, I'm genuinenly interested in ethics, but this also happens to be part of a data ethics course i'm taking as part of my grad program. Any help or direction is appreciated! I've reached out on many outlets and not getting many hits unfortunately
Can you tell me more about data eithics?
This is new tpic to me
I'm not the best person as I'm still taking the class hehe 😅 but from what I know the name pretty much explains: any ethical issues that may arise within the field of data science would count. The examples I mentioned seem to be pretty wide-ranging. The first one has to do with not infringing upon people's privacy, with some examples being health data (HIPAA), Amazon Echo potentially listening to private conversations, Apple accessing biometric data such as face scans, and etc.
In case you didn't know, Kaggle also has a short course with some reading and practical exercises on AI Ethics: https://www.kaggle.com/learn/intro-to-ai-ethics
Explore practical tools to guide the moral design of AI systems.
(This is just a subset of the broader topic of ethics in data science)
Thank you Myles. I should have mentioned that I'm looking to interview a data science professional about ethical dilemmas they've faced in their work as part of a homework assignment. I know this is a bit of shameless promotion on my part, but I've been asking around and I'm really unsure of where else to find interview candidates 😓
Hey Kagglers, I'm currently doing the Guided Tutorial for the Petal to the Metal competition. I understand what the error image is saying but I don't understand the long string of letters and numbers that comes after the "object at".
The image without the error message is where I'm assuming the error is pointing me to, but here's the problem: The error wants me to use the DefaultDistributionStrategy but I need to use the TPUStrategyV2 in order for this guided tutorial to actually be meaningful. I have no idea how to make the distribution strategies the same but all I know is that I need to keep using the TPUStrategyV2. Help would be appreciated 🙏
Hi everyone! I want to make a facial emotion expression detection model with deep learning or machine learning. I have made a model which gives 60 accuracy rate with machine learning. The model I have tried was XGBoost. If you have a better model with deep learning can you suggest any? And the main question I needed answers is that how can I make Feature Extractions from faces. Like when you are angry your eyebrows gets closer or your mouth relocates a bit upper.
Has anyone managed to run Co-DETR (https://github.com/Sense-X/Co-DETR) or DiffusionVID (https://github.com/sdroh1027/DiffusionVID) on colab? They are object detection models. I haven't been able to run them and can't find any way to do so. I wanted to see if anyone in the world has managed it lol
Need help answering this question please: https://www.quora.com/unanswered/I-need-help-evaluating-my-options-of-a-masters-degree-in-data-science-artificial-intelligence
Is there a way for 2 people to edit a Kaggle notebook live? Something similar to Google Colab
I invited someone to edit, but he doesn't see my changes unless i hit save version, which isn't 100% live
You can use a Jupyter notebook in PyCharm and invite the other person, you will see everything live and both can execute code
I finally used LiveShare in Vscode. But what you told me still works for me, ty
I've learned we use pre-trained models when making convolution networks in Tensorflow, is this the same in PyTorch
And how long does it take to train a CNN in Torch
Hello. I am working on a problem where i need to change the labels(of training data) of my .png from 0 to 255 orignally , to 0 and 1. I am trying to use a ML model that expects labels in 0, 1 form. I have tried many ways such as threshold , using numpy but I couldnt figure it out. is there specific way to do this?
I am not aware of any specific function but you can divide all the labels by 255 and get the new values which will be in the range 0 to 1
Thanks for the reply. I have tried this before and it makes my image completly black , making my labels ( which are in white) not visible. I have noticed that if i change the labels to numbers like 50,100 etc my image labels which are white gets a bit less visible but i can still see them. but when i reach 10 and below it becomes completly dark
Im working on retinal vessels segmentation problem and this is one of my eye images with its labels
Hey i have an issue here, I tried to use a dictionnary but it says that there remains a key error where I actually definied " 6" in my dictionnary, anyone to help ?
how machine learning models handling nan values in target prediction in the case of descision trees? i dont understand, imagine the root of a tree in a forest doesnt have value
Hey, I just installed Cuda and I think , cuda makes ur Neural Network use GPU's power while training , my gpu is being utilized (20-30%), but it also shows that my cpu is being used fully 100% , is it like that or have it done something wrong while installing Cuda ,
I am running on VS code btw
Hey Kagglers, I was looking at the list of competitions on Kaggle and I was wondering how to bridge the gap between only taking Kaggle's micro courses and actually entering real competitions with cash prizes and knowing how to solve the problem of competitions.
A good progression might look like:
- Courses
- Titanic Competition + Other Beginner Competitions
- Playground series competition
- Full prize competition
Each competition will require learning new skills and trying new things. Many people jump right to prize competitions and learn that way, there is no one answer, it depends on your skills, learning style, and comfort levels.
Guys, I'm trying fit my model but, my CPU is always 100% . How can I fix it?
You can try changing your accelerator to a T100 or T4 GPU, depending on your work.
Go to settings --> accelerator --> choose accelerator
I think it'll work
Tks
Is anyone willing to take a few newbies under their wing and teach us and work with us in a group of 4?
can anyone guide me on how to find questions or prompts that can help with analysis in kaggle datasets?
hey! how do I access the output files after a commit?
navigate to explore your csv file wheres is saved
i use the everything app
Q: can i use more submissions?
@thick terrace at kaggle? okay, thanks
i was rtalking about your pc but if you run notebook on kaggle sure the directory have them
Right, I found it. It turns out my previous attempt failed before it saved the model results.
The current one worked.
If I commit 2 (or more) notebooks at the same time, do they share the same cpu?
With regards to Kaggle competitions, I am a little confused about the given test set. Would it be bad to train the models on the training set, then test each model on the test set, submit it, and choose the submission with the lowest test error? I understand we don't want to overfit the test data, but the given test data isn't the "true" test data set anyway - the true one is hidden. So are we free to use it?
hey i wanted this data set DFL - Bundesliga Data Shootout,but it says that it was for the competition only is there someway to access it ? i want to make a computer vision project
Class labels are not available for test data. One can try pseudo-labeling but that increases the likelihood of overfitting.
But doesn't kaggle score the submission (as the scores are shown on the leaderboard). So aren't the class labels used for the scoring?
Hi everyone. I just completed the Titanic competition and and ready to make the submission. However, the help tutorial doesn't seem to match the current UI? Is there a way to actually submit directly from my notebook anymore?
Hi everyone, I have a dump question,
in the isic-2024 challenge requires internet to be off, right,?, then if i want to install some libraries, like 'pip install <library>', that won't be possible, what could be the alternative?!!
What is the accuracy of your model?
Hey there! I'm looking for EDA books, so far I found one decent, albeit old one. Could you recommend newer books in the topic?
Exploratory data analysis by Tukey, John W. (John Wilder), 1977
there are some medium articles, but I'm looking for indepth why, how, what kind of books
Guys how do you use hugging face's models in a local environment
Class labels are available to Kaggle, but not to us. We only see a score on a fraction of test data.
hey guys i have a question about a dnn implementation
I have the following backward function:
`
def backward(layer, dA_prev):
Calculate dZ by calling the activation_backwards() method from your Layer class
and pass to it dA_prev
dZ = layer.activation_backwards(dA_prev)
m = dA_prev.shape[1]
layer.dW = 1/m * np.dot(dZ, layer.input.T)
layer.db = 1/m * np.sum(dZ, axis=1, keepdims=True)
layer.db = np.squeeze(layer.db)
# Compute gradient for the previous layer
dA_prev = np.dot(layer.weights.T, dZ)
return dA_prev
Calculate dW and db using the input to this layer (ie the activation of the previous layer).
`
but I cant seem to pass the doctests
im wondering if anyone knows whats wrong. if u need extra information in order to get a gauge of the issue, pls ask.
Thanks!
Hi, everybody.
I have a question
I need to extract the abstract of papers not using GPT4, I have to rely on local resource.
from py_pdf_parser.loaders import load_file
from py_pdf_parser.components import ElementOrdering
document = load_file("JPM-2022-Harvey-25-46.pdf")
file_path = 'JPM-2022-Harvey-25-46.pdf'
document = load_file(
file_path, element_ordering=ElementOrdering.RIGHT_TO_LEFT_TOP_TO_BOTTOM
)
So I parsed the pdf using py_pdf_parser, and I'm going to merge the pieces until obtain the compelete abstract.
Now I try to use embedding models for this. But that doesn't work well.
If somebody has solution to about this, please help me.
Thanks!
In the case that I have to use the LLM models, the size should be under 2GB.
Hi all,
Do anyone have information about if DEFCON will launch a CTF competition at Kaggle this year?
Hi, everybody.
In the implementation of RAG, could you tell me the challenges and solutions to that?
And if you provide the references, I'm very thankful for that.
And I read papers where it uses BERT model to retreive the necessary data.
I want to know that it is useful for RAG, now. I think ChatGPT4 is perfect for this task.
So I want to know about the usage of LLM models and Ml models in retrieval process.
Is there any benefit to multiple submissions to a competition? Isn't there a danger of overfitting if you are going to choose your best performing submission on the leader board? Or do people do it as somehow an approximation of generalization error?
hi guys, does anyone have any experience with roboflow models? I'm doing an ML project for the first time and I was instructed to make my dataset and train my model there, but I'm lost on how to proceed further. if anyone has any experience, please lmk so I can consult you. thank you in advance
Hi Everyone, what would be the best data science learning path to get a job as Junior Data Scientist Role ?
There are always benefits to multiple submissions if one knows how to use the information properly. There is no solution that fits all scenarios, but in many cases it is possible to figure out how well public leaderboard (LB) scores correlated with hidden scores. If they do, then one can trust the LB. If not, is is necessary to develop a rigorous cross-validation (CV) scheme and stick to that. Come to think of it, it never hurts to have a good CV scheme, but sometimes we can trust the outside information as well.
If there was a simple answer to that question we would have millions of JDS people out there. There is no singe path, nor the best path, to anything in life. For some people learning DS will come as part of their regular work. Others will get a CS degree or take online courses. Yet others will jump into Kaggle competitions and learn DS by copying what other people are doing.
hi, i've been working on playground series churn dataset classification
i have done basic data cleaning and OrdinalEncoding
using XGBclassifier with 150 estimators gives me an accuracy of around 75
how can i increase the accuracy of the model
i have also tried using the Pytorch
it shows an train accuracy of 78%
but when i see the actual output its either 0 or 1 for all the predicted values
how do i fix this
Anyone familiar with NoBackendError in librosa ? The person who used code seemed to be fine but when I used exactly the same code I got this
Not enough detail to know for certain, but most likely you need to use .predict_proba instead of .predict
Radhe Radhe buddies, can any one suggest me some projects or sources or anything to upskill my self in data science, analytics and machine learning further and some content to add in my resume. tHANK YOU
I have a quesiton regarding EDA. Should we split the data into training and test set and only do EDA on training set? I've seen some articles say that this can prevent over fitting and data leakage.
Just a thought. Does anybody feel that AI (i.e. ChatGPT, etc.) is completly overvalue... pure smoke?
It feels there is a FOMO but when you try to do some staff is not so great
Some functionality is still way behind humans, but some things AI can do faster and better. So no pure smoke, but definitely some overhyped a bit. It is still early days.
This is a good article that might answer your question https://nicholas.carlini.com/writing/2024/how-i-use-ai.html#intro
I don't think that AI models (by which I mean: large language models) are over-hyped. In this post I will list 50 ways I've used them.
Hi !
I was wondering if you could win medals even if the competition has already ended and you beat the bronze score ?
Most competitions with a close date won't reconsider if someone gets a higher score later (imagine how it would feel to have your bronze metal taken away months later!) but we have some competitions that are evergreen. Of course, we're always adding new competitions, and you get the satisfaction of seeing your solution as one of the best!
How can I get badges from Kaggle?
Can I ask here some feedback about PC hardware specifications?
hi , a kaggle grandmaster nischay dhankar alumini of my college gave a session on kaggle. i dont have any background in coding . I am very fascinated by medical imaging . what could be the path?
Hi, I'm new to machine learning in general and I would like to ask where do I start? Like which specific math should I study first?
I want to be able to understand what is actually happening behind the scenes every time I train a model. Maybe with this I can make it perform better
Is anyone from here got 100% accuracy on Titanic unseen data ?
I always wonder on kaggle leaderboards how people get accuracy 100% where I am struggling with 86%
Have some valuable course or tutorial for the new start?
I am trying Coursera
Hi, everybody. I have a question.
I want to make a method to architecture the neural network for given real problem.
Is this possible?
So, I mean can we make the certain arhictecture of network based on neuro science?
Please help me overview of this and methods.
Where I can find the proper references?
Is Kaggle course available in Coursera?
Oh mb i thought you were talking about Data Science
ok.Thanks
Hi Everybody,
If you are interested in conducting academic research in the fields of generative AI, NLP, and XAI, Please do not hesitate to contact me. 🙏
Hi, I've been trying for a couple of days to submit a notebook to a competition and I cannot do it because it says my notebook is using a non-versioned dataset. I have tried multiple times to pin the dataset to a version, but after I open that window it simply doesn't show anything, it just says "Loading…", although I've left it some time to load. My teammate has the same problem. I have also tried to download the dataset through the Kaggle API but it fails.
Any suggestions that I could try?
Hey Malina, I passed this to an engineer on our team to look at.
Hi Malina, sorry for the confusing UX. We need to do a better job of surfacing the reason for the failure. In this case, the problem is that the dataset you're trying to pin a specific version for (https://www.kaggle.com/datasets/gordonyip/binned-dataset-v3) is an "Unversioned" dataset, meaning that the creator of that dataset has determined that they only want the most recent version to be accessible to others. This may also be why the API is failing (though I'm not sure which command you're running/is failing). So the options at this point would be to:
- Create a discussion on the dataset requesting that the creator update the settings for the dataset to allow access to all versions.
- If the current version of the dataset is what you want to pin, download the source and re-upload as a fixed dataset under your control. You could do this via the UI or if you prefer the API, I can help you figure out why the API isn't working--I just need to know what command you're running.
Guys i am a beginner in kaggle and this might be an stupid question but I pressed file, went to upload input and uploaded a dataset and named it taxis-dataset1. Then when i enter it into the code it says file not found. Why is this?
add the file extension, .csv or .xlsx
I recommend using this Copy file path feature in the sidebar. Click that, and then paste the value into the read_csv call (as a string). It should be something like /kaggle/input/taxis-dataset1/path/to/your_file.csv.
thanks guys
Hello Everyone , I want to Analyse Log file with ML
I have some timestamped log files from controllers, drivers, etc of a device. I have seen some error codes in the log files whose causes are not known. I would like to analyze these log files and identify patterns, such as whether errors occur sequentially or if one error depends on another. Errors could also be triggered by other errors, potentially even from one day ago or just a sec ago . What could be the best and simplest approach to this? or if there are solution of similar problems plz let me know
Hello
This is a very basic question which in my defence will probably be easy to answer, hehe.
I'm a soon to be second year AI and Datasci student engaged in the RSNA 2024 Lumbar Spine Degenerative Classification purely for the learning curves.
https://www.kaggle.com/competitions/rsna-2024-lumbar-spine-degenerative-classification
A peer of mine, perhaps correctly, says that we have to split the images into training, test and validation classifications. He wants to do this using code that randomly selects images and puts them into any one of the 3 categories.
However the competition already presents testing and training datasets with, I'm sure I remember correctly but couldn't find the documentation that details it, a final unseen set of images that it performs the classification on so as to determine the effectiveness of the model.
Also nowhere in the Efficientnet sample can I see anything that does that classification.
I think I am right here in that in terms of testing and validation the images are already classified and it's only through a dictionary that some of the images need the conditions and plains added to them.
Thanks for any and all help, any clarification will help a great deal.
Classify lumbar spine degenerative conditions
Does anyone work in or have experience with fraud detection? I'm super interested in the topic and have a lot of questions
Guys where is the auto complete settings in kaggle notebook? I need to complete step by step code but if I press tab all codes written at the same time
Yes, I've had that happen to me. how can I help u?
Hi - how to create a Team for challenges and add team members? It seems the Teams link for challenges (titanic and house price) have expired: https://www.kaggle.com/c/titanic/team https://www.kaggle.com/c/house-prices-advanced-regression-techniques/team
Start here! Predict survival on the Titanic and get familiar with ML basics
Predict sales prices and practice feature engineering, RFs, and gradient boosting
Hello everyone, I would like suggestions on how to solve this type of dataset:https://www.kaggle.com/datasets/ashisparida/amazon-ml-challenge-2023
This is previous year's amazon ml challenge dataset. I am interested to know how product_length is predicted using text data
this is the cleaned and merged version of the columns of title bullet_points and description
I am unable to understand what preprocessing or feature engineering to do
Can anyone suggest something to start with?
I want to ask if I need to reinstall the dependencies every time I restart?
I'm trying to do the Intermediate ML course, and I'm dealing with this error in the missing values exercise. Any ideas?
Restart what? I found that for course exercises at least, it seems to be you have to restart setup, yes.
hi, this might be a stupid question but do you need to become a data analyst first before becoming a data scientist?
https://www.data-mania.com/blog/data-science-vs-data-analytics-which-to-learn-first/#:~:text=By starting with data analytics,in studying data science first.
https://www.reddit.com/r/datascience/comments/11abupo/were_you_a_data_analyst_before_becoming_a_data/
https://www.youtube.com/watch?v=kr59DGtWDTs
https://www.reddit.com/r/datascience/comments/11abupo/were_you_a_data_analyst_before_becoming_a_data/
You could read into it yourself. There's a couple of things online about that.
Is data analyst a pre-requisite to become a data scientist? In this video, we are discussing common skillset between the two roles and what makes them different. What do you think? Share in comments if you think data scientist should first become data scientist.
My Self-Taught Data Science Journey: https://youtu.be/34r9OwjysDM
How I Would Lear...
You are working on a binary classification ML algorithm that detects whether an image of a classified scanned document contains a company’s logo. In the dataset, 96% of examples don’t have the logo, so the dataset is very skewed. Which metric would give you the most confidence in your model?
Im torn between Recall and F1 Score, which is right in this scenario I lean towards recall more and my reason for that is that f1 gives recall and precision equal weights, and since my positive class is only 4% of the data and catching the positives is the priority recall makes more sense here. I would love any help thanks!
Well experienced data scientist, do you guys always write all your python code from head?
Or you use online tools or AI as we the newbies
F1 Score offers the harmonic mean score between Recall and Precision. Recall measures the true positives and is important when a model is susceptible to lack of positives, which seems to be your case. So, from that perspective, recall seems the sensible option. However, I suggest applying both metrics as they provide quantitative assessments of your model that describe the strengths and weaknesses. Thus, giving you more diagnosis options.
You're right. As long as catching the positives is your priority, you should definitely use recall.
How do teams typically collaborate when working towards a Kaggle competition? GitHub, shared Kaggle notebooks, or something else?
I am taking a machine learning course that will be about 70% theory for exams. Does Kaggle have anything that I can practice regarding theory problems, or is there any resource that would make sense?
Does anybody working on project like comprehensive tool like "Social Listening"? or have some experience on such technology?
Hello Everyone,
I've used the MLC-LLM chat apk which in the below link:
https://blog.mlc.ai/2024/06/07/universal-LLM-deployment-engine-with-ML-compilation
-
I've used the MLCChat App, it showing the below models alone which is shown in the screenshot, how can I access my own models, which is in my device?
-
Is there any possible to use my local models in this MLCChat App ?
-
How can I do this with flutter?
Please help me out in this doubt !!!
No module named 'kaggle_evaluation'
[2:40 PM]
This is my first Kaggle competition. How can I import files into the notebook to be able to use such import statements?
Hello everyone, I am new to Kaggle. How can I download csv files on kaggle ?
for any kaggle competition we need to have internet disable, now i need to pip install some dependencies or updated library, how do i permanently store them in notebook, so can access those libraries in internet access disabled mode?
Lucky for you we just launched a new feature that solves this problem very neatly: https://www.kaggle.com/discussions/product-feedback/532336
[Feature Launch] Introducing Package Manager.
Hey I'm working on a dataset of Forest fires, model has to predict the probability of fire in the forest after getting some inputs from users. I've done most of the part but I'm getting a high MAE. Let me know if anyone can help!
can anyone help me ? whenever i went for submit it show inference error and got reject
Hello everyone
Please I have a problem.
I fine-tuned Gemma 2 on the Kaggle notebook now I would like to save the model fine-tuned and share it on Kaggle Models in my account. I don't know how I can do that.
Please somebody can help me 🤲
Hi
Someone that can explain me how do I know my model found the global minimum when tuning the weights? Like I just learned about the gradient descent, and its looking for the global minimum. But there are also local minimums... How do I know the gradient descent found the global minimum over the local minimum.. There a way to visualise it? proof that my model using the best weights possible?
does kaggle changed its UI ? the notebook is now confusing with big words , can some pls help ?
lol try ctrl + schroll down?
Hi . I am absolutely beginner in kaggle (just started) . I have a core i3 6th laptop with 4gb ram in linux system would that be enough to continue with kaggle or do I need a graphics card and more rams to operate minimally ? I am on a budget so I need some suggestions
Hello everyone, I’m looking for suggestions on how to become job-ready for a Machine Learning Engineer position. I’ve completed my certification and worked on various projects, but my resume hasn’t been getting selected. If anyone could share a strong resume example, I would greatly appreciate it.
Although I have over four years of experience in Operations, I am eager to transition into Data Science. I have been searching for a job since December last year, and I would welcome any ideas on how to land a position and kickstart my career.
Additionally, I’m interested in projects I can build to enhance my portfolio and skills I should focus on to become job-ready as soon as possible. So far, my projects have been fairly generic, and I want to stand out.
Those stats are not really great for data science on your local machine. If you want to do serious experiments on your local machine, a GPU + more RAM + a decent CPU (depends on how much data preprocessing you need) is a must. I would say, your current laptop will not suffice for more than basic exploratory data analysis.
But luckily, Kaggle does provide everyone with 30h/week of GPU and TPU compute and there are more cloud providers with free/affordable GPU quota (e.g. GoogleColab). Thus, I would not rush with buying a new setup, but instead try what you can achieve with the provided free cloud resources.
hi all, im pretty new to using kaggle, but i have a few notebooks working fine but my current notebook is giving me an issue when trying to !git clone:
Cloning into 'testing'...
Username for 'https://github.com':
and it just hangs here
ok solved this - misstyped the url
#help Im a beginner in ml . So how can i learn ml? I know i should start from preprocessing of data. But i don't have resources availabe for that . Can you guys share me the resource link?
Hello 👋 I'm training and autoencoder on signal data using LSTMs for anomaly detection.
For normalization I'm using sklearn.StandardScaler. For . fit(), should I only pass in the cleaned data without any deviating signals or the entire data?
Why you cleaned your data ?
I guess to train the model only on with the clean data, without any noise. Therefore, in my opinion normalize only the cleaned data (that you will use in training & testing)
Yes that's what I've been doing. But on deployment after scaling I'm getting large values which are ruining the model performance
emm I never used to work on models in deployement. Maybe I am not the right one to give the right answer. However, If I well understood what are you saying, the predictions of your model should follow the scale of your label. Therefore, the output/predictions should not be in the scale [-3, 3] (Scale of the standard scaler)
Regarding the performances, sorry man I don't have any Ideas, I don't know how to help. In fact, it happened to me once getting low performances after training and testing compared with my test scores. I assumed that this was due the fact that my model does not generalize well (I didn't fix the issue until now)
I my hamble opininion pre-processing is not an ML skill. It is in fact required to prepare data for ML models. Therefore, you don't need ressources/courses on pre-processing. You need just to check the state of your raw data, spot the "noise", the pre-processing you need to do. For instance, maybe you need to convert the "15 years old" to 15 for the age feature.
This kind of pre-processing require only basic programming skills.
hello can anyone please help me with AI related end degree bachelor IT project idea or even a walk-through?
Hi, I wonder how much time should I spend on tunning the model, like finding the best parameters? I have spent some time doing feature engineering, but not sure should I stick with my current model or keep trying new models / ensemble methods. Many thanks
who is good with SQL? I have a little problem, i have a table called companies but it is not connecting to my database
Hey,I want to learn machine learning using python. I have completed the beginner stage.Is there anyone who can guide me?
Hey everyone! I have a competition coming up, and the first round is focused on data structures. I'm honestly starting from scratch and not sure where to begin. I'd really appreciate any suggestions for lectures or resources to help me get started. By the way, I prefer Python, but if you think C++ or another language would be better for this, please let me know!
Hi, Who can help with the task of learning how to determine the value of a diamond?
Hello Kaggle Comm, i have a question, when i save a version of my notebook that have outputs files, and then i cancel it, to save some hours
if i go back to this version & pressing edit, how i start this notebook with those output files 🤔
Info i'm seeing :
**Clicking Output Message says **
Notebook canceled
View the status under the logs tab
On Logs tab Environment
i can click Latest Container Image and there is the output size 😮
how i start this notebook with that data ?
Thanks !
Hi, I have a question, who has worked in ‘Diamonds Price Prediction’, I need to get MAE < 200, but when I derived the correlation matrix, I saw that Price has many args with minuses, please help
Here's a fun one: can one learn to use and build ML models without having linear algebra and calculus under their belt?
Hello, @tardy lodge the requirement for those seeking for kaggle mentorship is high sort of
hi @real patio i'm not someone who works on our mentorship program, but @wind silo is the right person to talk to
Thanks Ma @tardy lodge
Does anyone know of a data source for the US CIA World Factbook? https://www.cia.gov/the-world-factbook/
I'm in the same situation. The answer is yes. However, be aware that the learning curve is even steeper without it because you will have to learn it at some point to understand how you got your results.
Thanks @tardy lodge.
Hi @real patio Happy to answer any questions you may have. I've sent you a direct message as well if you would like to chat more.
I'm getting submission file not found error even tho everything seems alright could somebody pls help
kindly help me with this somebody
Hello! I’m an undergraduate student researching image reconstruction using a diffusion model. I know this is not Kaggle-related, but I'm encountering an issue with my diffusion model and wanted to seek some advice.
In my research, I’m trying to reconstruct one type of brain image from another using a diffusion model. Before using the diffusion model (when I directly used a U-Net for reconstruction), the training worked well. However, when I switched to predicting noise instead of the image itself using the diffusion model and then performing the denoising process, the training doesn't seem to work. The training loss decreases, but metrics like PSNR and SSIM (which evaluate how well the image is reconstructed) do not improve at all or even degrade.
Is this about my dataset being too small (I have 800 images in the training set)? I set the noising steps to 1000.
Has anyone experienced something similar when working with diffusion models? Any advice would be greatly appreciated. Please help..
Below is the code I use for training.
t = diffusion.sample_timesteps(n).to(config.device)
x_t, noise = diffusion.noise_images(latent_target_image,t)
predicted_noise = Unet(x_t, conditioning_3d_image, t, diag)
loss = L2(predicted_noise, noise)
Hello every one can anyone tell me how exacly i can start and move forword for learing ai and ml.
what interests you about AI and ML? what educational background do you have? There's lots of different ways to go about it and there are lots of guides already written out there. If you're here it means you're on your way so check out Kaggle Learn and Kaggle competitions
I am a full stack developer and I am exploring al and ml, currently learing python's libraries such as numpy, pandas,matlab,scikit etc. I have basic theoretical working of ml models such as supervised , unsupervised and reinforcement learning.
Kaggle is something where we can compete with build or improving models with provider datasets.
Correct me If am wrong because I am new to kaggle
But the main question is what are the correct prerequisite I should learn and understand so that i can participate
tech like chatgpt, githubcopilot and text to video generation really amaze me
Go look at Kaggle Learn "Intermediate Machine Learning" you will understand how to submit to competitions in two quick lessons
You will be able to understand the code very quickly and the general format for a basic ML model. Then iterate
If you're curious about chatGPT and other LLMs, then look up videos/tutorials on how to build it from scratch
It's all software dev: sometimes you need to look something up to understand it, sometimes you need to actually go study and read up a lot more to understand it, but in the end it's doing projects and finding out what you need to learn to progress with your projects
What advice would you give someone new to software dev on how to begin? 😄
for software dev I would aske to build some learning projects like blog app, chat app, video streaming app and in software dev most important is reliability , scalability, availability and security. And for senior dev go through system design.
And try to avoid tutorial hall and build app no matter small or it has any purpose.
All the ai and ml tech are used in the form of software so if software dev is must have skill to build ai ml apps
Same goes for ai and ml: build some learning projects for ai and ml. Avoid tutorial hell. Build projects no matter how small. You got this 😄
yes sir thank you and can you suggest some very basic ai ml projects I can build and learn on the go.
Look up learning projects online to get a list, pick one that sounds the most interesting. Or just do kaggle competitions, as they are also considered learning projects. Projects in "Getting Started" and "Playground" categories are good place to start
Hello guys,
Do you know of any interesting datasets to refresh my pandas skills? I am mostly a beginner that have done two ML projects.
Hello, I am working with log data analysis. Most of the work in log parsing that I have found till now use more or less similar methods with very heavy ML/DL algorithms or heuristics.
I wanted to know more about the log analysis using premitive methods like PCFG parser or some unsupervised parser which takes the whole data into account.
My ultimate goal is to generate good quality templates. Please point me to any resources if you know.
Before pivoting, one should always consider - why you want to ?
Do you know your WHYs?
I am not pivoting I am just learning new tech
say it as exploring
cool then, you can start with some problem statement and then traverse back from end goal to data
Example -
step1 - you want to identidy if a given pic is cat or dog
step2 - you get to know that it is done by some model ( which is an artifact )
step3 - how was that model built
step4 - what does training mean and what is the role of data here
step5 - get the data
after this go forward again with the help of some already existing guided work- like kaggle notebooks
learning from scratch helps - but since you are exploring - this approach is more practical and help you even more in hands-on
thank you for guiding steps
for building models
i should understand ml learning models right like k means, decision tree, etc
not necessarily - there are packages like scikitlearn, xgboost, LGBM for classical models
youc just need to see the documenation and start implementimg - having an algorithmic knowledge is definitely plus - but knowing how to use them is even bigger plus to start with
so you can start using them from documentations/guided code books etc
Hello everyone 🤗
Instead of showing "Copy path" it is showing this in my kaggle notebook, can someone help ?
Hello !
Hi, what is the complexity of the project you are expecting?
Hi These days almost all profile for ML/DL/AI needs some prior experience. But this can be bypassed by displaying the real AI project. You can make a project portfolio which helped you solve a real world problem. You should be able to demonstarte the problems you faced and how you overcame them.
you can ping me if you want some personlaized solution.
Thanks
WHHHHHHHHHHhY
why private competitions, can't be set public later?
hello i was working on the housing regression competition and i am fairly new. I was wondering for a column such as Street, then it has a range of string values such as gravel, pavement... how should i encode this
someone has an automatic1111 notebook working for kaggle?
#❓┊ask-a-question Hello, I wanted to know how I can increase my storage capacity? Thank you
Hello @wind silo i have DM you the questions still waiting your response thanks
bonjour je suis francais , je me forme au dev , je suis débutant sur l'utilisation de AI par API, je veux aller plus loin que le simple chat de chat GPT ou autre chat assister par AI , je cherche un binome ou faire partie d 'une équipe et apprendre et passer des nuits blanche a me casser la tête ( je parle pas englais donc je me démerderais par écrit merci)
I wrote a classification program for Logistic Regression, but why does the cost function (j(w,x,y,b)) become larger after gradient descent
def gradient_w(w, x, y, b):
m = len(x)
h_wb = h(w, x, b)
grad_w = (1/m) * np.dot(x.T, (h_wb - y))
return grad_w
def gradient_b(w, x, y, b):
m = len(x)
h_wb = h(w, x, b)
grad_b = (1/m) * np.sum(h_wb - y)
return grad_b
Mise à jour des poids et biais
while T:
grad_w = gradient_w(w, x, y, b)
grad_b = gradient_b(w, x, y, b)
w = w - a1 * grad_w
b = b - a1 * grad_b
if j(w, x, y, b) < 1e-15:
break
Thank you, brother. I'll understand it.
Hello everyone,
Can anyone help me figure out if it is possible to link a local codebase and use it in any hosted competition?
I am working on this competition, and it appears the file upload button is disabled. is this normal ? https://www.kaggle.com/competitions/child-mind-institute-problematic-internet-use I downloaded teh data and have been working in a local notebook
do i have to use the kaggle notebook ?
Hi all,
I recently saw a blog post about fine tuning paligemma for receipt scaner
I have few questions :
- how i can create my dataset, i can use a tool to define the box and the text ?
- i ran the model with gradio on my local but it's so slow. Why ? I need to convert to model to a spécific format to use in local ?
Where i can find tutorial or which resources can i read to learn it easily ?
Many thanks
Hello everyone, I am an engineering student (mechanical) and I am interested in Math, AI and coding. I have been learning ML algorithms and EDA little by little for some time and I want to learn more. I don't have a particular goal in mind like I don't know if I want to work as an ML engineer for a company or maybe do research work, I am just learning and exploring this field. What would be your advice for someone like me in order to learn and level up in this field?
hey, I don't know if this is the right place to ask this question but anyways, I have learned a little of ML, and I want a ML job, but most entry level jobs in popular companies require at least 2 years of experience, but the problem is obviously, I want the job in order to gain experience, I am willing to take a part time remote job for as little as 10k dollars a year just so I really get into the field, or I'll even work for free and I don't if "work" is the right term for it, but I want to learn from someone who knows more than me .
hi, im new to ml and ive been trying to make a prediction model to identify handwritten numbers. ive just been kind of stuck since my models accuracy is always 10-13% and all the tips ive seen online have been exhausted for days. its kind of my first project in this stuff, so sorry if i seem clueless. Ive had great results using pretrained models which is why i wanted to go a little further and make one but im cooked.
im just noticing the beginner digit section, and it seems easy to just look at it for the answer. but i dont really get what the others did, mainly looks like they added more filters and used other tactics
if anyone can answer, what models are generally recommened and why, ive only touched adam.
and the filters and why u use them, i used 32,64,and 128. with some dropout. but i noticed more filters, as well as followups for them. also heavy dropouts. idk it seems crazy, maybe im a bad teacher
final question: why dont i see best model used alot, is it not good longrun? sorry
someone can help me?
Hello everyone,
I'm an ML engineer apprentice, currently working as an intern for a startup on a GenAI project. I've done some academic projects before and now I want to gain real world experience by working on real projects. Just like @wooden folio , I'm looking for a part time remote job to work even for free. PS: I'm motivated and willing to get my hands on stunning projects.
If you are interested or need additional information, feel free to DM me.
Kinds regards,
Stephane
I am curious about how you guys manage computing requirements?
Whats your set up?
Hi guys 🙂 Hi, I am working on a prototype of a motion sensor with an api to extract information already labelled via wifi on real time, like a data collector but smaller, so it doesnt biased the movement. Do you guys have any suggestions based on your experience any kind of additional feautures?
Hi all. I have a kaggle.com question: The first version of this notebook, https://www.kaggle.com/code/michaelstelly/cia-world-factbook-analysis?scriptVersionId=201169542
has an incorrect import. It's been stuck in processing for 21 hours. Selecting "stop the session" does nothing. How do I kill the session?
hay when is the sgd book going to git fixed thars a problum with seabor 😩
Hi everyone!
I'm super new to all of this stuff, and was attempting to make a simple model that can detect the position of a basketball in an image! I was attempting to use https://www.kaggle.com/datasets/trainingdatapro/basketball-tracking-dataset/data this set of data to train, and while attempting to set everything up(just plot the bounding box for the images I already have) and was noticing a small issue when attempting to rescale to points to image given in the files.
If you look at the xml file for the data, it says the original size of the image was 1280 by 720, yet on some of the images the values for the boundingbox seemed incorrect, as they were extremely small.
(these are x1, y1, x2, y2)
For example, the x values for the second image were 966, 408, 987, 429. These just seemed off for an image that looked like this(can't send here can send in dms)
Also for like the first image for example, the basketball is just outside of the image, how does that work?
please help either here or in dms!!
🙏
HI Everyone!, hope everyone is good, I'm new to Kaggle
Just started with the Titanic tutorial and am a bit stuck
For "Part 3: Your first Submission"
i managed to get the correct output for % of woman who survived
i also managed get the correct output for % of men who survived
But when i add the last part in order to get an output of "Submission was successfully saved!"
i get this error message:
NameError Traceback (most recent call last)
Cell In[16], line 7
5 features = ["Pclass", "Sex", "SibSp", "Parch"]
6 X = pd.get_dummies(train_data[features])
----> 7 X_test = pd.get_dummies(test_data[features])
9 model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=1)
10 model.fit(X, y)
NameError: name 'test_data' is not defined
Any advice?
Thanks!
To resolve This Error make sure You rerun The test_data cell where You apploaded a file and Also Make Sure Where You Use This File , If There is any syntex error, For your better Help I suggest Just Send me The Code
thanks for the response, after running it again it submitted!
Values for the bounding box should be extremely small, as the bounding box is for a tiny basketball on a large image. The gif on the home page for that dataset shows the full size image (1280 x 720) and a tiny blue bounding box around the basketball. So the x1, y1, x2, y2 values you provided should be right, since the basketball is only 20 x 20 pixels.
Oh I see that's interesting,
I'm a little confused about something then. In the files most of the images are just not like as wide and high as the big image, so are those cutouts of the big image? and if they are, how do I know like what "big image" the cutout refers to
I'm a bit confused all around 
You might be able to answer your own question by studying the "Dataset Structure" and "Data Format" sections in the description. It also looks like a pretty small dataset as it's only a sample of the actual dataset which apparently is for sale on another website...
Hello. I'm starting to use kaggle for the first time.
Even though my PC is connected to the Internet, Kaggle notebook cannot connect to the Internet. Below is what I tried. The result is that the internet is not available. How can I connect to the internet?
import requests
try:
response = requests.get('https://www.google.com')
if response.status_code == 200:
print("OK")
else:
print(f"OK_But: {response.status_code}")
except requests.ConnectionError:
print("No")
Hi. I think the last time I submitted anything on Kaggle was probably the first ever Jane Street competition, about 6 or 8 years ago. This is a pretty naïve question, but for the submissions, does our submissions notebook have to also train the models we use, or is it possible to train the models separately, load them into the notebook and just do the inference/predictions?
Actually I'm specifically asking for the Jane Street competition so this might not apply to other competitions?
Why do you want notebook to connect the Internet. Maybe this is unsafe.
Since I want to create content that supports Japanese, I would like to use "rinna/llama-3-youko-8b". First, you need to install transformer, and you need an internet connection. How to do it without using the Internet?
I understand it, I try to connect the internet on kaggle notebook, It's ok. Can you take a screenshot of the whole page to see the exception stack trace
https://www.kaggle.com/datasets/jakubkhalponiak/phones-2024
https://www.kaggle.com/code/jakubkhalponiak/a-study-of-smartphones-available-in-2024
I have webscraped phones from gsmarena.com and published a notebook and the dataset i would apreeciete any feedback on this as its my first time posting anything on kaggle
change the website address, for example youtube, x.com.
Hi, I am currently going through the book Deep Work by Cal Newport.
Now I am interested – how does your learning/working process look like? What are your habits? 🙂
For example, you participate in Kaggle competitions or watch/read ML tutorials.
I can't connect.
check this out, I have reproduced your problem
There is no Internet mark on my screen. Where do I set it to display the Internet mark?
I can't even save.
https://www.kaggle.com/code
create a new notebook
Kaggle Notebooks are a computational environment that enables reproducible and collaborative analysis.
I don't know. Perhaps the website displays differently in different regions. You can google for help。I think find the correct setting, you can connect the internet
Thank you!!!, I was able to connect!
When I tried to run the program I wanted to run, it stopped immediately. I'm trying Kaggle because the Collaboratory doesn't work due to lack of GPU.
I am trying again.
The GPU is not working, only the CPU is working.
And finally this happens. How should I solve it?
Hi Guys,
I’m a CS student currently doing my final work on the course. The problem i’m addressing is to predict the if a stock is gonna go up or down so that i could either Sell or Buy.
I was wondering if anyone have ever get in touch with a dataset that fills this description?
Thanks in Advance!
Why in most cases people use MSE and not MAE, isn't MSE only better than MAE then we want less big errors even at cost of having more errors on average? Is it usualto care about big errors more than on errors in general? For me it looks like MSE is a bit niche and MAE should be used in most cazes instead
hello, i am an absolute beginner in machine learning. i just learned about linear regression and error metrics and wanted to get my hands on a small project using the techniques i learnt. So i started with the famous boston-housing-prices dataset on kaggle and would appreciate if you could take a look at my code: https://www.kaggle.com/code/khalidhelmy55/boston-housing-prices and guide me on what is missing or what could be better done..
according to the metrics i calculated the model is not performing good.
Question 1:
I’m running my error metrics locally (e.g., RMSE, MAE) on my validation set while participating in a Kaggle competition. Since the test set lacks target values, can anyone help clarify which error metrics I can use to assess my model locally, and if possible, could you list some commonly used ones?
Question 2:
Also, I’m noticing that the error metric scores I compute locally are different from the Kaggle leaderboard score. How are these related? Are the scores directly or inversely proportional, or is there another relationship I should be aware of? Any insights would be greatly appreciated!
You assess your model using validation data, not testing data. Your local scores are using the testing data, while I believe the scores on the leaderboard are using new data that is not in the testing dataset.
GPT answer
Hi I'm looking to improve in the field of Marketing, is there any type of dataset or competition to make a recommendation system? thanks
i am an 3rd year CS undergrad. i have Intermediate knowledge about all data structures. i study hard and take effort. look as I need DSA to achieve any high package placement. i am practicing my DSA Skills in Leetcode. but it is going difficult for me to figure out the logic by myself (specially medium & hard level problems). and even if i saw the solution i understand it but it does not fit in my mind perfectly. so why this is happening ? and how to tackle this issue ? will i ever get better ? you can also guide on what kind of approach/steps should i follow while solving a particular Question ?'
U re at wrong place its kaggle not a dsa server
Anyone using allennlp ? Is it not maintained anymore by allenai ?
hello. I am currently working on Pandas section on Kaggle and I got a question. Can I ask it here?
nvm thank you i figured it out
I'm trying to learn more about Bayesian optimization techniques for machine/deep learning. Any good YouTube series recommendations?
hey guys, I want to become a data analyst and perhaps a transition to data scientist later on, Would it do by just focusing in python and R? the programming for this field seems very different compared to other fields like web development, etc.
for starting in data analytics, focus on Python and SQL, but it depends on how deep you want to go.
I´m aiming to go pro on this stuff.
start small, learn the basics, take on challenging tasks that make you get our of your comfort zone, learn more, take on more projects, learn more, rinse and repeat until you're a pro 😄
Where can I find conferences or talks about data and all that world?
is there a guide about the math I have to learn that you guys recommend? I´ve been learning on my own, but it won´t hurt to learn from well known free resources