#❓┊ask-a-question

1 messages · Page 3 of 1

storm moon
#

while using ngrok on kaggle

#

so i need to upload my model file to in w-okada files model_dir folder

#

as u see theres 3-2-1-0

sly inlet
#

Or can you try downloading from gdrive , using wget or curlcommand

storm moon
#

but i dont know how to use

#

wget and curl commands

#

yea i cant upload on app

#

i got some error

#

4 hour wasted for nothing

#

trying to fix these things :D

#

nah bro kaggle is idk

#

well i think ill pay collab

sly inlet
#

More storage as well

storm moon
#

i wanna just use voice changer via ngrok

#

i need gpu for that

#

im using paperspace for training models

#

but paperspace not support ngrok

sly inlet
storm moon
#

if it it would be perfect

storm moon
storm moon
#

in order to use this model, I need to load this model into the w-okada voice changer files that I have installed on the notebook. you can load it directly through w-okada, but when using the voice changer through ngrok, it is difficult to do this, it gives a warning. so I need to load my model directly into the w-okada files in the notebook, so I have to move my model with the pth extension into that file

sly inlet
#

You can check huggingface service as well

storm moon
#

8-9 hours per day maybe

#

max

storm moon
#

when i try, same things happen i guess

#

cuz i dont know how to do on huggingface

#

like in the kaggle

#

whats this even i dont get it

#

on collab didnt i get kind of errors

#

but collab getting so much money so

sly inlet
sly inlet
sly inlet
storm moon
#

so

#

idk they have paid service

#

30 hours per week is 5000x better than collab

#

cuz collab gaves

#

1hours free per account ::D

sly inlet
#

But it can disconnect anytime,

storm moon
sly inlet
#

On my way home

sly inlet
sly inlet
#

@storm moon did you try this

storm moon
#

@sly inlet

storm moon
sly inlet
sly inlet
sly inlet
storm moon
#

input

sly inlet
#

move from input to output ?

storm moon
#

i dont know how to -*-

sly inlet
#
 
!mv /kaggle/working/source_folder/mode.pth /kaggle/working/destination_folder/
storm moon
#

lets try

sly inlet
#

change the model input path and and output folder path there

#

first path should be of models pth file and second. path should be of folder you want to move pth file to

storm moon
#

at output

sly inlet
#

like what file ?

#

folder

#

or you mean copy model weights from on folder to another

storm moon
#

folder

#

is it possible to create folder

#

in model_dir

sly inlet
#

yes

storm moon
#

folder

#

okey can u give me code line creating folder in model_dir

#

but

#

i closed already

sly inlet
#

mkdir destination_folder/new_folder_name

sly inlet
storm moon
#

kaggle

sly inlet
#

then you will lose model , won't you ?

#

or did you download model locally or saved it on kaggle

storm moon
#

i trained 1 month ago

sly inlet
#

nice

#

you can upload it on kaggle as well

storm moon
#

tbh i dont want cuz i dont want to public these my models

sly inlet
#

as private model , so only you can access it

storm moon
sly inlet
storm moon
#

as u see

#

anyway its not working w-okada on kaggle

#

idk why

#

laggy or something

#

i think ill continue on collab

sly inlet
sly inlet
storm moon
#

I'm done dealing with this :D but thanks for asking

#

it been 6-7 hours

#

still trying to do on kaggle

#

waste of time

sly inlet
#

just tell me the problem so i will try to see in my free time

#

don't need to waste your time

storm moon
#

just run this

#

and use w-okada program

#

normally

#

if u do this will be good

sly inlet
#

ok

storm moon
#

someones can do

#

but i cant so

#

ill keep to use colalb

#

collab*

sly inlet
sly inlet
#

when you first asked help

#

and it does worked for me as i send you the screenshot above

storm moon
#

its worked for me but i cant upload a mode

#

model and i cant choose mode

#

model

sly inlet
#

like how

#

that notebook don't have any code related to model

sly inlet
# storm moon model and i cant choose mode

you mean , you have to speciify model path here , which one of the below is used to specify model path ?

--content_vec_500 pretrain/checkpoint_best_legacy_500.pt \
  --content_vec_500_onnx pretrain/content_vec_500.onnx \
  --content_vec_500_onnx_on true \
  --hubert_base pretrain/hubert_base.pt \
  --hubert_base_jp pretrain/rinna_hubert_base_jp.pt \
  --hubert_soft pretrain/hubert/hubert-soft-0d54a1f4.pt \
  --nsf_hifigan pretrain/nsf_hifigan/model \
  --crepe_onnx_full pretrain/crepe_onnx_full.onnx \
  --crepe_onnx_tiny pretrain/crepe_onnx_tiny.onnx \
  --rmvpe pretrain/rmvpe.pt \
sly inlet
#

then what is the problem ? could you be more clear

storm moon
#

open w okadar

#

okada

#

up there ull see edit button

#

click there

#

also theres edit button too choose blank one edit button

#

choose ur pth file

#

upload if u can

sly inlet
#

ohhk , i never open that link , 😅

#

stupid of me

#

let me re run the notebook

sly inlet
#

you won't get this error on google colab then ?

storm moon
#

so

#

cuz these files in drive google i can access easily

pulsar merlin
#

Hi, I am trying to pass speech signals to my model by extracting features using Non negative matrix factorization. but unable to find the correct results. Can anyone guide me?

sly inlet
#

it should be done in both

sly inlet
#

i don't want notebook

#

i want to know where and how did you locate model in that directory?

#

pth model ,
!mv source_path dest_path should be able to move model to specific directoy

obsidian bone
#

can anyone tell me how to modify LLM layers (hugging face text generation models) and how to add custom heads to them?

misty roost
#

Hey guys. I want to rank up on Kaggle. In order to become "Kaggle Expert", do I have to become "expert" in every category (competitions, datasets, notebooks, discussions)? Or just becoming "expert" in one of those categories is enough?

sly inlet
#

you can create new layer and pass the input from base model to it ,

#

that's how LM head is added to base model , for generation purpose

obsidian bone
# sly inlet you can create new layer and pass the input from base model to it ,

so far this is what I've done

  1. I get the model using AutoModel.from_pretrained
  2. then I passed in tokenized text
  3. I get the outputs of the model using .last_hidden_state
    but the last hidden state has a shape of [batch size, Sequence length, Embedding size]
    which is not constant... it changes from one sentence to another
    I saw some people get the output using out_ids.last_hidden_state[:, 0, :] but that only takes the embedding of the first token...
#

I want to take outputs from the LLM model and feed it into a custom pytorch model

#

but I've having trouble with dealing the last hidden state... i don't know how to work with it....

#

so thought I might modify it or change it...

sly inlet
#

ok so which model are you using

sly inlet
obsidian bone
#

am I supposed to change input dimension of fully connected network every time i pass a new sentence?

#

that's not logical, it's like instantiating a new model every sentence passed

#

example:

first sentence passed -> last_hidden_state.shape = [1, 59, 3072]
which means first layer of fc_model= torch.nn.Linear(59*3072, 256)

second sentence passed -> last_hidden_state.shape = [1, 92, 3072]
which means first layer of fc_model= torch.nn.Linear(92*3072, 256)

you get what I'm saying?

sly inlet
#

that's what everyone do when training with batches

obsidian bone
#

ok for example padding=128 and it will be last_hidden_state.shape = [1, 128, 3072]
input to torch.nn.Linear(128*3072, 256)

#

is that how people create custom heads for llms?

#

I found this,

class BertForSequenceClassification(BertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.num_labels = config.num_labels
        self.config = config

          
        self.bert = BertModel(config)
        classifier_dropout = (
            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
        )
        self.dropout = nn.Dropout(classifier_dropout)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

    self.init_weights()

as far as I know bert hidden size is 768, so in self.classifier it will be nn.Linear(768, num_labels=2 for example)

#

the output will then be [Batch size, sequence, num_labels]

sly inlet
#

that's why i'm asking you for model

#

you want to train bert for classifier

obsidian bone
#

no....

#

this is just example

#

what I mean is

sly inlet
#

tell me what exactly are you trying to do

obsidian bone
#

if padding size is constant

sly inlet
#

lm head for llama

sly inlet
obsidian bone
#

for example say padding_size=128,
then the last_hidden_state.shape = [1, 128, 3072]
this is 3d tensor
torch.nn.Linear accepts [batch size, in_dim]

#

if i do 128*3072, that's a huge number

#

I just want to know how people connect the head efficiently

obsidian bone
#

so where did sequence length go...

sly inlet
obsidian bone
#

now the question is

#

[:, 0, :] only takes embbedding of first token

#

what about the rest of tokens

sly inlet
#

as CLS tokens contains the whole information of sentence

obsidian bone
#

their information won't be passed

obsidian bone
#

wdym contains the whole information of the sentence?

sly inlet
sly inlet
#

it is what it is , CLS tokens has all the info we need for classification

obsidian bone
#

so CLS token having the whole sentence's information is only applicable for bert? or is it applicable for all LLMs?

obsidian bone
#

I am using Gemma

#

does it have CLS token?

sly inlet
#

are you running it locally or on collab

obsidian bone
sly inlet
sly inlet
obsidian bone
#

wait lemme share my code

#

tokenized_sentence = tokenizer(sentence, return_tensors='pt', padding="max_length", max_length=128)

out_ids = model(**tokenized_problem)

class Decision_Model(torch.nn.Module):
    def __init__(self, in_dim, out_dim):
        super(Decision_Model, self).__init__()
        
        self.fc = torch.nn.Sequential(
            torch.nn.Linear(in_dim, out_dim, dtype=torch.bfloat16),
            torch.nn.Softmax(dim=1)
        )
    
    def forward(self, x):
        return self.fc(x)

basic_des = Decision_Model(what dimension to put here? , n_labels)

//Problem is here, how do I pass out_ids.last_hidden_state with shape of [1, 128, 3072] to basic_des??

outputs = basic_des(out_ids.last_hidden_state) ???


sly inlet
#

what dimension output you get from gemma model

obsidian bone
sly inlet
#

and what do you want to do

obsidian bone
obsidian bone
#

just want to pass out_ids.last_hidden_state to torch.nn.Linear

sly inlet
#

i mean end goal ,

obsidian bone
#

that's all, that's the whole problem

sly inlet
#

what will you achieve after doing it

obsidian bone
#

classification

#

sentence

sly inlet
obsidian bone
#

ok

#

thanks for your time tho

sly inlet
#

but they already have GemmaForSequenceClassification

obsidian bone
# sly inlet https://github.com/huggingface/transformers/blob/821b772ab915e53870aabba6cb765e8...
[inside] def __init__(self, config):
    self.num_labels = config.num_labels
    self.model = GemmaModel(config)
    self.score = nn.Linear(config.hidden_size, self.num_labels, bias=False)

[inside] def forward(self,...):
    transformer_outputs = self.model(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            past_key_values=past_key_values,
            inputs_embeds=inputs_embeds,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
    hidden_states = transformer_outputs[0]
    logits = self.score(hidden_states)

their passing of hidden_states to self.score is strange...

#

I will look more into this

sly inlet
obsidian bone
#

i know that

#

hidden_states shape is still [batch_size, token_length, hidden_size]

#

wait we can pass 3d tensor to torch.nn.Linear?

sly inlet
#

Yes

#

The only thing that matters is last dim should match to input dim

#

If you want to know more try adding print statements in transformers lib code

tiny anvil
#

Hey guys I was looking for some help with this problem. I was hoping to figure out a different solution to the one posted (which calls the function made in the previous problem). The issue I think I am having is the elif in the second for loop at the bottom which checks if the key is not in the match list.

sly inlet
tiny anvil
#

Right, I was wondering how I can get the elif statement to only activate if the key isnt in the match list

#

If I remove the elif statement, it works, but the issue is if there is a key thats in keywords that isnt in the match list it just ignores it instead of adding an empty value.

sly inlet
tiny anvil
sly inlet
tiny anvil
sly inlet
#

could you send the whole function

tiny anvil
#
def multi_word_search(doc_list, keywords):
    """
    Takes list of documents (each document is a string) and a list of keywords.  
    Returns a dictionary where each key is a keyword, and the value is a list of indices
    (from doc_list) of the documents containing that keyword

    >>> doc_list = ["The Learn Python Challenge Casino.", "They bought a car and a casino", "Casinoville"]
    >>> keywords = ['casino', 'they']
    >>> multi_word_search(doc_list, keywords)
    {'casino': [0, 1], 'they': [1]}
    """
    
    x={}
    for i, doc in enumerate(doc_list):
        infos = doc.split()
        match = [info.rstrip('.,').lower() for info in infos]
            
        for key in keywords:
            if key.lower() in match:
                x[key] = [i]
    
    return x

# Check your answer
q3.check()```
sly inlet
#
def multi_word_search(doc_list, keywords):
    """
    Takes list of documents (each document is a string) and a list of keywords.  
    Returns a dictionary where each key is a keyword, and the value is a list of indices
    (from doc_list) of the documents containing that keyword

    >>> doc_list = ["The Learn Python Challenge Casino.", "They bought a car and a casino", "Casinoville"]
    >>> keywords = ['casino', 'they']
    >>> multi_word_search(doc_list, keywords)
    {'casino': [0, 1], 'they': [1]}
    """
    
    x= {k:[] for k in keywords}
    for i, doc in enumerate(doc_list):
        infos = doc.split()
        match = [info.rstrip('.,').lower() for info in infos]

        for key in keywords:
            if key.lower() in match:
                x[key].append(i)
    
    return x

# Check your answer

tiny anvil
#

I think I need to practice using list comprehension and for loops a bit more

sly inlet
#

@obsidian bone did you get it to work , the way you wanted ?

obsidian bone
#

Will look into it later and tell you

haughty mulch
#

By looking at raw data from a dataset, how can one decide whether it is necessary to add features (say, mean, variance, etc.)?
Is there any standard approach, or should I just do a trail and check?
If there is an approach, how do I know which features I should include?
Thank you

obsidian bone
#

@sly inlet

tokenized_problem = tokenizer(sentence, return_tensors='pt')

out_ids = model.forward(tokenized_problem['input_ids'].to(DEVICE))

debug_layer = torch.nn.Linear(hidden_size, n_labels, dtype=torch.bfloat16).to(DEVICE)

debug_out = debug_layer(out_ids[0])

debug_out.mean(1)

debug_out.shape = [batch_size, n_labels]

#

I checked Gemma's sequence classification, they used last token which is [:, -1, :] to get their logits,

#

but i wanted to accomodate all the tokens, so seems like taking the mean is the only option here

#

thanks for your help

sly inlet
sly inlet
obsidian bone
sly inlet
#

last token method will pretty much work as Transformers have implemented it.

#

but i'm curious to see how mean of embeddings will behave

obsidian bone
#

so transformers only depend on last token to predict next word?

sly inlet
obsidian bone
# sly inlet like last hidden states

I am really having trouble understanding transformers at all.... in theory they are something, and on practice they are totally different thing

obsidian bone
#

it keeps outputting the same label on every sentence

#

will try last token now

#

it's the same with last token

sly inlet
obsidian bone
obsidian bone
sly inlet
# obsidian bone

last try with their default implementation of GemmaForSequenceClassification

#

i would love to test this at myside as well

#

but i'm busy with my office work now , i will check out this variations later

obsidian bone
sly inlet
#

but input_ids , attention_ids, position_ids and attention_mask are important things to consider

#

you can try generating a answer with padded prompt , without attention_ids it fails but with it it works fine.
its all about rotary embeddings

#

see you later

abstract ridge
#

Greetings floks, am looking for a dataset that has Artworks with their descriptions (i only need about the name of the artwork with the semantic description, like what it is and what it means )
hope someone knows a dataset that suits this case

ruby oar
#

Can i get a job here after becoming a newbie ML and DL engineer?harold

plucky vector
#

define "here" 😅

weak compass
#

Hello. When we are using two GPUs in kaggle notebook, how do include both of them when running a LLM with huggingface pipeline?
GPUs available: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
But currently, i am doing something like:
pipe = pipeline("text2text-generation", model="grantslewis/spelling-correction-english-base-finetuned-places", device = 0)
Which i think only makes use of one of GPUs

vapid mantle
#

hello guys, i found a dataset that someone has worked on that i would like to use aswell and visualize just like he did on tableau, but despite trying to follow the instructions im so lost on how to download his jupyternotebook. can someone help me? thank you very much

undone nexus
#

Hello, I want to learn more about Machine Learning and want to know whether anybody has any suggestions as to which resources(research papers, online courses, books/textbooks,etc.) one could use to cover a lot of topics in a reasonable depth such that you could ascertain which fields of AI/ML you like or don't like. Coming from a complete beginner standpoint at ML here, but have the basic prerequisites of statistics, multivariable calculus, and linear algebra covered.

plucky vector
midnight oriole
#

hello, i am new to ML and was wondering how much an apple M1 with 16Gb will be able to handle for training? I found a dataset with 5 million data that i would like to work on but i'm sure that this is way too much for a laptop. what is a general range a laptop could handle? should i slice it down to 100-500k data? Also, should i just use google colab instead of jupyter?

sly inlet
# obsidian bone I am really having trouble understanding transformers at all.... in theory they ...

We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusi...

▶ Play video
obsidian bone
sly inlet
#

yeah , kind of webseries on GPT-2

obsidian bone
#

i am excited to watch

sly inlet
#

but i like whatever this man posts

obsidian bone
#

but got some essays to finish this week... writing is so boring

sly inlet
#

i'm glad i finally over that crap

obsidian bone
#

he has a discord server as well

sly inlet
#

yes

#

i was there but didn't feel that good to me , so i left : )

obsidian bone
#

man I already miss doing AI... it's been 3 days I didn't touch kaggle, feel like something important missing from my life

#

I hate university

obsidian bone
sly inlet
obsidian bone
sly inlet
obsidian bone
#

u finished or still studying?

sly inlet
#

graduated last year

obsidian bone
#

oooh congrats

sly inlet
#

which year you are in ?

obsidian bone
#

atleast you are saved from it

obsidian bone
sly inlet
#

ngl, CS things are so simple , we got pure madness of maths and theorms

obsidian bone
#

u got motor and voltage source, good luck connecting driver to them

#

and make sure you don't end up exploding ICs

#

I hate this major...

#

wish I went software engineering instead

sly inlet
#

its better in elctronica at least you will get hurt by max 24v not like electical , playing with live AC

obsidian bone
#

ended up blowing a light bulb

sly inlet
obsidian bone
#

glad we didn't get injured

obsidian bone
#

arduino has most of it's codes ready, u just connect components copy past the code and run the circuit Tadaaa!

sly inlet
#

auto transorformers , delta , star

obsidian bone
#

yeah transformers lmfao

#

we know 2 transformers now

#

one in electrical and one in AI

#

that are completely unrelated

sly inlet
#

search now and you will get electrical one

obsidian bone
sly inlet
#

too much competition

obsidian bone
sly inlet
#

remove s

obsidian bone
sly inlet
#

but HF needs to do more publicity

obsidian bone
#

wow. you are master at prompting

#

just by removing S we got different result

sly inlet
#

i wish it won't come to me at work place

obsidian bone
#

honestly I hate it cause it doesn't feel like real engineering

#

but it does bring results

sly inlet
#

change few letters and you get something new, how am i supposed to get all answer with single prompt i don't know

obsidian bone
#

chain of thoughts and self consistency for example, they do improve LLMs reasoning

obsidian bone
sly inlet
#

as your prompt gets bigger it start hallucinating

sly inlet
#

how a space can turn things over

obsidian bone
#

cause looking at transformers architecture based, they are really unstable.

sly inlet
#

yes they were training model on byte level

sly inlet
#

long way to go for byte level models to compete with transformers , its hard to find pattern with bytes on text data mostly

obsidian bone
azure fractal
tropic yarrow
#

Can someone explain why my code is running out of RAM? (please ignore the commented parts, I'm doing a test run to see if the submission is working correctly without training the models first)

obsidian bone
#

I added the length of 1000, and it took 2GB of rams when appending the model to models

tropic yarrow
obsidian bone
#

your neuralnetwork model has 288685 paramters to train, which is also something to consider.

tropic yarrow
#

So I'm initialising a different dataloader and model for each feature to be predicted

obsidian bone
tropic yarrow
tropic yarrow
obsidian bone
#

and you initiating a model for that times of length

tropic yarrow
#

I was just experimenting with whatever I could think of at the time, and it working for training and validation so i didn't optimise it

tropic yarrow
tropic yarrow
#

I also copied portions of the code and didn't remove the comments so ignore the double comments, im sorry

#

looks pretty ugly

#

i had commented out the training loop for checking if the submission works correctly

#

while training

obsidian bone
tropic yarrow
#

oh nvm its going up

#

i see, thank you

obsidian bone
#

wait you using this in your local machine or kaggle notebook?

tropic yarrow
#

colab

obsidian bone
#

oh

obsidian bone
#

17.4 GB right

#

now

#

bruh what did u do lol

#

wait lemme check the submission function

#

ok notebook crashed

tropic yarrow
#

For each row, I was using the image id to access the image from the test dataset and using the encoded feature name to access the model to be called (which could be optimised by saving weights)

obsidian bone
#
def submission(models, lookupDataset, imageDataset, organEncodings):
    for model in models:
        model.eval()
    for i in tqdm(range(len(lookupDataset))):
        # print(lookupDataset[i])
        # print(imageDataset[int(lookupDataset[i][0][1])-1].unsqueeze(0))
        # print(imageDataset[int(lookupDataset[i][0][1])-1].unsqueeze(0).shape)
        #print(f"Row Number {i}")
        #print(f"Before:\n{lookupDataset[i]}\nAfter:")
        with torch.no_grad():
            lookupDataset[i][0][3] = models[int(lookupDataset[i][0][2])](imageDataset[int(lookupDataset[i][0][1])-1].unsqueeze(0))
        #print(lookupDataset[i])
    return lookupDataset
#

you should add with torch.no_grad() before you feed the data into models

#

because without it, it calculates the backprops of the models and accumulates it

#

which takes space in ram

tropic yarrow
obsidian bone
tropic yarrow
#

What's loss.backward() for then? Doesn't it store the gradients?

obsidian bone
tropic yarrow
#

Um
somehow I used a dataloader and now it works fine

obsidian bone
#

but when you do forward pass

#

the model create differential graph

#

if it's gradient enabled

tropic yarrow
obsidian bone
#

no backprops

tropic yarrow
#

doesn't zero-grad simply set them to 0?

#

and how are they accumulating it if loss.backward() is never called?

obsidian bone
obsidian bone
#

wait lemme show u sth

tropic yarrow
#

wdym by accumulates gradients

obsidian bone
#

on the second one with torch.no_grad(), it returns the output without recording the graph

tropic yarrow
#

how large are the gradients?

obsidian bone
tropic yarrow
#

i mean, those grad fn objects

#

I have 12k entries, so is each object like 1 GB?

#

nvm 30k entries actually

obsidian bone
tropic yarrow
#

Thanks, this makes sense

obsidian bone
#

basically when you do forward pass, the tensors record history of the computation graph

#

which takes space in memory

#

but with torch.no_grad

#

u tell the model not to track the computation graph

tropic yarrow
#

I gtg now, can we discuss more of this later? (Ill read it)

obsidian bone
#

but just do calculation and output number

tropic yarrow
#

Thanks for your help!

fervent jolt
#

Hello everyone, I'm gonna try out one Kaggle competition, but I don't understand what "internet access disabled" means. Can you detail what it means???

obsidian bone
stable dragon
fervent jolt
obsidian bone
#

Sarvesh answered it

fervent jolt
stable dragon
fervent jolt
#

@stable dragon Thanks for your quick reply. I'm getting closer.... Can you give me specific examples that are not allowed to enlighten this newbie 😔 ?

stable dragon
#

Would recommend checking with a public notebook, that would help you more.

Alternatively to experiment with things, turn off the internet for the notebook that you are working on, and try executing the code

fervent jolt
fierce canopy
#

Hi! Can I run stable diffusion and kobold ai on kaggle without be banned?
I'm too recording a course about stable diffusion, can I use kaggle to teach and the students use without be banned? How it works?
Can I train checkpoints to stable diffusion on keaggle without be banned?

plucky vector
fervent jolt
stable dragon
#

Can anyone tell me how to import a python notebook with all the inputs to the local via API.

Right now I need to individually download all the input files

light nest
#

Hey guys I have a question , can we build sequential model ?
Like is it possible to train a model on X1_i Inputs Y1_i Output and then the second one is running on X1_i + Y1_i to give output Y2_i ??
Context : (I am trying to build this for a product where we are predicting what the user is likely to select. I have learned about Supervised Learning Algorithims including ensemble techniques)

vernal mist
#

So I am new to the whole data science thing and just started working with kaggle. Is there anyone that might be able to give me some pointers?

plucky vector
vernal mist
plucky vector
#

At least in the courses Machine Learning I + II and Neural Nets that I did

vernal mist
plucky vector
vernal mist
plucky vector
#

That's strange. Did you select R as the language instead of Python for your cell?

#

I must admit, I didn't work with R so far on Kaggle

vernal mist
vernal mist
# plucky vector I must admit, I didn't work with R so far on Kaggle

On that subject the only reason I'm working with R right now is that's the programing language that the Google cert I took taught us. But I haven't really come across anyone that uses it for data analytics. Does it matter if I stick with R or should I switch to learning python?

plucky vector
#

Well, as far as I understand with R you can do only data statistics, and with python you can do almost everything you can do with any other programming language too -- program scripts, for example, and especially on Kaggle also do machine learning stuff.

#

I don't know what your background is, but in the natural sciences python is very widespread -- because it's easy to learn and most scientists aren't programmers. They want to put some code together to do something for them without diving into the concepts of stuff like object-oriented programming

vernal mist
plucky vector
#

You can learn both. The languages are a bit different, but the concepts of the analytics themselves are the same. Like math books in different languages will still contain the same math

vernal mist
elder flower
#

There are news that US government forbid any IT services for Russia after 1st September. Is there any information about kaggle, will it stop working for russian users?

vapid mantle
#

anyone truly knowledgeable about big data & business intelligence here? i got a uni assignment and im cooked

#

please dm me

muted plover
next widget
#

Has anyone run into the problem where the kaggle python package cannot find kaggle.json? I have double and tripple checked that the kaggle.json file is in the "~/.kaggle" directory and I have also run the chmod 600 command on the file.

EDIT:
I figured it out. I had to set the KAGGLE_CONFIG_DIR to be the full path to the .kaggle directory. For some reason the relative paths ("~/") do not work. The permissions were set to 600.

midnight oriole
#

hello, does anyone know any good resources of understanding the "dataset"? I'm not so good at concepts such as feature engineering and understanding the visualization of the data. So I guess like data preprocessing?

serene pier
#

i was training some model and the progress epchs stops and i refresh it now the session stuck in booting kernel but when i see the console its still running. how to solve this ?

jaunty tangle
#

Hello, I'm having a problem with something really simple but I can't seem to make it work.
When I try running the code below, it doesn't recognize min_frequency as a parameter despite it being in the documentation. So I'm trying to upgrade scikit-learn to a newer version but it doesn't seem to be working. I run my code on kaggle. Help would be much appreciated 🙇‍♂️

!pip install --upgrade scikit-learn --use-deprecated=legacy-resolver
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder

# Define the columns to be encoded
categorical_cols = ["Gender"]  # replace with your column names
gender_categories=['M','F']
# Create the encoder
encoder = OrdinalEncoder(min_frequency=10,
                         categories=[gender_categories])```
#

For more context, I got this error the first time I tried to ugrade:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spopt 0.6.0 requires shapely>=2.0.1, but you have shapely 1.8.5.post1 which is incompatible.```

The version also hasn't seemed to change yet
candid canyon
#

I am having a problem while predicting using YOLOv8 with 2 GPUs i ahve set a batch size of 16 but i am still having the same problem

high sigil
compact temple
#

Hi this may sound like a bit of a silly question but what channel is for the discussions on the Titanic competition?

plucky vector
#

If you go to id:customize you can see many more channels than in the standard list

rocky bolt
#

Hi I've cloned from a git repository, and i tried to open the files under the output/RadioUNet directory but to no avail. Is the behaviour as expected? I tried googling about it and chatgpt did say it is possible to open the files by clicking on it

marble raptor
#

I can't seem to find the channel for the kagglex competition even though I've verified myself on discord, any help?

glacial moth
#
import plotly.express as px
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

medical_df = pd.read_csv('/kaggle/input/test12ssda/medical-charges.csv')

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (10, 6)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

fig = px.histogram(medical_df, 
                   x='age', 
                   marginal='box',
                   nbins = 47, 
                   title='Distribution of Age')
fig.update_layout(bargap=0.1)
fig.show()```
#

Someone please help

#

it's been an hour since I started trying to fix it

#

it doesn't show the graph but doesn't give any errors

#

and yes I printed medical_df and it was fine

#

so nothinf with the dataframe

#

@ruby pewter

stable dragon
#

Use copilot or LLMs to fix these errors, would be faster and easier

undone fulcrum
#

Does anyone know good ways to approach fitting classifier models on high dimensional vector embedding data?

#

As apart of my research at my university we are testing the accuracy of ML classifier models in network anomaly detection after the data has been converted to a textual format and embedded using LLM sentence embeddings. The goal is to measure the change in performance of the models with the data with and without LLM embeddings used in preprocessing.

A problem I’m encountering is the datasets I work with are typically 1.5 million instances and when running the data through a sentence transformer model the dimensionality of the data skyrockets (about 350+ embedding features for the current model I’m using) and fitting the models we’re testing (RFC, ET, SVM, ADA, XGB, GB) takes a significant amount of time which is not good as my code gets run in Google Colab and constantly having to re up on compute tokens is not cost efficient at all.

I’ve thought about running the embeddings through PCA but I think the dimension would have to be reduced so much that it would be significantly diminishing the strength of the embeddings. Before I try this route, are there any suggestions on better ways to approach this problem that I don’t know of? Thank you!

glacial moth
obsidian pulsar
obsidian pulsar
#

Why not preprocess the data using the counters in the ingest module before encoding the data?

glacial moth
jaunty tangle
obsidian pulsar
glacial moth
#

so they have to update the libraries for the platform themselves?

#

I thought it was something similar to a VM

glacial moth
red gust
#

So I'm relatively new to the data science world (experience in JAVA, but thats pretty much it). I've been trying to learn Python, but from what I have seen, it seems that most data science work involves using python libraries more than anything–– coding jargon isn't too necessary. Correct me if im wrong, as I am still learning python

#

And while we're at it, can someone introduce me to the basic functions (with examples, don't need to be too in-depth) of each python library?

marble raptor
#

I had one question, in the leaderboards it says:

This leaderboard is calculated with approximately 20% of the test data. The final results will be based on the other 80%, so the final standings may be different.

how will you calculate the remaining 80% of the data given you dont know which model I used

plucky vector
#

I suppose, you have to submit a model or kaggle notebook on this competition

sleek relic
#

Is there a good way to propagate image augmentation transforms to a label when loading training data (torchvision)? I am trying to train some model with pixel coordinates on the image as labels, which means I need to figure what transforms are being applied to the image to transform the label as well

thorny marsh
#

Hello everyone, I need some advice on where to dedicate my time. Let's say I wanted to start a company down the road in the automotive/consulting business, if I'm working as a automotive mechanical design engineer, during my free time would you guys suggest learning more about the automotive industry/business or machine learning? I enjoy machine learning a decent bit but idk how transferable it will be for a business
I know that that is where the industry is going tho, EVs, auto driving, etc

obsidian pulsar
obsidian pulsar
obsidian bone
#

anyone was able to download LLM files from huggingface using git clone? mine one stuck

marble raptor
stoic cove
#

Hi there, I am working on a competition which requires me to submit the notebook and runs it in offline mode, I want to import SentenceTransformer module, how can I import that

#

?

sleek relic
arctic glade
plucky vector
arctic glade
#

no way to approve the rules, bcuz it says its not required

#

ive tried using Kaggle API also might i add

#

i got 403 permission denied

lapis pendant
#

anyone here ? I'm new.... I had a question, maybe in general too... so if I let's say had embeddings for my train set, would I have to run that over every time when I submit or can I save the embeddings on my kaggle directory and just look up the numpy array when I submit? I'm doing the essay prediction learning one...

obsidian pulsar
main mica
#

Hello guys the dat viz course in kggle

#

How to complete their final project huh

lapis pendant
#

how are people in the essay learning scoring one using the LB for their own models? seems like that won't fly in final submission?

lapis pendant
quiet crystal
#

Hey guys, I have one nube question.

I'm trying to make my first submission to a code competition. I did my work on Jupyter Colab by going through some baseline examples.

Pretty much every examples use deberta-v3-base as their model and tokenizer, and they create the these by doing something like:

class PATHS:
    model_path = '/kaggle/input/huggingfacedebertav3variants/deberta-v3-base'

self.tokenizer = AutoTokenizer.from_pretrained(PATHS.model_path)
model = AutoModelForSequenceClassification.from_pretrained(PATHS.model_path, num_labels=CFG.num_labels)

I am not sure from where and how you can add the models/tokenizers to the kaggle/input directory. I tried this tutorial (https://www.kaggle.com/code/shravankumar147/save-huggingface-model-to-local-for-no-internet/notebook), but I still don't see deberta_v3_small_pretrained_model_pytorch created in my kaggle/input directory when browsing from the Jupyter notebook created through competition's submission page.

Would love to get help on this. Thanks!

obsidian pulsar
lapis pendant
#

can you submit a notebook via good colab or is that only for files like submission.csv? I ran out of GPU and they won't let me submit with GPU enabled now

deft fox
arctic glade
deft fox
arctic glade
# deft fox Join the competition in “Data” tab.

Im not sure how to do that, Ive clicked the license, it redirected me to the rules tab, but there is nothing to accept, when i click download, the same things happen (i.e rule tab). Ive tried the cli script, i got 403 permission denied. Can you guide me on how to join the competition via the data tab? thanks

deft fox
arctic glade
stoic lion
#

sorry to double post, but very urgent given two days before competition deadline on aimo: my team requires a dataset upload to submit our notebooks and it seems this is broken. can anyone help? submission deadline in 40 minutes

stray ledge
#

Hi Everyone, need help. I am trying to submit my notebook for AIMO competition but every time I am getting "Notebook threw exception" for the submission. I have put most of the code in try, except blocks to catch any exception. I just want to know what exception where it is throwing, please help I have spent days and nights to come this far. Here is the link to my notebook. https://www.kaggle.com/code/kiritidesarkar/aimo-deepseekmath-db-solved-trainingquestions. There are only 2-3 days left for the competition and I am not able to get even a score because of this exception.

stray ledge
lost dove
#

I am a beginner who just started learning and I had a doubt "Should we do imputation even when more than 50% of the values are missing ?"
if so what will be the best method?

tame aurora
#

but if a variable has 50% of its values missing i'd rather just drop it if it is insignificant anyways

#

try imputing and check the correlation with the dependent variable using
sns.heatmap(df.corr())

flint copper
#

hey, I had a doubt related to the Titanic Dataset I have recently started Learning ML can anyone tell me what columns I should target to give the required output?

#

I am not able to understand what output does they want?

graceful axle
#

who will be the winner?

stray mango
balmy yoke
#

Hello, I have little experience in the world of data science, but I want to participate in a competition to test my skills and acquire new ones. Could you recommend one, please?

graceful axle
#

How to get room direction in indoor photo/image.

stray mango
#

Try to build those problems model

#

It will surely help you gain a better understanding of model

broken nimbus
#

Hi, Is there any structured course that gives proper directions to Kaggle beginners on building profile? or any mentor in the group? Kindly guide. TIA!

stray mango
#

You should be okay once you do this guided projects

broken nimbus
#

not the single project, but the structured course

#

cz a lot of people are like me, new to kaggle but dont know how to build the profile. To start with , its titatnic project , but then whats next, as in series of tasks

sour crystal
#

Hi everyone, I’m new here and need some help. Can someone please explain what I need to do for this Skin Cancer competition in a simple way?

  1. What do I need to do? - Develop a program to identify dangerous skin spots from images.
  2. What language and tools should I use? - Use Python and tools like TensorFlow or PyTorch.
  3. Which data should I use? - Use the Training and Test data provided in the competition files or data from outside.

Thanks a lot!

fervent jolt
#

pheeww.. another newbie question:wWhen I submit in a Notebook competition, does Kaggle simply replace test.csv with the hidden dataset? 🤔 My submission is taking too much time, so if that’s correct, I need to re-engineer my approach.

wild perch
#

Hi everyone, do you know if there exists a dataset for fake (LLM-generated) online reviews in german language? I am looking for somthing similar to https://osf.io/tyue9/ but in german.
Thanks!

small musk
#

When I try to get the code, I get the too many requests message

rocky lintel
#

I'm looking for anyone interested in collaborating with me on an ML model for turning raster curves into vector vector (Bezier) curves? Is there a section on this Discord most appropriate for asking?

agile veldt
#

I'm calculating pAUC_80 using the implementation found on the icic skin cancer competition, but it is not matching up with the outputted score, any ideas on how to fix this? I want to test my models true value without wasting a submission

wintry shell
#

Hello everyone, i have a question and i would really appreciate your assistance.pika_wow
I have 2 networking and ip addresses data files with .RR format (ex: myipv6add.RR, myipv6add2.RR) and i want to extract into MySQL file .. how can i write a script in python to do that ? harold

wanton orchid
#

The kaggle quota of GPU resets every week right?

fervent jolt
uncut prism
#

when will participants for kaggleX fellowship program be notified if they got accepted or not?

verbal crest
agile veldt
fervent jolt
agile veldt
#

If you can export your model you might want to do that

#

and import it into another notebook

#

so you arent training it everytime you submit

vale steeple
#

hello guys, are there any source that specifically collects data on traffic?

quick crag
#

hello i am working on projects accross the web and participating in competetions as well as starting as a freelancer to enhance approach in the industry and work on real world data. DMs are open.

dapper yoke
#

Hey guys, how do I access my GPU and TPU in my kaggle notebooks?

#

Do i need to verify my phone number or smth?

dapper yoke
# sly inlet yes

alright, thank you very much! and after that, you can modify the settings in the notebook, right?

sly inlet
#

yes , you will get option for accelerator

dapper yoke
#

thanks man

quick spruce
#

hi, I recently got banned from my other account due to using prohibited code. How to get my account unbanned, I really learned my lesson now 😭

verbal hornet
#

Hi, i need some support on 1 dataset i've created, because it's very biased

sly inlet
wispy eagle
#

how do I team up with a friend?

verbal crest
#

@wispy eagle There is a "team" tab that allows you to add friends to your team to work together. You might need to refresh after accepting the rules.

quick spruce
sour adder
#

I have been working on the Image Segmentation task, but most of the notebooks I reffered have used MSE as the loss function to train the model, they are comparing the generated mask and the ground truth using the MSE Loss function. Generally for the image segmentation IoU should be used, when I tried that, my IoU loss stopped decreasing and it stagnated at 90,while MSE loss decreased quite well until 0.02. Can anyone suggest,why this is happening. Any help or suggestion will be quite helpful thanks in advance

quick spruce
#

could anyone help me on getting my Kaggle account back 😔, I've been so depressed and stressed... My kaggle account is imduckman, I got banned after using prohibited code. My account has lots of my valuable code that now I cannot access. I really realized my wrongdoing and learned my hard lesson now... I really apologize for my violation, I promise I would never violate the rules again. Thanks a lot

lapis bane
#

hii everyone 👋, how do I upgrade to kaggle pro ? and I recently ran out of memory when deploying gamma model in my kaggle notebook, can someone tell how to get more memory from kaggle?

Thanks in advance!!

wooden halo
#

Hi, please when is the next application to register as a mentee opening ?

hallow tundra
#

how do i start with kaggle? I have done a machine learning course

fervent jolt
# agile veldt If you can export your model you might want to do that

@agile veldt Could you elaborate your explanation? I'm wondering how my notebook submission is scored. The host has hidden dataset. The hidden dataset should follow my algorithm on my notebook to get the results, and the results will be scored by the host's metrics. Then, the hidden dataset should replace test.csv in my notebook. Am I missing something in my logic?

high sigil
#

@wind silo Hello and sry for the ping :)
but, any updates on the results of KaggleX

uncut prism
graceful axle
#

Hello, everyone
I want to create new similar style music from 100 ambient music.

#

Please help me.

stuck monolith
#

hello guys, I have been doing ML for about 1 month and I have no problem understanding the models and maths

#

but whenever i start implementing it myself

#

on any datset

#

i just go blank

#

and dont know what to do

#

can someone please help ??

eager fossil
#

Hello everyone,I am new to kaggle .I want to participate in competitions and started learning ml but don't know as a beginner how to participate in competition ect , please anyone guide me.

dapper yoke
wind silo
eager fossil
#

thanks @dapper yoke

real shore
#

hello everry one i 'm looking for anyone interested in collaborating with me on an ML

dapper yoke
#

How can you implement the volov8 or volov5 architecture in tensorflow for image classification?

obsidian bone
#

cause i've seen some notebooks git cloning the repo

shrewd scarab
#

Hey guys, does anyone know if there is a data api that I can use to access my user data (ex: competitions, leaderboard rankings, notebooks)? I want to programmatically access my information so that I can display it to others without having to redirect them to kaggle.

quasi holly
stable dragon
#

I get this error while downloading the data via API, any suggestions?

403 - Forbidden - You must accept this competition's rules before you'll be able to download files.

PS: I have already accepted the rules for the competition

halcyon island
#

hello guys
my add-ons is not showing up in my kaggle notebooks
i have tried to sign out sign in multiple times and created mulitple notebooks yet I am unable to add on my secret keys

obsidian bone
#

guys how do I use the attention mask like this when we have batche of sentences with different lengths?

#

like do I multiply attention mask with tokenized input, or do I input it seperately to the transformer model?

#

context: I am not using Hugging face library, I built custom transformers based model

tacit ibex
#

How would i gain followers on kaggle and upvoted by people?

jovial leaf
#

Hi, i'm relatively new to ML. And i always get this far when i make models (then i make the plot of predicted values ​​vs actual values) but that's it. How does this apply to the real world? What tools do you use? Do you have documentation of that or recommendations?
Because if I go to an interview and show them my script, i'll just sit there and not know how i could implement it into something "real."

candid tinsel
#

hey guys any attempt to use get_gcs_path() function just results in an error, does anyone know how to fix?

tacit ibex
tacit ibex
#

That's my GitHub follow me and see my diamond price prediction project

#

Also follow me on kaggle please

#

Hey can anyone help me how I would I make my own new dataset
I completed ml and I am very much introduced to the thing's but for making dataset from where we decide.colums and rows and specially feature in it and its data values
I am confused so anyone who makes dataset can HELP me please

prime tusk
#

Does any one help me to find some end to end data science industry project?
Mainly I looking for finding fraudster/bill defaulters using their credit history

Or finding out customer key insight of a business store using their all their customer transaction history
Mainly who is their target customer and who can be in future whom should they focus on

idle vortex
#

I am not able to download the model file from the working directory

#

any fix for that ?

dull sky
#

is there a way to early stop flaml automl? I did find hyperband and early stop, but not a standalone function

simple orchid
#

Whenever I want to tune hyperparameter with keras tuner it raises exception saying "RuntimeError: Number of consecutive failures exceeded the limit of 3". Can anyone help me how to solve this problem?

next glade
#

Hi, i was wondering if anyone knew any libraries that have a bot in a maze, and the bot has to try and find its way to a goal known that may or may not move, and is unknown, using a neural network?

quasi holly
#

thank you so much please tag me if there are any answer to the question

tacit ibex
#

Can anyone help me how to make dataset..means I have idea about it and column name...but how can I find data for that column

dull sky
#

is there an IDE ,that has conda and jupyter integration and works reasonably well? I tried pycharm community, but the notebook is a payed a payed feature...

golden lotus
#

Please Why is this not working please
i have successfully imported all the other documents for train and test

quasi holly
warm stream
#

how do you guys find unclean datasets on kaggle? I am doing data cleaning on structured data and i am looking for dirty datasets like housing prices or loans

#

or price predictions

#

it can be either classification or regression

subtle nexus
#

Hey, could you advise me on your favorite python libraries to quickly try out different model types (SVM, RandomForest, etc) on a dataset? Thank you!

plush oyster
#

Sklearn

lone roost
#

hi, i am new to kaggle. Can someone tell me how to make a comment in discussion?
Whenever i open any discussion page, i didn't have the permission to make a comment.

maiden imp
#

can i give kaggle competition on my own? no team.

golden lotus
quasi holly
# golden lotus i did

The issue cited is that train_data is not defined
Either variables name is wrong or
You might not have run all the cells after you edited them

I don't know if there is any other reason that could happen, try rerunning all the cells

real ravine
#

Where can I better understand how a notebook submission should look like for a competition? Thanks ❤️

earnest shell
#

Hi everyone, I'm stuck on a part of my code, I need to predict the price of gold in India.
I chose to use the ARIMA model, but for some reason the value is repeated after the ninth index.

I already tried:

  • Put the ''Date'' column in the index.
  • Keep the date column in the dataframe.
  • Turned the ''Price'' column into a list, and it didn't work.

Someone could help me?
PS: Sorry for my bad english, I'm brazilian, and actually I don't have time for became more fluent

quasi holly
stuck monolith
#

#❓┊ask-a-question guys I do not have a master degree is it good for me to continue pursuing ML (I am currently 2nd year b.tech) ??

#

are there any jobs I can do without a masters degree?

real ravine
# quasi holly look at other notebook how they submit it, i do that

Thanks for the answer. I've been looking through notebooks under the "Code" section of the competition, but I don't get how I could know if this is a submission notebook or just some general exploration or something else. Is there any way to know what an actual submission would look like?

marble raptor
real ravine
#

Thanks for answering. I think my question is really this:
What is the notebook submission supposed to have?
Does it have to load data, train a model, and write results?
Many thanks

marble raptor
#

you dont submit the notebooks, just that csv file having the predicitions

real ravine
#

But now I'm trying to do the ISIC competition with friends, so I guess I need to submit a notebook so that I can generate predictions on the hidden test set

clear crescent
#

hey . i uploaded my prediction csv file. how i can know if it is ok or not? how to chec k ?

marble raptor
pulsar dagger
#

hey everyone, I'm having trouble verifying my account using a phone number, I actually tried two of them but in both I get that "too many requests" orange message, and I've tried more than once in different times of the day but still to no avail, any help would be highly appreciated 🙏

real ravine
golden lotus
#

Please why am i getting this error

upper panther
#

Hi guys I just submit the first prediction titanic with the tutorial, but I am wondering how it goes from now, I do not think it is over. Right?!? tks

verbal crest
verbal crest
celest sphinx
#

Hey hey hey

#

Ive been trying to apply quantile loss function to my xgboost regression but something weird is happening

#

The 0.05 quantile seems to be working fine, but the 0.5 is wayyy off mark and the 0.95 is not even trying

#

(On the picture i made the xgboosts overfit the data just to confirm each quantile was doing its job properly)

#

And getting a flatline when trying to overfit tells me something is weird

celest sphinx
#

Ok i think i figured out the problem is that the gradient of quantile loss is always the same value if you re doing bad there is no incentive to go up or down so the xgboost is just stuck at 0

#

So ill try to modify the quantile loss function so it knows it is getting closer

celest sphinx
#

Now it works well ☺️

golden lotus
celest sphinx
# golden lotus still waiting for help

Idk looks something like you're accessing the kernel multiple times or something but i think i recognize kaggle interface so it has to do with your connection to their website

#

Chatgpt works very well with copy paste-ing error messages

#

Give it a shot

gusty hound
#

hello everyone, I am planning to go to university, but I can’t decide on the program, can you tell me how important mathematics is in data science and how many thousands of hours it is best to spend on it in order to study it to the required level to work as a data scientist. Thank you very much in advance

stable dragon
#

Data Science is all about Mathematics

clever bay
#

In order to work in data science, is there a minimum of what skills you must have?

#

Also is freelance data science a thing?

winter jay
#

Hi, guys! Do I have to mention on the Kaggle forum that I'm using external data? I remember there was an External Data Thread in every competition but not anymore.

fierce canopy
#

Hi, how can I have more space on Kaggle?

verbal crest
stray lichen
#

in a project I need to get the following api keys.
LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...
DEEPGRAM_API_KEY=...
OPENAI_API_KEY=... I have difficulty getting the LIVEKIT_APi keys. how can I create one?

sly inlet
#

i believe you know they are not free

stray lichen
maiden imp
#

Not related to kaggle

https://www.youtube.com/watch?v=gDF_qGzYEYQ&list=PLqXS1b2lRpYTUHPp2MYkgXS7v6_qA-JsF&index=5

guys i was thinking of replicating this project in my laptop, will my 3050 6gb will be enough?

In this simple coding tutorial, you will learn how to make your own Generative AI application with Stable Diffusion, Docker, and Flask. The app will take in a user-provided text prompt and convert it into high-resolution images. (Total size of 2048 x 2048 pixels! 😱)

We will also dive into Docker Init, Diffusers, CUDA, FreeU, EDSR, OpenCV, and D...

▶ Play video
compact snow
#

can I share my kaggle profile updates in #💬┊general ? if not here .. then where should I post them..? thanks..!

tulip surge
#

bruh did u do this project

shy flume
#

I created an ml model using imagenet v2 on a 12 class trash dataset and since it has achieved a high accuracy, I want to publish it as a notebook. What is, in general, the best format for a kaggle notebook?

lucid summit
dusty siren
#

hey! I am doing this little project to help detect cardiac arrest and cant find agonal breath datasets anywhere... any tips?

finite bridge
#

Hey Kagglers, I'm currently doing the Guided Tutorial for the Petal to the Metal competition. I understand what the error image is saying but I don't understand the long string of letters and numbers that comes after the "object at".

The image without the error message is where I'm assuming the error is pointing me to, but here's the problem: The error wants me to use the DefaultDistributionStrategy but I need to use the TPUStrategyV2 in order for this guided tutorial to actually be meaningful. I have no idea how to make the distribution strategies the same but all I know is that I need to keep using the TPUStrategyV2. Help would be appreciated 🙏

whole gazelle
#

🙂Does Kaggle offer an internship or pretraining program like Revature for job placement?🙂

obsidian pulsar
finite bridge
dusty siren
# obsidian pulsar PhysioBank or CAD

hi! thanks for the quick reply. i checked physiobank but kinda confused about what is on the website… it showed i. one result but when downloaded it was a text file of just random text… also not sure what CAD is

maiden imp
#

Guys your views on this book? For a beginner? What should be the perquisites?

sly inlet
#

the only drawback i felt for this book is it uses tensorflow which is pain in the ass

brisk cargo
#

"Notebook Threw Exception" Error

Hi all, I’m seeing a "Notebook threw exception" error after submitting my notebook for the competition. It runs fine locally. Has anyone encountered and resolved this issue? Any tips would be appreciated!

Thanks,

tender galleon
#

anybody knows the reason for this ? I'm doing a simple project from a book I'm reading and this keeps happening which makes running any cell take way more time than it should

clever bay
#

2 things:
Do you recommend using Kaggle learn
Is it better do the notebook first and use the learn page for reference or just read the page and then do the notebook

quasi holly
modern trellis
#

Hey, I have a few questions regarding hyperparameter tuning for kaggle contest models:

  1. Where do you tune your hyperparameters? Locally or on kaggle notebooks (or maybe google colab?)?
  2. How long does it generally take? How to decrease tuning time?
  3. What's the best library for hyperparameter tuning? Optuna?
plush cairn
#

Hello everyone, hope you are well.
Is there a way to import a public github repository to my Kaggle notebook environment?

dusty mauve
#

Hi, does anyone know how to load in a pretrained image classification model in kaggle ?

heady karma
vivid sand
#

Hey all, this is a long shot, but anyone willing to have a discussion on ethics in data science? This can range anywhere from privacy to bias. And just full disclosure, I'm genuinenly interested in ethics, but this also happens to be part of a data ethics course i'm taking as part of my grad program. Any help or direction is appreciated! I've reached out on many outlets and not getting many hits unfortunately

fickle trail
#

This is new tpic to me

vivid sand
#

I'm not the best person as I'm still taking the class hehe 😅 but from what I know the name pretty much explains: any ethical issues that may arise within the field of data science would count. The examples I mentioned seem to be pretty wide-ranging. The first one has to do with not infringing upon people's privacy, with some examples being health data (HIPAA), Amazon Echo potentially listening to private conversations, Apple accessing biometric data such as face scans, and etc.

verbal crest
#

(This is just a subset of the broader topic of ethics in data science)

vivid sand
#

Thank you Myles. I should have mentioned that I'm looking to interview a data science professional about ethical dilemmas they've faced in their work as part of a homework assignment. I know this is a bit of shameless promotion on my part, but I've been asking around and I'm really unsure of where else to find interview candidates 😓

finite bridge
#

Hey Kagglers, I'm currently doing the Guided Tutorial for the Petal to the Metal competition. I understand what the error image is saying but I don't understand the long string of letters and numbers that comes after the "object at".

The image without the error message is where I'm assuming the error is pointing me to, but here's the problem: The error wants me to use the DefaultDistributionStrategy but I need to use the TPUStrategyV2 in order for this guided tutorial to actually be meaningful. I have no idea how to make the distribution strategies the same but all I know is that I need to keep using the TPUStrategyV2. Help would be appreciated 🙏

heady patio
#

Hi everyone! I want to make a facial emotion expression detection model with deep learning or machine learning. I have made a model which gives 60 accuracy rate with machine learning. The model I have tried was XGBoost. If you have a better model with deep learning can you suggest any? And the main question I needed answers is that how can I make Feature Extractions from faces. Like when you are angry your eyebrows gets closer or your mouth relocates a bit upper.

magic gate
#

Has anyone managed to run Co-DETR (https://github.com/Sense-X/Co-DETR) or DiffusionVID (https://github.com/sdroh1027/DiffusionVID) on colab? They are object detection models. I haven't been able to run them and can't find any way to do so. I wanted to see if anyone in the world has managed it lol

GitHub

[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training - Sense-X/Co-DETR

GitHub

Official Repository of the paper "DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection" - sdroh1027/DiffusionVID

viral grove
jovial leaf
#

Is there a way for 2 people to edit a Kaggle notebook live? Something similar to Google Colab
I invited someone to edit, but he doesn't see my changes unless i hit save version, which isn't 100% live

plucky spire
jovial leaf
clever bay
#

I've learned we use pre-trained models when making convolution networks in Tensorflow, is this the same in PyTorch

#

And how long does it take to train a CNN in Torch

honest vortex
#

Hello. I am working on a problem where i need to change the labels(of training data) of my .png from 0 to 255 orignally , to 0 and 1. I am trying to use a ML model that expects labels in 0, 1 form. I have tried many ways such as threshold , using numpy but I couldnt figure it out. is there specific way to do this?

fossil vapor
honest vortex
#

Im working on retinal vessels segmentation problem and this is one of my eye images with its labels

pale lark
#

Hey i have an issue here, I tried to use a dictionnary but it says that there remains a key error where I actually definied " 6" in my dictionnary, anyone to help ?

thick terrace
#

how machine learning models handling nan values in target prediction in the case of descision trees? i dont understand, imagine the root of a tree in a forest doesnt have value

tribal grail
#

Hey, I just installed Cuda and I think , cuda makes ur Neural Network use GPU's power while training , my gpu is being utilized (20-30%), but it also shows that my cpu is being used fully 100% , is it like that or have it done something wrong while installing Cuda ,
I am running on VS code btw

finite bridge
#

Hey Kagglers, I was looking at the list of competitions on Kaggle and I was wondering how to bridge the gap between only taking Kaggle's micro courses and actually entering real competitions with cash prizes and knowing how to solve the problem of competitions.

verbal crest
haughty wyvern
#

Guys, I'm trying fit my model but, my CPU is always 100% . How can I fix it?

empty mural
haughty wyvern
#

Tks

mint bough
#

Is anyone willing to take a few newbies under their wing and teach us and work with us in a group of 4?

delicate hill
#

can anyone guide me on how to find questions or prompts that can help with analysis in kaggle datasets?

dull sky
#

hey! how do I access the output files after a commit?

thick terrace
#

navigate to explore your csv file wheres is saved

#

i use the everything app

#

Q: can i use more submissions?

dull sky
#

@thick terrace at kaggle? okay, thanks

thick terrace
#

i was rtalking about your pc but if you run notebook on kaggle sure the directory have them

dull sky
#

Right, I found it. It turns out my previous attempt failed before it saved the model results.

#

The current one worked.

#

If I commit 2 (or more) notebooks at the same time, do they share the same cpu?

brittle bison
#

With regards to Kaggle competitions, I am a little confused about the given test set. Would it be bad to train the models on the training set, then test each model on the test set, submit it, and choose the submission with the lowest test error? I understand we don't want to overfit the test data, but the given test data isn't the "true" test data set anyway - the true one is hidden. So are we free to use it?

placid galleon
#

hey i wanted this data set DFL - Bundesliga Data Shootout,but it says that it was for the competition only is there someway to access it ? i want to make a computer vision project

deft fox
brittle bison
reef meteor
#

Hi everyone. I just completed the Titanic competition and and ready to make the submission. However, the help tutorial doesn't seem to match the current UI? Is there a way to actually submit directly from my notebook anymore?

radiant breach
#

Hi everyone, I have a dump question,
in the isic-2024 challenge requires internet to be off, right,?, then if i want to install some libraries, like 'pip install <library>', that won't be possible, what could be the alternative?!!

sullen lintel
dull sky
#

Hey there! I'm looking for EDA books, so far I found one decent, albeit old one. Could you recommend newer books in the topic?
Exploratory data analysis by Tukey, John W. (John Wilder), 1977

#

there are some medium articles, but I'm looking for indepth why, how, what kind of books

sacred sonnet
#

Guys how do you use hugging face's models in a local environment

deft fox
shy flume
#

hey guys i have a question about a dnn implementation

I have the following backward function:
`
def backward(layer, dA_prev):

Calculate dZ by calling the activation_backwards() method from your Layer class

and pass to it dA_prev

dZ = layer.activation_backwards(dA_prev)

m = dA_prev.shape[1]

layer.dW = 1/m * np.dot(dZ, layer.input.T)
layer.db = 1/m * np.sum(dZ, axis=1, keepdims=True)
layer.db = np.squeeze(layer.db)

# Compute gradient for the previous layer

dA_prev = np.dot(layer.weights.T, dZ)

return dA_prev

Calculate dW and db using the input to this layer (ie the activation of the previous layer).

`
but I cant seem to pass the doctests
im wondering if anyone knows whats wrong. if u need extra information in order to get a gauge of the issue, pls ask.
Thanks!

toxic hollow
#

Hi, everybody.
I have a question
I need to extract the abstract of papers not using GPT4, I have to rely on local resource.

from py_pdf_parser.loaders import load_file
from py_pdf_parser.components import ElementOrdering

document = load_file("JPM-2022-Harvey-25-46.pdf")
file_path = 'JPM-2022-Harvey-25-46.pdf'

document = load_file(
file_path, element_ordering=ElementOrdering.RIGHT_TO_LEFT_TOP_TO_BOTTOM
)

So I parsed the pdf using py_pdf_parser, and I'm going to merge the pieces until obtain the compelete abstract.
Now I try to use embedding models for this. But that doesn't work well.
If somebody has solution to about this, please help me.
Thanks!

#

In the case that I have to use the LLM models, the size should be under 2GB.

jade jetty
#

Hi all,

Do anyone have information about if DEFCON will launch a CTF competition at Kaggle this year?

toxic hollow
#

Hi, everybody.
In the implementation of RAG, could you tell me the challenges and solutions to that?
And if you provide the references, I'm very thankful for that.
And I read papers where it uses BERT model to retreive the necessary data.
I want to know that it is useful for RAG, now. I think ChatGPT4 is perfect for this task.
So I want to know about the usage of LLM models and Ml models in retrieval process.

brittle bison
#

Is there any benefit to multiple submissions to a competition? Isn't there a danger of overfitting if you are going to choose your best performing submission on the leader board? Or do people do it as somehow an approximation of generalization error?

feral coral
#

hi guys, does anyone have any experience with roboflow models? I'm doing an ML project for the first time and I was instructed to make my dataset and train my model there, but I'm lost on how to proceed further. if anyone has any experience, please lmk so I can consult you. thank you in advance

hushed dirge
#

Hi Everyone, what would be the best data science learning path to get a job as Junior Data Scientist Role ?

deft fox
# brittle bison Is there any benefit to multiple submissions to a competition? Isn't there a dan...

There are always benefits to multiple submissions if one knows how to use the information properly. There is no solution that fits all scenarios, but in many cases it is possible to figure out how well public leaderboard (LB) scores correlated with hidden scores. If they do, then one can trust the LB. If not, is is necessary to develop a rigorous cross-validation (CV) scheme and stick to that. Come to think of it, it never hurts to have a good CV scheme, but sometimes we can trust the outside information as well.

deft fox
shadow arrow
#

hi, i've been working on playground series churn dataset classification
i have done basic data cleaning and OrdinalEncoding
using XGBclassifier with 150 estimators gives me an accuracy of around 75
how can i increase the accuracy of the model

#

i have also tried using the Pytorch
it shows an train accuracy of 78%
but when i see the actual output its either 0 or 1 for all the predicted values
how do i fix this

wanton orchid
#

Anyone familiar with NoBackendError in librosa ? The person who used code seemed to be fine but when I used exactly the same code I got this

deft fox
quick crag
#

Radhe Radhe buddies, can any one suggest me some projects or sources or anything to upskill my self in data science, analytics and machine learning further and some content to add in my resume. tHANK YOU

dark raft
#

I have a quesiton regarding EDA. Should we split the data into training and test set and only do EDA on training set? I've seen some articles say that this can prevent over fitting and data leakage.

clear stag
#

Just a thought. Does anybody feel that AI (i.e. ChatGPT, etc.) is completly overvalue... pure smoke?
It feels there is a FOMO but when you try to do some staff is not so great

deft fox
dark raft
slow narwhal
#

Hi !
I was wondering if you could win medals even if the competition has already ended and you beat the bronze score ?

sinful root
spring junco
#

How can I get badges from Kaggle?

empty belfry
#

Can I ask here some feedback about PC hardware specifications?

raven thunder
#

hi , a kaggle grandmaster nischay dhankar alumini of my college gave a session on kaggle. i dont have any background in coding . I am very fascinated by medical imaging . what could be the path?

wraith sun
#

Hi, I'm new to machine learning in general and I would like to ask where do I start? Like which specific math should I study first?

#

I want to be able to understand what is actually happening behind the scenes every time I train a model. Maybe with this I can make it perform better

torn vector
#

Is anyone from here got 100% accuracy on Titanic unseen data ?
I always wonder on kaggle leaderboards how people get accuracy 100% where I am struggling with 86%

drowsy solar
#

Have some valuable course or tutorial for the new start?

meager bone
toxic hollow
#

Hi, everybody. I have a question.
I want to make a method to architecture the neural network for given real problem.
Is this possible?
So, I mean can we make the certain arhictecture of network based on neuro science?
Please help me overview of this and methods.
Where I can find the proper references?

drowsy solar
meager bone
drowsy solar
silver bear
#

Hi Everybody,
If you are interested in conducting academic research in the fields of generative AI, NLP, and XAI, Please do not hesitate to contact me. 🙏

shut prairie
#

Hi, I've been trying for a couple of days to submit a notebook to a competition and I cannot do it because it says my notebook is using a non-versioned dataset. I have tried multiple times to pin the dataset to a version, but after I open that window it simply doesn't show anything, it just says "Loading…", although I've left it some time to load. My teammate has the same problem. I have also tried to download the dataset through the Kaggle API but it fails.
Any suggestions that I could try?

verbal crest
worn herald
# shut prairie Hi, I've been trying for a couple of days to submit a notebook to a competition ...

Hi Malina, sorry for the confusing UX. We need to do a better job of surfacing the reason for the failure. In this case, the problem is that the dataset you're trying to pin a specific version for (https://www.kaggle.com/datasets/gordonyip/binned-dataset-v3) is an "Unversioned" dataset, meaning that the creator of that dataset has determined that they only want the most recent version to be accessible to others. This may also be why the API is failing (though I'm not sure which command you're running/is failing). So the options at this point would be to:

  • Create a discussion on the dataset requesting that the creator update the settings for the dataset to allow access to all versions.
  • If the current version of the dataset is what you want to pin, download the source and re-upload as a fixed dataset under your control. You could do this via the UI or if you prefer the API, I can help you figure out why the API isn't working--I just need to know what command you're running.
merry thunder
#

Guys i am a beginner in kaggle and this might be an stupid question but I pressed file, went to upload input and uploaded a dataset and named it taxis-dataset1. Then when i enter it into the code it says file not found. Why is this?

azure dove
worn herald
merry thunder
#

thanks guys

warm umbra
#

Hello Everyone , I want to Analyse Log file with ML

I have some timestamped log files from controllers, drivers, etc of a device. I have seen some error codes in the log files whose causes are not known. I would like to analyze these log files and identify patterns, such as whether errors occur sequentially or if one error depends on another. Errors could also be triggered by other errors, potentially even from one day ago or just a sec ago . What could be the best and simplest approach to this? or if there are solution of similar problems plz let me know

jagged spear
#

Hello

This is a very basic question which in my defence will probably be easy to answer, hehe.

I'm a soon to be second year AI and Datasci student engaged in the RSNA 2024 Lumbar Spine Degenerative Classification purely for the learning curves.

https://www.kaggle.com/competitions/rsna-2024-lumbar-spine-degenerative-classification

A peer of mine, perhaps correctly, says that we have to split the images into training, test and validation classifications. He wants to do this using code that randomly selects images and puts them into any one of the 3 categories.

However the competition already presents testing and training datasets with, I'm sure I remember correctly but couldn't find the documentation that details it, a final unseen set of images that it performs the classification on so as to determine the effectiveness of the model.
Also nowhere in the Efficientnet sample can I see anything that does that classification.

I think I am right here in that in terms of testing and validation the images are already classified and it's only through a dictionary that some of the images need the conditions and plains added to them.

Thanks for any and all help, any clarification will help a great deal.

zenith lodge
#

Does anyone work in or have experience with fraud detection? I'm super interested in the topic and have a lot of questions

torpid dock
#

Guys where is the auto complete settings in kaggle notebook? I need to complete step by step code but if I press tab all codes written at the same time

obsidian pulsar
hard gust
#

Hi - how to create a Team for challenges and add team members? It seems the Teams link for challenges (titanic and house price) have expired: https://www.kaggle.com/c/titanic/team https://www.kaggle.com/c/house-prices-advanced-regression-techniques/team

terse badger
#

this is the cleaned and merged version of the columns of title bullet_points and description

#

I am unable to understand what preprocessing or feature engineering to do

#

Can anyone suggest something to start with?

toxic mantle
#

I want to ask if I need to reinstall the dependencies every time I restart?

fervent glen
#

I'm trying to do the Intermediate ML course, and I'm dealing with this error in the missing values exercise. Any ideas?

fervent glen
wild garden
#

hi, this might be a stupid question but do you need to become a data analyst first before becoming a data scientist?

fervent glen
# wild garden hi, this might be a stupid question but do you need to become a data analyst fir...
Reddit

Explore this post and more from the datascience community

Is data analyst a pre-requisite to become a data scientist? In this video, we are discussing common skillset between the two roles and what makes them different. What do you think? Share in comments if you think data scientist should first become data scientist.

My Self-Taught Data Science Journey: https://youtu.be/34r9OwjysDM
How I Would Lear...

▶ Play video

On the fence between choosing to learn data science vs. data analytics? Learn why analytics acts as a prerequisite to being a data scientist.

cunning thunder
#

You are working on a binary classification ML algorithm that detects whether an image of a classified scanned document contains a company’s logo. In the dataset, 96% of examples don’t have the logo, so the dataset is very skewed. Which metric would give you the most confidence in your model?

Im torn between Recall and F1 Score, which is right in this scenario I lean towards recall more and my reason for that is that f1 gives recall and precision equal weights, and since my positive class is only 4% of the data and catching the positives is the priority recall makes more sense here. I would love any help thanks!

worn gale
#

Well experienced data scientist, do you guys always write all your python code from head?
Or you use online tools or AI as we the newbies

rich vortex
# cunning thunder You are working on a binary classification ML algorithm that detects whether an ...

F1 Score offers the harmonic mean score between Recall and Precision. Recall measures the true positives and is important when a model is susceptible to lack of positives, which seems to be your case. So, from that perspective, recall seems the sensible option. However, I suggest applying both metrics as they provide quantitative assessments of your model that describe the strengths and weaknesses. Thus, giving you more diagnosis options.

real ravine
#

How do teams typically collaborate when working towards a Kaggle competition? GitHub, shared Kaggle notebooks, or something else?

dire cloud
#

I am taking a machine learning course that will be about 70% theory for exams. Does Kaggle have anything that I can practice regarding theory problems, or is there any resource that would make sense?

fast crown
#

Does anybody working on project like comprehensive tool like "Social Listening"? or have some experience on such technology?

normal pier
#

Hello Everyone,

I've used the MLC-LLM chat apk which in the below link:
https://blog.mlc.ai/2024/06/07/universal-LLM-deployment-engine-with-ML-compilation

  1. I've used the MLCChat App, it showing the below models alone which is shown in the screenshot, how can I access my own models, which is in my device?

  2. Is there any possible to use my local models in this MLCChat App ?

  3. How can I do this with flutter?

Please help me out in this doubt !!!

plain mural
#

No module named 'kaggle_evaluation'
[2:40 PM]
This is my first Kaggle competition. How can I import files into the notebook to be able to use such import statements?

uncut burrow
#

Hello everyone, I am new to Kaggle. How can I download csv files on kaggle ?

tepid shuttle
#

for any kaggle competition we need to have internet disable, now i need to pip install some dependencies or updated library, how do i permanently store them in notebook, so can access those libraries in internet access disabled mode?

verbal crest
gleaming cradle
#

Hey I'm working on a dataset of Forest fires, model has to predict the probability of fire in the forest after getting some inputs from users. I've done most of the part but I'm getting a high MAE. Let me know if anyone can help!

spring cobalt
#

can anyone help me ? whenever i went for submit it show inference error and got reject

empty mural
#

Hello everyone

Please I have a problem.

I fine-tuned Gemma 2 on the Kaggle notebook now I would like to save the model fine-tuned and share it on Kaggle Models in my account. I don't know how I can do that.

Please somebody can help me 🤲

uneven flint
#

Hi

ripe blaze
#

Someone that can explain me how do I know my model found the global minimum when tuning the weights? Like I just learned about the gradient descent, and its looking for the global minimum. But there are also local minimums... How do I know the gradient descent found the global minimum over the local minimum.. There a way to visualise it? proof that my model using the best weights possible?

daring kiln
#

does kaggle changed its UI ? the notebook is now confusing with big words , can some pls help ?

ripe blaze
#

lol try ctrl + schroll down?

open dock
#

Hi . I am absolutely beginner in kaggle (just started) . I have a core i3 6th laptop with 4gb ram in linux system would that be enough to continue with kaggle or do I need a graphics card and more rams to operate minimally ? I am on a budget so I need some suggestions

eager fossil
#

Hey I tried kaggle course from start to learn ml

#

But I get stuck at some points

fading swift
#

Hello everyone, I’m looking for suggestions on how to become job-ready for a Machine Learning Engineer position. I’ve completed my certification and worked on various projects, but my resume hasn’t been getting selected. If anyone could share a strong resume example, I would greatly appreciate it.
Although I have over four years of experience in Operations, I am eager to transition into Data Science. I have been searching for a job since December last year, and I would welcome any ideas on how to land a position and kickstart my career.
Additionally, I’m interested in projects I can build to enhance my portfolio and skills I should focus on to become job-ready as soon as possible. So far, my projects have been fairly generic, and I want to stand out.

analog shale
# open dock Hi . I am absolutely beginner in kaggle (just started) . I have a core i3 6th la...

Those stats are not really great for data science on your local machine. If you want to do serious experiments on your local machine, a GPU + more RAM + a decent CPU (depends on how much data preprocessing you need) is a must. I would say, your current laptop will not suffice for more than basic exploratory data analysis.
But luckily, Kaggle does provide everyone with 30h/week of GPU and TPU compute and there are more cloud providers with free/affordable GPU quota (e.g. GoogleColab). Thus, I would not rush with buying a new setup, but instead try what you can achieve with the provided free cloud resources.

frank granite
#

hi all, im pretty new to using kaggle, but i have a few notebooks working fine but my current notebook is giving me an issue when trying to !git clone:

Cloning into 'testing'...
Username for 'https://github.com':

and it just hangs here

#

ok solved this - misstyped the url

stray lily
#

#help Im a beginner in ml . So how can i learn ml? I know i should start from preprocessing of data. But i don't have resources availabe for that . Can you guys share me the resource link?

dull canyon
#

Hello 👋 I'm training and autoencoder on signal data using LSTMs for anomaly detection.
For normalization I'm using sklearn.StandardScaler. For . fit(), should I only pass in the cleaned data without any deviating signals or the entire data?

final galleon
dull canyon
final galleon
# dull canyon Yes that's what I've been doing. But on deployment after scaling I'm getting lar...

emm I never used to work on models in deployement. Maybe I am not the right one to give the right answer. However, If I well understood what are you saying, the predictions of your model should follow the scale of your label. Therefore, the output/predictions should not be in the scale [-3, 3] (Scale of the standard scaler)

Regarding the performances, sorry man I don't have any Ideas, I don't know how to help. In fact, it happened to me once getting low performances after training and testing compared with my test scores. I assumed that this was due the fact that my model does not generalize well (I didn't fix the issue until now)

final galleon
# stray lily #help Im a beginner in ml . So how can i learn ml? I know i should start from pr...

I my hamble opininion pre-processing is not an ML skill. It is in fact required to prepare data for ML models. Therefore, you don't need ressources/courses on pre-processing. You need just to check the state of your raw data, spot the "noise", the pre-processing you need to do. For instance, maybe you need to convert the "15 years old" to 15 for the age feature.

This kind of pre-processing require only basic programming skills.

thin pelican
#

hello can anyone please help me with AI related end degree bachelor IT project idea or even a walk-through?

young fern
#

Hi, I wonder how much time should I spend on tunning the model, like finding the best parameters? I have spent some time doing feature engineering, but not sure should I stick with my current model or keep trying new models / ensemble methods. Many thanks

cerulean sage
#

who is good with SQL? I have a little problem, i have a table called companies but it is not connecting to my database

vital venture
#

Hey,I want to learn machine learning using python. I have completed the beginner stage.Is there anyone who can guide me?

steady vessel
#

Hey everyone! I have a competition coming up, and the first round is focused on data structures. I'm honestly starting from scratch and not sure where to begin. I'd really appreciate any suggestions for lectures or resources to help me get started. By the way, I prefer Python, but if you think C++ or another language would be better for this, please let me know!

solid pulsar
#

Hi, Who can help with the task of learning how to determine the value of a diamond?

misty hearth
#

Hello Kaggle Comm, i have a question, when i save a version of my notebook that have outputs files, and then i cancel it, to save some hours
if i go back to this version & pressing edit, how i start this notebook with those output files 🤔
Info i'm seeing :
**Clicking Output Message says **
Notebook canceled
View the status under the logs tab
On Logs tab Environment
i can click Latest Container Image and there is the output size 😮
how i start this notebook with that data ?

Thanks !

solid pulsar
#

Hi, I have a question, who has worked in ‘Diamonds Price Prediction’, I need to get MAE < 200, but when I derived the correlation matrix, I saw that Price has many args with minuses, please help

winged slate
#

Here's a fun one: can one learn to use and build ML models without having linear algebra and calculus under their belt?

real patio
#

Hello, @tardy lodge the requirement for those seeking for kaggle mentorship is high sort of

tardy lodge
real patio
#

Thanks Ma @tardy lodge

empty bluff
empty bluff
wind silo
graceful axle
#

I'm getting submission file not found error even tho everything seems alright could somebody pls help
kindly help me with this somebody

uneven fractal
#

Hello! I’m an undergraduate student researching image reconstruction using a diffusion model. I know this is not Kaggle-related, but I'm encountering an issue with my diffusion model and wanted to seek some advice.

In my research, I’m trying to reconstruct one type of brain image from another using a diffusion model. Before using the diffusion model (when I directly used a U-Net for reconstruction), the training worked well. However, when I switched to predicting noise instead of the image itself using the diffusion model and then performing the denoising process, the training doesn't seem to work. The training loss decreases, but metrics like PSNR and SSIM (which evaluate how well the image is reconstructed) do not improve at all or even degrade.

Is this about my dataset being too small (I have 800 images in the training set)? I set the noising steps to 1000.

Has anyone experienced something similar when working with diffusion models? Any advice would be greatly appreciated. Please help..

Below is the code I use for training.

t = diffusion.sample_timesteps(n).to(config.device)
x_t, noise = diffusion.noise_images(latent_target_image,t)
predicted_noise = Unet(x_t, conditioning_3d_image, t, diag)
loss = L2(predicted_noise, noise)

sharp sun
#

Hello every one can anyone tell me how exacly i can start and move forword for learing ai and ml.

rain oar
sharp sun
#

Correct me If am wrong because I am new to kaggle

#

But the main question is what are the correct prerequisite I should learn and understand so that i can participate

#

tech like chatgpt, githubcopilot and text to video generation really amaze me

rain oar
#

Go look at Kaggle Learn "Intermediate Machine Learning" you will understand how to submit to competitions in two quick lessons

#

You will be able to understand the code very quickly and the general format for a basic ML model. Then iterate

#

If you're curious about chatGPT and other LLMs, then look up videos/tutorials on how to build it from scratch

#

It's all software dev: sometimes you need to look something up to understand it, sometimes you need to actually go study and read up a lot more to understand it, but in the end it's doing projects and finding out what you need to learn to progress with your projects

#

What advice would you give someone new to software dev on how to begin? 😄

sharp sun
# rain oar What advice would you give someone new to software dev on how to begin? 😄

for software dev I would aske to build some learning projects like blog app, chat app, video streaming app and in software dev most important is reliability , scalability, availability and security. And for senior dev go through system design.

And try to avoid tutorial hall and build app no matter small or it has any purpose.

All the ai and ml tech are used in the form of software so if software dev is must have skill to build ai ml apps

rain oar
sharp sun
#

yes sir thank you and can you suggest some very basic ai ml projects I can build and learn on the go.

rain oar
#

Look up learning projects online to get a list, pick one that sounds the most interesting. Or just do kaggle competitions, as they are also considered learning projects. Projects in "Getting Started" and "Playground" categories are good place to start

wintry birch
#

Hello guys,
Do you know of any interesting datasets to refresh my pandas skills? I am mostly a beginner that have done two ML projects.

finite galleon
#

Hello, I am working with log data analysis. Most of the work in log parsing that I have found till now use more or less similar methods with very heavy ML/DL algorithms or heuristics.
I wanted to know more about the log analysis using premitive methods like PCFG parser or some unsupervised parser which takes the whole data into account.
My ultimate goal is to generate good quality templates. Please point me to any resources if you know.

split orchid
sharp sun
#

say it as exploring

split orchid
# sharp sun I am not pivoting I am just learning new tech

cool then, you can start with some problem statement and then traverse back from end goal to data
Example -
step1 - you want to identidy if a given pic is cat or dog
step2 - you get to know that it is done by some model ( which is an artifact )
step3 - how was that model built
step4 - what does training mean and what is the role of data here
step5 - get the data

after this go forward again with the help of some already existing guided work- like kaggle notebooks

#

learning from scratch helps - but since you are exploring - this approach is more practical and help you even more in hands-on

sharp sun
split orchid
near plank
#

Hello everyone 🤗

slim storm
#

Instead of showing "Copy path" it is showing this in my kaggle notebook, can someone help ?

split orchid
split orchid
split orchid
slim basalt
#

WHHHHHHHHHHhY

why private competitions, can't be set public later?

lilac venture
#

hello i was working on the housing regression competition and i am fairly new. I was wondering for a column such as Street, then it has a range of string values such as gravel, pavement... how should i encode this

halcyon spindle
#

someone has an automatic1111 notebook working for kaggle?

lucid wren
real patio
#

Hello @wind silo i have DM you the questions still waiting your response thanks

fierce gulch
#

bonjour je suis francais , je me forme au dev , je suis débutant sur l'utilisation de AI par API, je veux aller plus loin que le simple chat de chat GPT ou autre chat assister par AI , je cherche un binome ou faire partie d 'une équipe et apprendre et passer des nuits blanche a me casser la tête ( je parle pas englais donc je me démerderais par écrit merci)

random narwhal
#

I wrote a classification program for Logistic Regression, but why does the cost function (j(w,x,y,b)) become larger after gradient descent

fierce gulch
random narwhal
orchid spoke
#

Hello everyone,

Can anyone help me figure out if it is possible to link a local codebase and use it in any hosted competition?

glass bronze
#

do i have to use the kaggle notebook ?

spiral mesa
#

Hi all,

I recently saw a blog post about fine tuning paligemma for receipt scaner

I have few questions :

  • how i can create my dataset, i can use a tool to define the box and the text ?
  • i ran the model with gradio on my local but it's so slow. Why ? I need to convert to model to a spécific format to use in local ?

Where i can find tutorial or which resources can i read to learn it easily ?

Many thanks

zinc cloud
#

Hello everyone, I am an engineering student (mechanical) and I am interested in Math, AI and coding. I have been learning ML algorithms and EDA little by little for some time and I want to learn more. I don't have a particular goal in mind like I don't know if I want to work as an ML engineer for a company or maybe do research work, I am just learning and exploring this field. What would be your advice for someone like me in order to learn and level up in this field?

wooden folio
#

hey, I don't know if this is the right place to ask this question but anyways, I have learned a little of ML, and I want a ML job, but most entry level jobs in popular companies require at least 2 years of experience, but the problem is obviously, I want the job in order to gain experience, I am willing to take a part time remote job for as little as 10k dollars a year just so I really get into the field, or I'll even work for free and I don't if "work" is the right term for it, but I want to learn from someone who knows more than me .

wet fjord
#

hi, im new to ml and ive been trying to make a prediction model to identify handwritten numbers. ive just been kind of stuck since my models accuracy is always 10-13% and all the tips ive seen online have been exhausted for days. its kind of my first project in this stuff, so sorry if i seem clueless. Ive had great results using pretrained models which is why i wanted to go a little further and make one but im cooked.

im just noticing the beginner digit section, and it seems easy to just look at it for the answer. but i dont really get what the others did, mainly looks like they added more filters and used other tactics

if anyone can answer, what models are generally recommened and why, ive only touched adam.
and the filters and why u use them, i used 32,64,and 128. with some dropout. but i noticed more filters, as well as followups for them. also heavy dropouts. idk it seems crazy, maybe im a bad teacher

final question: why dont i see best model used alot, is it not good longrun? sorry

flat swan
#

someone can help me?

willow relic
#

Hello everyone,

I'm an ML engineer apprentice, currently working as an intern for a startup on a GenAI project. I've done some academic projects before and now I want to gain real world experience by working on real projects. Just like @wooden folio , I'm looking for a part time remote job to work even for free. PS: I'm motivated and willing to get my hands on stunning projects.

If you are interested or need additional information, feel free to DM me.

Kinds regards,

Stephane

timber ocean
#

I am curious about how you guys manage computing requirements?

Whats your set up?

potent niche
#

Hi guys 🙂 Hi, I am working on a prototype of a motion sensor with an api to extract information already labelled via wifi on real time, like a data collector but smaller, so it doesnt biased the movement. Do you guys have any suggestions based on your experience any kind of additional feautures?

empty bluff
hoary warren
#

hay when is the sgd book going to git fixed thars a problum with seabor 😩

tender light
#

Hi everyone!

I'm super new to all of this stuff, and was attempting to make a simple model that can detect the position of a basketball in an image! I was attempting to use https://www.kaggle.com/datasets/trainingdatapro/basketball-tracking-dataset/data this set of data to train, and while attempting to set everything up(just plot the bounding box for the images I already have) and was noticing a small issue when attempting to rescale to points to image given in the files.

If you look at the xml file for the data, it says the original size of the image was 1280 by 720, yet on some of the images the values for the boundingbox seemed incorrect, as they were extremely small.

(these are x1, y1, x2, y2)

For example, the x values for the second image were 966, 408, 987, 429. These just seemed off for an image that looked like this(can't send here can send in dms)

Also for like the first image for example, the basketball is just outside of the image, how does that work?

please help either here or in dms!!

tender light
#

🙏

subtle willow
#

HI Everyone!, hope everyone is good, I'm new to Kaggle
Just started with the Titanic tutorial and am a bit stuck
For "Part 3: Your first Submission"

i managed to get the correct output for % of woman who survived
i also managed get the correct output for % of men who survived

But when i add the last part in order to get an output of "Submission was successfully saved!"
i get this error message:

NameError Traceback (most recent call last)
Cell In[16], line 7
5 features = ["Pclass", "Sex", "SibSp", "Parch"]
6 X = pd.get_dummies(train_data[features])
----> 7 X_test = pd.get_dummies(test_data[features])
9 model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=1)
10 model.fit(X, y)

NameError: name 'test_data' is not defined

Any advice?
Thanks!

thorn summit
subtle willow
rain oar
tender light
rain oar
smoky scarab
#

Hello. I'm starting to use kaggle for the first time.
Even though my PC is connected to the Internet, Kaggle notebook cannot connect to the Internet. Below is what I tried. The result is that the internet is not available. How can I connect to the internet?

#

import requests
try:
response = requests.get('https://www.google.com')
if response.status_code == 200:
print("OK")
else:
print(f"OK_But: {response.status_code}")
except requests.ConnectionError:
print("No")

livid kestrel
#

Hi. I think the last time I submitted anything on Kaggle was probably the first ever Jane Street competition, about 6 or 8 years ago. This is a pretty naïve question, but for the submissions, does our submissions notebook have to also train the models we use, or is it possible to train the models separately, load them into the notebook and just do the inference/predictions?

#

Actually I'm specifically asking for the Jane Street competition so this might not apply to other competitions?

last fern
smoky scarab
# last fern

Since I want to create content that supports Japanese, I would like to use "rinna/llama-3-youko-8b". First, you need to install transformer, and you need an internet connection. How to do it without using the Internet?

last fern
smoky scarab
sterile barn
#

https://www.kaggle.com/datasets/jakubkhalponiak/phones-2024
https://www.kaggle.com/code/jakubkhalponiak/a-study-of-smartphones-available-in-2024

I have webscraped phones from gsmarena.com and published a notebook and the dataset i would apreeciete any feedback on this as its my first time posting anything on kaggle

last fern
languid fern
#

Hi, I am currently going through the book Deep Work by Cal Newport.

Now I am interested – how does your learning/working process look like? What are your habits? 🙂

For example, you participate in Kaggle competitions or watch/read ML tutorials.

smoky scarab
last fern
#

check this out, I have reproduced your problem

smoky scarab
smoky scarab
last fern
last fern
smoky scarab
smoky scarab
#

I am trying again.
The GPU is not working, only the CPU is working.

#

And finally this happens. How should I solve it?

tranquil gull
#

Hi Guys,

I’m a CS student currently doing my final work on the course. The problem i’m addressing is to predict the if a stock is gonna go up or down so that i could either Sell or Buy.

I was wondering if anyone have ever get in touch with a dataset that fills this description?

Thanks in Advance!

pseudo storm
#

Why in most cases people use MSE and not MAE, isn't MSE only better than MAE then we want less big errors even at cost of having more errors on average? Is it usualto care about big errors more than on errors in general? For me it looks like MSE is a bit niche and MAE should be used in most cazes instead

tender dagger
#

hello, i am an absolute beginner in machine learning. i just learned about linear regression and error metrics and wanted to get my hands on a small project using the techniques i learnt. So i started with the famous boston-housing-prices dataset on kaggle and would appreciate if you could take a look at my code: https://www.kaggle.com/code/khalidhelmy55/boston-housing-prices and guide me on what is missing or what could be better done..
according to the metrics i calculated the model is not performing good.

compact snow
#

Question 1:
I’m running my error metrics locally (e.g., RMSE, MAE) on my validation set while participating in a Kaggle competition. Since the test set lacks target values, can anyone help clarify which error metrics I can use to assess my model locally, and if possible, could you list some commonly used ones?

Question 2:
Also, I’m noticing that the error metric scores I compute locally are different from the Kaggle leaderboard score. How are these related? Are the scores directly or inversely proportional, or is there another relationship I should be aware of? Any insights would be greatly appreciated!

rain oar
drowsy plank
#

Hi I'm looking to improve in the field of Marketing, is there any type of dataset or competition to make a recommendation system? thanks

velvet tinsel
#

i am an 3rd year CS undergrad. i have Intermediate knowledge about all data structures. i study hard and take effort. look as I need DSA to achieve any high package placement. i am practicing my DSA Skills in Leetcode. but it is going difficult for me to figure out the logic by myself (specially medium & hard level problems). and even if i saw the solution i understand it but it does not fit in my mind perfectly. so why this is happening ? and how to tackle this issue ? will i ever get better ? you can also guide on what kind of approach/steps should i follow while solving a particular Question ?'

wraith sparrow
#

Anyone using allennlp ? Is it not maintained anymore by allenai ?

hollow nova
#

hello. I am currently working on Pandas section on Kaggle and I got a question. Can I ask it here?

#

nvm thank you i figured it out

stone anchor
#

I'm trying to learn more about Bayesian optimization techniques for machine/deep learning. Any good YouTube series recommendations?

lost crag
#

hey guys, I want to become a data analyst and perhaps a transition to data scientist later on, Would it do by just focusing in python and R? the programming for this field seems very different compared to other fields like web development, etc.

rain oar
lost crag
#

I´m aiming to go pro on this stuff.

rain oar
#

start small, learn the basics, take on challenging tasks that make you get our of your comfort zone, learn more, take on more projects, learn more, rinse and repeat until you're a pro 😄

earnest cliff
#

Where can I find conferences or talks about data and all that world?

lost crag
#

is there a guide about the math I have to learn that you guys recommend? I´ve been learning on my own, but it won´t hurt to learn from well known free resources

errant glen
#

Lin alg

#

And calc 3

#

Obviously

#

More complex versions of that combination exist

#

But

#

Most ML require those two