#❓┊ask-a-question

1 messages · Page 2 of 1

muted cliff
#

They should help you with that

golden nova
#

Yea, completed a few of them.

muted cliff
#

Including those that talk about different models and cross validation ?

#

For instance did you try and start the last playground challenge ?

golden nova
#

Nah.. just the introduction to ML once..

#

Okay, so if I specify my problem,

muted cliff
#

Yeah it's better if you tell us where you are stuck 🙂

golden nova
#

I've a dataset to have decide to build a model from it. But not getting where to start from.!?
Like what to do with data, what all labels to choice etc.

#

Hope you are trying to decode what I'm trying to say!?

muted cliff
#

Yes I'm here no worries

#

So the first step is to look at your data, determine what kind of data is here

#

Are there continuous features ? Categorical features ?

#

(categorical means it takes a small amount of values that don't really make numerical sense. For instance the gender is a categorical feature, while the balance in your bank account is numerical because it can take many values)

golden nova
#

Okayy...

muted cliff
#

Then you need to determine what you want your model to do

#

is it a regression or a classification problem ?

#

So do you need to predict the price of something, or classify your data into classes

golden nova
#

It's for classification if I'm not wrong.

muted cliff
#

Ok cool

#

Then you need to look if there are any missing values

#

If you want a first model up and running quickly :

  1. Remove all rows with missing values (data.dropna())
  2. Keep only continuous features (those that already have number data types)
  3. Use a model such as XGBClassifier
#

Then you can look at all three of those parts and make your model better

#
  1. How can you keep the rows with missing values ? Find ways to impute values in those rows. There's a class about that on Kaggle
#
  1. Try encoding the categorical features into continuous features. There are various ways to do that and there's also a class about that on Kaggle
#
  1. Try making your model better : maybe another classifier is better, or you can make hyper parameters better, or you can do advanced things such as combining different models to get better results
#

those are the 3 basic things you should always do after your first model is up and running

#

anything unclear ?

golden nova
#

Okay.. not understood it fully.. but got the building blocks to it.

#

Thanks for the advice.. I guess will be more clear when I implement it once.

muted cliff
#

yes

#

I strongly suggest following the "Intermediate Machine Learning" class on kaggle

golden nova
#

Okay buddy.!

#

On it!

lapis totem
#

Might be wrong channel so aplogies if that the case, but what does **random_state **do this in line?
smote = SMOTE(sampling_strategy={1: desired_fraud_cases}, random_state=42)

If i use 0, it will always split the test and train data the same, but with random it will always be different? is that correct?

desert tusk
#

Hi all, I am looking for a multilingual model with long sequence length for classification task. Any ideas?

wicked aspen
#

I understand that we can use the command line API to submit a submission.csv. But what about code competitions? Can we submit a notebook via the command line?

muted cliff
frozen sail
# wicked aspen I understand that we can use the command line API to submit a submission.csv. Bu...

Yes, you can submit a notebook to a Kaggle competition via the command line. Here are the steps to do so:

  1. First, you need to install the Kaggle API by running the following command in your terminal:
!pip install kaggle --upgrade
  1. Next, you need to provide your Kaggle credentials using the file kaggle.json or setting some environment variables with your Kaggle credentials. You can get your Kaggle credentials from kaggle.com > 'Account' > "Create new API token". Here is an example of how to set your credentials as environment variables:
%env KAGGLE_USERNAME=abc
%env KAGGLE_KEY=12341341
  1. Finally, you can submit your notebook to a Kaggle competition by running the following command in your terminal:
!kaggle competitions submit -c <competition-name> -f <notebook-name>.ipynb -m "<submission-message>"

Here, <competition-name> is the name of the competition you want to submit to, <notebook-name>.ipynb is the name of your notebook file, and <submission-message> is a message describing your submission.

Please note that the kernel must be of type "Script" and not "Notebook" for this method to work ¹²³. I hope this helps!.

Source: Conversation with Bing, 05/01/2024

delicate lichen
#

Do papers ever come out of these competitions or is it mostly just large ensemble models that have had the entirety of AWS thrown at them?
Can authors even make papers or are the solutions always fully closed?
I've never competed, so I don't really know what the environment's like.

wise forge
#

what does it mean for a competition to be code competition?novice

hey kagglers, i’m so confused now about the rules of what are so called “coding competitions”. i come across this coding competition Bengali.AI Speech Recognition, the first solution clearly stated that he used 8x 48GB RTX A6000 for training, does that mean the rules of the competition has changed, or did i miss something?

i will state my understading of code competition to check if it is correct, the way i understand it is that in code competition you are not allowed to use any compute power other than the one provided in kaggle notebooks. further more your code and infrace should not exceed 9 hours of running time in order to produce. this is to make the game fair for those GPU-poor like myself.

delicate lichen
#

At least, that's my reading of the link.

frozen sail
#

yeah what vibe says is right

frozen sail
#

You can find many solutions in notebooks in Kaggle too, people share a lot
Specially in playground-series competitions or when the competition is finished

muted cliff
#

Can you beat auto ml in data science ?

#

I'm talking specifically about the modeling part : which is parameter tuning and ensembling

#

It feels like modern auto ml tools use enough models and smart ensembling that it makes it difficult to do something better by hand

#

For instance during the current challenge, I spent quite a bit of time finding many parameters for my models, both myself with optuna and by stealing them from other notebooks. Then I just trained an auto ml suite for ~ 12 hours on my laptop and it beat what I did by quite a big margin in cross validation (similar on public leaderboard but it doesn't mean much I think)

coral tartan
#

Hi, I'm still pretty new to the world of data science and machine learning. I have been working on random forest model for regression/classification problems. I have now started learning neural networks.

My question is which model is better to choose?

Thanks in advance!

muted cliff
#

For tabular data (which is the case for example for playground series contest and a lot of data science) it's hard to beat gradient boosting algorithms such as xgboost, lightgbm or catboost with neural networks

#

I think it's good to use when you do ensembling

#

basically you train many models, your best models will usually be xgboost or lightgbm, and then you add some less good models such as neural network, random forest and so on. Then you apply algorithms to ensemble predictions of all those models to get better results

#

but if you're starting I do not recommend trying neural networks

#

If you want to read more about it, I suggest reading the writeup of rank 2 in the last playground series which was a multi-class prediction problem :

https://www.kaggle.com/competitions/playground-series-s3e26/discussion/464887

And if you want to deep dive, you can read about things mentionned here
https://sebastianraschka.com/blog/2022/deep-learning-for-tabular-data.html

Sebastian Raschka, PhD

Occasionally, I share research papers proposing new deep learning approaches for tabular data on social media, which is typically an excellent discussion sta...

#

To summarize : He says neural networks as baseline models marginally improved results in ensembling
but neural networks were very efficient at predicting classes from the predictions of other models

#

if you have questions feel free to ask @coral tartan

#

there's a lot of concepts here 🙂

coral tartan
#

Thank you for the detailed answer! I will read it through

muted cliff
#

it's normal if you don't understand everything

lapis totem
#

I’m using a RF algorithm to detect credit card fraud, before tuning I have about 95.5% accuracy, after tuning (using a grid search) I still get 95.5% (all my other metrics are the same too) is it normal for it not to increase after tuning? Should I look into other options to increase results?

muted cliff
#

which parameters are you tuning + could we get your code ?

delicate lichen
muted cliff
#

ok thak you vibe !

#

I will try to get the best model as I can without auto ml to learn then

#

My future company won't be happy otherwise 🙂

#

actually

#

how is tuning with optuna any better than using auto ml ?

#

in terms of efficiency

delicate lichen
#

I haven't used either but my guess is hyperparameter tuning vs neural architecture search.

muted cliff
#

When I use optuna to optimize it regularly uses 6 hours total as well if I do many trials

#

Ok maybe for neural architecture search the cost is massive 🙂

delicate lichen
#

NAS is a much harder problem

muted cliff
#

I was more thinking about data science without nn

#

just on tabular data

#

(I know deep learning models can improve ensembles but usually it's quite small)

delicate lichen
#

DL models can be parts of ensembles, be whole ensembles themselves, and can be used to merge classical ML methods. I haven't done much on tabular data, so out of my expertise there.

muted cliff
#

I think DL to do ensemble instead of being part of an ensemble is very powerful

#

people have won playground contests doing that

delicate lichen
#

I straight up don't use classical ML methods in practice because inference speed is critical at work.

#

But they can definitely be useful as they can have better priors than NNs

muted cliff
#

(I'm very new to Kaggle)

#

Is it like xgboost, lgbm, catboost ?

frozen sail
# muted cliff wdym by classical ML models ?

Classical Machine Learning (ML) Methods:
Classical ML methods refer to traditional or conventional approaches to solving machine learning problems that were widely used before the rise of deep learning and neural networks. These methods include:

  1. Linear Regression: Used for predicting a continuous outcome based on one or more predictor variables.
  2. Logistic Regression: Applied when the outcome is binary (two classes).
  3. Decision Trees: Tree-like models that make decisions based on features.
  4. Support Vector Machines (SVM): Used for classification and regression tasks.
  5. Naive Bayes: Based on Bayes' theorem and often used for classification tasks.
  6. K-Nearest Neighbors (KNN): Classifies objects based on the majority class of their k nearest neighbors.
  7. Random Forests: Ensembles of decision trees for improved performance.
  8. Gradient Boosting Machines: Sequentially builds weak learners to improve predictive performance.
muted cliff
#

thanks

#

I would have expected some of those to be faster than neural network huh

frozen sail
#

they are

frozen sail
muted cliff
#

KNN is very slow

delicate lichen
#

If the relationships between data is complex, or if it's """big""" then NNs win in speed easily.

frozen sail
#

fair enough

delicate lichen
#

I'm talking about a production algorithm vs a competition, something with access to PB of data.

frozen sail
lapis totem
lapis totem
muted cliff
#

this doesn't do what you want it to do

#

randint(10, 200) will just pick a random integer once and use it all the time

#

so you're only tuning max_features and bootsrap

#

so it makes sense you don't get anything

lapis totem
#

Oh

muted cliff
#

you should do something like range(10, 210, 10)

#

well this would take ages if you do it for all of them

#

try maybe [50, 100, 150, 200] to start off

lapis totem
#

Okay, so updated code like this?

param_dist = {
    'n_estimators': [50, 100, 150, 200],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth': range(1, 20),
    'min_samples_split': range(2, 20),
    'min_samples_leaf': range(1, 20),
    'bootstrap': [True, False]
}
muted cliff
#

this shoul dwork I think

#

BUT

#

I think grid search tries ALL possible values

#

so this will do cross validation on
4 * 3 * 19 * 18 * 19 * 2
set of parameters

#

this is way too much

#

I would do what I did with n-Estimators to the other ranges

lapis totem
#

Amazing, ill give it try thank you!

slender raft
#

Hello everyone!!

#

I aspire to become a data scientist, i would like if all of you help me!

#

I wanted suggestions on how should i start?

fallow star
#

Hi All! I hope this is the right place to post this. I am facing an issue using Pytorch Lightning with the ddp_notebook strategy (in the trainer) when using two GPUs. Namely, when I call trainer.fit(model, dataset) the program is stuck computing nothing, and the GPUs still. Thank you in advance for any help on this matter!

hearty lark
fast parcel
#

Any good lightweight object detection model, that detects humans?

raven arch
#

I am a beginner, i am struggling with data filtering . However, for null row or value, it is recommended to drop it ? Or make it become mean value or 0?

delicate lichen
# raven arch I am a beginner, i am struggling with data filtering . However, for null row or ...

The answer is data-dependant, the solution varies. If you have a large portion of data as null you may very well want to drop it, as keeping it could bias the model. However, life does not always generate clean data and data with tons of nulls may be all you have - it is best to experiment with all of the above filtering methods. Feature engineering and data pipelining like this makes up the vast majority of a data scientist's job.

elder flower
#

Does private dataset appear immediately after compettition close?

deft fox
lapis dirge
#

My notebook is getting stuck while compilation and the CPU usage showing 100%
Then after sometime the page shows unresponsive

#

Pls help!

raven arch
#

reviews_per_region = reviews.region_1.fillna("Unknown")
count = reviews_per_region.value_counts.sort_values(ascending=False)

I have a question that why this code is not correct

#

And also count =reviews.region.sort_values(ascending=False) / reviews.region.value_counts.sort_values(ascending=False)

#

Is that because of , i didnt modify the original dataframe . I should make an apply on it ?

patent kiln
quaint hollow
cunning thunder
#

are these layers correct, input output wise?

molten sky
#

May I know the reason for this error? I'm unable to figure it out, even the solution code throws the same error...

velvet fox
#

Hi, quick question

in a kaggle competetion, if they provide multiple dataset (example multiple csv's)

abc.csv, xyz.csv but there is another csv named train.csv, so does that mean we should only use train.csv for training the model?

lapis dirge
#

That is not the thing you can concat all of them together according to the datasets and train your model on it

#

Then you can split them in 70 30 and test the model on the 30% of the data

craggy cove
#

Is it possible to NVlink 2 RTX3090 cards together with memory pooling enabled ? If it so What motherboard model should be used ? Shall I be able to use full 48GB VRAM to train large models?

deft fox
frozen sail
#

Hello guys I have a question. So far I've been dealing with only traditional machine learning models where I just imported a library and executed the code

But, in the #🛍┊store-sales-time-series-forecasting I tried training a Facebook Prophet model for forecasting and it took me a lot of time. I guess that if I want to be competitive in most competitions I might have to tried more complex models or even models that require deep learning.

Do I need a good computer in order to be competitive when trying these algorithms? Thank you, I'm quite new to all of this

sleek urchin
delicate lichen
#

And cloud costs are NOT incredibly cheap, at all idk what the above user is referring to or what their perspective is.

#

I suppose compared to buying one outright for a single training instance, absolutely cloud is cheaper. But over the long run, cloud is very costly

#

Never ask a data scientist their cloud bill 🤫

delicate lichen
#

Also knowledge of cloud resources, how and when to use them, and their pros & cons are absolutely things employers are looking for.

#

Just be extremely careful, you can easily get like a $40k cloud bill in a month if you make a mistake.

frozen sail
#

Alright, thanks for the help 👍

placid hamlet
#

suggest best DBMS course availble on youtube to learn SQL and that is sufficient for data science

wispy hornet
#

I am looking for help on a small private kaggle classification project. Pls dm

coral fractal
#

having this issue for getting secrets...

#

error connecting to service... tried it on new notebooks too... will not use secrets for now but.. something to be fixed?

pulsar merlin
#

Hi I am extracting mfcc values of adusiio and want to sendthem for training my model . Saved these MFCC values in csv which is saved in form of string. Now when I want to map these values withmy labels they are not running and giving errors Please help

#

This is how my data is and I am not able to passit into my model

desert tusk
#

I try to load a private hf dataset (gated dataset) in kaggle and got this error:
FileNotFoundError: Couldn't find a dataset script at /kaggle/working/xxx/yyy/yyy.py or any data file in the same directory. Couldn't find '

Any help? 🙂

desert tusk
desert tusk
# pulsar merlin

You have non numerical features, use 1-hot or something similar instead

pulsar merlin
agile vale
#

Does somebody know how to approach this problem? I need some help!

pulsar merlin
#

I want to verify some sentences in arabic do I need to train my machine with seperate sentences or do I have to train each and every word for it. Pls someone guide

craggy zephyr
#

Hi everyone, I'm looking for data for Load Forecasting of power systems. It's my final year project. Can someone help me in finding data?

hollow grail
#

For kaggle notebooks is there a way to completely clear memory like restarting the session would, but without losing stuff written to disk?

finite galleon
#

Hello, I had one question, I am working with torch VGG model and I came across this method called ADAPTIVEAVGPOOL2D, which essentially take any input size and converts it to target output size by adapting the kernel size. My question is if this is a good idea or not? Like will it affect the model's performance?

zealous creek
karmic geyser
#

Doing the titanic tutorial and everything's going great until I save and run than get failure. Now have 4 versions--how can I get rid of 3 versions, keep one and redo til is runs and saves correctly?

delicate lichen
#

Also "is it a good idea" is a bad way to phrase a question. Even if it did drastically impact the model (probably doesn't) the ability to take variably-sized images is really important for a model as it allows you to train and inference* on variably sized data. Other options for accomplishing this are one of the spatial pooling pyramid techniques.

#

Being able to resize your image as a form of data augmentation also helps a ton, as many features in an image are scale-dependant.

#

Pooling is already an important part of most cv models, you can take any pooling method and make it adaptive.

#

Back in the day, they just took any image and resized it to be the shape of a model's input. If you had for example a photo that was taken in portrait the features would get super compressed if you resized it to be your model's 'landscape' input size.

calm oar
#

So I'd like to keep my private GitHub repo and my private Kaggle notebook synced on the data side. Is it possible to clone from a GitHub repo into the Input Data section?

#

I did !git clone https://access_tok@github.com/me/repo and it put it in output instead

#

In addition, does the input and output data section persist across resets?

calm oar
finite galleon
still mural
#

anyone knows how do i start to learn GAN?

wheat tangle
#

hey, I am trying to train gpt to make it predict next set of tokens (not textual data in this case) , I am pretty new to the gpt, how can I fit it into A40 gpu without getting out of memory errors

desert tusk
#

Can someone help me?

#

kaggle stuff?

orchid mortar
#

a lot of people use jupitor notebook but jupitor notebook doesn't have copilot.
I am too used to at AI helping me to code, do I need to buy my own GPU if I want to use copilot?

#

also can any mod help me with my name? I think it is bugged

#

oh i need to update it on keggle cool

gleaming oyster
#

guys i had a doubt reagrding batches in Tensorflow? im a bit stuck ont hat part and having trouble understanding converting data into batchess, any good tutorials?

stable dragon
fickle bobcat
#

Anyone knows the difference between keras and tensorflow.keras? What shall we use, and why? (I know the difference between keras and tensorflow, I'm asking about the difference of the two keras)

deft fox
deft fox
# gleaming oyster guys i had a doubt reagrding batches in Tensorflow? im a bit stuck ont hat part ...

There is no conversion to batches. It is a matter of selecting a subset of data at a time. Let's say that you have 10,000 training images. If there was infinite GPU memory the computer would load them all at once, train on all of them, and perform a single gradient update based on all images. In reality, GPUs will have enough memory only for 50-100 images at a time. So the dataset will be divided into non-overlapping batches that contain 50-100 images. The first batch of images will be used for training, the gradients updated, and then the next batch will go through the same two steps. That will be repeated until all batches are used, and then a new training epoch will begin. As to how it is done, nothing special is needed other than specifying the batch size.

dusty silo
#

Hi. How might I find a tflite model for the Kaggle ASL Fingerspelling competition? I'd like to experiment with an app rather than build the model myself. Is there a good place to post such requests? Thanks.

real patio
#

Please can someone help with a link of any UK Covid-19 dataset from NHS because the one i have is clean already or any health related for a prediction task!.. Thanks

boreal robin
#

Hi, I'm following along this guide https://github.com/FurkanGozukara/Stable-Diffusion/blob/main/Tutorials/How-To-Use-Automatic1111-Web-UI-On-A-Free-Kaggle-Notebook-Like-Google-Colab.md whenever I try to use !wget https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors -O /kaggle/temp/models/sd_xl_base_1.0.safetensors it reads "saved to X folder" but there's actually nothing in there, the file weighs in at 6.46GB could it be related to disk size on kaggle or something?

GitHub

Stable Diffusion, SDXL, LoRA Training, DreamBooth Training, Automatic1111 Web UI, DeepFake, Deep Fakes, TTS, Animation, Text To Video, Tutorials, Guides, Lectures, Courses, ComfyUI, Google Colab, R...

urban latch
#

Hello all,
This is my first post. are there any projects in DS using SAS ?

candid fiber
foggy monolith
#

I'm trying to import a 102688 x 61 rows into mysql but it seems to take a very long time to import where I leave it overnight and it still doesn't finish. Is there a maximum amount of rows that mysql can handle or do I really just need to wait?

#

I'm thinking of breaking up the dataset and then importing each table and connecting with id columns but is it necessary?

narrow edge
#

hey. I completed Python on Kaggle. I was wondering if I should make a project. Something simple. To implement all that new information. Or maybe I should learn a couple of libraries first.What do you think?

lofty totem
desert tusk
#

I want to improve my skills in pandas / numpy / seaborn / matplotlib, any recommendation how to do it?

lime wyvern
#

folks, couple of Qs - has kaggle had any competitions on LLMs where the answer wasn't categorical? Eg, I know there has been teh science exam one, but that was reliant on scoring multiple choice... has anyody seen a scoring criteria based off "nearest answer" or similar?

#

also, does anybody know if there has been a "text retrieval" competition before? Eg, pick out most relevant bits of text for question Y

molten dock
#

why doesn't kaggle have excel courses? and when would you realistically be using excel in data analysis and machine learning vs python or sql?

uneven quail
#

hi there @everyone, I'm curious if there are volunteer work here as ML engineer or at least to be an associate

deft fox
# molten dock why doesn't kaggle have excel courses? and when would you realistically be using...

Everyone is free to use whatever tool they like, but I doubt many people would pick Excel as their first choice for machine learning applications. Pandas has most of Excel functionality and then some, yet it is completely free. It even can open Excel files! A combination of 5-10 well-chosen python packages will be vastly superior to Excel for machine learning. But maybe you know something I don't. For example, is there a good (and free) neural network implementation in Excel? Or a gradient boosting machine implementation? Can Excel create highly stylized and interactive graphs? Can Excel even open a matrix that has dimensions 1,000,000 x 500, let alone do something useful with it?

tranquil forge
#

Please for the free cloud credits what's the elapse time?

My billing account just stopped about two days ago and even setting up my card has been an issue.

Would appreciate if anyone can help explain this behavior.

crisp vale
#

Hello, I'm looking for a dataset with information about people creating a profile to find a job, do you know of any?

Info like skills, resume, desired job etc.

olive tinsel
#

does anyone know this paper Knowledge Graph-Enhanced Knowledge Integration Learning for Natural Language Processing ? I am not being able to find it anywhere

sharp maple
#

oh people are still using this. I nee help and guidance!!

#

Any assistance would be appreciated

quasi junco
#

I finished a course on Kaggle Learn, the challenge is i am confused and don't know where to continue from. I need A Mentor!!! Please. I want to expand my work and be productive.

ruby crest
#

Hey everyone,

I hope you're all doing well. I'm currently facing a challenge in Object detection dataset specially related to class imbalance. My dataset is in yolov5 format. I'm exploring image augmentation techniques to address it. Although I can generate augmented images, the missing piece is the corresponding annotation, specifically creating annotation files like label.txt.

I'm a bit unsure about the best practices for generating these annotations for augmented images. If anyone has insights or guidance on this matter, I'd really appreciate your help!

Thanks a ton!

Latifur Rahman Zihad
Undergrad student

indigo bridge
#

hello @everyone i wanna know about a subject intitled 'AI-Powered Appointment Scheduler for patients' if it's a good subject for my final year project im so actually confused about choosing a specific subject

muted talon
#

Does anyone know of a rcent-ish (less than 4 years ago) image classification challenge? looking into studying multimodal approaches for CV problems

sharp plume
#

I have a project that focuses on performing descriptive analysis and statistics. But I am new to this. Can anyone suggest me some resources to get me started.

#

@everyone please help me out here

strong jolt
#

Can someone help me with configuring SVM in the convolutional neural network?

#

github: https://github.com/krishnaik06/Complete-Deep-Learning/blob/master/Image Classification Using SVM.ipynb
⭐ Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been u...

▶ Play video
#

I am working on FER2013 dataset. And I built this model.

#

`model = keras.Sequential([

layers.Reshape((48, 48, 1), input_shape=(2304,)),

layers.BatchNormalization(),
layers.Conv2D(filters=64, kernel_size=3, activation='relu' ),
layers.AveragePooling2D(pool_size=(2, 2)),
layers.Dropout(0.5),

layers.BatchNormalization(),
layers.Conv2D(filters=128, kernel_size=3, activation='relu'),
layers.AveragePooling2D(pool_size=(2, 2)),
layers.Dropout(0.5),

layers.BatchNormalization(),
layers.Conv2D(filters=128, kernel_size=3, activation='relu'),
layers.AveragePooling2D(pool_size=(2, 2)),
layers.Dropout(0.5),

layers.BatchNormalization(),
layers.Conv2D(filters=512, kernel_size=3, activation='relu'),
layers.AveragePooling2D(pool_size=(2, 2)),
layers.Dropout(0.5),

layers.Flatten(),

layers.BatchNormalization(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.3),

layers.BatchNormalization(),
layers.Dense(256, activation='relu'),
layers.Dropout(0.3),

layers.Dense(7, kernel_regularizer=tf.keras.regularizers.l2(0.01),activation
         ='softmax')

])

model.compile(
optimizer='adam',
loss = 'squared_hinge',
metrics=['accuracy'],
)`

#

But am getting weird result.

#

Epoch 48/50
202/202 [==============================] - 4s 19ms/step - loss: 0.3469 - accuracy: 0.1326 - val_loss: 0.3508 - val_accuracy: 0.1378
Epoch 49/50
202/202 [==============================] - 4s 19ms/step - loss: 0.3469 - accuracy: 0.1617 - val_loss: 0.3508 - val_accuracy: 0.2426
Epoch 50/50
202/202 [==============================] - 4s 19ms/step - loss: 0.3469 - accuracy: 0.1594 - val_loss: 0.3508 - val_accuracy: 0.1291

#

Am I doing something wrong?

#

I would be really grateful if someone could help me.

craggy cove
verbal crest
#

Alternatively, directly within the notebook editor there is a panel with a submit button (you need to have the competition added as a data source so it's linked).

echo oracle
#

Does anyone know good books to learn machine learning about the more advanced concepts?

zealous creek
austere igloo
#

I need help with some basics machine learning. I am trying to solve the Titanic prediction problem from Kaggle but after imputation, my train data gets more rows somehow and then it doesn't match with the y_train

#
X = train_data[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']]
y = train_data['Survived']

X_train, X_val, y_train, y_val = train_test_split(X, y)

# Encoding

oh_enc = OneHotEncoder(handle_unknown='ignore', sparse_output=False)

oh_X_train = pd.DataFrame(oh_enc.fit_transform(X_train[['Sex']]))
oh_X_val = pd.DataFrame(oh_enc.transform(X_val[['Sex']]))

X_train_encoded = pd.concat([X_train.drop('Sex', axis=1), oh_X_train], axis=1)
X_val_encoded = pd.concat([X_val.drop('Sex', axis=1), oh_X_val], axis=1)

X_train_encoded.columns = X_train_encoded.columns.astype(str)
X_val_encoded.columns = X_val_encoded.columns.astype(str)

# Imputation

imputer = SimpleImputer()

imputed_train_data = pd.DataFrame(imputer.fit_transform(X_train_encoded))
imputed_test_data = pd.DataFrame(imputer.transform(X_val_encoded))

imputed_train_data.index = X_train_encoded.index
imputed_test_data.index = X_val_encoded.index

imputed_train_data.columns = X_train_encoded.columns
imputed_test_data.columns = X_val_encoded.columns
#

and when I check the number of rows using .describe() after the encoding, it says the dataframe has 668 rows at that point, which is exactly what it should have.

But when I do this after the imputation, for some reason it shows the df with a varying number of rows around 830, though this number varies a little bit every time I restart the kernel. At the end of the program, I get this error "ValueError: Found input variables with inconsistent numbers of samples: [838, 668]" when trying to fit a model

#

Do you guys have any idea about what could this be?

zealous creek
austere igloo
#

It seems as it should for me

#

668 rows each dataframe

#

the right columns

#

ooooh

#

the indexes

#

is that it?

#

IT WORKED

#

lol

#

it was it

#

I solved it by adding this

#

Thank you @zealous creek

zealous creek
#

well done! 🙂

#

You correctly reindexed imputed_train/val_data but forgot to do the same for oh_X_train/val.

austere igloo
#

using it, the encoding part seemed to be fine

#

but then when I printed the whole df now to check way the describe was misleading, I saw this:

#

there was a lot of rows added, but they wouldn't affect the count of the columns because they were all NaN

#

So, by using the describe method, I could just see the other rows after the imputation

#

Now I know I shall always use the shape attribute instead

zealous creek
#

I'd also print the whole data frame just to be safe.

austere igloo
austere igloo
calm garnet
#

yo lads

#

does anyone have a roadmap form beginner to advanced in DS?

tardy garnet
#

Hi, everybody.
I have a quick question.
I uploaded an Excel file (.xlsx) to Kaggle to use as a part of a notebook.
I have set it as private. How do I find the path or call that excel file in my notebook?

deft fox
deft fox
#

I think you go to the competition discussion section and start a new thread by tagging @paultimothymooney, who seems to be responsible for that competition. If you explain your case just like you did here, they should be able to get you in touch with the competition sponsor.

deft fox
#

You can tag @near basalt here as well.

wet cairn
#

Thank you so much!

#

@near basalt Sorry to disturb you, but if you could get me in touch with the competition sponsor of the Google Gemma Kaggle competition, that would be amazing. Thank you.

verbal crest
#

@wet cairn Typically Kaggle staff can't help you through discord. Definitely create a thread in the forums to get help.

shrewd scarab
#

Greetings, I have a college assignment which requires me to interview a DBA/Data Scientist or someone in a similar profession. I am looking for anyone who might be interested in participating. This assignment isn't due for a while but I felt that I should reach out beforehand to see if anyone is interested. Feel free to let me know!

twilit prairie
#

hi guys, I am trying to do the House Prices - Advanced Regression Techniques but I encounter the same problems as I see a lot of people encounter, the could not convert string to float: 'RH'. Do you have any idea how can I resolve it? I'm trying for a few days now to make it work. I am using RandomForest and for the train data it worked, when I try to predict on test data it did not. Thank you in advance.

zealous creek
twilit prairie
#

Sorry about that. When i run predict on the test data i get these error. You can see the full code here.

zealous creek
twilit prairie
#

Same format

zealous creek
#

Is it? 😄

#

What are the values for example in the LotShape column in X_train and X_test?

#

vs the values of the same column in test?

twilit prairie
#

Ok, so it seems that the test LotShape in the test was not converted into categories, right? But why is that because in test.info() says it is a category. I have wrote this for loop to do so, as for the training data.

zealous creek
#

Input samples to sklearn .fit and .predict methods need to be numbers. I assume you preprocessed X_train and X_test to convert the strings to numbers so you need to apply the exact same transformation on test if you want the same model to work on it.

twilit prairie
#

i have found my problem. when i converted the test to category, I used df instead of test. Freaking copy paste lol. Thank you very much for your time and for opening my eyes 😄

zealous creek
charred copper
#

hello , I hope you're all doing good I have a favour to ask i searched in kaggle but couldnt find th emulti label antenna selection dataset its for a research trying to do in MASSIVE MIMO ,if anyone has an idea of where i can get it without generating it my self or idk if anyone could tell me how can i generat eit or have ready

summer raptor
#

Project Summary:
Objective:
The primary goal of this project is to develop a comprehensive tool that can automatically process and analyze various types of screenshots related to financial transactions, communication, and potential fraudulent activities. The tool aims to assist in detecting and documenting scams by extracting relevant data from screenshots of chats, transactions, transfer receipts, and UPI payments.
This is my college semester project anyone can help me
How can I build a project

sonic dock
#

Struggling with this:
Want to use a kaggle kernels output:

kaggle kernels output <username here>/text-summarization-using-lstm -p /path/to/dest

How do I use the above API command? Also, can I access files stored by the person while running the API command?

heavy granite
#

Can I exchange money for an extension of the maximum running time?

#

In kaggle

twilit prairie
zealous creek
amber willow
#

Hello everyone. I wanted to build model that predicts oscar winner for the upcoming year. For example we want to predict best actor. Even If we gather all oscar nominess from the beginning until today it will be roughly 400~ row. It seems very small dataset. Is it okay build with such a small dataset? If not what I can do? Thanks in advance

eager brook
#

Hi ! how can I link my kaggle profil with linked in ?

craggy cove
hearty knoll
#

can someone help me with submission?

foggy pilot
#

kaggle beginner | notebook beginner question here:

trying to run text-gen webui on kaggle notebook because it is faster (mistral) but after i run the main cell (last cell) kaggle terminates the session

is there anyway to make it contiously run for the duration of my time utilizing text gen web ui

obsidian bone
#

Hello,
A company asked me to develop a motion detection program that won't use deep learning. As far as they told me, they will run the program on GPUs and want to detect motion from CCTV cameras that are connected to it.

I wrote 3 different programs with opencv and python using following approaches:

  1. Frame difference
  2. Optical flow
  3. Background subtraction

But none of these are using GPU, is there a way to implement them on GPUs? or better yet how can I run opencv on GPUs? Thanks

deft fox
obsidian bone
#

That's what i was wondering as well... The guy said "we want something that works fast and on real time"

deft fox
obsidian bone
#

There are some people in higher positions who think they know everything, and if i don't meet their requests, they will think i'm insulting their and that i'm showing off.

#

So i can run the normal algorithms just fine on their GPUs?

woeful tundra
#

Message 1of2-Good day to all!,
Question: I have encountered a bug

Description of issue: Course: Getting Started With SQL and BigQuery Course step 1 of 6 - Introduction Exercise: Getting Started With SQL and BigQuery => I wrongly deleted the cell following this first one:

Set up feedack system from learntools.core import binder binder.bind(globals()) from learntools.sql.ex1 import * print("Setup Complete")

I got mixed up trying to revert the mistake but I could not make it at the end, now I want to re-start this exercise all along to fix it all and retrieve the deleted cell but I do not know how. Please your soon help. Thx a lot in advance!.

#

Message 2of2-ReproSteps: Introduction The first test of your new data exploration skills uses data describing crime in the city of Chicago. Before you get started, run the following cell. It sets up the automated feedback system to review your answers. add Codeadd Markdown

Set up feedack system from learntools.core import binder binder.bind(globals()) from learntools.sql.ex1 import * print("Setup Complete") # Set up feedack system from learntools.core import binder binder.bind(globals()) from learntools.sql.ex1 import * print("Setup Complete") Using Kaggle's public dataset BigQuery integration. Setup Complete add Codeadd Markdown

Use the next code cell to fetch the dataset. => THIS IS WHERE I WRONGLY DELETE ITS CELL CONTENT. I tried to amend it but I could not and now it launches an error and I do not know to debug it... 😦 add Codeadd Markdown

Create a "Client" object client = bigquery.Client()# Construct a reference to the "crime" dataset dataset_ref = client.dataset("crime", project="bigquery-public-data")# API request - fetch the dataset dataset = client.get_dataset(dataset_ref) --------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[14], line 2 1 # Create a "Client" object ----> 2 client = bigquery.Client() 4 # Construct a reference to the "crime" dataset 5 dataset_ref = client.dataset("crime", project="bigquery-public-data") NameError: name 'bigquery' is not defined Exercises

Thx a lot in advance!.

analog comet
#

I am searching for a partner to study machine learning engineering together. I am currently at quite advanced level, meaning i study such CNN models as StackGAN, ProGAN, AttnGAN, StyleGAN, etc. If you want to study with me, just DM me.Timezone is : UTC + 1; Framework: PyTorch, it is important, because i study only this.

molten wharf
ripe surge
#

Hello everybody! This might seem like a very dumb question but I am just getting started in data science and im on my first programming course, I am trying to get a grasp of the titanic problem but one thing that I cant understand is that on the gender_submission.csv document where only females where supposed to survive there are 411 entries while on the main data document there are only 314 females. Thank you in advance for your time

barren phoenix
#

hello everyone, I've just embarked on a journey into the world of data science! 📊🔍 As a newbie, I'm eager to soak up as much knowledge as possible and become proficient in this fascinating field.

I'd love to hear from experienced data scientists or anyone who's passionate about the subject. If you have any advice, recommended resources, or valuable insights to share, I'm all ears! 🧠💡

Feel free to drop your favorite learning materials, tips, or even your own experiences in the data science realm. Thank you in advance

abstract sequoia
#

which of these would be more robust?

model 1
Seed: 0, Data size: 100, Noise: 0, MSE: 0.00
Seed: 0, Data size: 100, Noise: 0.1, MSE: 0.71
Seed: 0, Data size: 100, Noise: 0.2, MSE: 0.73
Seed: 42, Data size: 100, Noise: 0, MSE: 0.00
Seed: 42, Data size: 100, Noise: 0.1, MSE: 0.33
Seed: 42, Data size: 100, Noise: 0.2, MSE: 2.44
Seed: 99, Data size: 100, Noise: 0, MSE: 0.00
Seed: 99, Data size: 100, Noise: 0.1, MSE: 0.70
Seed: 99, Data size: 100, Noise: 0.2, MSE: 0.86

model 2
Seed: 0, Data size: 100, Noise: 0, MSE: 0.00
Seed: 0, Data size: 100, Noise: 0.1, MSE: 0.61
Seed: 0, Data size: 100, Noise: 0.2, MSE: 1.40
Seed: 42, Data size: 100, Noise: 0, MSE: 0.00
Seed: 42, Data size: 100, Noise: 0.1, MSE: 0.6
Seed: 42, Data size: 100, Noise: 0.2, MSE: 1.44
Seed: 99, Data size: 100, Noise: 0, MSE: 0.00
Seed: 99, Data size: 100, Noise: 0.1, MSE: 0.63
Seed: 99, Data size: 100, Noise: 0.2, MSE: 1.46

#

model 1 has lower loss

#

model 2 has lower variance in loss with noise

#

so what is more important

zealous creek
abstract sequoia
#

Testing at 100 states, the noise has slightly lower mean (about 10%) in model 1 but far lower variance on model 2 (90% lower)

#

With 1000 data points

zealous creek
abstract sequoia
#

0.1 means the input is multiplied by random.normal(loc=1, scale=0.1, size=1000)

muted talon
#

In the arcface loss, is the embeding size the shape of the last layer?

deft fox
# abstract sequoia which of these would be more robust? model 1 Seed: 0, Data size: 100, Noise: 0,...

I think 10-20% noise is too much. Wouldn't go beyond 5%. I think a better way of assessing this is to do an N-fold cross-validation (NFCV) rather than adding noise. N could be 3, 5 and 10 and see what that gets you. Also, for such a small dataset doing a leave-one-out cross-validation (LOOCV) should be in play as well, because one can quickly build and test 100 models for a dataset of this size. Whether you do a NFCV or LOOCV, it should give a more unbiased MSE estimate than noise injection.

dull shell
verbal crest
placid tide
#

Hi guys, I'm very new to this field and would really appreciate some help or direction. Can someone please tell me some pre reqs for this competition?

acoustic moth
#

Hi guys, I have link my Kaggle account to Discord, but still cannot send messages in other channel... Can someone help?

zealous creek
zealous creek
verbal crest
obsidian pulsar
obsidian pulsar
#

Shall we compete together with our respective teams in a competition? @deft fox

#

I want to challenge you.

young apex
#

Any data scientists who used to be researchers in physics? How was the transitioning for you?

zealous creek
young apex
zealous creek
young apex
zealous creek
severe relic
#

Hi!
I have an machine learning question:
Let's say I've spotted some features through visual analysis or empirical studies that really seem to line up with the outcome. How can I give those features more weight in my model setup?
I'm having trouble understanding how to improve my model architecture to take steps beyond just engineering new features from the base data.
Thanks!

zealous creek
zealous creek
severe relic
#

Yeah, I found out that I had to do it with support vector machines

#

There's some SVM+ architecture that does what I wanted but it was outcompeted by WSVM.

However, there still ould be a case for SVM+ to be computationally less intensive? Albeit I bet no one will pursue this course.

There is however, a paper on domain adapative learning technology that has promising results. Written in 2023 too. Gonna take a look at it

alpine lotus
#

Hi!
I want to learn feature engineering. Can someone recommend any good resources, whether they're books, blogs, or papers?

hearty meadow
#

Does anyone know where i can find datasets for SeamlessM4T model?

modern remnant
#

Hey everyone,
I am planning to create a dataset of Human Speech commands to Robotic Arm motion.
My current plan:

  1. Have a participant come in and prompt them to move the Robot Arm, based on the given prompt.(e.g. : Wipe the dirt using the tissue, peel the potato,etc)
  2. Record the trajectory of the Robot Arm, collecting the joint pose, state and force torque data
    3.Replay the trajectory and ask the participant to describe the trajectory. This will be the natural language speech command.(The reason I am asking them to describe the trajectory is I want to include adjectives in the Command, which could help parameterised the motion characteristics like speed, force, etc)

After the Data collection is done, I plan on using this dataset to train a model, that give me the Arm trajectory based on the Natural command.

My question is:

  1. What other kinds of data should I include here? Should I record video as well?
  2. What kind of ML/DL technique would work here? I was thinking Reinforcement learning. Any other learning I should be aware of?
  3. My advisor suggested using contrastive learning, by pairing the good and bad examples, but what would contrastive learning look like here? How should I pair the examples and any other such techniques?
brittle shard
#

I want to study EDA, but I need datasets with some missing values for learing the handling of missing part better. But the datasets that I want to use don't have missing values. Is there any efficient way to generate missing values, lets say 12%? But it should be random. I tried bruteforcing but the dataset has around 40k entries. Pls h

deft fox
zealous creek
deft fox
brittle shard
#

Thanks

lunar ridge
#

after a restart my notebooks are showing "Draft Session Waiting for previous session to upload results..."

it's been stuck like this 10+ mins

muted talon
#

In the data i am currently working on i have noticed that normalization doesnt really do anything, the metrics are just marginally worse, and the loss curves are similar.
I have tried imagenet norm, channel wise norm, min-max norm
is there any study on the actual effects of normalization? Or any rational on why it would not be helpful?

zealous creek
# muted talon In the data i am currently working on i have noticed that normalization doesnt r...

Normalization is really important for ML models that use gradient-based techniques as the optimizer. If the features in your dataset have different orders of magnitudes (i.e., one feature is age with values between 0-100, and another feature is salary on the order of 10k-100k), gradient descent could become numerically unstable. It overshoots along one axis and converges really slowly along the other. Normalization is not important at all for tree-based techniques because the best split is determined by one feature at a time. So it really depends on the ML model and the optimizer.

muted talon
#

I am aware of that especialy for tabular data and time series
This was related to image based models, thus the imagenet and channel wise normalization

#

at least from empirical tests, normalizing the pixels values is not producing any sort of statistically significant difference, just marginally worsening the results

zealous creek
#

Neural networks are also optimized by gradient-based techniques, right?

muted talon
#

that is correct yes

plush sierra
#

hi guys can someone tell me about the spaceship titanic competetion.

shrewd crystal
#

i am actually struck can someone guide me to start it from where exactly?

jovial harness
#

If I want to setup a Datathone for my university could I get funding from kaggle itself? also How should I structure a datathone? any and all tips are very much appreciated virtual_hug

olive tinsel
#

Does anyone Know any search engines that can be used with APIs which are SEO Free; which do not follow SEO ranking?? I want to use them for a project of mine.

heavy fractalBOT
#
eliab8081 has been warned

Reason: Bad word usage

#
eliab8081 has been warned

Reason: Bad word usage

#
eliab8081 has been banned

Reason: Too many infractions

molten walrus
#

hello everyone, i have an internship in AI in with smartgrids, has anyone did similare project that can help me?

graceful axle
#

Hey Team Kaggle ,
I am Kaggle Notebook Expert and my Some Notebooks are Eligible for the silver medals but not getting ke Silver medal , Can u Please Help me out from this Problem

deft fox
graceful axle
glass cave
#

When I want to submit to competition I get ERROR: Unexpected Column: '' (Line 1, Column 1) can anyone help me

coral surge
analog comet
#

Guys what you recommend to study after those models: ingogan, cgan, dgan, lsgan, wassersteingan, biggan, progan, cyclegan, stylegan, stackgan, pix2pix, vqvae, vae, maybe consider some diffusion models?

serene merlin
#
pip install git+https://github.com/huggingface/accelerate.git git+https://github.com/huggingface/transformers.git bitsandbytes
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
import torch
from transformers import BitsAndBytesConfig
from datasets import Dataset
quantization_config = BitsAndBytesConfig(load_in_8bit=True,
                                         llm_int8_threshold=200.0)
# Tokenizer ve modeli yükleme
model_name = "/kaggle/input/mixtral/pytorch/8x7b-instruct-v0.1-hf/1"  # Yerel model yolu
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="auto", #"balanced",
                                             torch_dtype=torch.float16,
                                            quantization_config=quantization_config)
#
-error 
ImportError                               Traceback (most recent call last)
Cell In[25], line 10
      8 model_name = "/kaggle/input/mixtral/pytorch/8x7b-instruct-v0.1-hf/1"  # Yerel model yolu
      9 tokenizer = AutoTokenizer.from_pretrained(model_name)
---> 10 model = AutoModelForCausalLM.from_pretrained(model_name,
     11                                              device_map="auto", #"balanced",
     12                                              torch_dtype=torch.float16,
     13                                             quantization_config=quantization_config)

File /opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:561, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    559 elif type(config) in cls._model_mapping.keys():
    560     model_class = _get_model_class(config, cls._model_mapping)
--> 561     return model_class.from_pretrained(
    562         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    563     )
    564 raise ValueError(
    565     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    566     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    567 )

File /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:3024, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   3021     hf_quantizer = None
   3023 if hf_quantizer is not None:
-> 3024     hf_quantizer.validate_environment(
   3025         torch_dtype=torch_dtype, from_tf=from_tf, from_flax=from_flax, device_map=device_map
   3026     )
   3027     torch_dtype = hf_quantizer.update_torch_dtype(torch_dtype)
   3028     device_map = hf_quantizer.update_device_map(device_map)

File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_8bit.py:62, in Bnb8BitHfQuantizer.validate_environment(self, *args, **kwargs)
     60 def validate_environment(self, *args, **kwargs):
     61     if not (is_accelerate_available() and is_bitsandbytes_available()):
---> 62         raise ImportError(
     63             "Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` "
     64             "and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`"
     65         )
     67     if kwargs.get("from_tf", False) or kwargs.get("from_flax", False):
     68         raise ValueError(
     69             "Converting into 4-bit or 8-bit weights from tf/flax weights is currently not supported, please make"
     70             " sure the weights are in PyTorch format."
     71         )

ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`
#

how can i fix this problem

zealous creek
graceful axle
#

How can I measure the accuracy of text extracted from pdf using pyresparser?

glass cave
#

I want to Participate in Steel Plate Defect Prediction but I'm not quite sure what they mean byyour objective is to predict the "probability" of each of the 7 binary targets
Do they mean that I the model need to predect the probability of each problem for each steel plates and not Categorize them or what

deft fox
pastel fossil
#

Possible dumb question for you y'all, I wanted to work on a beginner project like the housing pricing regression with a team to learn from, or should that be a solo project?

#

(this might be for the getting started channel)

verbal crest
#

@pastel fossil You can do it either way. If you have friends to team up with then why not. But it's perfectly fine to do solo too.

gray copper
#

Hi everyone, I wanted to know if it is compulsory to form team to join competions(since I am a beginner in kaggle and I don't know anyone)

deft fox
quasi gyro
#

Hi I am trying to run: https://www.kaggle.com/code/abhimanyuaryan/fine-tune-gemma-7b-it-for-sentiment-analysis/edit

But I see warning on top. Also when I execute the cell

model_name = "/kaggle/input/gemma/transformers/7b-it/1"

compute_dtype = getattr(torch, "float16")

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    quantization_config=bnb_config, 
)

model.config.use_cache = False
model.config.pretraining_tp = 1

max_seq_length = 2048
tokenizer = AutoTokenizer.from_pretrained(model_name, max_seq_length=max_seq_length)
EOS_TOKEN = tokenizer.eos_token

I get this error

OSError: Incorrect path_or_model_id: '/kaggle/input/gemma/transformers/7b-it/1'. Please provide either the path to a local folder or the repo_id of a model on the Hub.

would appreciate any help. Por Favor

worldly panther
#

is seaborn library as recognized in ML as matplotlib? or is it strictly matplotlib? Like would you hire an engineer that uses seaborn instead of matplotlib? Is it important? Thanks.

pastel fossil
#

Does anybody use ML for their job in precision agriculture? Would love to chat more about it for someone in the industry and get some insights. 😎👍

deft fox
# worldly panther is seaborn library as recognized in ML as matplotlib? or is it strictly matplotl...

Seaborn is built on top of Matplotlib, kind of like Keras on top of TensorFlow. I am not in industry, so what follows is only a personal opinion. I don't "know" either Seaborn or Matplotlib in great depth, but any time I need to plot something it always gets done. This is either by consulting my old scripts, or I find a particular function by a simple Google search. In my experience both packages are well-documented, and I don't think either one would be strongly preferred in any setting unless a company is bound by their older software versions to use one of them.

glass cave
#

In a competition can I copy the approach of anather one in the competition
Not the same code just the way he handled the problem

tough hornet
#

i mean u need to understand and imitate approaches in one way or the other

#

NLP? I want to learn NLP. Where should I Start

distant lance
#

Hi, how are you all today? Does anyone know if the current limit of the maximum number of CPU notebooks has changed from 10 to 5? I am getting ´Maximum batch CPU session count of 5 reached.' messages. Thank you in advance.

glass cave
# tough hornet NLP? I want to learn NLP. Where should I Start

Natural Language Processing is the discipline of building machines that can manipulate language in the way that it is written, spoken, and organized

glass cave
empty belfry
#

Does anyone here know of any good resources for preprocessing, and data analysis of hyperspectral images? Thank you in advance.

obsidian bone
#

hello, i wanna ask quick question, does CrossEntropyLoss apply softmax to target values as well? or only on predictions? and should I normalize target values before passing it in?

deft fox
obsidian bone
limber belfry
#

hey! i want to submit for a non internet notebook competition and i want to use some pretrained LLMs from kaggle. i just cannot find the "add model" button on the right inside the notebook environment. i want to load e.g. mistral from a local path like it is described here https://www.kaggle.com/models/mistral-ai/mistral but it says that the model path is wrong. how can i integrate mistral into that competition notebook?

limber belfry
#

ok, i just found it. one must first click "add input" and then you can search for your desired model

glass cave
#

If there is a correlation between two columns
Like if the value of columns1
=0 the value of columns2 is 1
Can I delete one of the colums

#

I mean that deleting one of them will not negatively affect the training of the model

zealous creek
# glass cave I mean that deleting one of them will not negatively affect the training of the ...

Run an experiment and try it. Train a model with both features and note the test score. Then, train another model by removing one of the features while keeping everything else exactly the same, and note the test score again. If you want to be precise, you should try several random states in both scenarios and record the mean and standard deviation of the test scores. If you observe a significant change in the test score and you made only one modification between the two scenarios, chances are the change in the test score is caused by the modification.

obsidian bone
#

hello, my competition submissions say: Submissions Scoring Error
even though I checked the sample file, it's the same as my predictions Dataframe?

glass cave
zealous creek
glass cave
# glass cave

Why the groupby fonction creat this 3 Columns even that it not supposed to do

glass cave
zealous creek
viral grove
#

Hello, guidence and recommendations needed here please. I am still new to the kaggle and AI world. I am currently taking this course on udemy, https://www.udemy.com/course/machinelearning/learn I just finished the regression portion of the course, is there any excercises you recommend I do before I move on to Classifications? is there any videos you recommend I watch to soak in the skills I learned? or shall I move toward classification and work on project later. Would love to have a quick call or chat with someone who can help me understand how each of these aspects (Regression, Classification, clustering, deep learning) work together to form AI, the applications, use cases, etc.

vestal lichen
#

Good day everyone! I am still new to machine learning. I seek for help and guidance regarding a dataset I'm working with, it has csv file consisting video paths indicating the location of the videos. I just wanted to understand how should I supposed to load and preprocess the data based on the csv file containing video paths. Here is the csv file:

#

I hope to recieve help from you guys, thank you in advance.

arctic gorge
#

Where can I get notes about various topics in data science in pdf format?

foggy monolith
#

When I got my predictions using my model for the titanic competition, I got predictions that were between 0 and 1. I ended up just rounding to the nearest digit but is there a better way to deal with predictions that are between 0 and 1 (especially if they are 0.5)?

deft fox
# foggy monolith When I got my predictions using my model for the titanic competition, I got pred...

In most Kaggle competitions predictions on the 0-1 scale are expected for submissions. That means no rounding. The way you are doing it - predicting classes rather than probabilities for each class - hasn't been in use in a long time. Rounding up 0.7 to 1 doesn't really tell us how close the prediction was to 1. On the other hand, 0.7 not only has the information that class 1 is more likely than 0, but also tells us about the confidence in that prediction. Even though both 0.7 and 0.99 round to 1, the latter is more confident. By the way, 0.5 rounds to 1 .

shell tusk
#

is it a good idea to store your dataset in /kaggle/temp/ for training does it persist because all my training runs seem to stop after a certain number of train steps

#

here are few runs that stopped then crashed for silly errors like "np" is not defined

#

the graphs just stop changing

#

i should just let it run but i dont wanna to waste my gpu hours

shell tusk
#

ok they are moving just very slowly mb

shell tusk
#

that aside is there any reason as to why it doesnt ever seem to converge

#

ive always had loss curves like this even with another dataset

eager brook
#

Hey ! I have a problem of multilabel image classification but 2 labels the percentage of 1 is about 95% . I think this will harm the training . how can I augment the the percentage of 0 ?
generally when I do data augmentation I do it randomly using image data generator but how can I augment the part of images when it has not a person or machine !

deft fox
# eager brook Hey ! I have a problem of multilabel image classification but 2 labels the perc...

This is a classic problem of imbalanced datasets. Most classifiers in such cases are pushed towards classifying the majority class well, because that will guarantee high accuracy. In your case, a classifier that gives label 1 to each data point would still be 95% accurate. That sounds great, but it would be a useless classifier. One way around it is to change the class weight to correct the imbalance. I would try that first, as all modern classifiers will have that option. You can also try to selectively upsample only the minority class, but I would make that a second option.

deft fox
# shell tusk

I would say it is converging, but slowly. It may help to try a larger batch size, and to gradually decrease the learning rate.

young apex
#

Does anyone know if there's an app for practicing pandas? Something like Duolingo but for python programming.

coral tartan
#

Hello, I would like to finetune Gemma on Python questions.

Is it necessary to sanitize the data and if so, which characters should be removed from the dataset?

Is there a specific rule for LLM?

Thanks in advance!

verbal crest
#

@vapid valve We have pretty lightweight coverage and rely on automation mostly. Trying our best!

nimble swift
ionic owl
#

What should be the ideal way for me to start learning data science and ai? I do competitive programming in python and have computer science background and I am good at math

strong lynx
#

am a building a multi-point regression model to predict 6 different plant traits which have very different ranges, so I wrote a loss function in a way that each activation of the last layer is dedicated to predicting each different trait, and I also multiply the loss by weights ( 1/mean of each trait normalized)

But I don't think my model is learning when I look at the loss function plot.

I love to hear your feedback on my approach, as I am very new to machine learning and this approach is not doing very well in the kaggle competition.

odd rain
#

hello i am new to AI and have this school project for computer science in which i decided to create a pygame zelda style open world pixel game that contains NPC's that use an ollama 2 model to generate text, is this possible to do

haughty phoenix
#

What is transport in bemda

#

Hi

lunar ridge
#

not sure if there's a bug, but I keep trying to upload a transformer model but something is failing... on the model detail page it shows no variations, but if I go to add a new variation, I can see both there. and model can't be used, code doens't show...

feral spade
#

Hey guys, so I have never given Kaggle a fair shot and I am beginner to intermediate at Machine Learning. Does Kaggle genuinly help you learn machine learning?

feral spade
#

Hmm. Does Kaggle equate to the leetcode -> software engineering, Kaggle -> Machine Learning?

mighty topaz
#

i want to make sure this is not a scam and am hoping a staff member can confirm this for me

#

@verbal crest

glass cave
#

If the test data have a specific range and the train data have a larger range can I train the model specifically to the same range of test or is it considered cheating

verbal crest
#

@mighty topaz I can confirm this is real and not a scam.

low elbow
#

Hello I am completely new to data science and programming, I am doing the Intro to programming course ( + started learning python through youtube very recently ) and was wondering if I should wait until I get the hang of basics to understand the code for titanic or just follow the tutorial and do it

Thank you ❤️

glass cave
#

Bro what happening how in the world the code add 5 or more columns of age when the word age unavailable in the database neither the code

glass cave
desert tusk
shut fable
#

Hello guys, I have a question about a personal project I have created and the type of AI models I should use. Is it appropriate to ask my question in the kaggle discord ? If yes, in which channel ? Thx

fervent ocean
#

function ConnectButton(){
console.log("Connect pushed");
document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click()
}
setInterval(ConnectButton, 60000);

this is the code for colab notebooks so that they dont get interrupted due to inactivity. how to do the same in kaggle notebooks?

#

@tender trench @verbal crest

verbal crest
#

We don't support that on Kaggle. If you want to run a long query you should click "save version" which will run your code without any interuptions

fast magnet
#

Hello, I have a question about the "Cabin" feature in the Titanic competition. Why does this feature have so many missing values? I think there should not be missing values because there obviously should be a list containing passengers and their cabins.

proper scroll
#

What does the code throw the following error:
ModuleNotFoundError: No module named 'tensorflow.keras.layers.experimental'

import tensorflow
from tensorflow.keras import models, layers, optimizers
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
deft fox
proper scroll
# deft fox It is most likely that you have an old version of TensorFlow. The issue should b...

What do you mean with old version of TensorFlow? I'm running this now within Kaggle in the latest environment.

Running only issues the error on the last. The TensorFlow built of November 2023, 2.15.0, is installed.

import tensorflow as tf
print(tf.__version__)
from tensorflow.keras import models, layers, optimizers
from tensorflow.keras.layers import Input, TextVectorization, Embedding, Conv1D, MaxPooling1D, Flatten, LSTM, Bidirectional
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
deft fox
# proper scroll What do you mean with old version of TensorFlow? I'm running this now within Kag...

Same answer as before: Google it but add TensorFlow 2.15.0 to search terms. An educated guess is that somewhere between version 2.2.0 and 2.15.0 TF guys decided to drop this module, because that's what the error says: No module named 'tensorflow.keras.layers.experimental'. Maybe they renamed it. Yet another thing to try is to remove tensorflow from the import and try getting this module directly from Keras: from keras.layers.experimental.preprocessing import TextVectorization

deft fox
proper scroll
# deft fox You can also downgrade TF to 2.2.0 and that might work.

Downgrading stuff defeats the purpose of having updates.

Am I grasping this correctly? How can you see this in the TensorFlow docs?

What in TensorFlow 2.2.0 was:

from tensorflow.keras.layers.experimental.preprocessing import TextVectorization

... is in TensorFlow 2.15.0:

from tensorflow.keras.layers import TextVectorization
lapis canopy
#

Hi everyone, I usually see/understand the purpose of a Pipeline, altho I find that in practice it can create some inconveniences. Example:

Creating a sklearn Pipeline with XGBoost & passing the fit parameters, handling cross validation when XGBoost fit method requires eval_set param with stopping round.

So, my question is about how to structure your code/work when working on a particular problem.

Do you tend to choose 1 model to work with after looking at the data, or do you still go through different possible models, and if so, how do you structure your work? Do you make one notebook for every model architecture type, do you use Pipeline in practice?

hearty girder
#

Hi, everyone! I'm currently new to this. I was wondering what are your best practices if the raw data has a missing value which is supposed to be the primary keys like VIN?

lapis canopy
#

if VIN is supposed to be unique and doesn't give any specific information about the vehicle, wouldn't you be better off dropping the column entirely in this case?

#

if it gives some information, then you can encode it and put a special value when missing I guess

deft fox
hearty girder
#

I appreciate your answers! Thank you so much!

alpine ocean
#

I need help with hypermeter optimization for random forests with bayesion optimizer !

icy wind
#

Hi guys, working on a datasets with huge missing values, used NaNImputer to generate the missing values while it worked fine for my training data, It throwing up key error for my test data. I have checked my columns they matched, and there are no duplicates. Can anyone suggest how to resolve this issue. Would love some advise on this.

fresh ermine
#

Which competition would I learn the most as practice if I am going to build a machine learning model for predicting how likely horses are to win a race in horse racing?

#

And would i learn more from an active or completed competition?

forest remnant
#

I'm currently working on a project involving the CodeLlama model, which originally utilizes a Decoder-Only architecture with Masked-Self-Attention and KV_cache. However, I'm looking to replace the Decoder-Only architecture with my own Encoder-Only model, which employs Dilated-Attention as used in LongNet.

Here's a breakdown of the steps I've taken:

1.Code Initialization:
    Initialized the CodeLlama model using the AutoModelForCausalLM class from the Transformers library.

2.Model Inspection:
    Examined the structure of the CodeLlama model, including the layers and configurations, to understand its architecture.

4.Custom Configuration:
    Created a custom configuration class, CondensedLlamaConfig, inheriting from LlamaConfig, to adjust parameters for the new Encoder-Only model.

5.Attention Mechanism Replacement:
    Developed a new attention mechanism, MultiheadDilatedAttention, based on Dilated-Attention as described in LongNet.

6.Model Reconstruction:
    Reconstructed the model using the custom configuration and replaced the Decoder layers with the new Encoder layers.

7.Weight Transfer:
    Implemented weight transfer logic to transfer relevant weights from the original Decoder-Only model to the new Encoder-Only model.
#

correct if i am wrong please

summer drum
#

Hi, I am currently try to read pdf as a set of image, using pdf2image package
here is the code I used

imgreader = convert_from_path('/kaggle/input/the-test-on-pdfv/Scan_30_Mar_24_105590.pdf', poppler_path='/opt/conda/lib/python3.10/site-packages/poppler/')
#

and the error occur, state that I do not have poppler, which I have installed and imported already

#

are there any way to get around the problem?, or are there any other way to extract image from pdf?

merry dragon
#

@summer drum the convert_from_path('/kaggle/input/whatever is that from / or is that from the directory/folder you are in when you start python?

#

I am trying to do this on my own computer.

#

Or this line: train_data = pd.read_csv("/kaggle/input/titanic/train.csv")

#

Is that from your os root directory? Or is it like from the directory that you run python from?

summer drum
#

I used kaggle directory as well, and it still force me to install and get a path for poppler, which i have no clue about

#

I finish installing yet i dont know the path

vernal ibex
#

Hey! Is it ok to have something else than my real name as a display name on Kaggle? Can I still participate competitions and such? I did set up my name and surname at first, but I don't really want to use my legal name as a nickname on discord.

muted talon
#

Quick question:
Say i have two architectures, a small and a large variant (e.g convnext)
For the small, i can run batches of size 64, for the larger variant i can only run batches of size 32
Would it be comparable to run batch 64 on the small, and batch 32 with accumulation of 2, to only update weights after 64 batches? Or are there any other underlying differences?

opal berry
#

I started machine learning and data science about 5-6 months ago. In the next 6 months, I need to create a recommendation system for a startup platform, but I don't know how to do it. Can you give me suggestions and guidance?

deft fox
# opal berry I started machine learning and data science about 5-6 months ago. In the next 6 ...

It is difficult to give you more than a general advice because we don't know what type of data you have, and whether you want to do user-based or item-based recommendations. Recommendation systems have been around for a long time and a Google search will give you many useful leads. I suggest you first search Kaggle notebooks and discussions for "recommender" or a similar keyword. There should be many complete notebooks that you can analyze and adopt to your own needs.

misty cove
#

I want to build a unique personal project in machine learning, particularly around LLM and NLP. How can I look for such a problem statement to work on?

strong lynx
cerulean pier
#

Hello 👋 Guys,

I have a doubt,
I'm taking part in ML competitions on Kaggle, so can I consider the notebooks i make during the competition as my project also? Because it's the same thing as taking a dataset from Kaggle and making a project.

lusty bronze
#

this is gonna be a dumb question, but I am struggling to open the trainig set for this comp. theres just a red circle at the bottom left on colab and when I try open it on excel its just grey and this isnt an issue with any other data set Ive tried to open. Does anyone know why or how to fix it?

lusty bronze
gray juniper
#

Do you guys use cloud services to run models or do you use your own machine? I try to use mine but it's a mid ahh laptop so it's really slow but I'm really struggling to figure out cloud services

lusty bronze
#

I perosnally use google colab

gray juniper
#

Ohh wait i never knew this existed it is perfect!! Thank you!!

spice elbow
#

Hello! one quick question. I'm training a transformer for symbolic music generation. Thing is the model stays at a loss of 0.818 (using cross entropy) after 5 epochs. I'm training it with 81260 songs. Is the problem more likely to be with the way i preprocessed my data or a wrong implementation of the transformer? I followed this tutorial (making some changes because i'm not doing a translation task) https://github.com/hkproj/pytorch-transformer/tree/main.

Thanks!

#

Also, which metric can i use for validation? I'm using rouge. Thanks.

graceful axle
#

It's personally much faster than Google colab and you get higher run time

full anvil
#

hi! newbie here, i wonder is there any projects or competitions that are suitable for ML and DL newbie like me to take, so i can learn and master the techniques by practicing

gray juniper
quasi nacelle
#

hey anyone know much about the probabilistic machine learning book series

heavy granite
#

how to get kaggle_api_url

heavy granite
#

?

marble tendon
#

Hi, I've just got a couple basic questions- 1. What does a typical workflow look like? I'm doing the Olympiad competition, and my model takes forever to run, so is there anything I should be doing in that time, like improving other parts of the code? What do you personally do? 2. Are APIs allowed for final submissions? Seems pretty pay-to-win if you fork the 30 bucks and pay for gpt4 while everyone else is stuck using open-source models

wanton orchid
#

What is the difference between learntools ex5 and ex3

fresh ermine
#

Would you agree with these are the feature engineering tasks required for each model (as generated by chatGPT)?

deft fox
hearty drift
#

Hi! I'm training a text classifier with BERT and I'm getting a validation and regular accuracy of 1. Is that not bad? - Should it at max not be like 0.99?

glass cave
#

About the ai Olympiad math competition can I use a ATP or I should just use NLP

finite galleon
#

Hello, has anyone here worked with MedSAM? I have a few doubts about it. Please DM me.

pallid delta
#

Hello everyone, I'm going to start working on a project investigating the impact of the new metro system on traffic patterns in Quito, Ecuador using satellite imagery but I'm really new and would greatly appreciate any guidance. Does anyone of you have some experience doing this type of thing? Where can I find reliable satellite images of Quito for my analysis?

ancient hinge
#

Is it possible to save a fine-tuned LLM on the Kaggle notebook and use it later?

deft fox
deft fox
obtuse ginkgo
#

After a competition ends for example, HMS-harmful-brain-activity , can I keep the dataset for further research (self study as a student)?

deft fox
obtuse ginkgo
#

ty for clairfying appreciated blobthanks

remote snow
#

I am currently doing a kaggle comp for my class and I want to set up a way to run r gbm model through my gpu for faster run times. I am new to the kaggle world and want to know if this is even possible. Thanks in advance

spice sigil
#

please share any open internship offers for data science and Machine learning

ancient hinge
empty venture
#

hello, everyone. Is there a channel to discuss and find dataset we want?

grim pilot
#

Each train folder has 100 images and each image is 800x800.

#

The issue i am facing is that if i am trying to calculate rays on all 100 images it is taking forever, i dont know if i am doing anything wrong,

#

i am even using cupy for GPU utilization on VS code but i dont think its working. Please help me and sorry if this is the wrong channel to post these type of things.

strange crescent
#

Anyone here pretty good with the networkx python library? I’m trying to use it in one of my scripts but having trouble with it

hallow comet
#

I am a beginner in data science how do I get started with kaggle competitions?

vernal ibex
# hallow comet I am a beginner in data science how do I get started with kaggle competitions?

There is a demo competition on the website, The Titanic one, so go through a tutorial, and everything is explained there. On top of that, you need to know statistics, math, some programming, and some data modeling and data analytical skills. Pick the competition you feel like you want to participate and start exploring the given dataset. There are plenty beginner competitions there! Enjoy!

hallow comet
vernal ibex
# hallow comet Can you recommend any books or courses etc that you used for progress

Where are you in your journey? I used many many resources, Khan Academy, math textbooks, university classes, a lot of books from No Starch Press on programming, DataCamp courses, and many more. It's hard to recommend something if I don't know what exactly do you need. For example, I learned A LOT about neural networks and machine learning by writing my own neural network from scratch in raw python using Sentdex yt video series and a accompanying book. It's of course not a viable or efficient project for real life application, but I learned tons doing it. While doing it, I also read tons around the topic.
If you start completely from zero, Kaggle has some introductory courses on programming and data analysis in Learn section, Khan Academy teaches all the math and stats you need, and maybe grab an IBM Data Science Professional Certificate series of courses - you can audit them for free on Coursera, and they kind of show you what direction you need to follow. Start from learning math and statistics. Also, here you have an entire DS curriculum if you want to follow: https://github.com/ossu/data-science

GitHub

:bar_chart: Path to a free self-taught education in Data Science! - ossu/data-science

muted cradle
#

If i am training some data, and although there isnt a distinct point, at one point the program loss stops decreasing and instead increases (from 0.5 to like 30), what could be the reason

#

The learning rate im sure is ok

#

but the model trains really well, but there will be a point where learning just begins to spike

#

and is there a way to dignose the issue, or a way to prevent it

empty venture
#

i am doing a task of nlp. and i need a dataset which consists of texts posted by doctors or physician.does anyone know how to make a dataset like that?

broken heron
#

Hi I have one serious doubt for Stable Diffusion architecture

#

There are two things written denoising unet and and denoising step

#

Are these both identical or both different

#

Because in the SD paper there is no clear information related to it

snow gull
#

I want to submit a notebook to the prompt prediction competition but the submission uses libraries, is there any way to copy the installed libraries into a dataset and then use it with no internet?

undone nexus
#

As a beginner, I'm curious about learning more about neural network architecture(from the basics to transformers to the edge of current research). Does anybody have any recommendations for a deep understanding about neural networks and their architecture(i.e. textbooks, videos, etc.)

worldly panther
#

why my kaggle notebook keeps running endlessly? and to fix that I constantly would have to ---> factory reset and run

#

or sometimes it's stuck at a certain cell with the star

lunar ridge
#

are there site issues? notebooks seem to be stuck trying to load inputs

small geode
#

What are the best practices i can apply towards improving the accuracy of a neural network?
I've been playing around with a model on the House prices regression competition and I've gotten a 0.16 RMSE score:
def build_model():
input_shape = X_train_preprocessed.shape[1]
inputs = Input(shape=(input_shape,))
x = Dense(128, activation='relu')(inputs)
x = Dropout(0.2)(x)
x = Dense(64, activation='relu')(x)
outputs = Dense(1)(x)
model = Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss='mean_squared_error')
return model
adding layers and playing with the dropout rate yielded an RMSE score of 0.14 on the training data
def build_model():
input_shape = X_train_preprocessed.shape[1]
inputs = Input(shape=(input_shape,))
x = Dense(1024, activation='relu')(inputs)
x = Dropout(0.4)(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.3)(x)
x = Dense(256, activation='relu')(x)
x = Dropout(0.2)(x)
x = Dense(128, activation='relu')(x)
outputs = Dense(1)(x)
model = Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss='mean_squared_error')
return model

still mural
deft fox
deft fox
# small geode What are the best practices i can apply towards improving the accuracy of a neur...

If you make a wide enough and deep enough neural network, it will be able to memorize the training data and it will keep giving lower RMSE values. This doesn't mean the NN will be able to generalize well. The idea is to set up a cross-validation and test how this works on unseen data. If the NN keeps reducing RMSE on train data but not so on the validation fraction, the training must be stopped. That is one of the oldest competitions on Kaggle and there are many notebooks where these concepts are explained. I suggest you go through them by searching for NN notebooks that work well and use the information to re-implement your NN.

small geode
undone nexus
winged zenith
#

Does anyone know why this error is happening?

E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 0: 3.18444, expected 2.27506
Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.

I am using a GPU P100 and have a CNN network. This error happens when I train using the GPU. When I try with just a CPU, the accuracy also drastically increases.

hushed spindle
#

How can I install rag-101 ?
I was actually on llama-index and RAG the main code of libraries is (i am working in vs-code )

carmine oracle
#

Guys does anyone have free resources to learn deep learning and GenarativeAi?

next pebble
carmine oracle
#

Okay

velvet yoke
#

Hi guys, I'm relative new to kaggle and I just got into a code competition.
So in a code competition, we are required to submit a notebook right?
Can I build a model in my local machine, and then, upload it to my kaggle notebook that import that model and generate a submission csv? If that model works well, I would then upload the training program and use that to make a submission as a whole.
Is this appropriate according to the competition rule?
Or I should:
1 - do the training and predicting only on kaggle?
2 - train & test on the given dataset. if the model works will on the public dataset, copy the training code to kaggle and make a submission

Any help will be appreciated!

ancient hinge
#

Is it possible to make Kaggle datasets with hugging face dataset?

barren nebula
deft fox
slim burrow
#

does anyone have good resources (preferably a book or video lectures) about deep learning? the course I'm doing in my uni has very scattered information in slides and some of it is incomplete.

wet mason
slim burrow
barren nebula
#

Why is tensorflow throwing me so much errors?

snow gull
#

does submission scoring take gpu quota?

frank plinth
#

can someone link me to any tutorial about making my gpu work with TF > 2.10? I'm really at my wits end here, nothing seems to work

opaque barn
#

Hey yall, I have a quick question. So, basically, I'm working on developing a neural network for an image classification problem (this one: https://cs231n.github.io/classification/) and basically I wrote an algo for creating and training a CNN. As input, a CNN takes tensors of shape (image_height, image_width, color_channels). Then, I define the convolutional base using a common pattern: a stack of Conv2D, MaxPooling2D and dropout layers. This is the code for my base_cnn() function:

#
def base_cnn():
  """
  Define a convolutional neural network using the Sequential model. This is the
  basic CNN that you will need to reuse for the remaining parts of the assignment.
  It would be good to familiarize yourself with the workings of this basic CNN.
  """
  model = Sequential()
  '''
  Add 2D convolution layers the perform spatial convolution over images. This
  layer creates a convolution kernel that is convolved with the layer input to
  produce a tensor of outputs. When using this layer as the first layer in a
  model, provide the keyword argument 'input_shape' (tuple of integers). Besides,
  the Conv2D function takes as input
  - filters: Integer, the dimensionality of the output space (i.e. the number of
   output filters in the convolution). We set it to 32.
  - kernel_size: An integer or tuple/list of 2 integers, specifying the height
   and width of the 2D convolution window. Can be a single integer to specify
   the same value for all spatial dimensions. We set it to (3, 3).

  Here, we create a stack of (CONV2D, Activation, CONV2D, Activation) layers with
  the ReLu activation function
  '''
  model.add(Conv2D(32, (3, 3), padding='same',input_shape=x_train.shape[1:]))
  model.add(Activation('relu'))
  model.add(Conv2D(32, (3, 3), padding='same'))
  model.add(Activation('relu'))
  '''
  Perform MaxPooling operation for 2D spatial data. This downsamples the input
  along its spatial dimensions (height and width) by taking the maximum value
  over an input window of size 2X2 for each channel of the input.
  '''
  model.add(MaxPooling2D(pool_size=(2, 2)))
#
'''
  Add a Dropout layer that  randomly sets input units to 0 with a frequency of
  'rate' at each step during training time, which helps prevent overfitting.
  Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all
  inputs is unchanged. We set the rate to 0.25 for Dropout.
  '''
  model.add(Dropout(0.25))
  '''
  Create another stack of (CONV2D, Activation, CONV2D, Activation) layers with
  the ReLu activation function. Set the 'filters' to 64.
  '''
  model.add(Conv2D(64, (3, 3), padding='same'))
  model.add(Activation('relu'))
  model.add(Conv2D(64, (3, 3), padding='same'))
  model.add(Activation('relu'))
  '''
  Perfrom MaxPooling and Dropout similar to the one defined earlier.
  '''
  model.add(MaxPooling2D(pool_size=(2, 2)))
  model.add(Dropout(0.25))
  '''
  The image is still in 3D. It needs be unrolled from 3D to 1D using the Flatten
  layer. Then add a Dense layers on top of it followed by ReLu activation and
  dropout of 0.5. This helps to create a fully-connected layer.
  '''
  model.add(Flatten())
  model.add(Dense(512))
  model.add(Activation('relu'))
  model.add(Dropout(0.5))
  '''
  Create the output layer using the Dense layer with 'softmax' activation. The
  number of predicted output needs to be equal to 'num_classes'.
  '''
  model.add(Dense(num_classes))
  model.add(Activation('softmax'))
#
'''
  Set the optimizer for doing mini-batch gradient descent. Here, we make use of
  the RMSprop optimizer that comes with Keras. We supply some default values for
  the parameters learning_rate and decay. Do not modify them.
  '''
  opt = keras.optimizers.RMSprop(learning_rate=0.0001, weight_decay=1e-6)
  '''
  Compile the model for training. Since this is a multi-class classification
  problem, we use the 'categorical_crossentropy' loss function and 'accuracy' as
  the desired performance metric.
  '''
  model.compile(loss='categorical_crossentropy',
                optimizer=opt,
                metrics=['accuracy'])
  print(model.summary())

  return model
#

I'm currently trying to pass different activation functions to my CNN model and plot their accuries on training and validation data but for some reason, I'm getting a flat line for the CNN + sigmoid activation function, and I'm quite unsure what I'm doing wrong here:

#
def base_cnn_activation(activation):
  """
  The base_cnn() function sets the activation function to 'relu' by default. Modify
  the code so that it can work with an user-supplied activation functions instead
  of the default 'relu' activation. Do not change the 'softmax' activation.
  
  Compare the accuracy achieved by rectified linear units and sigmoid units in the base CNN. Produce two graphs (one for training accuracy and one for validation accuracy) that each contain 2 curves (one for rectified linear units and another one for sigmoid units). The y-axis is the accuracy and the x-axis is the number of epochs. Train the neural networks for 25 epochs. Although 25 epochs is not sufficient to reach convergence, it is sufficient to see the trend. Save the following results in your Jupyter notebook:

 The two graphs for training and validation accuracy.
 For each activation function, print the test accuracy of the model that achieved the best validation accuracy among all epochs (i.e., one best test accuracy per activation function).
  """
  model = Sequential()
  model.add(Conv2D(32, (3, 3), padding='same',input_shape=x_train.shape[1:]))
  model.add(Activation(activation))
  model.add(Conv2D(32, (3, 3), padding='same'))
  model.add(Activation(activation))
  model.add(MaxPooling2D(pool_size=(2, 2)))
  model.add(Dropout(0.25))

  model.add(Conv2D(64, (3, 3), padding='same'))
  model.add(Activation(activation))
  model.add(Conv2D(64, (3, 3), padding='same'))
  model.add(Activation(activation))
  model.add(MaxPooling2D(pool_size=(2, 2)))
  model.add(Dropout(0.25))

  model.add(Flatten())
  model.add(Dense(512))
  model.add(Activation(activation))
  model.add(Dropout(0.5))

  model.add(Dense(num_classes))
  model.add(Activation('softmax'))

  opt = keras.optimizers.RMSprop(learning_rate=0.0001, weight_decay=1e-6)
  model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
  print(model.summary())

  return model
#

I'm getting graphs that look like this:

#

but the sigmoid is also supposed to be a curve instead of flat line with 10% accuracy

#

Can anyone tell me what I'm doing wrong here in the base_cnn_activation() function please?

true bridge
#

is there any free dataset of drilling or open hole log?

snow gull
devout bobcat
#

Good day! I am an Applied Computer Science student, currently in my second year with a focus on Artificial Intelligence. For our Deep Learning Project, we have to choose a competition on Kaggle where we have to use at least one of the following:

  • MLP
  • CNN
  • RNN
  • Auto-encoder
  • NLP
    As we are very limited in time (only 10 days), they have advised us against using CNN, as training would take too long. So my question is which of the current competitions would you suggest me to choose? Thank you all!
devout goblet
#

Hey everyone !!
I hope everyone is doing great
This is my first ever Kaggle competition, hence I don't have prior experience with respect to submissions. I just wanted to ask how does submission work ? Do we have to upload the model or a pipeline (script) somewhere ? (I could just see an option to upload the notebook) . Apologies if the question is repeated (I just asked the same question in competition-general assuming it would be a correct channel to ask the question)
It would be of great help if anyone of you could provide some necessary information about the same

misty roost
left jewel
#

I am looking for an African Climate Dataset. If anyone has it or has an idea of where I can get it please reach out

deft fox
shut yoke
real patio
#

Hello, been having issues on apply hybrid model on my project but not get it please can someone help. Thanks

olive tinsel
#

Can anyone help me in implementing this paper "Robust and accurate object detection through adversarial learning" https://arxiv.org/abs/2103.13886?

cloud mango
static nest
#

Can someone help me out with this?

devout bobcat
static nest
#

@devout bobcat hey, got it cleared earlier. Thanks for replying

devout bobcat
#

No worries, feel free to ask if you have any other questions!

worldly panther
deft brook
#

Hello Kagglers! I am trying to submit my first competition. Do you know how long on average takes Scoring?

verbal crest
#

For the most part webuis are intentionally blocked on Kaggle for consuming too much compute. Kaggle compute is for learning and data science work, not as a tool for generating art or deepfakes.

TOS and Documentation:
https://www.kaggle.com/terms

https://www.kaggle.com/docs/notebooks

silent venture
#

Is it allowed to upload my solutions for a titanic dataset on my github repo? Basically I want to solve titanic survival competition using multiple logistic regression algorithms and store them in a github repo.

glossy finch
#

anyone have any tips on improving unsupervised learning knowledge (i am already familiar with supervised)? like using images to classify the image or like using images to detect something?

glacial linden
verbal crest
#

@glacial linden I'd try asking in the forum for thart competition or it's specific discord channel #playground-series-s4e4

atomic blaze
#

hey guys, while using Google Colab do I have to rerun the entire code every time from the beginning? (including training ds) because I can't seem to save the progress?

potent oyster
#

iirc webuis on kaggle were banned (might be wrong) so i would like to ask are things like rvc banned?

visual gate
#

Hi I am facing some issues with tensorflow multihead attention.

Even though the input shape is consistent. It's still throwing an error saying incorrect input dimension.

Can any one help please.

I am creating a local transformer single layer

visual gate
#

Any one please help

worthy copper
#

Good day everyone, I have a few questions about my model for the Titanic competition. I extracted the titles of each person and put them as extra features. Then I normalized my data and did PCA. Then I took the 10 best principle components and fit a few models on these. All of my models are performing very badly even after doing extensive grid search with each model. Does anyone have any tips?

misty roost
#

Can anybody recommend a book about Data science related to finance? I mean a up-to-date book relevant for 2024.

green coyote
#

I somehow got laughed at and downvoted for making the discussion post?****

cloud quest
#

can anyody share learning materials for time series how to do prediction on multivariate time series.
I know about ml algos (ses,des,tes,Arima,Sarima) but don't know how to find out forecast when other independent columns can be used in forecasting and guys any vlog on decompositon .How to decompose a time series please help me buddies I need them urgent guys.
please seniors help me

static nest
#

can anyone help me with this. The solution was correct but couldn't proceed further and the question is also not marked as correct

static nest
#

even this is not working

wide crescent
#

#DOUBT_HELP: I am doing this titanic kaggle challenge - https://www.kaggle.com/c/titanic/data?select=train.csv. im confused if the following columns should be dropped for EDA or not -- 1# of siblings / spouses aboard the Titanic
parch 2# of parents / children aboard the Titanic 3Cabin number
4 Port of Embarkation
(I think these factor does not contribute to whether they could survive or not- it seems like unneccesary infor for this analysis, For example - how can it make sense to say people who entered first had more chances of dying in ship sink or people in cabin no. ABC could survive better!?) Please help

honest perch
#

As you'd expect sex is the greatest importance for the dependent variable. Though surprisingly Ticket and Cabin are quite important.

honest perch
#

Here is a partial dependence graph with those 3 columns, as you can see there is a trend for fare and Cabin and a weird relationship for ticket

#

My catagorical data are converted to ints alphanumerically

ie cabins :
  0: 'A10',
  1: 'A14',
#

They were probably corrolated with fare or class idk
/shrug

wide crescent
#

Thank you so much @honest perch I will check out your notebook.

#

By the way, its my first challenge and my leaderboard position is 13334, wonder if its any good? And tips on improving?

weary gull
honest perch
honest perch
honest perch
#

For reference this simple decision tree got 0.545 and position of 15326

wide crescent
#

@honest perch I read ur notebook - I think col Cabin with 687 Nans should entirely be removed while u replaced Nan with mode vals

halcyon island
#

I have an absurd regression problem
as a part of the assignment i am supposed to perform regression on a dataset with almost 2 million rows and 2400 columns/attributes
and the test dataset on which i need run the trained model has 400,000 rows
Please suggest possible methods to solve this

arctic kayak
#

Hi, guys! I'm participating in my first Kaggle Code Competition, so I'd be greatful if you could help me, a beginner, understand the requirements for an elligible submission.

From what I've read in the Kaggle documentation, it is necessary for the submission notebook to be ran "top to buttom" in less than 9 hours of CPU / GPU runtime. That means that in my submission I should train the model on the training dataset and also predict on the test dataset in less than 9 hours, right? So, if I'm using an ensemble solution, I should manage to train all my component models in the time limit. Or I'm getting it all wrong?!

I'm asking this question because I've noticed in this and other competitions' code tabs that there are public inferrence-only notebooks that import model(s) trained elsewhere (uploaded as Kaggle datasets) and use them directly to predict on the test dataset. This shortens the total runtime of those notebooks. Is this kind of notebook allowed to be a final submission? Or is this just a way to avoid exhausting the GPU weekly quota while also allowing one to see how well their predictions perform on the public leaderboard and also making certain notebooks public for the community without revealing too much of the training process used.

If this isn't allowed, then how are my submissions supposed to compete with these sped-up notebooks, with high public scores, especially in the efficiency section of the contest?

Thank you in advance!

honest perch
honest perch
weary gull
honest perch
past sleet
#

I already asked my question on discussions but I have to cross post here since kaggle discussions are getting so much spam. This is the post. https://www.kaggle.com/discussions/general/498808

I saw some of the old discussions about some teams getting removed because they shared something before merging teams. My teammate shared a notebook and literally after 10 minutes we teamed up. We haven't done any submissions in last 2 weeks. We have teamed up 2 times before and finished top 10 in both competitions and we have no history of cheating. Are we safe? @steel sundial @verbal crest

cyan pelican
#

Hi,
Anyone have knowledge pytorch_forecasting library?
I have a question about it.

honest perch
# wide crescent <@244045452083724289> I read ur notebook - I think col Cabin with 687 Nans sho...

Actually I had a look at the feature engineering notebook, and I believe unknown cabin is quite informative. I should have made it a separate category if I spent more time on feature engineering.

To conclude, cabins used by 1st class passengers have higher survival rates than cabins used by 2nd and 3rd class passengers. In my opinion M (Missing Cabin values) has the lowest survival rate because they couldn't retrieve the cabin data of the victims. That's why I believe labeling that group as M is a reasonable way to handle the missing data.

https://www.kaggle.com/code/gunesevitan/titanic-advanced-feature-engineering-tutorial

wheat kettle
cyan arch
#

How come my code is able to run on jupyter notebook, but when I try to run it on the kaggle notebook I get a value error?

shut moon
#

I was re-creating a work of hate speech detection model, I am quite noob at certain paramete, what I was facing a error while measuring the model accuracy with " Naïve Bayes BoW classifier ", I have attached a screen shot. I hope I am understood at this level.

cyan arch
#

is it vocab or vocabs

winter furnace
#

Can anyone point me to the right direction,

I'm trying to build a model that matches a book's paragraphs in one language with the matching paragraphs in a translation (for example, let's take the little prince's english version and its japanese translated version)
The idea would be to create a version of a book where its original and translation are laid out side by side for language learning

I'm not too sure yet how to approach this kind of problem (what model to use, what kind of problem it is, etc.) so i'd appreciate some guidance

as of now, my idea would be to vectorize/tokenize the words, compute something like a vector sum per paragraphs, then maybe match the resultant vector using a dot product with the vectors in the other language, the thing tho is that since these are two different languages, the way the words would be vectorized would probably result in vectors where the dimensions aren't the same, so not yet sure how to deal with that


TLDR: I'd like to create a model that automates the creation of something like this: http://bilinguis.com/book/alice/jp/en/c1/ where the model aligns the text from an original language to an official human-translated text

Any suggestions would be appreciated!

cyan arch
#

multilangual model

misty roost
haughty oriole
#

hi guys if i have 3 box with 3 digit of number inside it what is the best deep learning methode to predict it?

outer geyser
#

how you guys been running the llms? do you pay for the vertex ai subscription or use private hardware?

shut moon
sterile cliff
#

@shut moonFirst of all this a function and you are using it, can you show me the function itself BEFORE that line of code

shut moon
sterile cliff
#

do you remember that there's something the same name as this ?

#

before that line of code?

shut moon
#

In the whole code I didn't found anything like that. first I thought it was a part of 'naive_bayes' module. but it wasn't

sterile cliff
#

no it's not, and even it is, this not the right way to train the module

#

so the answer is no

#

I guess I will have to wait to see an answer I might helpo

sterile cliff
#

and I might now

shut moon
#

Much APPRECIATED

misty roost
#

Hello I would like to ask question about Kaggle Tier progression

  1. In order to become Kaggle Expert, do I have to become expert in all 4 categories (competition, notebook, discussion, datasets). Or is simply becoming expert in one of these enough to become overall Kaggle expert?

  2. Is skipping Tiers possible theoretically? Let's say I am Kaggle contributor and I don't meet criteria for becoming "Kaggle Expert" but I meet criteria for becoming "Kaggle Master". Do I become Kaggle Master, skipping the Kaggle expert phase? Or does it not work like that and I have to progress gradually?

thanks for explanation

shut moon
sterile cliff
#

@shut moonOh sorry something came up, and yes I saw it

#

[Errno 2] No such file or directory: 'train.tsv'

#

I guess you do have it

#

@shut moonand decent accuracy at the end I guess?

#

it isn't even a good accuracy

#

but give me your whole lines of code

#

and I might be able to help

shut moon
#

I will send you my notebook

sterile cliff
#

ok

shut moon
sterile cliff
#

Ya I downloaded it just hold on a sec

shut moon
#

Okay

sterile cliff
#

ok here's the thing my friend, it's a function not something that you can get from naive_bayes
so there's 2 notes here from my understanding.
either he forgot to put the cell that have the actual function in this case "get_vocabs"
or
the cell works in your pc and not mine, in this case I don't understand why but I am very sure it's not something from the library,

#

I mean the guy who produced it in the first place

#

@shut moon

#

I guess you will have to create this function?

shut moon
#

Okay. Let me see. I am not sure if I can do that, I have not that experience, I will start learning, come back to it later.

sterile cliff
#

I do not have experience also my friend, but if you want to learn now, here's a quick advice if you want to create this function

#

simply see the difference between the 3 variables that he created as objects, and what happened to them after the change and the application of the function

#

and see what's in the train data

#

again see the difference and see what happened

#

and start making notes of the changes that happened

#

then you will create your function very easy

#

I was always doing that when I was learning python so give it a try @shut moon

#

all you need is information, loads of it

shut moon
#

RecursionError: maximum recursion depth exceeded

sterile cliff
#

can you show me the function

hallow dagger
#

Hi I'd like a help on the problem I'm working ( it's image classification competition by CVPR) it's basically about classifying different species of snakes from their images.

I've been trying to do transfer learning on several models still not getting a good results

models I've tried.
facebookresearch/Hiera -- Tiny version
vit_base_patch16_224 ( trained over imagenet data)

Any suggestion for the base model or any augmentations to try on. I've been using fastai + albumentations for the augmentations to the training images ( taking reference from this link) https://github.com/benihime91/kgl-pogchamps-3-corn/blob/main/nbs/NB_EXP_V2_008_swin_base_patch4_window12_384_in22k.ipynb

would like to have a discussion if anyone is interested 🙂

Thank you 🙂

GitHub

It's Corn (PogChamps #3): Corn Seed Image Classification - benihime91/kgl-pogchamps-3-corn

hallow dagger
# shut moon RecursionError: maximum recursion depth exceeded

@shut moon it's been determined by the system I'll probably recommend you to use some DP approach if possible instead of recursion since limit is exceeding but you can have a workaround by using this

https://stackoverflow.com/questions/3323001/what-is-the-maximum-recursion-depth-and-how-to-increase-it

wheat kettle
analog bear
#

Hi all, I am working on my first ML project independently. The project is titled 'House Prices - Advanced Regression Techniques.' We have been provided with separate files for training and testing. Why do we need to split the data into train and test again before running the model?

celest dust
#

oh and you have to split the training data into train and test again since you want to see the accuracy of your model before sending it into the competition

#

otherwise its a similar act as throwing a dart while being blindfolded

sterile cliff
#

@wheat kettleunfortuntly for you I don't have such a dataset

sterile cliff
#

@shut moonand this function is very wrong

wheat kettle
# sterile cliff <@1233477275614187593>unfortuntly for you I don't have such a dataset

can you suggest me a dataset that I can use to predict Diabetes. I found this dataset https://data.mendeley.com/datasets/wj9rwkp9c2/1 but the fields are different from PIMA dataset https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database?resource=download

sterile cliff
#

@wheat kettleI told you that I don't have such information in the datasets

steel matrix
#

how can i remove useless data from my heatmap cause i sort it using this :
`corr_scaled = corrmat_top_features[(corrmat_top_features['mpg'] >= 0.3) | (corrmat_top_features['mpg'] <= -0.3)]

mask = np.triu(np.ones_like(corr_scaled, dtype=bool))`

but in the heatmap i still have all of them

#

plt.figure(figsize=(192, 50)) sns.heatmap(corr_scaled , mask=mask, annot=True, annot_kws={'fontsize': 8}, cmap='coolwarm', vmin=-1, vmax=1, square=True, linewidths=0.5)

sterile cliff
#

@steel matrixmmmm, can you show me the columns that you put them into?

#

the corr_scaled, show me the result of that variable

obsidian bone
#

Here is the thing, I imported a LLM using AutoModel, and added my own Custom Classifier at the end of the model with Custom pytorch class. But evertytime I try to enter an input to the model created from the pytorch class, it runs out of memory, I used quantization and acceleration. and the input's batch size is 1.

is there a way to input a data to model like this without running out of memory? thanks.

untold helm
#

I want to get into predictive analytics... any book recomendations? Im pretty new at this 🙂

sterile cliff
#

@untold helmKaggle has bunch of courses in machine learning that can actually teach you this without any cost, it will teach you predictive analysis, and actual Predication can see stuff into the future

#

whatever you want

sterile cliff
#

@untold helmyw

analog bear
fading swift
#

hello,
I want scrape data from a PDF which is stored in a Google drive and then I need to train a model to scrape it dynamically (whenever a new PDF is added) and get the complete data in excel or google sheet. can anyone tell me how to approach this, Because I have tried getting the PDF data into Json first and then converting it on to a excel sheet also I was thinking to do this on SQL but the column names are different on the PDFs. Please suggest an optimized way of achieving this. Thank you

sterile cliff
#

@fading swiftsearch on something called tesseract, it will extract the text from any given file (such as a pdf), it requires so research but you got it, it's also a python code not program, I mean library I suppose

#

@analog bearit doesn't to have be those, this an example not an answer, choose your feature by yourself if you want

#

@fading swiftalso you will find how it work on youtube (I hope)

fading swift
sterile cliff
#

np

analog bear
sterile cliff
#

@analog bearanything can be a feature, but before you add them, just remember to increase it's sensibility of the data and it's reliability (in other words, data cleaning)
you might remove a feature because it simply doesn't tell you enough data, or it's just random from guy who put random numbers, it's up to you to discover that

wide crescent
#

Guys, if smbody has good kaggle profile, does that help in landing DS jobs?

frank plinth
#

i was looking through the codes and saw that the top ones didnt really create coherent pokemon images

it was more like different blobs of colour that might resemble a pokemon from a distance

#

they use a dcgan

#

so is the size of the dataset itself the bigger problem here?

#

because its 819 images

fringe arch
#

Hello everyone, I want to start building a Fake News Detection Model to practice machine learning, i am reading a lot of fellow kagglers notebooks about this subject and getting to learn what libraries, models, and algorithms to use, but i still haven't rapped my head around it , like what's the best practice to start developing the model as a beginner, then as an intermediate, then as an Expert.

sterile cliff
#

@fringe archlearn from kaggle

fringe arch
#

bro

sterile cliff
#

@fringe archdo you want a project ? you will see more from kaggle courses

#

do you want competition, believe me from the course

#

do you want an actual project for beginners?, this hard, because a complete project can be easy if it's described, thus it's in the course

#

I don't know what else you want me to say

weak compass
#

Hello. Does anyone know what is going on with this

#

The file path seems correct but the program can't find it?

#

nvm it is fixed after refreshing....

sterile cliff
#

ok

sterile cliff
#

isn't there a team that I cna join to make projects and competitions?

analog bear
pallid radish
#

Hello guys, I am facing some issue regarding running my code using TPU for a binary classifier dataset which contains 23k files for one class and 3k files for another.
I am inporting my dataset in the form of TFRecord and then convert it into TFDataset. My sparse accuracy is coming out to be 7%, instead it should be between 0-1.

Can someone please help, its really urgent?

quasi gyro
quasi gyro
#

but now I have new question. How to use both the GPUs. I only see one T4 being used in training

sterile cliff
#

@quasi gyrothat is torch, I am not familiar with that issue

#

@pallid radishand that's an AI classifier?, mmmmm not sure about that one

#

guess I don't know everything lol

quasi gyro
sterile cliff
#

@quasi gyrowell, consider it as a gift instead of complete silence from most of the users that doesn't answer(which is disappointing)

#

but if you want help I can search with

quasi gyro
#

No I found solutions. Thanks anyways. I don't know about using 2 gpus yet

#

But I'll figure it out thanks

sterile cliff
#

np

karmic spear
#

How can I join a team??

pallid radish
pallid radish
sterile cliff
#

@pallid radishI used to have this kind of problem before, however the problem is I completely forgot the way to handle the TPU, I am sorry

#

@karmic spearI am looking for a team myself, kinda noob here but if there's nothing to emerge I will create a team here

echo latch
#

I am working on a community detection project for social network (telecom data) i wanted to ask if this approach is good
Transform data to graph (Pyg)
Train a Gnn Model (GAT)
Generate embbeddings with the gnn
Apply K-means for clustering
I would appreciate any advice or guidance

shut moon
#

I was trying to run some different ML model On liver_cirrhosis data set. I got result for logistic regression and svc, two of these models were about 50% accurate, then I wanted to run XGboost. the first error I found that the categorical values should be like 0,1,2 but onehotecoding did 1,2,3 so I did the labelencoding and run xgboost again, the error persists till now. Here are the two screenshot to clear the scenario.

stark frost
#

Hey can anyone please tell me where I can get a good scipy tutorial

pastel fossil
#

How often do y'all use Calculus in your models? I'm early into my endeavor with machine learning and trying to get my fundamentals in place and curious your thoughts. 🤔

shut moon
#

I somehow solved the previous problem , but the accuracy seems bit low, how do I increase accuracy?

wide crescent
#

#query For Titanic - Machine Learning from Disaster, I used logistic regression for training. I split mmy train data into train n test to check accuracy and it was 1. But when I used same EDA and training for test data and submitted, its giving me 0.7 accuracy. Is this overfitting and how to overcome this?

sterile cliff
#

@shut moongood for you that you solved the problem I was just about to make a trail and error with, second if you want to increase the accuracy, you have to first understand the data, how you can do that, get a box plot, see the outliers that messes up the data, and see how it is related, second see your feature, I mean column by column BEFORE hotencoding them, their might be something interesting, third, did you see the duplicates?, the NA data?, is there any of the those doesn't make any sense at all in the Same columns?, do you have to delete some of the columns, if so ? why?

these are the kind of questions you have to ask yourself before you put them inside the module

because a 1.74% accuracy, is pretty PRETTY bad

#

@pastel fossilwell the theory of the machine learning and AI is entirely on calculus, you can study it, and see how the modules actually work with just the numbers, the only problem is it takes a lot of time, that's why it's done in one line, and voilà, it works.
but we don't use calculus to as actual calculations in coding, we have other things to be careful for, and that's not one of them

#

@wide crescentare you sure you split your TRAINING data into training and test?, and if so there's no way, I mean NO WAY they will be 1, something wrong in your code. either you trained it at the same pre-trained model, or you didn't split them at all

#

@wide crescenteither way, you have to show me what you did

zinc orbit
#

I close the notebook rightside setting panel accidentally,how can I reopen it?

sterile cliff
#

i don't remember, take a screenshot so I can tell you

#

@zinc orbit

modest delta
#

how to fix this? editor loading forever...cant do anything in the page

empty mural
#

hello Guys
i hope you are well

please i have i question : i'm taking part in an llm competition and i'm facing a small problem. i'm working on colab and i'm having a problem with the model i want to use. i'd like to know how to train an llm model?

can anyone help me?

zinc orbit
frank plinth
patent kiln
#

i dont think theres a single different entry

deft fox
feral spade
#

What are the best social media for a highly technical code blog?

sterile cliff
#

@feral spade github?

slim frigate
#

hallo

#

I want to know is the certificate of Python useful?

sterile cliff
#

@slim frigate not much, the most important thing is your knowledge, and if you can work and make projects with it or not

wide crescent
wide crescent
sterile cliff
#

@wide crescent good hope you make the best module ever 😃😃😃

wide crescent
sterile cliff
#

@wide crescent well send in the notebook, I might help

patent kiln
celest dust
#

hey! how do you guys decide what hyperparameters to set for a CNN? (more specifically, a computer vision model)
whenever I try to create a model, it fails to converge. I also recently tried to remake the AlexNet architecture on a food101tiny dataset but it fails to converge, so I'm not so sure if it's about hyperparameters in that case.

sterile cliff
#

@celest dustWell, according to the data, you need to understand what the parameters do exactly, before you apply them, OOOOORRRRRRRRRR............................

You can just make a for loop for a small portion of the data("this part is VERY VERY important"), and you can change the value each time in the loop
if one of the numbers you put in the loop got a high number then you get a winner, then you set on the next parameter on the next for loop, and voala, you have unbeatable boss

celest dust
#

yeah that'd be cool but it would take a longer time to finish that process on every hyperparameter than just understanding what those params do exactly lol

#

I somewhat understand most of them, but I'm not sure about the metrics that are usually used for the parameters

#

like if I had a dataset of size x, I couldn't tell how many layers of conv and fc I would need

sterile cliff
#

a small portion, I said a small portion

celest dust
#

ah, and that's the part what my brain decided to skip, even tho u said its VERY VERY important 😂

#

hmm I'll try that, but is it going to work on deeper networks too, like one with a VGG architecture for example?

lean pine
#

I have this problem my "Pclass" is type category, I dont know how to fix it, show if i change it to string or anything else.
Thank advance

modern comet
#

I have this problem in my kaggle notebook. Program is on but output is not updated

sudden nacelle
#

hey guys im using rvc ai vocal cloner in kaggle and im facing the following error..any help would be greatly appreciated,thx

stable current
#

I am on the classic California Housing Dataset trying to predict the median house value. So, the dataset contains NaN values in total bedrooms. I used the Simple Imputer to replace them with median values but when I go on to train the model I still get NaN values. I have checked the dataframe and there aren't any after my preprocessing . But when I try to preprocess and run the model together via a pipeline, I don't know what goes wrong. I have linked the notebook and shared screenshots of the problem I'm facing. Any help will be greatly appreciated. Thanks!
https://colab.research.google.com/drive/1gpaI2xJE2tY0gxEAD1oFGUGsBgRIal5q?usp=sharing

analog bear
#

HI All, This might be a vague question. What criteria do you use to select the correct features?

elder ferry
#

hello, I am a beginner in computer science and my project is: Identification of texts generated by artificial intelligence or by human beings.

I really don't have the qualities to reproduce it I need a mentor or someone to help me
please i really need help please

glass violet
#

Hello everyone,

I have a question related to data science in general. In two words: I have a tabular dataset where target has exponential-like distribution, but from business point of view the most valuable is to predict correctly is right tail (higher than 95 percentile), because it brings about 90% of revenue, but values here sometimes are superhigh (like 10^9)

I wonder how to handle such situation, when you are interested in correct prediction for “outliers”, but don't really care much about the rest 95% of the data?
Currently I thought to train one regression model for cases below 95% and the second one for above 95%, plus classifier for outliers detection to choose the appropriate model.

Any suggestions, links or ideas are welcome.
Thanks a lot guys!

iron lodge
#

Has anyone worked on Deepfakes Detection, if yes reach out to me , need to discuss somethings

crisp gust
#

hi guys :), I wish to create a new collection with a same name as a collection that I deleted before. It keeps on giving me "collection with this name has already been created" (something similar) but I already deleted. Do I have to wait for the next reset (the same time as the GPU/TPU quota reset)? or is there anyway to solve it?

verbal crest
cosmic ginkgo
#

Does anyone know why excel would add curly brackets around a formula that was cut and pasted over several columns as an array?

low elk
#

I'm working on Regression with a Flood Prediction Dataset. I tried to submit predictions by ensembling Ridge Regression and LightGBM, but when I saved & ran the notebook, I encountered the error message 'Your notebook tried to allocate more memory than is available. It has restarted.' Could you please advise on how to resolve this issue?

dull lagoon
#

I want to annotate my image like this one ( upper one is real and bottom one is annotated)

#

This is the image I want to annotate

#

How can I do ? What are the possible ways ?

sterile cliff
#

@analog bearwell there's something called feature engineering, and use correlation according to the output "y", it will help you pick it up statistically.

#

@elder ferrythat's easy, search for something called tesseract. see it's feature, see it's tutorial, it might help you, it might not, but I assure it's something related to text

#

@glass violetwell there's 2 possibilites for this kinda situation
first you can delete and be off with it, because it messes up your data and it's average and median. (it also means that there's people have mistaken their numbers)
second, you can ISOLATE them, and treat them as a different case that you can make your entire data story on it (it might have a great deal or something)

either way, it's your job to see if it's a mistake, or a real deal that the company made alot of money off.

#

@iron lodgeif there's something, tell me, but here's a hint, use neural network and a lot of data(pictures of deepfakes) to be able to detect deepfakes, good luck

#

@crisp gustis that in kaggle?

#

@low elkhttps://stackoverflow.com/questions/62311260/your-notebook-tried-to-allocate-more-memory-than-is-available-it-has-restarted

#

@dull lagoonuse OpenCV library, I used it once in one of my project, but you can learn it and use it to change color and rotate, and also manipulate, technically anything

dull lagoon
sterile cliff
#

@dull lagoonactually I don't have a link, but there's tons of tutorial on youtube, it's easy don't worry.

crisp gust
#

putting a old collection's name will just give an error of "a name with this collection already exist" even though it has been deleted

sterile cliff
#

@crisp gust weird. Jupyter notebook is always a choice

crisp gust
#

wdym?

#

I want to create a new collection to organize them

#

but its so weird, cause it some of it will have only 6 notebooks but having displaying 11 inside it

#

it keeps making me confused, (basically my OCD :P)

dull lagoon
# sterile cliff <@1225105994808430743>actually I don't have a link, but there's tons of tutorial...

I saw one tutorial on yt
( https://youtu.be/UUP_omOSKuc?si=nLv3DL0BQBphyy_a )
But by following this video I was not able to do it perfectly like that sample image

dull lagoon
sterile cliff
#

@crisp gust for I don't know about kaggle stuff yet, I use jupyter as a main. So I can't help ya 😃😃

#

@dull lagoon I know, that's where you will start your search, as I told you openCV is really REALLY powerful tool when it comes to images, if you didn't find the thing you want in this tutorial try another, or learn everything about opencv, until you reach what you want

daring obsidian
#

Regarding the AIMO competition and the new api:
If I am submitting a copy of just this document: https://www.kaggle.com/code/ryanholbrook/aimo-submission-example for evaluation((via the button submit in the submit to competition frame), then test.csv does not have the 50 tests, but just the 3 standard examples. What am I doing wrong?
Would love to get your help. Thanks!

patent tangle
#

if i have a column for storing yes/no status in sql, which one should i use?
Yes, No
YES, NO
1, 0
Y, N

worthy copper
#

Good day everyone, I was wondering something about the progression system, Do you have to complete each task in each category i.e. data, competitions,... or can you just choose one to rank up?

#

Like to rank up your account

trim stone
#

Hello all, i was given the task of carrying out research on
“Analysis of Algorithms, specifically across Artificial Intelligence body of work: understanding applicable scenarios and performance considerations.

You should study algorithms that belong to the following classes: Machine Learning and sub classes such as Deep Learning, LLMs etc

For each of these models and algorithms identified within each descendant subclass, identify the following:
Algorithms and Use Cases (inclusive of performance analysis)
Weakness and associated use risks
Financial Services Applications

You are to come up with a detailed analysis and presentation ”

Can anyone point me to books, videos or research papers that would help me in achieving this task 🙏

trail gorge
stray knot
#

Hey,
I'm working on a recommendation algorithm for clothing products. I chose to go with a content based filtering approach. After much tinkering, I've decided to go with a pretrained vectorizer + autoencoder approach. Basically, Each product in the dataset consists of 5 distinct images (resized and normalised), a general description (100 tokens, padded if necessary), top 5 reviews (100 tokens each, padded if necessary). I plan to pass the images through resnet to obtain the embeddings and then concatenate the 5 embedding vectors to pass them through an autoencoder. Same for the text, except I use BERT to tokenize it. I then pass the two embedding vectors through another autoencoder (much denser) to obtain the final embedding, which I'll use to find similar embeddings in the vector space through cosine similarity.
I have not yet trained this model, but just wanted to get the opinions of an expert, if this is the right approach. Thank you for your time!

half basin
#

I have just started the journey of DS. Can I compete in the competition?

rocky yacht
#

URGENT!!! Please Help!
hello @everyone, I get access to meta's Llama3-8b. But how do I get my access token? Any idea..Please help its urgent!!

wild relic
#

Hi All, I'm a data science student currently creating a personal project and working in IT/automation, looking to learn new skills and expand my portfolio. Apart from competitions, how can I achieve this? I would love to contribute towards some projects and collaborate with others but I don't know how to begin doing that

weary cairn
#

I wanted help with using kaggle! Can anyone guide me!

plucky vector
#

Hello everyone, I would like to ask for advice on a chemistry project I am doing. It involves analysis of many electron microscopy images, which I don't want to measure by hand. Let me explain my procedure so far:

  1. Use ImageJ (FOSS software) to extract image scale from metadata, then threshold the image (divide in foreground and background) and then by ImageJ's Analysis function measure the length, width, circularity etc. of every image (done via script in batch mode).

  2. Load the results after some pre-processing into Kaggle, where I have a trained classifier to distinguish between the categories "rods" (elongated particles), "spheres" (round particles), and "trash" (agglomerated particles or defects due to irregular background, which confused the thresholding).

Here below is a picture illustrating the procedure:

#

This is a representative example picture, showing that at the level of thresholding, two problems occur:

  1. The background is not removed reliably, as you can see in the area of the grey spot. Since the background is different in every picture and the pictures are sadly all done at different size scales, it can not be filtered out easily.

  2. Some particles appear to be sticking together, despite touching only slightly or not touching at all in the original picture. I tried expansion and erosion, but it changed the size of some particles (which I want to measure), and I also tried watershed, but that is an algorithm intended for separation of round particles. Many of my rods are not evenly-coloured in the picture, and get "cut apart" by the watershed algorithm. Because I also want to measure the ratio of rods-to-spheres, this is inacceptable for my purposes.

#

My current approach is to discard all particles sticking together in analysis, which leaves me for some samples with not enough particles to make reliable statistics.

So my question in this context is:
How can Machine Learning /Computer Vision help me with the thresholding step to separate the particles which are close to each other?

sterile cliff
#

@wild relic you can collaborate with me, the only problem is, I will not start now because I am completing a certain courses in feature engineering, you should start as well in machine learning and data analysis

#

@plucky vector see something called openCV library, see its courses and youtube videos, plus I used it once in one of my projects but I didn't use it later, but I believe it's a very powerful tool

#

@weary cairn kaggle as a whole or kaggle how to use it as notebook, please demon

#

Demonstrate

weary cairn
#

How to use notebook datasets etc! and how to earn medals! how to be better at data science using kaggle!

plucky vector
sterile cliff
#

@weary cairn well on the left side of kaggle you will see courses, click on it and you will have abundant of lessons, learn them and you make notebooks and projects that is special and you will be expert if you have done enough

#

Plus people should look to your notebook alot, so don't forget to share what you did

#

@plucky vector only way to find out is to compare, and if didn't get better results theirs always the the option of changing the parameters, putting more layers and looping always works

sterile cliff
#

@weary cairn np

plucky vector
sterile cliff
#

@plucky vector you need to install it first and then import it

#

@plucky vector I think theirs youtube videos explaining those

#

@plucky vector and you welcome

maiden sparrow
#

Hi, I am working on a project of extracting frames from class room lectures. My main focus is on extracting the frames when the whiteboard is full of text or figures drawn by the prof. So usecase is extracting the frames such that 1 hour long video can be summarized in 20-30 full board frames. I needed help related to this as I cannot figure out the full board scenarios. Advices will be really appreciated THANKS

sterile cliff
#

@maiden sparrow very easy, try to make the picture into a complete dataframe, (their will be ALOT of columns and rows because the pixel itself is either from 0 to 255 according to color)
Gather the sums of the pixels in the white board, and gather the sums of the pixels in the full board, and compare

#

THEIRS another way

#

You can get an AI, (not very experienced at that) get a lot of data (example: pictures of a white board name True and pictures of full board named False)

1000 picture or 500 to make it accurate I think

#

Tensorflow I think can make that

plucky vector
grizzled cargo
#

Hey all. I am working on a model for a task. It is a handwriting recognition classification model but I am having issues with the accuracy score. I need to make comparisons, which in my case I am using KNN, Bayes and CNN. I have to include KNN and Bayes. I have been sitting on this thing for 2 weeks and I feel like my brain is becoming more smooth by the minute. If anyone could please help, I don't mind DM's if you require more info.

Thank you in advanced

sterile cliff
#

@grizzled cargofirst of all, do you have enough data of the classes of the handwriting, and even if you have enough data, is that for a particular student, or you are asking about the general handwriting, like the French and British handwriting for the 18th and 17th.

#

can you please let me see the training data and the names on it

tulip tangle
#

Hello all,
I am reading the book Approaching Any ML Problem and I am at the section where he is talking about One-Hot encoding. I am a bit confused about the order of splitting and fitting the encoder. According to the book, it is okay to fit the encoder on full_data that comprises of both df_train and df_valid, but I can't quite digest it. Won't it lead to some form of data contamination and ultimately misleading results? It would be great if someone could clear this tiny thing for me.

Here is the actual code (Pg-110)

# get training data using folds
df_train = df[df.kfold != fold].reset_index(drop=True)
# get validation data using folds
df_valid = df[df.kfold == fold].reset_index(drop=True)
# initialize OneHotEncoder from scikit-learn
ohe = preprocessing.OneHotEncoder()
# fit ohe on training + validation features
full_data = pd.concat(
[df_train[features], df_valid[features]],
axis=0
)
ohe.fit(full_data[features])
# transform training data
x_train = ohe.transform(df_train[features])
# transform validation data
x_valid = ohe.transform(df_valid[features])
tulip tangle
#

oh I think I realise why he'd do that

let's say that there is categorical feature f1 that has attributes ['a1', 'a2', 'a3'], and by sheer luck I did not get a1 in my training set. the one hot encoder in that case would not be able to process a1 when it sees that during validation phase

sterile cliff
#

@tulip tangleeasy, that's because in preprocessing, you need the machine learning module to see hot encoders as a way to understand the categorical data, that needs to be in the test AND train. why?

#

Because the predictions and the test data needs to be the same kind of numbers not categorical with names. So you can compare

tulip tangle
#

right, thanks

rose marsh
#

Hello
I am not getting the sms for verification code to participate in some competitions
Can anyone help regarding this matter?

vague onyx
#

Hello everyone, I am working on a data mining project.
I want to find the best model that can estimate the class of the response column.
I got training dataset with response column and test dataset without response column.

I engineered the data by introducing dummy variables to express categorical data, log-transform data with high variance, and made new columns by adding two related columns.
Also, I used RandomizedSearchCV with RandomForest to find the model with highest CV accuracy.

These questions arose during the process:

  1. How do I determine which columns need log-transformation, and is it better to drop the original columns after applying the log-transformation or keep them?
  2. What are some effective feature selection methods, and how can I determine which columns to apply them to? Similar to question 1, should I keep the original columns after feature selection?
  3. I've used GridSearchCV and RandomizedSearchCV, but I find it challenging to decide the types and ranges of parameters to change. What is the most effective way to find a well-predicting model?

Despite these questions, I just want to find a model with high accuracy. Any help would be greatly appreciated.

noble dragon
#

Hi, Guys, I'm new in the field and started doing some kaggle competetions.
I wanted to know how the team works in a competetion or a project like even in a team I only worked by myself as were the others.

plucky vector
sterile cliff
#

@vague onyx 1-after you apply your the dummies transformation yes you should delete the columns origins and keep the ones after transformation, beware naming matters

2- everything can be a feature the thing is, what is the most logical and sensible one, that's why asking people in the field is very crucial, also you can use correlation and heat map to find how strong the relation of the feature to the response.
3- for loop, you can make a function of that to change the parameters until you have high accuracy, I know that gridsearch is doing that, the thing is it takes time, so the best way is to put very low value, medium value and very high value, like if you want to change the depth of the random forest it will be like this [10, 100, 1000]
AFTER you get which value is having the higher accuracy (for example 100) you will expand your micro and it will be more like [80,90,100,110, etc]
And then you will do stop at the highest accuracy and you will get into the next parameter
This way it will not burn your pc

#

Repeat in the steps in the next parameter

maiden wasp
#

is there any way to test a kaggle agent before is submit it. I submit it but nothing happends

analog skiff
#

Hii everyone!
I am currently working on semantic chunking of a youtube video and have a problem with time align transcript. Can anyone please help with this

sterile cliff
#

@analog skiffthat's new, time aligh transcript, hmmmm, explain more

analog skiff
#

I have got this task to do semantic chunking of a YouTube video. So the process for semantic chunking that was provided to me contain to first download the video then extract its audio and transcript. After that we have to time-align this transcript with audio means to get the transcript for certain amount of time. Then followed by semantic chunking where I have to form its chunk id, chunk_length, text, start_time, end_time. @sterile cliff

sterile cliff
#

@analog skiff damm, it's really related to unsupervised learning, it will take time to revise all of that, damm

#

In other words, I will try and find something related to your problem but it will take time for me and there's no guarantee I will have the answer

analog skiff
#

That's not a problem.. thank you 😊

spring plume
#

Hi everyone, how can I achieve constant time intervals? I'm working with a stock values dataset, so I would like to know what would be the appropiate take 🫡

sterile cliff
#

@spring plumeso you want everything to be 1 day?

sterile cliff
#

@spring plumeyou can use .replace to change the 4 to 1

#

easy peasy

dim torrent
#

interpolate the values in between

spring plume
#

i forward filled

tiny anvil
#

Total beginner, doing the python loops and list comprehension exercise, and I was wondering why my code isnt working. I added comments showing how I think it should work, but I feel like I am missing something basic since I am getting outputs that are way off.

Here is what the functional call resulted in, I dont get why we need to subtract 1 from it.

wary nova
tiny anvil
worldly panther
#

Hey guys. A beginner in ML, I went through some ML courses, and got some basic hands-on practice on Kaggle, got some experience with NumPy, Pandas & basic algorithms.
Now going through a Deep Learning course and got to the point where I need to choose a framework and get my hands dirty with it.

So, the main question is, which should I choose? Tensorflow or PyTorch?
Mainly looking to build, not to research. And main priority is to get into the AI market asap, and so as I understand there is more demand for Tensorflow?

obsidian bone
#

does anyone know how to do gradient ascent in pytorch?

analog bear
#

Does anyone know why I cannot submit my notebook?

verbal crest
analog bear
haughty mulch
#

Hello, Can anyone please tell how many hours a week is a Fellow expected to work for 15 weeks in the KaggleX Fellowship Program?

ashen snow
#

suggest LLM projects from beginner level to advanced level, i really want to get good at this domain

storm moon
#

guys i was trying to use w-okada voice changer with ngrok on kaggle but theres a issue

#
#

but idk theres a few error and i didnt use

#

can someone help me how to fix or is it kaggle issue

sly inlet
#

are you trying to do it locally on jupyter notebook ?

storm moon
#

notebook

#

didnt get this how to use

sly inlet
storm moon
#

open kaggle and create notebook

#

and i pasted these cells code

sly inlet
#

and install the pyngrok module as well

storm moon
#

and i run

sly inlet
storm moon
#

yellow text is

#

WARNING: Error parsing requirements for aiohttp: [Errno 2] No such file or directory: '/opt/conda/lib/python3.10/site-packages/aiohttp-3.9.1.dist-info/METADATA'
/kaggle/working/Hmod

sly inlet
sly inlet
storm moon
#

only it says

#

i just copied whole things

sly inlet
storm moon
#

and runned code cells

sly inlet
#

!pip install pyngrok

#

and change the path as your local path

#

you can't use kaggle path in your local jupyter notebook

storm moon
#

i downloaded notebook w-okada voice changer and ill use via ngrok

sly inlet
storm moon
#

i really dont understand

storm moon
sly inlet
storm moon
#

let me check

#

is it kaggle or jupyter?

sly inlet
sly inlet
#

It looks like jupyter

storm moon
sly inlet
#

Ok

#

So what is the error now

sly inlet
#

What was the problem? What you did

sly inlet
storm moon
storm moon
storm moon
#

it gaves error

#

when i was use collab i can upload my model through model dir folder in drive folders

storm moon
#

in model_dir folder

sly inlet
storm moon
#

w-okada folder

#

how can i say

#

i cant upload my model to w-okada