#💎┊models

1 messages · Page 1 of 1 (latest)

slow kiln
#

We recently worked with the LMSYS team to get Vicuna on Kaggle Models (more to come). If anyone is interested in creating some code examples in Kaggle Notebooks, let me know! I can send a Kaggle swag to a couple of folks who create high quality notebooks. https://www.kaggle.com/models/lmsysorg/vicuna

late rune
#

@surreal salmon sir as i was saying traditional deep learning models uses the activation functions to activate the neurons , where as in liquid neural networks , neuron uses the differential equations

#

Also the model can also resume its learning after being trained on dataset

surreal salmon
late rune
#

Code is written in TensorFlow

#

It contains Equations like these

surreal salmon
#

Thanks for sharing, I'll definitely check it out!

slow kiln
native charm
#

Thanks for sharing. Checking it out. Looks very interesting actually.

sharp marsh
mossy trellis
oblique trail
#

Its my first project

iron hatch
#

does anyone know a good framework that supports adding custom interactions for gpt model like this:

lament quail
#

Does anyone know if there are multimodal large models available on Kaggle ?

slow kiln
vale sand
royal locust
late mesa
#

Can someone remind me, if you have a large number of features, consisting of counts (positive integers, including 0), what is a good method for dimensionality reduction / logistic regression?

#

I vaguely remember, first apply log? And SVD is better than PCA? It's been so long since I studied this at university.

#

Can SVD be applied directly to count data, or should I first do the log transform?

warped yarrow
# late mesa Can someone remind me, if you have a large number of features, consisting of cou...

PCA is fine but you have to normalize the data first. That may be by doing a log transformation for some type of data, but in general the log transformation will not normalize all the data equally well. SVD takes precedence over PCA for sparse data. More modern dimensionality reduction methods - and non-linear - are tSNE, UMAP, self-organizing maps, autoencoders. In many cases they tend to give better visualization, but at the expense of not preserving linear distances between data points.

late mesa
#

Thank you!

rigid vapor
#

We are doing a project based on anomaly detection through video surveillance. Our project is used mainly in sports stadiums to detect anomalies such as assault, explosion, fighting among fans etc. The surveillance video is captured by slave robots, which can reposition themselves autonomously, through cameras. These robots then check for the anomalies. If an anomaly is found, it sends the video footage to a central server for anomaly classification. We want an unsupervised model which takes videos as inputs. It also learns from the live video it detects during deployment. Can anyone suggest a model to be used at the slave robot cameras or at the central servers?

pastel crest
midnight gust
#

Does anyone know a ml model that we can use to count the number of people in a photo? Like a model from hugging face to demonstrate the impact of ml

icy wedge
icy wedge
warped yarrow
# icy wedge Why do you have to normalize data for PCA?

Features must be on the same scale for PCA or their contribution may be calculated incorrectly. https://stats.stackexchange.com/questions/69157/why-do-we-need-to-normalize-data-before-principal-component-analysis-pca

solar gyroBOT
#
grimsqueaker has been warned

Reason: Posted an invite

oblique crypt
#

Been playing about with my model, in regards to setting the best number of iterations, chatgpt says i have a total of 480 combinations as seen in the image, in theroy 480 itterations would cover all bases but resouces, mainly time, this isnt possible for me, what do you think a good number of iterations would be?

warped yarrow
oblique crypt
#

I’ll give it a bash thank you

zinc hemlock
#

Hi, I've been trying to use BERT from kaggle and it's not working for me. I hope this is the right channel to post this. I am trying to use BERT for text classification. Pardon me if I make mistakes, I'm very new to NLP. I've used exactly the example code on kaggle with a softmax dense layer:

text_input = tf.keras.layers.Input(shape=(), dtype=tf.string)
preprocessor = hub.KerasLayer(
    "https://kaggle.com/models/tensorflow/bert/frameworks/TensorFlow2/variations/en-uncased-preprocess/versions/3")
encoder_inputs = preprocessor(text_input)
encoder = hub.KerasLayer(
    "https://www.kaggle.com/models/tensorflow/bert/frameworks/TensorFlow2/variations/bert-en-uncased-l-10-h-128-a-2/versions/2",
    trainable=True)
outputs = encoder(encoder_inputs)

# Neural network layers
l = tf.keras.layers.Dropout(0.1, name="dropout")(outputs['pooled_output'])
l = tf.keras.layers.Dense(4, activation='softmax', name="output")(l)

# Create the model
# Use inputs and outputs to construct a final model
model = tf.keras.Model(inputs=[text_input], outputs = [l])

I'm running this in a kaggle notebook. I get the following error by simply running the cell:```

ValueError: A KerasTensor is symbolic: it's a placeholder for a shape an a dtype. It doesn't have any actual numerical value. You cannot convert it to a NumPy array.``` Thank you!

I have tried disabling tensorflow eager execution and it still gives me the same error.

quartz onyx
#

How long would it take to train 3 ML models at once

pliant veldt
#

I am training debeta-v3-base on my dataset. num-labels==1. The dataset looks like "Bot: ... User: .... Bot: ....." conversation and the target is the score the user gives to the FINAL BOT MESSAGE. I added a newline token

Why did 512 context do worse than 256?! Any ideas?

pliant veldt
graceful falcon
#

Hey guys,

This is Arsalan from CAMB AI -- we've spent the last month building and training the 5th iteration of MARS, which we've now open sourced in English on GitHub https://github.com/camb-ai/mars5-tts

We've have also been featured on VentureBeat: Check it out here.
We'd really love if you guys could check it out and let us know your feedback. Thank you!

GitHub

MARS5 speech model (TTS) from CAMB.AI. Contribute to Camb-ai/MARS5-TTS development by creating an account on GitHub.

left siren
late mesa
#

Also you can use LoRA fine-tuning tech to finetuning llama3

#

If you intested in this, DM me.

rapid kayak
#

Hello,
I am trying to use "shap" to extract details like feature importance from my binary classification model.
Among plots im utilizing, I'm using a force plot, a dependence plot and a summary plot but one of my variables appears to be excluded.
How does one interpret these? I'm kinda fumbling my way through this to find most important features - I want to compare these to a decision tree as well so I'm taking the time to pull up most important features

fossil oxide
#

💥Shoutout to BlackForestLabs for publishing their Flux.1 model on Kaggle! 🎉 The FLUX.1 suite of text-to-image models sets a new standard in image quality, prompt adherence, style diversity, and scene complexity for text-to-image synthesis.

https://www.kaggle.com/models/black-forest-labs/flux

left siren
#

But my data is really domain specific, can you suggest some tabular models

gaunt glacier
#

hey guys im using kaggle to train my voice model and im keep getting thsis error in the screen shot so could someone help me how to fix this?

#

@left siren

dim pelican
#

I added an example of my EfficientNet shrunk model for this plant diseases dataset: https://www.kaggle.com/code/timothylovett/plant-disease-shrunken-efficientnet showing 98% on my stratified splits for validation and test. 96% for the new plant diseases dataset (https://www.kaggle.com/datasets/vipoooool/new-plant-diseases-dataset). I used the github repository that "new plant diseases" one used as a base. I noticed that dataset has some unrealistic augmentations as part of its validation set like changing the color profile of the images completely so opted to use the github source for my training.

Model is 105546 params (412.29 KB), 33 outputs. Not fully TinyML size as I only brought the input size down to (200, 200, 3) so the RAM requirements are still quite large but I wanted to minimize accuracy loss for this example.

dim pelican
#

99% F1 Score Vision Transformer model detecting pavement issues, both have same F1 score, (240, 240, 3), initial model: 21395650 parameters, shrunken 431746 parameters - used stratified splits for training, valid, test to keep them separate and fairly evenly distributed notebook - model - license: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) - I don't know how to do a gradcam for the ViT type yet so left out. 2% of the original size and slightly more accurate.

#

97.3% F1 Score, Input Shape (224, 224, 3), 544,699 parameters (initially 3,847,227 parameters so 14% of the initial size), 523 outputs - EfficientNetLiteB0 base - Not Fully TinyML sized given the parameter count (but smallest I could shrink without accuracy loss) birds model. I used stratified splits on the training folder from the initial dataset to get the train and validation sets and combined the datasets valid and test for a test set (they only included 5 photos for each bird in the valid and test so I felt this would give a more generalized model vs the existing splits). Unfortunately the dataset was set to private for the 525 birds dataset but it was CC0: Public Domain so I've pushed up a private dataset sans 2 bad classes I pruned -- a human one and an unprocessed bird with various aspect ratios. notebook - model - license: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

fossil oxide
#

🗓️ Happy Monday folks! I hope everyone saw the incredibly exciting news that I'll be hosting the Cohere For AI Aya team in an "Ask Me Anything" on Thursday. We are still looking for some great questions to bring to their data scientists. Please contribute your questions here! And be sure to tune in at 11 am EST on Thursday for the conversation! https://www.kaggle.com/discussions/general/542389

fossil oxide
rose ferry
#

🎉 Achievement Unlocked! 🎉
I just completed a COVID-19 detection project using a Convolutional Neural Network (CNN) and achieved an impressive 92% accuracy! 🚀

👨‍💻 Tech Highlights:

Model: CNN 🧠
Dataset: [Covid-19 Image Dataset]
Accuracy: 92% 📊
📈 Super excited to continue exploring AI and deep learning applications in healthcare! 🙌
https://www.kaggle.com/code/ahmedashraf299/covid-19-with-acc-92

light escarp
high elk
#

🚀 Dive into the Future of AI with Marco O1!

I’ve just published a comprehensive deep dive into Alibaba’s groundbreaking AI model, Marco O1, designed for open-ended reasoning. This article unpacks how Marco O1 is setting new standards for developers and innovators with its cutting-edge capabilities.

Whether you’re an AI enthusiast, a developer, or just curious about where open-source tech is heading, this piece covers it all – from core functionalities to its game-changing applications in the real world.

🔗 Check it out here: https://www.linkedin.com/pulse/marco-o1-alibabas-advanced-groundbreaking-ai-model-nalkheda-wala-uhkmf

💡 Trust me, this is more than just an overview – it’s a must-read deep dive for anyone passionate about the future of AI!

Feel free to share your thoughts – would love to know how you see this impacting the tech landscape! 🚀

Explore Alibaba's Marco O1, a groundbreaking AI model for open-ended reasoning. This in-depth analysis covers its capabilities, impact on developers, and future

solid storm
#

guys i need help setting up the kaggle gpu, for some reason it's not working even when it's set on the session

warped plover
#

I'm looking for best efficient and powerful multimodal ai model for complex video analysis through agents can you help me to find out?

stuck lance
#

When will QwQ-32b be added?

stuck lance
last oracle
late mesa
#

Guys. My cnn,vgg16 models gives different accuracy and loss for every time it's retrained. I mean the difference is too big. One time it gets accuracy of 40% and another time it gets 90%. What should I do

rigid zephyr
balmy scarab
#

Need 1bit quantised Gemma 1b 🙇‍♂️

pine glen
gentle stirrup
#

need help in time series modeling
data:

Project  year  Month  MoneyLeft
prj1  2024  1  1000
prj1  2024  2  800
prj1  2024  3  400
prj1  2024  4  100
prj2  2022  3  5000
prj2  2022  4  3493
prj2  2022  5  2000
prj2  2022  6  1000
fabrciate this for 10 to 20 projects ,each prorjecr can have month 12 to month 18
for a new project given moneyLeft  for 2 or 3 months it should predcit next 4 months moneyLeft
the models like ARIMA ,SARIMA ,EXPONENETIAL SMOOTHING  ETC will take only one season or trend,whick means we can train these model only on single project
1 .I have one solution like we can convert this time series problem to regression problem ,we can create lags or windows for three months and can predict for next 4 months , the problem here is it will train on that lags or windows only ,it should also be giving importance for project name (I do not no how to do)

  1. other solution would be we can train the model for each project which is not feasible here in this case
    how to do this
karmic sage
#

code review request - time series using FFT-PCA-XGB and LSTM
Hi folks,
i am new in field and just finishing course data science. As part of final project, I created two notebooks https://www.kaggle.com/code/sheroleg/lstm-motionsense-naya and https://www.kaggle.com/code/sheroleg/xgb-motionsense-naya.
Two different ways to deal this time series.
Will be very appreciate for comments and please point to bugs.
best regards
Oleg Sher
https://www.linkedin.com/in/oleg-sher-802865344/

worldly storm
# gentle stirrup need help in time series modeling data: Project  year  Month  MoneyLeft prj1  ...

Just a suggestion. Try to convert time domain data to frequency domain through FFT (Fourier transform), try an interpretation Frequency and Amplitude. And if you consider the imaginary component it will provide you the phase. The Square of Amplitude can be said of the energy, Amplitude itself being a kind of intensity while reciprocal of frequency could indicate a kind of periodic cycle, and possibly the phase (representing lag or lead) in the range of -180° to +180° could be interpreted as an interpretation of a kind of synchronisation in the cycle. I am providing my intuitive understanding. For example if the frequency is 0.5 when Amplitude is the highest then a remarkable cyclic moment is observed every 2 years. Note this is my intuitive understanding of FTT of a typical time series data like, that of say, daily stock price. Please check up whether this intuitive interpretation suits your data, since I do not know the domain characteristic of your data and related time attributes.

glad plover
#

I am currently in the process of developing a research proposal on disease detection, and I would greatly appreciate your guidance or suggestions to help refine my approach.

thorny rover
balmy scarab
dim pelican
#

https://www.kaggle.com/code/timothylovett/flower-102-tinyml-70k-params - Oxford 102 output flower model using 70k parameters for 87% on the hold out test set. Definitely not as accurate as the multi million models but given the size (about 2% of MobileNetV2) and number of outputs not terrible either (I think I could have got up to 91% if I had augmented it properly initially).

regal aspen
#

Hi, @everybody
I have one question, I'm training ml models for the prediction, which is classification problem of 3 classes, where the number of samples are similar but the predition is skewed.
First class and second class is predicted with low precision tough, third class is never predicted. What's the reason? I can' t find the reason.
Before, when I applyed reinforcement learning, where the three classes were assigned to three actions and one action is never selected, too.
Actually, that is the preeiction model of forex eur/usd.

dim pelican
#

https://www.kaggle.com/code/timothylovett/birds-523-shrunken-model - used the birds 525 (pruned 2 bad classes) dataset from huggingface and trained a model with 523 outputs and around 91% accuracy on the holdout test set. 396,862 (1.51 MB) parameters total for the model. Included the splits I used for training (I exported it as a json file to reduce risk of ever introducing leakage across the sets).

dim pelican
#

160x160: https://www.kaggle.com/code/timothylovett/plant-disease-shrunken-tflite-quantization?scriptVersionId=265717345
128x128: https://www.kaggle.com/code/timothylovett/plant-disease-shrunken-tflite-quantization?scriptVersionId=265793865
160x160: https://studio.edgeimpulse.com/public/782690/v3
128x128: https://studio.edgeimpulse.com/public/782690/v4
https://www.kaggle.com/models/timothylovett/plants-160x160x3-input-tinyml-efficientnetlite

Total params: 75969 (296.75 KB) (160x160: 98% / 128x128: 97%)
Input Size: (160,160,3) / (128,128,3)
Outputs: 33
Quantized: 128.86 KB

Was trained on the PlantVillage dataset (https://github.com/spMohanty/PlantVillage-Dataset/). Initially trained using two outputs the first the plant, second the disease, and later switched to just the one output for the disease. For the training I split the dataset into train, valid, and a holdout validation (I uploaded those splits to Kaggle https://www.kaggle.com/datasets/timothylovett/plantvillage-splits/data). Accuracy of around 98% (160,160,3) / 97% (128,128,3) on the holdout test set post quantization.

The holdout was not utilized at all until this notebook for testing it. I ran CleanVision against the dataset and pruned duplicate images in valid/test prior to testing to reduce leakage affecting the accuracy score.

int8 quantization -- it quantized well given it's using a partial EfficientNetLiteB0 (so relu6 activations which quantize well plus no SE layers relative to EfficientNetB0).

Was a bit of pain as keras 3 was not working for quantization for me (kept crashing) given: https://github.com/tensorflow/tensorflow/issues/63987 https://github.com/tensorflow/tensorflow/issues/64273 but managed to rebuild it and load the weights with the keras 2 override env var and then quantize it.

weak nimbus
#

Hello everyone,

Anyone know how to train the llm model in the local machine?

balmy scarab
red bane
#

can anyone help with my problem with gridsearch?

grand ember
#

Hey everyone, just wanted to share a new baseline we found for the ARC-AGI-2 eval set.

We managed to hit 24% accuracy with a tiny 15M param model (TOPAS-DSPL), which is a pretty big jump over the standard TRM baseline (~8%).

We open-sourced the full training pipeline and the TTT (Test-Time Training) evaluator. If anyone is grinding on the ARC competition, the augmentation pipeline in the repo might be useful for your larger runs.

Repo: https://github.com/Bitterbot-AI/topas_DSLPv1

sharp epoch
sharp epoch
sharp epoch
small sparrow
#

Greetings. I've been searching for projects to build for my resume. I asked AI and it suggested few. I later zeroed in on a "Credit default risk" project using the lendingclub dataset. But I feel like it's overdone, the AI keeps telling me making it production grade is what will make it stand out. But still, if you're reading this what are you thoughts?

#

You can also share projects, especially difficult ones that can make my resume stand out

dense egret
mental prairie
#

The World Has a Data Problem. We Fix It.

Every AI team hits the same wall eventually.
You have the model. You have the architecture. You have the engineers. But you don't have the data, and everything stops.
Maybe your dataset is too small to train on. Maybe it carries sensitive patient records, financial transactions, or personal identifiers that legal won't let you touch. Maybe you've been waiting months for a vendor to deliver labeled data that still isn't ready. Maybe your edge cases are so rare in real life that your model keeps failing exactly where it matters most.
This is not a skill problem. This is a data problem. And it is quietly killing more AI projects than any other single reason.
We generate synthetic data.
Not as a workaround. Not as a compromise. As a legitimate, statistically rigorous alternative that lets your team move again. We produce tabular, text, image, and time-series synthetic datasets that mirror the distributions, correlations, and behavioral patterns of real-world data without exposing a single real record.
We have solved this for teams in healthcare who couldn't share patient data across departments. For fintech companies building fraud detection models with almost no real fraud examples to train on. For startups that needed 10x their dataset size before a funding deadline. For enterprises blocked by GDPR, HIPAA, and compliance teams that said no to everything.
The problem you are sitting with right now, whether it is a privacy blocker, a data scarcity issue, a class imbalance, a regulatory wall, or a timeline that real data collection simply cannot meet, has a solution. We will tell you exactly what it is within 24 hours of hearing from you.
No long sales cycles. No vague proposals. You describe your data problem in plain language, and we come back with a concrete plan.
Send us your situation: [synthox.ai@gmail.com]
The only thing worse than a data problem is spending another month pretending it will resolve itself.

fossil oxide
mental prairie
#

👍

weak drift
#

Hey guys i got a quick question i am building a moe + mla model from scratch on 2x t4 gpus using deepspeed stage 2 with expert parallelism ep_size = 2
Running micro batch size 64 with 4 gradient accum steps at 512 context i am using 8bitadamw and i am going to train the model on fineweb edu cosmopedia v2 openwebmath and wiki datasets with these ratios 0.57 0.23 0.14 0.06 so my question is before i start training i need an experienced guy advice and review on my configs to see if they are optimal or not
NOTE: I am planning to extend the context to 2048 with yarn
Here is my current model config
vocab_size: int = 32000
hidden_size: int = 768
num_layers: int = 16
initializer_range: float = 0.02
tie_word_embeddings: bool = True
max_seq_len: int = 512
max_batch_size: int = 1

RMSNorm
rms_norm_eps: float = 1e-6

RoPE & YaRN 
rope_theta: float = 10000.0
rope_type: str = "default"
beta_slow: float = 1.0
beta_fast: float = 32.0
factor: float = 1.0
mscale: Optional[float] = None
original_max_seq_len: int = 512

MLA
num_attention_heads: int = 8
kv_lora_rank: int = 96
qk_nope_dim: int = 64
qk_rope_dim: int = 32
attn_impl: Literal["sdpa", "flash_attn"] = "sdpa"

MoE
num_experts: int = 8
num_experts_per_token: int = 2
moe_intermediate_size: int = 1024
num_shared_experts: int = 1