#💎┊models | Kaggle | Page 1

slow kiln Aug 7, 2023, 6:20 PM

#

We recently worked with the LMSYS team to get Vicuna on Kaggle Models (more to come). If anyone is interested in creating some code examples in Kaggle Notebooks, let me know! I can send a Kaggle swag to a couple of folks who create high quality notebooks. https://www.kaggle.com/models/lmsysorg/vicuna

vicuna

Vicuna is a chat assistant trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT

late rune Aug 8, 2023, 7:50 PM

#

@surreal salmon sir as i was saying traditional deep learning models uses the activation functions to activate the neurons , where as in liquid neural networks , neuron uses the differential equations

#

Also the model can also resume its learning after being trained on dataset

surreal salmon Aug 8, 2023, 8:01 PM

#

late rune <@1101209061871067309> sir as i was saying traditional deep learning models uses...

Sounds interesting! Do you know of any papers or resources to learn more about them?

late rune Aug 8, 2023, 8:05 PM

#

surreal salmon Sounds interesting! Do you know of any papers or resources to learn more about t...

📎 2006.04439.pdf

#

https://github.com/raminmh/liquid_time_constant_networks This is the code repo

GitHub

GitHub - raminmh/liquid_time_constant_networks: Code Repository for...

Code Repository for Liquid Time-Constant Networks (LTCs) - GitHub - raminmh/liquid_time_constant_networks: Code Repository for Liquid Time-Constant Networks (LTCs)

#

Code is written in TensorFlow

#

It contains Equations like these

#

#

#

https://blog.roboflow.com/liquid-neural-netowrks/

Roboflow Blog

Liquid Neural Networks in Computer Vision

Excitement is building in the artificial intelligence community around MIT's recent release of liquid neural networks. The breakthroughs that Hasani and team have made are incredible. In this post, we will discuss the new liquid neural networks and what they might mean for the vision field.

surreal salmon Aug 8, 2023, 8:37 PM

#

Thanks for sharing, I'll definitely check it out!

slow kiln Aug 28, 2023, 2:28 PM

#

Tracking LLM + RLHF architecture openness https://opening-up-chatgpt.github.io/

native charm Sep 6, 2023, 10:02 PM

#

Thanks for sharing. Checking it out. Looks very interesting actually.

sharp marsh Sep 12, 2023, 12:36 PM

#

Some of us may get confused by the variety of the Bert models in Huggingface. I am definitely one of them.
Therefore I did some googling and researching and summarised my findings in this post.

Please feel free to give it a look. and if you find it helpful, please give it an upvote.🙂
https://www.kaggle.com/datasets/xhlulu/huggingface-bert/discussion/438733

Huggingface BERT

BERT models directly retrieved and updated from: https://huggingface.co/

mossy trellis Sep 20, 2023, 6:31 AM

#

Are all Generative AI and LLM conversations taking place in the #💾┊data and #💎┊models channels only?

oblique trail Sep 22, 2023, 5:44 PM

#

Not too good on terminology but Ive started work on a self driving car lol. Has anyone done anything similar?

#

Its my first project

iron hatch Oct 20, 2023, 7:15 PM

#

does anyone know a good framework that supports adding custom interactions for gpt model like this:

lament quail Oct 25, 2023, 12:35 PM

#

Does anyone know if there are multimodal large models available on Kaggle ?

slow kiln Oct 26, 2023, 2:33 AM

#

lament quail Does anyone know if there are multimodal large models available on Kaggle ?

i don't think so ... yet! which would you like to see?

vale sand Oct 26, 2023, 5:46 PM

#

iron hatch does anyone know a good framework that supports adding custom interactions for g...

Check out the ChatGPT API. Such custom instructions would be provided in a system role. https://platform.openai.com/docs/guides/gpt/chat-completions-api

royal locust Nov 11, 2023, 7:24 PM

#

https://www.kaggle.com/code/ayeshairshadcoder/big-mart-sales-prediction

My model is Overfitting .... how to deal with it

big mart sales prediction

Explore and run machine learning code with Kaggle Notebooks | Using data from BigMart Sales Data

late mesa Nov 13, 2023, 9:37 AM

#

Can someone remind me, if you have a large number of features, consisting of counts (positive integers, including 0), what is a good method for dimensionality reduction / logistic regression?

#

I vaguely remember, first apply log? And SVD is better than PCA? It's been so long since I studied this at university.

#

Can SVD be applied directly to count data, or should I first do the log transform?

warped yarrow Nov 13, 2023, 9:45 PM

#

late mesa Can someone remind me, if you have a large number of features, consisting of cou...

PCA is fine but you have to normalize the data first. That may be by doing a log transformation for some type of data, but in general the log transformation will not normalize all the data equally well. SVD takes precedence over PCA for sparse data. More modern dimensionality reduction methods - and non-linear - are tSNE, UMAP, self-organizing maps, autoencoders. In many cases they tend to give better visualization, but at the expense of not preserving linear distances between data points.

late mesa Nov 14, 2023, 10:32 AM

#

Thank you!

rigid vapor Nov 19, 2023, 3:12 PM

#

We are doing a project based on anomaly detection through video surveillance. Our project is used mainly in sports stadiums to detect anomalies such as assault, explosion, fighting among fans etc. The surveillance video is captured by slave robots, which can reposition themselves autonomously, through cameras. These robots then check for the anomalies. If an anomaly is found, it sends the video footage to a central server for anomaly classification. We want an unsupervised model which takes videos as inputs. It also learns from the live video it detects during deployment. Can anyone suggest a model to be used at the slave robot cameras or at the central servers?

pastel crest Nov 24, 2023, 9:54 AM

#

royal locust https://www.kaggle.com/code/ayeshairshadcoder/big-mart-sales-prediction My mode...

Sorry can't help you with the results with your train and test data but your data analysis work within the project is great. Can you share the resources for data analysis ?

midnight gust Dec 11, 2023, 4:34 PM

#

Does anyone know a ml model that we can use to count the number of people in a photo? Like a model from hugging face to demonstrate the impact of ml

vital junco Feb 22, 2024, 9:55 PM

#

oblique trail Not too good on terminology but Ive started work on a self driving car lol. Has ...

Nice

icy wedge Mar 8, 2024, 8:41 AM

#

warped yarrow PCA is fine but you have to normalize the data first. That may be by doing a log...

Why do you have to normalize data for PCA?

icy wedge Mar 8, 2024, 8:48 AM

#

midnight gust Does anyone know a ml model that we can use to count the number of people in a p...

I am not super sure about this but maybe you can try with a conolutional neural network with a vertical edges detection kernel

midnight gust Mar 8, 2024, 8:49 AM

#

icy wedge I am not super sure about this but maybe you can try with a conolutional neural ...

Ohhh great thanks

warped yarrow Mar 9, 2024, 12:43 AM

#

icy wedge Why do you have to normalize data for PCA?

Features must be on the same scale for PCA or their contribution may be calculated incorrectly. https://stats.stackexchange.com/questions/69157/why-do-we-need-to-normalize-data-before-principal-component-analysis-pca

Cross Validated

Why do we need to normalize data before principal component analysi...

I'm doing principal component analysis on my dataset and my professor told me that I should normalize the data before doing the analysis. Why?
What would happen If I did PCA without normalization?...

solar gyroBOT Mar 31, 2024, 8:39 AM

#

grimsqueaker has been warned

Reason: Posted an invite

oblique crypt Apr 1, 2024, 4:13 PM

#

Been playing about with my model, in regards to setting the best number of iterations, chatgpt says i have a total of 480 combinations as seen in the image, in theroy 480 itterations would cover all bases but resouces, mainly time, this isnt possible for me, what do you think a good number of iterations would be?

warped yarrow Apr 1, 2024, 5:26 PM

#

oblique crypt Been playing about with my model, in regards to setting the best number of itera...

I suggest you try Bayesian optimization, which should be able to find a near-optimal combination of parameters in 50-100 iterations. Look for Hyperopt and BayesOpt.

oblique crypt Apr 1, 2024, 5:27 PM

#

I’ll give it a bash thank you

zinc hemlock Apr 6, 2024, 7:12 PM

#

Hi, I've been trying to use BERT from kaggle and it's not working for me. I hope this is the right channel to post this. I am trying to use BERT for text classification. Pardon me if I make mistakes, I'm very new to NLP. I've used exactly the example code on kaggle with a softmax dense layer:

text_input = tf.keras.layers.Input(shape=(), dtype=tf.string)
preprocessor = hub.KerasLayer(
    "https://kaggle.com/models/tensorflow/bert/frameworks/TensorFlow2/variations/en-uncased-preprocess/versions/3")
encoder_inputs = preprocessor(text_input)
encoder = hub.KerasLayer(
    "https://www.kaggle.com/models/tensorflow/bert/frameworks/TensorFlow2/variations/bert-en-uncased-l-10-h-128-a-2/versions/2",
    trainable=True)
outputs = encoder(encoder_inputs)

# Neural network layers
l = tf.keras.layers.Dropout(0.1, name="dropout")(outputs['pooled_output'])
l = tf.keras.layers.Dense(4, activation='softmax', name="output")(l)

# Create the model
# Use inputs and outputs to construct a final model
model = tf.keras.Model(inputs=[text_input], outputs = [l])

I'm running this in a kaggle notebook. I get the following error by simply running the cell:```

ValueError: A KerasTensor is symbolic: it's a placeholder for a shape an a dtype. It doesn't have any actual numerical value. You cannot convert it to a NumPy array.``` Thank you!

I have tried disabling tensorflow eager execution and it still gives me the same error.

quartz onyx Apr 19, 2024, 3:21 AM

#

How long would it take to train 3 ML models at once

pliant veldt Apr 29, 2024, 3:34 PM

#

I am training debeta-v3-base on my dataset. num-labels==1. The dataset looks like "Bot: ... User: .... Bot: ....." conversation and the target is the score the user gives to the FINAL BOT MESSAGE. I added a newline token

Why did 512 context do worse than 256?! Any ideas?

pliant veldt Apr 29, 2024, 9:34 PM

#

graceful falcon Jun 11, 2024, 7:38 PM

#

Hey guys,

This is Arsalan from CAMB AI -- we've spent the last month building and training the 5th iteration of MARS, which we've now open sourced in English on GitHub https://github.com/camb-ai/mars5-tts

We've have also been featured on VentureBeat: Check it out here.
We'd really love if you guys could check it out and let us know your feedback. Thank you!

GitHub

GitHub - Camb-ai/MARS5-TTS: MARS5 speech model (TTS) from CAMB.AI

MARS5 speech model (TTS) from CAMB.AI. Contribute to Camb-ai/MARS5-TTS development by creating an account on GitHub.

left siren Jun 13, 2024, 6:56 AM

#

Hello, How can I finetune llama3 for MultiLabel classfication. Do I need to follow the same prompt format for each row as mentioned https://www.kaggle.com/code/danielhanchen/kaggle-llama-3-8b-unsloth-notebook/notebookhere.
Or is there any other method like we do for BERT?

late mesa Jul 20, 2024, 1:23 PM

#

Also you can use LoRA fine-tuning tech to finetuning llama3

#

If you intested in this, DM me.

rapid kayak Aug 5, 2024, 9:08 PM

#

Hello,
I am trying to use "shap" to extract details like feature importance from my binary classification model.
Among plots im utilizing, I'm using a force plot, a dependence plot and a summary plot but one of my variables appears to be excluded.
How does one interpret these? I'm kinda fumbling my way through this to find most important features - I want to compare these to a decision tree as well so I'm taking the time to pull up most important features

fossil oxide Aug 29, 2024, 6:33 PM

#

💥Shoutout to BlackForestLabs for publishing their Flux.1 model on Kaggle! 🎉 The FLUX.1 suite of text-to-image models sets a new standard in image quality, prompt adherence, style diversity, and scene complexity for text-to-image synthesis.

https://www.kaggle.com/models/black-forest-labs/flux

Black Forest Labs | FLUX.1 | Kaggle

The FLUX.1 models are 12 billion parameter rectified flow transformers that generate images given a text prompt.

left siren Sep 1, 2024, 8:40 PM

#

But my data is really domain specific, can you suggest some tabular models

gaunt glacier Sep 3, 2024, 6:15 PM

#

hey guys im using kaggle to train my voice model and im keep getting thsis error in the screen shot so could someone help me how to fix this?

Screenshot_2024-09-03_at_11.42.46_PM.png

#

@left siren

dim pelican Sep 19, 2024, 12:07 AM

#

I added an example of my EfficientNet shrunk model for this plant diseases dataset: https://www.kaggle.com/code/timothylovett/plant-disease-shrunken-efficientnet showing 98% on my stratified splits for validation and test. 96% for the new plant diseases dataset (https://www.kaggle.com/datasets/vipoooool/new-plant-diseases-dataset). I used the github repository that "new plant diseases" one used as a base. I noticed that dataset has some unrealistic augmentations as part of its validation set like changing the color profile of the images completely so opted to use the github source for my training.

Model is 105546 params (412.29 KB), 33 outputs. Not fully TinyML size as I only brought the input size down to (200, 200, 3) so the RAM requirements are still quite large but I wanted to minimize accuracy loss for this example.

dim pelican Oct 8, 2024, 1:36 PM

#

99% F1 Score Vision Transformer model detecting pavement issues, both have same F1 score, (240, 240, 3), initial model: 21395650 parameters, shrunken 431746 parameters - used stratified splits for training, valid, test to keep them separate and fairly evenly distributed notebook - model - license: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) - I don't know how to do a gradcam for the ViT type yet so left out. 2% of the original size and slightly more accurate.

#

97.3% F1 Score, Input Shape (224, 224, 3), 544,699 parameters (initially 3,847,227 parameters so 14% of the initial size), 523 outputs - EfficientNetLiteB0 base - Not Fully TinyML sized given the parameter count (but smallest I could shrink without accuracy loss) birds model. I used stratified splits on the training folder from the initial dataset to get the train and validation sets and combined the datasets valid and test for a test set (they only included 5 photos for each bird in the valid and test so I felt this would give a more generalized model vs the existing splits). Unfortunately the dataset was set to private for the 525 birds dataset but it was CC0: Public Domain so I've pushed up a private dataset sans 2 bad classes I pruned -- a human one and an unprocessed bird with various aspect ratios. notebook - model - license: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

fossil oxide Oct 28, 2024, 1:26 PM

#

🗓️ Happy Monday folks! I hope everyone saw the incredibly exciting news that I'll be hosting the Cohere For AI Aya team in an "Ask Me Anything" on Thursday. We are still looking for some great questions to bring to their data scientists. Please contribute your questions here! And be sure to tune in at 11 am EST on Thursday for the conversation! https://www.kaggle.com/discussions/general/542389

Aya Expanse Ask Me Anything - SUBMIT YOUR QUESTIONS HERE! | Kaggle

Aya Expanse Ask Me Anything - SUBMIT YOUR QUESTIONS HERE!.

fossil oxide Oct 29, 2024, 3:15 PM

#

New model alert! We've added some awesome models to Kaggle's model hub - and can't see how you use them! 🤹 We've got new offerings from Meta, Deepseek and InternAI! https://www.kaggle.com/discussions/general/543266

5 models added: Deepseek, Meta and Intern AI | Kaggle

5 models added: Deepseek, Meta and Intern AI.

rose ferry Nov 19, 2024, 12:43 PM

#

🎉 Achievement Unlocked! 🎉
I just completed a COVID-19 detection project using a Convolutional Neural Network (CNN) and achieved an impressive 92% accuracy! 🚀

👨‍💻 Tech Highlights:

Model: CNN 🧠
Dataset: [Covid-19 Image Dataset]
Accuracy: 92% 📊
📈 Super excited to continue exploring AI and deep learning applications in healthcare! 🙌
https://www.kaggle.com/code/ahmedashraf299/covid-19-with-acc-92

Covid-19 with acc<<92

Explore and run machine learning code with Kaggle Notebooks | Using data from Covid-19 Image Dataset

light escarp Nov 20, 2024, 10:12 AM

#

rose ferry 🎉 Achievement Unlocked! 🎉 I just completed a COVID-19 detection project using ...

DM if interested in spine imaging analysis

high elk Nov 25, 2024, 6:38 AM

#

🚀 Dive into the Future of AI with Marco O1!

I’ve just published a comprehensive deep dive into Alibaba’s groundbreaking AI model, Marco O1, designed for open-ended reasoning. This article unpacks how Marco O1 is setting new standards for developers and innovators with its cutting-edge capabilities.

Whether you’re an AI enthusiast, a developer, or just curious about where open-source tech is heading, this piece covers it all – from core functionalities to its game-changing applications in the real world.

🔗 Check it out here: https://www.linkedin.com/pulse/marco-o1-alibabas-advanced-groundbreaking-ai-model-nalkheda-wala-uhkmf

💡 Trust me, this is more than just an overview – it’s a must-read deep dive for anyone passionate about the future of AI!

Feel free to share your thoughts – would love to know how you see this impacting the tech landscape! 🚀

🚀 Marco O1: Alibaba’s Advanced & Groundbreaking AI Model for Open-...

Explore Alibaba's Marco O1, a groundbreaking AI model for open-ended reasoning. This in-depth analysis covers its capabilities, impact on developers, and future

solid storm Mar 4, 2025, 2:11 AM

#

guys i need help setting up the kaggle gpu, for some reason it's not working even when it's set on the session

warped plover Mar 4, 2025, 8:05 AM

#

I'm looking for best efficient and powerful multimodal ai model for complex video analysis through agents can you help me to find out?

stuck lance Mar 6, 2025, 8:16 AM

#

When will QwQ-32b be added?

stuck lance Mar 6, 2025, 12:50 PM

#

stuck lance When will QwQ-32b be added?

https://www.kaggle.com/models/qwen-lm/qwq-32b
Added 6 hours ago itself.

QwenLM | QwQ-32B | Kaggle

QwQ is the reasoning model of the Qwen series which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems.

last oracle Mar 28, 2025, 8:13 PM

#

I have a set of models That I've been doing over the past few days check them out they are for scientific applications https://www.kaggle.com/allanwandia/models

Allanatrix | Contributor

I'm just your average dude doing his best to contribute to the Data science community

late mesa Apr 3, 2025, 3:13 PM

#

Guys. My cnn,vgg16 models gives different accuracy and loss for every time it's retrained. I mean the difference is too big. One time it gets accuracy of 40% and another time it gets 90%. What should I do

rigid zephyr Apr 9, 2025, 3:46 PM

#

Here's my 100% accuracy Naive Bayes Model, if this helped you don't forget to upvote 🙂 https://www.kaggle.com/code/lucipils/naive-bayes-100-accuracy

NAIVE BAYES - 100% ACCURACY

Explore and run machine learning code with Kaggle Notebooks | Using data from Play Badminton

balmy scarab Apr 12, 2025, 7:26 AM

#

Need 1bit quantised Gemma 1b 🙇‍♂️

pine glen Apr 19, 2025, 9:19 AM

#

https://youtube.com/shorts/72Zjsb3deOM?si=ubqCxYL6-Nv0b3E6

YouTube

Pria Bijaksana

Kami Tak Sempurna: Pria Seratus Persen Pernah Tersesat Jadi Budak N...

#fyp #shorts #priabijaksana #priaseratuspersen #raymondchin #timothyronald

▶ Play video

gentle stirrup Apr 26, 2025, 4:27 AM

#

need help in time series modeling
data:

Project year Month MoneyLeft
prj1 2024 1 1000
prj1 2024 2 800
prj1 2024 3 400
prj1 2024 4 100
prj2 2022 3 5000
prj2 2022 4 3493
prj2 2022 5 2000
prj2 2022 6 1000
fabrciate this for 10 to 20 projects ,each prorjecr can have month 12 to month 18
for a new project given moneyLeft for 2 or 3 months it should predcit next 4 months moneyLeft
the models like ARIMA ,SARIMA ,EXPONENETIAL SMOOTHING ETC will take only one season or trend,whick means we can train these model only on single project
1 .I have one solution like we can convert this time series problem to regression problem ,we can create lags or windows for three months and can predict for next 4 months , the problem here is it will train on that lags or windows only ,it should also be giving importance for project name (I do not no how to do)

other solution would be we can train the model for each project which is not feasible here in this case
how to do this

karmic sage May 3, 2025, 11:16 AM

#

code review request - time series using FFT-PCA-XGB and LSTM
Hi folks,
i am new in field and just finishing course data science. As part of final project, I created two notebooks https://www.kaggle.com/code/sheroleg/lstm-motionsense-naya and https://www.kaggle.com/code/sheroleg/xgb-motionsense-naya.
Two different ways to deal this time series.
Will be very appreciate for comments and please point to bugs.
best regards
Oleg Sher
https://www.linkedin.com/in/oleg-sher-802865344/

lstm_motionsense_Naya

Explore and run machine learning code with Kaggle Notebooks | Using data from MotionSense Dataset : Smartphone Sensor Data - HAR

worldly storm Jun 10, 2025, 4:39 PM

#

gentle stirrup need help in time series modeling data: Project year Month MoneyLeft prj1 ...

Just a suggestion. Try to convert time domain data to frequency domain through FFT (Fourier transform), try an interpretation Frequency and Amplitude. And if you consider the imaginary component it will provide you the phase. The Square of Amplitude can be said of the energy, Amplitude itself being a kind of intensity while reciprocal of frequency could indicate a kind of periodic cycle, and possibly the phase (representing lag or lead) in the range of -180° to +180° could be interpreted as an interpretation of a kind of synchronisation in the cycle. I am providing my intuitive understanding. For example if the frequency is 0.5 when Amplitude is the highest then a remarkable cyclic moment is observed every 2 years. Note this is my intuitive understanding of FTT of a typical time series data like, that of say, daily stock price. Please check up whether this intuitive interpretation suits your data, since I do not know the domain characteristic of your data and related time attributes.

glad plover Jul 14, 2025, 7:09 AM

#

I am currently in the process of developing a research proposal on disease detection, and I would greatly appreciate your guidance or suggestions to help refine my approach.

thorny rover Sep 4, 2025, 1:13 AM

#

glad plover I am currently in the process of developing a research proposal on disease detec...

hi this is good does you need a colaborator thanks

balmy scarab Sep 8, 2025, 1:51 PM

#

im currently in the process of developing a framework for developing opensource alts to proprietary llms using superweights https://github.com/Ash-Blanc/paper2sw

dim pelican Sep 10, 2025, 11:23 PM

#

https://www.kaggle.com/code/timothylovett/flower-102-tinyml-70k-params - Oxford 102 output flower model using 70k parameters for 87% on the hold out test set. Definitely not as accurate as the multi million models but given the size (about 2% of MobileNetV2) and number of outputs not terrible either (I think I could have got up to 91% if I had augmented it properly initially).

regal aspen Sep 15, 2025, 6:57 PM

#

Hi, @everybody
I have one question, I'm training ml models for the prediction, which is classification problem of 3 classes, where the number of samples are similar but the predition is skewed.
First class and second class is predicted with low precision tough, third class is never predicted. What's the reason? I can' t find the reason.
Before, when I applyed reinforcement learning, where the three classes were assigned to three actions and one action is never selected, too.
Actually, that is the preeiction model of forex eur/usd.

dim pelican Sep 30, 2025, 4:28 PM

#

https://www.kaggle.com/code/timothylovett/birds-523-shrunken-model - used the birds 525 (pruned 2 bad classes) dataset from huggingface and trained a model with 523 outputs and around 91% accuracy on the holdout test set. 396,862 (1.51 MB) parameters total for the model. Included the splits I used for training (I exported it as a json file to reduce risk of ever introducing leakage across the sets).

dim pelican Oct 5, 2025, 3:29 AM

#

160x160: https://www.kaggle.com/code/timothylovett/plant-disease-shrunken-tflite-quantization?scriptVersionId=265717345
128x128: https://www.kaggle.com/code/timothylovett/plant-disease-shrunken-tflite-quantization?scriptVersionId=265793865
160x160: https://studio.edgeimpulse.com/public/782690/v3
128x128: https://studio.edgeimpulse.com/public/782690/v4
https://www.kaggle.com/models/timothylovett/plants-160x160x3-input-tinyml-efficientnetlite

Total params: 75969 (296.75 KB) (160x160: 98% / 128x128: 97%)
Input Size: (160,160,3) / (128,128,3)
Outputs: 33
Quantized: 128.86 KB

Was trained on the PlantVillage dataset (https://github.com/spMohanty/PlantVillage-Dataset/). Initially trained using two outputs the first the plant, second the disease, and later switched to just the one output for the disease. For the training I split the dataset into train, valid, and a holdout validation (I uploaded those splits to Kaggle https://www.kaggle.com/datasets/timothylovett/plantvillage-splits/data). Accuracy of around 98% (160,160,3) / 97% (128,128,3) on the holdout test set post quantization.

The holdout was not utilized at all until this notebook for testing it. I ran CleanVision against the dataset and pruned duplicate images in valid/test prior to testing to reduce leakage affecting the accuracy score.

int8 quantization -- it quantized well given it's using a partial EfficientNetLiteB0 (so relu6 activations which quantize well plus no SE layers relative to EfficientNetB0).

Was a bit of pain as keras 3 was not working for quantization for me (kept crashing) given: https://github.com/tensorflow/tensorflow/issues/63987 https://github.com/tensorflow/tensorflow/issues/64273 but managed to rebuild it and load the weights with the keras 2 override env var and then quantize it.

weak nimbus Nov 20, 2025, 2:36 PM

#

Hello everyone,

Anyone know how to train the llm model in the local machine?

balmy scarab Dec 5, 2025, 12:16 PM

#

https://github.com/Ash-Blanc/cocoblt do give feedback and contributions welcome too

red bane Dec 14, 2025, 5:17 PM

#

can anyone help with my problem with gridsearch?

grand ember Dec 30, 2025, 9:37 PM

#

Hey everyone, just wanted to share a new baseline we found for the ARC-AGI-2 eval set.

We managed to hit 24% accuracy with a tiny 15M param model (TOPAS-DSPL), which is a pretty big jump over the standard TRM baseline (~8%).

We open-sourced the full training pipeline and the TTT (Test-Time Training) evaluator. If anyone is grinding on the ARC competition, the augmentation pipeline in the repo might be useful for your larger runs.

Repo: https://github.com/Bitterbot-AI/topas_DSLPv1

sharp epoch Feb 17, 2026, 12:06 PM

#

please vote for my notebook.
https://www.kaggle.com/code/hammadansari7/pakistan-air-quality-crisis-2025-2026

sharp epoch Feb 19, 2026, 11:20 AM

#

Assalam o alikum. @everyone
Ramazan Mubarik
https://www.kaggle.com/code/hammadansari7/milk-s-effect-on-human-health

sharp epoch Feb 21, 2026, 4:39 PM

#

Assalam o alikum!
please vote my notebook.
https://www.kaggle.com/code/hammadansari7/global-electricity-access-economic-indicators

small sparrow Feb 26, 2026, 9:52 PM

#

Greetings. I've been searching for projects to build for my resume. I asked AI and it suggested few. I later zeroed in on a "Credit default risk" project using the lendingclub dataset. But I feel like it's overdone, the AI keeps telling me making it production grade is what will make it stand out. But still, if you're reading this what are you thoughts?

#

You can also share projects, especially difficult ones that can make my resume stand out

dense egret Mar 18, 2026, 3:45 PM

#

8 specialized AI model types
https://x.com/ingliguori/status/2033610508582973696?s=20

mental prairie May 24, 2026, 9:59 AM

#

The World Has a Data Problem. We Fix It.

Every AI team hits the same wall eventually.
You have the model. You have the architecture. You have the engineers. But you don't have the data, and everything stops.
Maybe your dataset is too small to train on. Maybe it carries sensitive patient records, financial transactions, or personal identifiers that legal won't let you touch. Maybe you've been waiting months for a vendor to deliver labeled data that still isn't ready. Maybe your edge cases are so rare in real life that your model keeps failing exactly where it matters most.
This is not a skill problem. This is a data problem. And it is quietly killing more AI projects than any other single reason.
We generate synthetic data.
Not as a workaround. Not as a compromise. As a legitimate, statistically rigorous alternative that lets your team move again. We produce tabular, text, image, and time-series synthetic datasets that mirror the distributions, correlations, and behavioral patterns of real-world data without exposing a single real record.
We have solved this for teams in healthcare who couldn't share patient data across departments. For fintech companies building fraud detection models with almost no real fraud examples to train on. For startups that needed 10x their dataset size before a funding deadline. For enterprises blocked by GDPR, HIPAA, and compliance teams that said no to everything.
The problem you are sitting with right now, whether it is a privacy blocker, a data scarcity issue, a class imbalance, a regulatory wall, or a timeline that real data collection simply cannot meet, has a solution. We will tell you exactly what it is within 24 hours of hearing from you.
No long sales cycles. No vague proposals. You describe your data problem in plain language, and we come back with a concrete plan.
Send us your situation: [synthox.ai@gmail.com]
The only thing worse than a data problem is spending another month pretending it will resolve itself.

fossil oxide May 27, 2026, 3:04 PM

#

mental prairie **The World Has a Data Problem. We Fix It.** Every AI team hits the same wall e...

It is against server rules to post the same thing in multiple channels. Please do not.

mental prairie May 27, 2026, 3:45 PM

#

👍

weak drift May 28, 2026, 10:13 AM

#

Hey guys i got a quick question i am building a moe + mla model from scratch on 2x t4 gpus using deepspeed stage 2 with expert parallelism ep_size = 2
Running micro batch size 64 with 4 gradient accum steps at 512 context i am using 8bitadamw and i am going to train the model on fineweb edu cosmopedia v2 openwebmath and wiki datasets with these ratios 0.57 0.23 0.14 0.06 so my question is before i start training i need an experienced guy advice and review on my configs to see if they are optimal or not
NOTE: I am planning to extend the context to 2048 with yarn
Here is my current model config
vocab_size: int = 32000
hidden_size: int = 768
num_layers: int = 16
initializer_range: float = 0.02
tie_word_embeddings: bool = True
max_seq_len: int = 512
max_batch_size: int = 1

RMSNorm
rms_norm_eps: float = 1e-6

RoPE & YaRN 
rope_theta: float = 10000.0
rope_type: str = "default"
beta_slow: float = 1.0
beta_fast: float = 32.0
factor: float = 1.0
mscale: Optional[float] = None
original_max_seq_len: int = 512

MLA
num_attention_heads: int = 8
kv_lora_rank: int = 96
qk_nope_dim: int = 64
qk_rope_dim: int = 32
attn_impl: Literal["sdpa", "flash_attn"] = "sdpa"

MoE
num_experts: int = 8
num_experts_per_token: int = 2
moe_intermediate_size: int = 1024
num_shared_experts: int = 1