#computer-vision
1 messages · Page 1 of 1 (latest)
a brain tumor mri classifier
Maybe try asking in #❓┊ask-a-question ? Or even Streamlit's community forums. I'm not able to tell what part of your question is relevant to computer vision.
My app is based on computer vision but you are right
Exciting, I've worked in Radiography based classifiers
Hi @ Everyone, I am working on a usecase related to image detection of a battery fluid level and calculating the percentage of fluid. Any help in this regard is highly appreciated. Please let me know if you have any ideas/suggestion to handle this usecase.
I’m working on building a CV model to aid farmers. Would deploy on streamlit and mobile 🙂
Hello Everyone great to be here😃👍
Hi everyone
I am currently working over a library called openpose
Need help regarding the basis concepts
Sorry. Haven't heard of it.
can you be a bit more specific about what help you need?
@visual quartz actually I am working on live attendence system
Which would recognise the head , hand position and calculate the time and mark system .
okay, and what is not working for you? Or is everything working and you want someone to review your code?
can visual prompting be used for image classification
Hello everyone im working on a pose estimation model that can be used to detect/diagnose early onset nuerodegenerative conditions. I have a working model but really finding it hard to get my datasets/videos one particular platform dementias platform UK has a 28 day waiting list. Any leads to where i can get video data would be awesome
Anyone working on remote sensing + deep learning?
Hi.
I'm looking for some image dataset about small plastics on the beach.
The idea is identify plastic straw, candy wrap, popsicle packaging, plastic bag, bottle cap, plastic label, etc.
Can someone point me a project about or similar to this?
Thank you.
need in harry a satellite dataset for a segmentation graduating project!
Anyone working on https://www.kaggle.com/competitions/rsna-2022-cervical-spine-fracture-detection/overview?
Identify cervical fractures from scans
i will do segmentation on xDB dataset which contains images and masks each one has an image or mask pre-disaster and post-disaster how I preprocess this data and make it ready for my model and how the model will be so I can give it the two images before and after and their labels to give me the final mask ??
Hello Kagglers. Who can help me with the point cloud reconstruction?
what technique are you looking to use
there's this really cool paper called RL-GAN-Net
Is it the best approach?
I don't think so, it was a paper from 2019 iirc and 4 years is a long time in deep learning
But it's definitely very interesting, and holds a lot of potential imo
if you're going for a production use tho, something like samplenet or yolo3D might be better
I haven't done a benchmark study of this area but these are some interesting things to check out ig ^^
ty
@everyone I have a las file. Who can help me to reconstruct this point cloud data using AI?
Man this has not been active in a looooon time
Hi everyone, if you are into computer vision we can have a separate group to form a team for compeitions and projects. Just write to me.
Hey, in training a cv model, how much variation in mAP50 and mAP90 values is considered ok?
I want to fine-tune the SAM model on a dataset of 256 image sizes, but SAM only takes 1024 sizes, how to solve this problem ??
Hey everyone,
I hope you're all doing well. I'm currently facing a challenge in Object detection dataset specially related to class imbalance. My dataset is in yolov5 format. I'm exploring image augmentation techniques to address it. Although I can generate augmented images, the missing piece is the corresponding annotation, specifically creating annotation files like label.txt.
I'm a bit unsure about the best practices for generating these annotations for augmented images. If anyone has insights or guidance on this matter, I'd really appreciate your help!
Thanks a ton!
Latifur Rahman Zihad
Undergrad student
did you solve your issue? i think i can help you
!!!! URGENT
Is this the normal or the right format for the images to go to the model after doing normalization to them ??
Hi everyone. I am working at Roco Dasaset and trying to produce captions for images of radiology. #llms I need help on this
I am looking for expert advice so that I can get a solution. llms llm-detect-ai-generated-text
Here is the link of my code. https://colab.research.google.com/drive/1SLHq31HcjhUK1dIuNaCNP4dkR66TMd71?usp=sharing
Hi everyone! Check my notebook about Melanoma cancer detection. Your support is greatly appreciated!
https://www.kaggle.com/code/seifwael123/melanoma-cancer-efficientnet
is anyone here try to fine tune Meta Segment Anything model on satellite images ??
I replicated TinyVGG architecture for cifar dataset it didn't work and give very low accuracy, is there some good model for this task
- For those interested in Medical Image Segmentation, I'm sharing two preprocessed benchmark datasets for cardiac segmentation.
- Additionally, weakly-supervised learning, particularly scribble-supervised learning, has been gaining popularity in recent years. This is due to the high cost and difficulty of traditional labeling, especially in the medical field where data sensitivity is paramount.
- Therefore, each image in my datasets also comes with corresponding scribble labels, facilitating superior learning in cardiac segmentation.
- Moreover, I've included notebooks to guide you on how to load and visualize the data. To learn more about these datasets and access the code, feel free to visit the links below:
https://www.kaggle.com/datasets/anhoangvo/acdc-dataset
https://www.kaggle.com/datasets/anhoangvo/mscmrseg
I'd just resize to 1024x1024
for sam they have preprocess function which will get in correct format see there github here: https://github.com/facebookresearch/segment-anything/blob/6fdee8f2727f4506cfbbe553e23b895e27956588/segment_anything/modeling/sam.py#L164 . Also here is code example : image_for_sam= sam_model.preprocess(image); //here image would be tensor in format (B,C,H,W) where C=3 and and the pixel/color values in in from 0-255 (it is kinda odd)
hello everyone .
is there anyone who can help me improve my background extraction code, which extracts the front garments from any type of background where it is located?
There is my code :
def background_extraction(image):
if image.shape[2] == 4:
image = image[:, :, 0:3]
gray = rgb2gray(image)
#img_equalized=exposure.equalize_hist(gray)
#gauss=gaussian_filter(img_equalized,1)
gauss_top=white_tophat(gray,disk(2))
outline = extraction_class.outline_detection(gauss_top, sigma=1)
#outline = white_tophat(gauss, disk(2))
filled_outline = binary_fill_holes(outline)
structurant_element=disk(2)
smoothing=erosion(filled_outline,structurant_element)
#smoothing=dilation(filled_outline,structurant_element)
extraction = np.copy(image)
extraction[~smoothing] = 0
return extraction
def outline_detection(image,sigma):
gauss = gaussian_filter(image,sigma)
sobelX = np.array([[-1,0,1],[-2,0,2],[-1,0,1]])
sobelY = np.array([[-1,-2,-1],[0,0,0],[1,2,1]])
derivX = convolve(gauss,sobelX)
derivY = convolve(gauss,sobelY)
gradient = derivX+derivY*1j
G = np.absolute(gradient)
threshold = threshold_li(G)
s = G.shape
for i in range(s[0]):
for j in range(s[1]):
if G[i][j]<threshold:
G[i][j] = 0.0
return G
anyone want to work in collab for solving the Petal classification on TPU challenge?
Me.
Thanks for replying. Are you on twitter?
No.
Hello, I want to build some computer vision AI projects to put on my resume. Do you have any great computer vision projects that you would recommend?
Ohh ye i have try a hand recognition calculator like the apple just introduced use media pipe for hands gestures and movements tracting and other chracter recognition and logics to make it work
Hi, Kaggle community!
My friend Alex and I are developing a tool to make deep neural network training interactive, user-friendly, and deterministic.
Are you tired of the guesswork and never-ending experimentation?
Are you seeking an alternative methodology that makes the process less painful and more understandable?
We'd love to hear from you! Please share your thoughts and help us to upgrade the current methodology.
LINK: https://calendly.com/graybx/30min <<
Hey gys, is the Image Data Generator in Keras deprecated, then whats the latest standard way of image data augmentation?
Can we use the augmentation layers directly before the main model, or is it better to create a seperate Sequential pipeline for augmentation?
Is it better to use OpenCL or jsut OpenCV
hi guys, does anyone have any experience with roboflow models? I'm doing an ML project for the first time and I was instructed to make my dataset and train my model there, but I'm lost on how to proceed further. if anyone has any experience, please lmk so I can consult you. thank you in advance
https://www.kaggle.com/datasets/alanjo/gpu-scores-with-cuda-metal-opencl-vulkan
sharing OpenCL, Metal and Vulkan scores dataset
Hello fellow Computer Vision enthusiasts 😄
Hi! how's the day going?
hey guys, I know this is a dumb question
but does anyone knows how to take out the YOLO model from ultralytics (pretrained) and train it with a self defined training loop?

Hello @here guys i'm starting my project in AIR quality detection and i'm searching for images datasets if you know any source please let me know
https://www.kaggle.com/datasets/adarshrouniyar/air-pollution-image-dataset-from-india-and-nepal/data
HI, I am Abdullah I am an ML engineer want to join any team to particapte in kaggle competions
Hi Tom
Howdy!
Hey, please help me find out a good reference doc for CV (like whitepapers in 5dgai session), especially focusing on implementing OCR.
Would love to connect Tom. Can you help me with this stuff?
What have you found from your own searching? Have you tried any of the many OCR tutorials found on youtube/google search/etc?
Yes I can search internet but worried if I got any non-relevant docs and hence it'd confuse me more. So why seeking an expert help
I have done course on the basic architecture of the CNN models. I need help with the implementations
My advice: try one of the "non-relevant docs" that specifically says its a tutorial on how to implement OCR, if that doesn't work, try another one and see what the differences are. If you don't understand something in the tutorial, search specifically on that topic and figure out what you don't understand.
I apologize if it's not exactly what you were looking for, but it's the way to learn on your own. In my opinion, it's better to seek advice after you tried something and have specific questions, rather than asking for someone to walk you through something you haven't even tried.
Yeah, got it
Hi everyone! I'm exploring ways to use image-to-image models to predict deformations and failure loads in stone arch structures based on initial geometry. If anyone has worked on similar structural prediction tasks, used GANs for FEA simulations, or has some insights and paper references, please send me a DM.
Beautifully said
♥️
That's kind of you to say -- it's a lesson I learned the hard way, too
😁
I made a cv script that shows the viewers their emotions in real time. Got it running on a buggy intel tablet but with a proper gpu behind it there’s a lot of fun to be had. Some people freak out when it show their emotion changing and others laugh. It’s an interesting night.
that sounds so cool!
Do you want to see it running? I think it’s got enough power to do a choppy screen record.
me as well can you check your dm
Hi, what is the best model for human fall detection? Any helps would be appreciated
That depends how you want to detect human falls
✍️
hi everyone i am looking for research ideas in computer vision can help me by suggesting some ideas
im stuck with my college minor project. Can anyone help me?🥹
there's this persistent bug that im stuck on for hours:
ValueError: It looks like you are using a PerReplica object while not inside a replica context, which is not supported. Try running your op or function inside a replica context by using strategy.run
https://www.kaggle.com/code/sidharthad/custom-training
Handwritten Persian numerals - Generative Adversarial Networks: DCGAN, CycleGAN
DCGAN : https://www.kaggle.com/code/omidsakaki1370/handwritten-persian-numerals-dcgans-pytorch
CycleGAN : https://www.kaggle.com/code/omidsakaki1370/handwritten-persian-numerals-cyclegans-pytorch
Website: https://omidsakaki.ir/Projects
Hi, I am new to computer vision and currently I am trying a multiclass image segmentation using PyTorch, from scratch. From past couple of days, I am stuck with it. Anyone could help me, please?
I am working with yolo and I am new with it so I want to start understand architecture of it and I understand yolo have (backbone ,neck , head) I want to train head first and then train backbone and neck to fine tuning the model so how can I know number of layers that consider as backbone and neck
Hello everyone, I recently added some YOLO notebooks related to object detection, image segmentation, and image classification. If you're interested, you can click the link below.
Excited to Share My YOLOv8 Projects: Object Detection, Image Classification, and Segmentation.
I need a little help in my project anyone who is having good experience in image enhancing and preprocessing
guys i need help setting up the kaggle gpu, for some reason it's not working even when it's set on the session
Generative AI in Computer Vision - Fake Human Face Generator
Website: https://omidsakaki.ir/projects/29
Github: https://github.com/omid-sakaki-ghazvini/Projects/blob/main/generative-ai-face-image-dcgans-cyclegans.ipynb
kaggle: https://www.kaggle.com/code/omidsakaki1370/generative-ai-face-image-dcgans-cyclegans
I'm having trouble with multi label classification on medical datasets with 15 classes, I am using BCEwithlogitsloss to combat heavy class imbalance but still getting very low f1 scores and strict accuracies, any advice to improve my performance?
<@&1303433601177751593> scam?
Anyone interested to review the code I wrote for custom CNN(it is a colab notebook), like what are the things I need to improve or how much I have got correct. Also it would be helpful if anyone could guide me for the next steps, currently I have been able to create a feature map consisting of multiple neurons which slide over image do convolution, but all the neurons in same layer are producing same output is this correct or anything I need to change over here??
here is the github link:-
https://github.com/TronYash10101/Custom_CNN
Hi, I'm currently working on a CNN built from scratch, and I'm training it on a dataset with mnist style digits and with operator symbols too. I've trained for 160 batches of 64 images out of 1 epoch, and my cross entropy loss has plateaued at 2.639, which is 1/14 chance of predicting correctly which means my CNN is randomly guessing. I spent the past 7 days debugging this and still can't find the issue, if anyone can help I have more details just dm 😄
Hi everyone,
I just open-sourced YOLOv1-PyTorch, a from-scratch PyTorch reimplementation of the original YOLOv1—complete with a hands-on notebook that walks you through every detail:
YOLO-V1-Explanation.ipynb
A comprehensive tutorial that covers:
- Environment & Data: setting up PyTorch, downloading Pascal VOC 2007/2012, inspecting class distributions and annotation formats
- Data Loader & Augmentation: parsing XML to YOLO’s S×S grid, handling edge cases, and applying on-the-fly transforms (flips, color jitter)
- Model Architecture: building each convolutional layer and prediction head exactly as in the original paper, with tensor-shape diagrams
- Loss Function: step-by-step derivation of localization, confidence, and classification losses, directly tied to code
- Training Loop: configuring hyperparameters, real-time plotting of total vs. per-term losses, checkpointing
- Evaluation & Inference: computing IoU/mAP, visualizing ground-truth vs. predictions, implementing non-max suppression, and generating inline GIF demos
YOLO-V1-Pure-Code.ipynb
The same pipeline stripped of commentary—ideal for quick experimentation or integration into your own projects.
Live examples
Pre-rendered outputs (sheep, bicycle) so you can see detection quality before running a single cell.
https://github.com/franciszekparma/YOLOv1-PyTorch
Whether you’re teaching, researching, or prototyping in classic object detection, this repo guides you through both the “why” and the “how.” Feel free to clone, star, file issues, or send PRs!
Hello everyone,
I need some guidance regarding deep learning for our Final Year Project (FYP).
Our project heavily depends on model training using both images and text, but currently, we are focusing on medical image model training. The problem is—we are not sure about the proper direction to follow. When we try to follow GPT instructions, the model doesn’t train effectively, and we lack clarity on how to evaluate and decide what to apply at each step.
Could anyone experienced in deep learning please guide us on the correct path? Specifically, we need clarity on:
Understanding graphs/visualizations (loss vs accuracy, confusion matrix, PR curves, etc.)
Model selection (e.g., ResNet vs DenseNet vs EfficientNet, pretrained vs from scratch)
Evaluation metrics (accuracy, precision, recall, F1, AUC, etc.)
Data augmentation techniques (rotation, flipping, normalization, mixup, etc.)
Overfitting/underfitting (how to detect and solve using dropout, regularization, early stopping)
Hyperparameter tuning (learning rate, batch size, optimizers, etc.)
Training workflow (train/val/test split, cross-validation, reproducibility)
Basically, GPT gives us the “what to do” list, but not the how to properly evaluate, compare, and decide among these options. What’s the right way to structure and approach this whole process so we can train and evaluate our model effectively?
Job Title: Part-Time Senior AI/ML Engineer (Remote)
We are seeking a skilled and experienced Senior AI/ML Engineer to join our remote team on a part-time basis. The ideal candidate will have a strong technical background, excellent communication skills, and the ability to work independently in a fast-paced environment.
Requirements:
-Minimum of 9–10 years of professional software development experience
-Proven experience working effectively in a remote environment
-Advanced English proficiency (C1 or higher); an American accent is preferred
-Availability to work 10–15 hours per week during EST or CST business hours
If you're a highly motivated engineer with a passion for building high-quality software and can commit to a flexible part-time schedule, we’d love to hear from you.
You can connect with me on WhatsApp: +1 (910) 386-2433
Make a dl agent inspired from mle star that can guide u better
Job Title: Part-Time Senior AI/ML Engineer (Remote)
We are seeking a skilled and experienced Senior AI/ML Engineer to join our remote team on a part-time basis. The ideal candidate will have a strong technical background, excellent communication skills, and the ability to work independently in a fast-paced environment.
Requirements:
-Minimum of 7–10 years of professional software development experience
-Proven experience working effectively in a remote environment
-Advanced English proficiency (C1 or higher); an American accent is preferred
-Availability to work 10–15 hours per week during EST or CST business hours
If you're a highly motivated engineer with a passion for building high-quality software and can commit to a flexible part-time schedule, we’d love to hear from you.
You can connect with me on WhatsApp: +1 (567) 469-5384
Does anyone know what's the best way to detect characters/symbols on cube-shaped objects?
Like, what's the best way to augment the data? 2D augmentation or 3D augmentation?
Hi! Hope you're doing well, actually that's pretty blurry for me, I'd say that if your data comes from a real 3d you could simply render each face and apply simple matching
If it comes from an image, you'll need to apply transformation to your original matching dataset, i'd say if you could detect the cube and its edges you'll be able to determine the transformation you'll need to do
You'll allways be free to use huffman strategies or whatever to optimize your model, but the core goal is for your model to work, not to be perfectly optimized, if your model has a broader utility, you'll have to teach him similar methods, think with first principles, hope I didn't miss the point
Hi! I just finished the Intro to DL and CV courses + the ML Zero to Hero series from Tensorflow. What competitions would you recommend me to try except the get started ones to get some expericene?
hi
if you find anything please tell me too
Hi, just joined the server
I'm finding a US developer for the collaboration. If anybody interested, please dm me.
Hello!
I'm profficient in CNN do you all know anything courses with certification free?
Hi I'm a CV engineer from Nigeria
Hi, i'm newbie in LM from Belarus. I want to learn the CV
Hey everyone 👋
Quick question for folks building chatbots / LLM apps here —
how are you currently handling long-term user memory beyond a single session?
Curious what’s actually working in practice (RAG, DB, custom hacks, etc).
Newbie in Pytorch from Pakistan. Excited to getting to know yall
https://www.kaggle.com/code/dastgeerjutt/generative-ai-full-map-kaggle-professional
Please upvote @everyone
Can I use MS coco dataset for training a model for commercial use? As i read the annotations are on the license that allow it, but since the images themselves were taken from flickr, some of them are under some license... so like I don't know wether i am allowed to use it or not ?
I read on some website that I am allowed to use it for training in commercial use, but I dont know if i want to trust some random website
so none was ever concerned about it or we don't have many computer vision engineers that worked on commercial projects
or i am just being ignored
Hello,
Did you find anything by any chance?
are all datasets on roboflow intented for commercial use/
like i see some subset of ms coco and there is a license that in theory allows for commercial use, but again is it there just becuase someone put it there, or like roboflow owns rights to put any kind of respected license on datasets posted on their website ?
Thanks to everyone for supporting me!
I uploaded a new notebook, "Retail Store Product Sales Simulation—Exploratory Data Analysis (EDA)."
Link: https://www.kaggle.com/code/hammadansari7/retail-store-product-sales-exploratory-analysis
@everyone
Yeah, a few patterns seem to work in practice:
• Explicit memory layers outside the LLM (DB / KV store) for user facts & preferences
• Embedding-based recall (RAG) for long-term, fuzzy memory
• A light memory manager that decides what to store vs what to forget
In one of my projects (orbmem)www.orbmem.online, we separate:
short-term conversational state
long-term user memory (facts, preferences, habits)
and retrieve only relevant memories per turn instead of stuffing everything into context.
RAG alone wasn’t enough — combining structured memory + embeddings worked better for us.
Curious what others here are using in production.
anyone working in computer vision and space applications?
What kind of space application ?