#computer-vision

1 messages · Page 1 of 1 (latest)

zinc plaza
#

Hey all, How are we doing? Would love to hear what you are working on 🙂

past terrace
crisp meadow
#

Maybe try asking in #❓┊ask-a-question ? Or even Streamlit's community forums. I'm not able to tell what part of your question is relevant to computer vision.

obtuse veldt
zinc plaza
ancient swift
#

Hi @ Everyone, I am working on a usecase related to image detection of a battery fluid level and calculating the percentage of fluid. Any help in this regard is highly appreciated. Please let me know if you have any ideas/suggestion to handle this usecase.

icy pebble
#

I’m working on building a CV model to aid farmers. Would deploy on streamlit and mobile 🙂

eternal remnant
#

Hello Everyone great to be here😃👍

azure nymph
#

Hi everyone
I am currently working over a library called openpose
Need help regarding the basis concepts

visual quartz
azure nymph
#

@visual quartz actually I am working on live attendence system
Which would recognise the head , hand position and calculate the time and mark system .

visual quartz
icy pebble
#

can visual prompting be used for image classification

pearl sequoia
#

Hello everyone im working on a pose estimation model that can be used to detect/diagnose early onset nuerodegenerative conditions. I have a working model but really finding it hard to get my datasets/videos one particular platform dementias platform UK has a 28 day waiting list. Any leads to where i can get video data would be awesome

rich temple
#

Anyone working on remote sensing + deep learning?

rare current
#

Hi.
I'm looking for some image dataset about small plastics on the beach.
The idea is identify plastic straw, candy wrap, popsicle packaging, plastic bag, bottle cap, plastic label, etc.
Can someone point me a project about or similar to this?
Thank you.

drowsy sundial
#

need in harry a satellite dataset for a segmentation graduating project!

zenith rampart
drowsy sundial
#

i will do segmentation on xDB dataset which contains images and masks each one has an image or mask pre-disaster and post-disaster how I preprocess this data and make it ready for my model and how the model will be so I can give it the two images before and after and their labels to give me the final mask ??

heady sage
edgy wedge
#

Hello Kagglers. Who can help me with the point cloud reconstruction?

boreal grove
#

there's this really cool paper called RL-GAN-Net

edgy wedge
boreal grove
#

I don't think so, it was a paper from 2019 iirc and 4 years is a long time in deep learning
But it's definitely very interesting, and holds a lot of potential imo

#

if you're going for a production use tho, something like samplenet or yolo3D might be better

#

I haven't done a benchmark study of this area but these are some interesting things to check out ig ^^

edgy wedge
#

@everyone I have a las file. Who can help me to reconstruct this point cloud data using AI?

vocal whale
#

Man this has not been active in a looooon time

wispy galleon
#

Hi everyone, if you are into computer vision we can have a separate group to form a team for compeitions and projects. Just write to me.

eager wagon
#

Hey, in training a cv model, how much variation in mAP50 and mAP90 values is considered ok?

drowsy sundial
#

I want to fine-tune the SAM model on a dataset of 256 image sizes, but SAM only takes 1024 sizes, how to solve this problem ??

grand moat
#

Hey everyone,

I hope you're all doing well. I'm currently facing a challenge in Object detection dataset specially related to class imbalance. My dataset is in yolov5 format. I'm exploring image augmentation techniques to address it. Although I can generate augmented images, the missing piece is the corresponding annotation, specifically creating annotation files like label.txt.

I'm a bit unsure about the best practices for generating these annotations for augmented images. If anyone has insights or guidance on this matter, I'd really appreciate your help!

Thanks a ton!

Latifur Rahman Zihad
Undergrad student

lament perch
drowsy sundial
#

!!!! URGENT

Is this the normal or the right format for the images to go to the model after doing normalization to them ??

tawny saffron
crystal sequoia
drowsy sundial
#

is anyone here try to fine tune Meta Segment Anything model on satellite images ??

blazing plinth
#

I replicated TinyVGG architecture for cifar dataset it didn't work and give very low accuracy, is there some good model for this task

chilly tulip
#
  • For those interested in Medical Image Segmentation, I'm sharing two preprocessed benchmark datasets for cardiac segmentation.
  • Additionally, weakly-supervised learning, particularly scribble-supervised learning, has been gaining popularity in recent years. This is due to the high cost and difficulty of traditional labeling, especially in the medical field where data sensitivity is paramount.
  • Therefore, each image in my datasets also comes with corresponding scribble labels, facilitating superior learning in cardiac segmentation.
  • Moreover, I've included notebooks to guide you on how to load and visualize the data. To learn more about these datasets and access the code, feel free to visit the links below:
    https://www.kaggle.com/datasets/anhoangvo/acdc-dataset
    https://www.kaggle.com/datasets/anhoangvo/mscmrseg
cinder charm
# drowsy sundial !!!! URGENT Is this the normal or the right format for the images to go to the...

for sam they have preprocess function which will get in correct format see there github here: https://github.com/facebookresearch/segment-anything/blob/6fdee8f2727f4506cfbbe553e23b895e27956588/segment_anything/modeling/sam.py#L164 . Also here is code example : image_for_sam= sam_model.preprocess(image); //here image would be tensor in format (B,C,H,W) where C=3 and and the pixel/color values in in from 0-255 (it is kinda odd)

GitHub

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model. -...

flat summit
#

hello everyone .
is there anyone who can help me improve my background extraction code, which extracts the front garments from any type of background where it is located?
There is my code :
def background_extraction(image):
if image.shape[2] == 4:
image = image[:, :, 0:3]
gray = rgb2gray(image)

    #img_equalized=exposure.equalize_hist(gray)
    #gauss=gaussian_filter(img_equalized,1)
    gauss_top=white_tophat(gray,disk(2))
    outline = extraction_class.outline_detection(gauss_top, sigma=1)
    #outline = white_tophat(gauss, disk(2))
    filled_outline = binary_fill_holes(outline)

    structurant_element=disk(2)
    smoothing=erosion(filled_outline,structurant_element)
    #smoothing=dilation(filled_outline,structurant_element)
    extraction = np.copy(image)
    extraction[~smoothing] = 0

    return extraction

def outline_detection(image,sigma):
gauss = gaussian_filter(image,sigma)
sobelX = np.array([[-1,0,1],[-2,0,2],[-1,0,1]])
sobelY = np.array([[-1,-2,-1],[0,0,0],[1,2,1]])
derivX = convolve(gauss,sobelX)
derivY = convolve(gauss,sobelY)
gradient = derivX+derivY*1j
G = np.absolute(gradient)
threshold = threshold_li(G)
s = G.shape
for i in range(s[0]):
for j in range(s[1]):
if G[i][j]<threshold:
G[i][j] = 0.0
return G

slate granite
#

Hello guys

#

I wanna ask

#

How to do object detection in java huh

carmine flame
#

anyone want to work in collab for solving the Petal classification on TPU challenge?

carmine flame
shell kite
mystic heath
#

Hello, I want to build some computer vision AI projects to put on my resume. Do you have any great computer vision projects that you would recommend?

slate granite
#

Ohh ye i have try a hand recognition calculator like the apple just introduced use media pipe for hands gestures and movements tracting and other chracter recognition and logics to make it work

rancid hemlock
#

Hi, Kaggle community!

My friend Alex and I are developing a tool to make deep neural network training interactive, user-friendly, and deterministic.

Are you tired of the guesswork and never-ending experimentation?
Are you seeking an alternative methodology that makes the process less painful and more understandable?

We'd love to hear from you! Please share your thoughts and help us to upgrade the current methodology.

LINK: https://calendly.com/graybx/30min <<

dapper path
#

Hey gys, is the Image Data Generator in Keras deprecated, then whats the latest standard way of image data augmentation?
Can we use the augmentation layers directly before the main model, or is it better to create a seperate Sequential pipeline for augmentation?

stone seal
#

Is it better to use OpenCL or jsut OpenCV

finite nova
#

hi guys, does anyone have any experience with roboflow models? I'm doing an ML project for the first time and I was instructed to make my dataset and train my model there, but I'm lost on how to proceed further. if anyone has any experience, please lmk so I can consult you. thank you in advance

fathom crater
river bobcat
#

Hello fellow Computer Vision enthusiasts 😄

hot perch
vagrant moon
#

hey guys, I know this is a dumb question

#

but does anyone knows how to take out the YOLO model from ultralytics (pretrained) and train it with a self defined training loop?

robust reef
#

Hello @here guys i'm starting my project in AIR quality detection and i'm searching for images datasets if you know any source please let me know

fallen saddle
#

HI, I am Abdullah I am an ML engineer want to join any team to particapte in kaggle competions

inner willow
river bobcat
#

Howdy!

inner willow
#

Hey, please help me find out a good reference doc for CV (like whitepapers in 5dgai session), especially focusing on implementing OCR.

#

Would love to connect Tom. Can you help me with this stuff?

river bobcat
#

What have you found from your own searching? Have you tried any of the many OCR tutorials found on youtube/google search/etc?

inner willow
#

Yes I can search internet but worried if I got any non-relevant docs and hence it'd confuse me more. So why seeking an expert help

#

I have done course on the basic architecture of the CNN models. I need help with the implementations

river bobcat
#

My advice: try one of the "non-relevant docs" that specifically says its a tutorial on how to implement OCR, if that doesn't work, try another one and see what the differences are. If you don't understand something in the tutorial, search specifically on that topic and figure out what you don't understand.

inner willow
#

Why not I start with one suggested doc?

#

BTW, great advice buddy, thanks

river bobcat
#

I apologize if it's not exactly what you were looking for, but it's the way to learn on your own. In my opinion, it's better to seek advice after you tried something and have specific questions, rather than asking for someone to walk you through something you haven't even tried.

inner willow
#

Yeah, got it

lapis cosmos
#

Hi everyone! I'm exploring ways to use image-to-image models to predict deformations and failure loads in stone arch structures based on initial geometry. If anyone has worked on similar structural prediction tasks, used GANs for FEA simulations, or has some insights and paper references, please send me a DM.

river bobcat
austere arch
#

😁

drowsy jacinth
#

I made a cv script that shows the viewers their emotions in real time. Got it running on a buggy intel tablet but with a proper gpu behind it there’s a lot of fun to be had. Some people freak out when it show their emotion changing and others laugh. It’s an interesting night.

drowsy jacinth
#

Do you want to see it running? I think it’s got enough power to do a choppy screen record.

shell kite
topaz bough
#

Hi, what is the best model for human fall detection? Any helps would be appreciated

worthy obsidian
frigid jay
#

hi everyone i am looking for research ideas in computer vision can help me by suggesting some ideas

karmic cedar
#

im stuck with my college minor project. Can anyone help me?🥹
there's this persistent bug that im stuck on for hours:
ValueError: It looks like you are using a PerReplica object while not inside a replica context, which is not supported. Try running your op or function inside a replica context by using strategy.run
https://www.kaggle.com/code/sidharthad/custom-training

hazy tapir
#
round stream
#

Hi, I am new to computer vision and currently I am trying a multiclass image segmentation using PyTorch, from scratch. From past couple of days, I am stuck with it. Anyone could help me, please?

trail pebble
#

I am working with yolo and I am new with it so I want to start understand architecture of it and I understand yolo have (backbone ,neck , head) I want to train head first and then train backbone and neck to fine tuning the model so how can I know number of layers that consider as backbone and neck

rough night
tough heath
#

I need a little help in my project anyone who is having good experience in image enhancing and preprocessing

mellow forge
#

guys i need help setting up the kaggle gpu, for some reason it's not working even when it's set on the session

hazy tapir
#
GitHub

داده پردازان هوش یار. Contribute to omid-sakaki-ghazvini/Projects development by creating an account on GitHub.

leaden wraith
#

I'm having trouble with multi label classification on medical datasets with 15 classes, I am using BCEwithlogitsloss to combat heavy class imbalance but still getting very low f1 scores and strict accuracies, any advice to improve my performance?

fading jetty
#

<@&1303433601177751593> scam?

small flare
#

Anyone interested to review the code I wrote for custom CNN(it is a colab notebook), like what are the things I need to improve or how much I have got correct. Also it would be helpful if anyone could guide me for the next steps, currently I have been able to create a feature map consisting of multiple neurons which slide over image do convolution, but all the neurons in same layer are producing same output is this correct or anything I need to change over here??

here is the github link:-
https://github.com/TronYash10101/Custom_CNN

GitHub

Contribute to TronYash10101/Custom_CNN development by creating an account on GitHub.

rigid jetty
#

Hi, I'm currently working on a CNN built from scratch, and I'm training it on a dataset with mnist style digits and with operator symbols too. I've trained for 160 batches of 64 images out of 1 epoch, and my cross entropy loss has plateaued at 2.639, which is 1/14 chance of predicting correctly which means my CNN is randomly guessing. I spent the past 7 days debugging this and still can't find the issue, if anyone can help I have more details just dm 😄

quaint needle
#

from scratch means numpy only stuff?

#

😭

left magnet
#

Hi everyone,

I just open-sourced YOLOv1-PyTorch, a from-scratch PyTorch reimplementation of the original YOLOv1—complete with a hands-on notebook that walks you through every detail:

YOLO-V1-Explanation.ipynb
A comprehensive tutorial that covers:

  • Environment & Data: setting up PyTorch, downloading Pascal VOC 2007/2012, inspecting class distributions and annotation formats
  • Data Loader & Augmentation: parsing XML to YOLO’s S×S grid, handling edge cases, and applying on-the-fly transforms (flips, color jitter)
  • Model Architecture: building each convolutional layer and prediction head exactly as in the original paper, with tensor-shape diagrams
  • Loss Function: step-by-step derivation of localization, confidence, and classification losses, directly tied to code
  • Training Loop: configuring hyperparameters, real-time plotting of total vs. per-term losses, checkpointing
  • Evaluation & Inference: computing IoU/mAP, visualizing ground-truth vs. predictions, implementing non-max suppression, and generating inline GIF demos

YOLO-V1-Pure-Code.ipynb
The same pipeline stripped of commentary—ideal for quick experimentation or integration into your own projects.

Live examples
Pre-rendered outputs (sheep, bicycle) so you can see detection quality before running a single cell.

https://github.com/franciszekparma/YOLOv1-PyTorch

Whether you’re teaching, researching, or prototyping in classic object detection, this repo guides you through both the “why” and the “how.” Feel free to clone, star, file issues, or send PRs!

GitHub

Comprehensive guide to YOLOv1 using PyTorch, built from Scratch - franciszekparma/YOLOv1-PyTorch

primal plinth
#

Hello everyone,
I need some guidance regarding deep learning for our Final Year Project (FYP).

Our project heavily depends on model training using both images and text, but currently, we are focusing on medical image model training. The problem is—we are not sure about the proper direction to follow. When we try to follow GPT instructions, the model doesn’t train effectively, and we lack clarity on how to evaluate and decide what to apply at each step.

Could anyone experienced in deep learning please guide us on the correct path? Specifically, we need clarity on:

Understanding graphs/visualizations (loss vs accuracy, confusion matrix, PR curves, etc.)

Model selection (e.g., ResNet vs DenseNet vs EfficientNet, pretrained vs from scratch)

Evaluation metrics (accuracy, precision, recall, F1, AUC, etc.)

Data augmentation techniques (rotation, flipping, normalization, mixup, etc.)

Overfitting/underfitting (how to detect and solve using dropout, regularization, early stopping)

Hyperparameter tuning (learning rate, batch size, optimizers, etc.)

Training workflow (train/val/test split, cross-validation, reproducibility)

Basically, GPT gives us the “what to do” list, but not the how to properly evaluate, compare, and decide among these options. What’s the right way to structure and approach this whole process so we can train and evaluate our model effectively?

snow bluff
#

Job Title: Part-Time Senior AI/ML Engineer (Remote)

We are seeking a skilled and experienced Senior AI/ML Engineer to join our remote team on a part-time basis. The ideal candidate will have a strong technical background, excellent communication skills, and the ability to work independently in a fast-paced environment.

Requirements:
-Minimum of 9–10 years of professional software development experience

-Proven experience working effectively in a remote environment

-Advanced English proficiency (C1 or higher); an American accent is preferred

-Availability to work 10–15 hours per week during EST or CST business hours

If you're a highly motivated engineer with a passion for building high-quality software and can commit to a flexible part-time schedule, we’d love to hear from you.
You can connect with me on WhatsApp: +1 (910) 386-2433

vast horizon
snow bluff
#

Job Title: Part-Time Senior AI/ML Engineer (Remote)

We are seeking a skilled and experienced Senior AI/ML Engineer to join our remote team on a part-time basis. The ideal candidate will have a strong technical background, excellent communication skills, and the ability to work independently in a fast-paced environment.

Requirements:
-Minimum of 7–10 years of professional software development experience

-Proven experience working effectively in a remote environment

-Advanced English proficiency (C1 or higher); an American accent is preferred

-Availability to work 10–15 hours per week during EST or CST business hours

If you're a highly motivated engineer with a passion for building high-quality software and can commit to a flexible part-time schedule, we’d love to hear from you.
You can connect with me on WhatsApp: +1 (567) 469-5384

shadow musk
#

Does anyone know what's the best way to detect characters/symbols on cube-shaped objects?

#

Like, what's the best way to augment the data? 2D augmentation or 3D augmentation?

true sedge
# shadow musk Does anyone know what's the best way to detect characters/symbols on cube-shaped...

Hi! Hope you're doing well, actually that's pretty blurry for me, I'd say that if your data comes from a real 3d you could simply render each face and apply simple matching

If it comes from an image, you'll need to apply transformation to your original matching dataset, i'd say if you could detect the cube and its edges you'll be able to determine the transformation you'll need to do

You'll allways be free to use huffman strategies or whatever to optimize your model, but the core goal is for your model to work, not to be perfectly optimized, if your model has a broader utility, you'll have to teach him similar methods, think with first principles, hope I didn't miss the point

narrow spoke
#

Hi! I just finished the Intro to DL and CV courses + the ML Zero to Hero series from Tensorflow. What competitions would you recommend me to try except the get started ones to get some expericene?

rose zealot
#

hi

plucky elk
wary zinc
#

Hi, just joined the server

kind knoll
#

I'm finding a US developer for the collaboration. If anybody interested, please dm me.

rocky flax
#

Hello!

rocky flax
#

I'm profficient in CNN do you all know anything courses with certification free?

reef bluff
#

Hi I'm a CV engineer from Nigeria

obtuse rain
slim olive
#

Hey everyone 👋
Quick question for folks building chatbots / LLM apps here —
how are you currently handling long-term user memory beyond a single session?
Curious what’s actually working in practice (RAG, DB, custom hacks, etc).

raven zephyr
#

Newbie in Pytorch from Pakistan. Excited to getting to know yall

tacit idol
lean latch
#

Can I use MS coco dataset for training a model for commercial use? As i read the annotations are on the license that allow it, but since the images themselves were taken from flickr, some of them are under some license... so like I don't know wether i am allowed to use it or not ?

#

I read on some website that I am allowed to use it for training in commercial use, but I dont know if i want to trust some random website

lean latch
#

so none was ever concerned about it or we don't have many computer vision engineers that worked on commercial projects

#

or i am just being ignored

twilit meadow
lean latch
#

are all datasets on roboflow intented for commercial use/

#

like i see some subset of ms coco and there is a license that in theory allows for commercial use, but again is it there just becuase someone put it there, or like roboflow owns rights to put any kind of respected license on datasets posted on their website ?

plain belfry
slim olive
# twilit meadow Hello, Did you find anything by any chance?

Yeah, a few patterns seem to work in practice:
• Explicit memory layers outside the LLM (DB / KV store) for user facts & preferences
• Embedding-based recall (RAG) for long-term, fuzzy memory
• A light memory manager that decides what to store vs what to forget
In one of my projects (orbmem)www.orbmem.online, we separate:
short-term conversational state
long-term user memory (facts, preferences, habits)
and retrieve only relevant memories per turn instead of stuffing everything into context.
RAG alone wasn’t enough — combining structured memory + embeddings worked better for us.
Curious what others here are using in production.

prisma zealot
#

anyone working in computer vision and space applications?

grim dragon
#

What kind of space application ?