#byu-locating-bacterial-flagellar-motors-2025
1 messages · Page 1 of 1 (latest)
Hey guys! My name's Andrew Darley and I'm one of the hosts for this Kaggle competition. Feel free to reach out if you have any questions! We're so excited to be doing this with you!
Hello all I'm Muhammad Yousif BS IT student and data science and ML practitioner I'm looking for team for this competition interested dm me thanks
Add me to your team i have 3yrs experience in IT and computer science
Andrew's the competition host, they're not competing
Hi all! I'm looking to join a team. I'm a PhD student in Biostatistics with previous/current experience in data science and statistics. I have given the competition a start, and would love to partner up to continue on. Thanks!
can you dm me?
lets team up and build ourselves
Hello! Quick question, any reason why the series have the slices saved as separate files and not as a single 3D image? Wouldn't it be more beneficial for those looking to make use of 3D conv nets or other similar methods? I'm guessing the images are similar to MRIs with voxel spacing and all
You can just do that with some preprocessing code
I mean yea I'm just asking if the images work in the same way as MRIs
"A tomogram is a 3D volumetric representation of an object. In this competition, each tomogram is provided as a set of 2D image slices (JPEG) stored in a unique directory."
I'm guessing voxel spacing will be in the jpeg metadata then, I'm planning on resampling to a common voxel spacing so I need to figure the averages out
Is on train.csv
Don’t think it will work at inference time for submission because there is no CSV file for the test data. Would be great if there is some metadata in the image files themselves but not sure.
How good is xz/yz resolution? I mean, resize the 2D slices sounds good, but resize z is factible?
I mean it sounds odd to me, when working with MR images resampling to a common voxel spacing is a pretty common practice. Unsure if tomography has more nuance as to how it handles voxels but that seems odd to me
I'll check the image metadata in a bit
I read someone saying they want our models to be able to generalize across tomograph data of different resolutions / spacing.
I don’t see voxel spacing in the JPEG metadata so it has to be assumed unknown at inference
It's not an inference thing, it's just that we're unable to resample the tomograms to a common voxel spacing
I would have to guess that step is already taken care of by the comp hosts
You can resample according to voxel spacing in the training data since it’s part of the labels file, but we don’t have that for the test set, that’s what I meant by inference
because, when you are processing it, you r going to process slices
These are examples of XZ and YZ slices
Thank you.
Yea I guess that's one of way of approaching it
How long does scoring take? It's been about an hour and still at it
This seems odd my notebook runtime was less than 1 hour (just testing the submission pipeline) and it's still here
Anyone want to collaborate?
Note to self and anyone submitting: make sure your submission.csv doesn't have typos in the filename
Anyone lmk if you want to collaborate. Im doing a 3d UNet but might be looking at object detection based solutions soon
i have a question a image given is 2d should i process single image at a time with yolo or a sequence matter in this problem?
are 3D images, so you can consider it as a sequence. But would be simpler consider it a volume.
or just 2D images and stack predictions, not sure about popular approach
I had also submitted. How long it takes for scoring
?
have you tried 3D unet ?
ok so my notebook runs fine and here's an example submissions_csv generated
the notebook for eval runs fine but then scoring gets timed out, has anyone else had this similar issue?
which model are you using i used yolov8n model got score of 0.268 now training large model
i'm using a unet
and goona try V-net
3D?
2d
oh Thanks for information
i mean on a small subset it does fine i just don't understand why it times out on scoring
is my csv set up wrong?
ok i think i found the issue, sec
yup it was incorrect csv formatting, don't forget to add index = false when saving your csv
hidden test has more samples than local test
I have been trying 3d unet for awhile with no luck. Ive seen people say it can work, but i cant get the model to learn anything. The data might be to large & sparse for unet
Has anyone got 3d unet to learn?
3D for me works well on training and validation locally but scored poorly on LB, not using UNet though
How long does submission take? I'm 10 hours in with a 2D unet and still nothing
What kind of 3D model are you using?
CNN
I think the “mask” would be too sparse for UNet — did try it but shifted away quickly
If you’re going 3D I think the challenge is more in how you’re preprocessing, what your objective/loss is, and any postprocessing
Also augmentation is key
Absolutely. I was doing random crop, stretch & rotation using ~ 100^3 dim volumes w gaussian sphere target, but I think the data is too sparse. Tried many combinations of losses with no luck.
Going to look into 2d&3d object detection instead
It’s possible to get some great validation scores on the data provided but I think one issue is that the array sizes vary between training and holdout
I'm doing Unet on a small subset of the data and it's doing fine, my issue is my notebooks keep timing out and that's why I asked you how long submission takes with a 3d net. I made the unet shallower and trying a submission again
Took mine about 3.5 hours to run through the holdout data
T4x2
I see, I made my Unet shallower and testing again
Are you not able to run it on Kaggle through some train or test data to see how long per tomogram then have an estimate for the full 900 holdouts?
The array sizes vary but I think it was pretty close to what I estimated by doing that
it was the model taking too long, finally got a submission to work
so proud of this
So heads up if anyone is trying to use a 2d U-Net I wouldn't go past depth 3, maybe with something like mixed-precision it could be faster but since the task is getting coordinates and not segmentation keep the depth to a minimum
Anyone want to colab?
I have no credentials or anything
I've participated in czii-2024 and gone through all the solutions to an extent
Thinking of trying out various object detection models from previous 1st place solution
I am open to collab
what are the way or model to denoise the 3D volume
There are methods to denoise each slice individually that you could adapt
https://www.kaggle.com/code/andreipaulavets/byu-denoising-cryo-et-slices/notebook
https://www.kaggle.com/code/andreipaulavets/byu-denoising-cryo-et-with-noise2void
I’m curious whether the dataset used in this competition is synthetic (generated) or derived from real experimental data. Thanks in advance!
Pls have done his capstone project on the just concluded 5 days training
Don't you think CV fails because train might be already augmented so different folds can get same tomogram just rotated/translated/flipped/etc... ?
I mean, there is a lot of images that looks rotated...
All the data is authentic. It is actually currently impossible to generate synthetic data as we can't model the noise
I'm looking for motivated teammates.
I'm particularly interested in collaborating with people who are passionate, diligent, and eager to learn together. I'm from South Korea, so teammates comfortable with international collaboration and open communication would be ideal.
How do I get to know that my yolo has reached its full potential? Current MAP@50 is 0.978 and Public LB is 0.769
I was wondering if I should start working on a custom model now or should I work on improving yolo only?
Try, more difficult argumentation or making new train code for 2.5d
Hi, I was wondering if we need to turn off internet connection for notebook does that also mean we cannot install any packages from internet?
Is there any reason that my notebook still showing running a few hours after it's being finished. it's still not showing it's score
900 hidden samples
Has anyone faced a public score of 0 before?! I'm getting good results on the training and validation part of the data. I think it might be related to something other than the model performance! appreciate any help in advance.
I also get great results on the provided data but poor performance on the test set, I think it’s because the image scale is different
Just noticed that some folders inside train folder are not present in train label csv file. is that on purpose? or just something missing in data?
Oh, there is 8 folders more than labels, first time to notice
I've just checked and there is 648 of both, not sure where I've seen more folders than ids...
and just to be clear, all 648 folders are in labels csv