#blood-vessel-segmentation | Kaggle | Page 1

hollow schooner Nov 8, 2023, 2:18 PM

#

Anyone willing to join a team?

lavish lily Nov 9, 2023, 12:26 AM

#

hollow schooner Anyone willing to join a team?

I want!

severe stump Nov 9, 2023, 1:05 PM

#

what kind of label in this segmentation work? 0-background 1-kidney 2-vasculature?

lilac oak Nov 11, 2023, 5:37 AM

#

Hi! I would like to join a team. Please DM me if you have a vacancy.

agile pier Nov 12, 2023, 12:43 PM

#

What method are you guys using to obtain the run-length encoding based on the mask? There's one given in the discussion thread (https://www.kaggle.com/competitions/blood-vessel-segmentation/discussion/453831#2522158) but i think it is not consistent with the RLE given in 'train_rles.csv' (see my comment on the linked post).

SenNet + HOA - Hacking the Human Vasculature in 3D

Segment vasculature in 3D scans of human kidney

#

Nevermind, this (https://www.kaggle.com/code/paulorzp/run-length-encode-and-decode/script) is linked in the competition overview and it seems to work

Run-Length Encode and Decode

Explore and run machine learning code with Kaggle Notebooks | Using data from Carvana Image Masking Challenge

naive lintel Nov 12, 2023, 5:12 PM

#

Hey, I have worked extensively with 3D image segmentation and currently I'd be participating in Hacking the Human Vasculature in 3D. I am a final year UG student in the department of Information Technology. Past experiences: Summer Intern at Wells Fargo as SWE, Scholar at Google ML Bootcamp
HMU asap for team formation

thick vine Nov 12, 2023, 5:22 PM

#

Has anyone tried converting to a nifty or similar 3d format?

#

its a bit weird, considering we would only have 3 samples, but with 2d tiff we lose all spatial information

naive lintel Nov 13, 2023, 4:51 AM

#

thick vine its a bit weird, considering we would only have 3 samples, but with 2d tiff we l...

true

naive lintel Nov 13, 2023, 6:06 AM

#

Btw what's the difference between RLE masks in train.csv and the labels in train directory?

thick vine Nov 13, 2023, 7:44 AM

#

Rle is the encoded submission format

#

Labels are the actual binary masks

ionic comet Nov 14, 2023, 1:06 AM

#

Hi, first time competing. In the Code Competition FAQ it says that in some competitions it is allowed to upload/external data. I cannot see anything in the rules allowing/disallowing it. More experienced users know what the default is?

ionic comet Nov 14, 2023, 1:07 AM

#

thick vine its a bit weird, considering we would only have 3 samples, but with 2d tiff we l...

Yes, I agree. I think it's not smart to discard the very valuable 3D information, but it seems to me that the test datasets will be 2D instead of 3D, right?

thick vine Nov 14, 2023, 7:49 AM

#

It still seems 3d, just not a complete voi but a few slices

thick vine Nov 16, 2023, 3:03 PM

#

Any particular trick to denoise/make more clear the 6 test cases?

#

Very strong cv and holdout results, howerver the test samples are horrible

gleaming acorn Nov 16, 2023, 3:51 PM

#

thick vine Very strong cv and holdout results, howerver the test samples are horrible

got the same issue, test samples are so different from what i've seen in training

agile pier Nov 16, 2023, 4:53 PM

#

gleaming acorn got the same issue, test samples are so different from what i've seen in trainin...

What do you mean? How can you tell they are different?

thick vine Nov 16, 2023, 8:31 PM

#

Inspecting the images, they look like the initial slices of kidkey_1_dense

#

which have no annotations in their masks

agile pier Nov 16, 2023, 9:18 PM

#

thick vine Inspecting the images, they look like the initial slices of kidkey_1_dense

You cannot inspect the full test set yet, since kaggle only gives you access to a small subset of the test set (this is very common for image classification/segmentation competitions).
When you submit a model it will be tested on a much larger test set, also containing images with non-empty masks.

gleaming acorn Nov 16, 2023, 9:44 PM

#

agile pier What do you mean? How can you tell they are different?

i just meant that compared to a random image in the training set they looked a bit odd to me

thick vine Nov 16, 2023, 10:40 PM

#

agile pier You cannot inspect the full test set yet, since kaggle only gives you access to ...

ah ok, that does make sense. I never participated on a notebook competition, assumed we would get a larger test set later on, but was wandering if we needed to do something special with the 6 samples we got so far.
Thanks!

ionic comet Nov 17, 2023, 5:18 PM

#

agile pier You cannot inspect the full test set yet, since kaggle only gives you access to ...

So, how does it work? The notebook must be ready to be self-executed and process all the image files in the 'test' folder?

agile pier Nov 17, 2023, 5:26 PM

#

ionic comet So, how does it work? The notebook must be ready to be self-executed and proces...

Yes, at some point in your notebook you need to read the test files from "path/test/images", do your prediction and put your predictions in "submission.csv", then submit the csv file. When submitting the notebook Kaggle will read the proper test set where you read the images from "path/test/images".

ionic comet Nov 17, 2023, 5:27 PM

#

Can we pre-train the predictor off-line and just load it during evaluation time to cut time? The rules seem to indicate that the training and evaluation needs to be done in those <=9 hrs, right?

agile pier Nov 17, 2023, 6:01 PM

#

I'm not quite sure. I know that in a lot of previous competitions people have done exactly that, but it might depend on the rules of the competition. Better to ask that in the discussion forum on Kaggle.

thick vine Nov 18, 2023, 1:06 PM

#

ionic comet Can we pre-train the predictor off-line and just load it during evaluation time ...

Hello, did you get any answers on that?

ionic comet Nov 18, 2023, 2:21 PM

#

thick vine Hello, did you get any answers on that?

Not definitely, but I'm assuming it's not allowed until it's explicitly mentioned by the host.

thick vine Nov 18, 2023, 2:40 PM

#

ionic comet Not definitely, but I'm assuming it's not allowed until it's explicitly mentione...

Thanks!

#

altought, that is unfortunate, will probably drop from the comp. if that is the case

languid bison Nov 19, 2023, 1:14 PM

#

hollow schooner Anyone willing to join a team?

Hey i am willling to join ur team

languid bison Nov 19, 2023, 1:16 PM

#

naive lintel Hey, I have worked extensively with 3D image segmentation and currently I'd be p...

Hey i am willling to join ur team

rustic talon Nov 19, 2023, 6:04 PM

#

I'm wondering how much variation we can expect between the given training set and hidden test, could it be possible for some submissions to get resource constraints (memory/time-out<=9 hrs) that didn't exist during training, this could be an issue for some and an example is if the test set is of different image quality.

thick vine Nov 19, 2023, 6:55 PM

#

If inference takes more than 9h than there is something seriously wrong with the architecture

vestal igloo Nov 22, 2023, 7:00 AM

#

thick vine It still seems 3d, just not a complete voi but a few slices

if you could guess, would u say the top scores are all from 3D ones?

#

cause the baseline 2D public notebook i saw had 0.14 score

thick vine Nov 22, 2023, 8:37 AM

#

vestal igloo if you could guess, would u say the top scores are all from 3D ones?

2D, you have have 3 training samples in total from a 3D perspective, and since the data is stored in tif, with no additional metadata file, there is not enough information like spacing, orientation, centering, etc to make a 3D volume

vestal igloo Nov 22, 2023, 8:42 AM

#

ah all right. thanks

severe stump Nov 24, 2023, 8:05 AM

#

Are all the images in one folder the same size?

Is there a relationship between resolution and image size?

vestal igloo Nov 25, 2023, 8:18 AM

#

i was wondering about splitting the train and validation sets

#

in this case, ig we have to keep the kidneys separate right? is it appropriate to do traditional 80-20 splits here

thick vine Nov 25, 2023, 10:32 AM

#

idealy in medical imaging you want to split based on patients / study ids, however you only have 3 samples this time. One possible approach would be to take portions of the top, middle and bottom slices of each sample you will use. e.g, if you are going to use kidney 1 dense and voi, take x slices from th top middle and bottom of the dense and voi data

vestal igloo Nov 27, 2023, 4:25 PM

#

so, i've been following this notebook for submissions: https://www.kaggle.com/code/kashiwaba/sennet-hoa-inference-unet-simple-baseline/notebook, it mentions that As of Nov. 13, Sucoring Error occurs when empty mask is predicted as "1 0". To avoid this, this Notebook tentatively sets the empty mask prediction to "1 1". As soon as this issue is resolved, we plan to update the Notebook. However, here, https://www.kaggle.com/code/hengck23/lb0-534-baseline-simple-unet-seresnext26d-32x4/notebook 1 1 is still used. can anyone please elaborate on this? I would like to discuss the implications of it, like why is one or the other chosen/working.

LB0.534 - baseline: simple unet seresnext26d_32x4

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

agile pier Nov 28, 2023, 7:53 AM

#

vestal igloo so, i've been following this notebook for submissions: https://www.kaggle.com/co...

Honestly, i think everyone is still slightly confused about the OOM error regarding the competition metric. In this thread (https://www.kaggle.com/competitions/blood-vessel-segmentation/discussion/455220) one of the organizers of the competition wrote (about a week ago) that the competition metric should be fully functional now.
I've also heard that an abundance of small components in your predictions can cause the OOM error, so another approach would be to remove small regions of the predictions where the model is not very confident.

SenNet + HOA - Hacking the Human Vasculature in 3D

Segment vasculature in 3D scans of human kidney

#

Does anybody have a recommendation for 3D visualization of the data in a Kaggle notebook? In this thread (https://www.kaggle.com/competitions/blood-vessel-segmentation/discussion/457702) there's some code given to do this, but as the author mentions this library does not seem to function in Kaggle unfortunately.

SenNet + HOA - Hacking the Human Vasculature in 3D

Segment vasculature in 3D scans of human kidney

thick vine Nov 28, 2023, 11:16 AM

#

agile pier Does anybody have a recommendation for 3D visualization of the data in a Kaggle ...

not a notebook, but you can join all slices in a matrix, use SimpleITK to save it as a nifty, and view it in a lightweight medical imaging viewer such as itksnap

unreal knoll Nov 28, 2023, 12:16 PM

#

vestal igloo so, i've been following this notebook for submissions: https://www.kaggle.com/co...

agile pier Dec 6, 2023, 3:54 PM

#

Does anybody have an idea about how to train the model to generalize well? My current approach is to use Kidney 1 and 2 for training, Kidney 3 for validation. From kidneys 1 and 2 i randomly select 256x256 subtiles, rotate them, mirror them, normalize them and adjust their brightness; But while the model does seem to learn the training dataset (kidney 1,2) quite well, the performance on the validation dataset (kidney 3) does not seem to be improving. I'm using a standard U-Net architecture trained on the 2d slices of the ct scans. Any ideas on how to improve generalization here?

cosmic tundra Dec 7, 2023, 7:12 AM

#

agile pier Does anybody have an idea about how to train the model to generalize well? My cu...

how was your LB score? I trained my model on kidney 1 800x800 subtiles, only reach 0.36, after I trained the model with kidney 2 data the score droped, I feel like just increase the size of patch to capture more information doesn't guarantee to improve the performance on validation dataset

agile pier Dec 7, 2023, 8:50 AM

#

cosmic tundra how was your LB score? I trained my model on kidney 1 800x800 subtiles, only rea...

Interesting. I haven't submitted anything yet, only run local tests. Will submit something today and see how it goes. Thanks!

thick vine Dec 8, 2023, 6:04 PM

#

@agile pier quit the competition a while ago so dont fuly remember the details on the data, however from past experience in medical imaging segmentation, you want to get a high degree of variability in your training set.
Neither of the kidney datasets had a temporal continuity so using one entirely as a hold-out test set doesnt feel like a good choice. A suggestion would be to , e.g, remove 15% of each dataset to make your holdout, and use the remaing data for CV
that way you expose your model to a higher degree of data variability, which can be good since the test set is not disclosed so there is no way to know which resolution is better

agile pier Dec 9, 2023, 3:13 PM

#

thick vine <@695668075886018580> quit the competition a while ago so dont fuly remember the...

Thanks for the advice! But what do you mean by "no temporal continuity" of the datasets?

thick vine Dec 9, 2023, 4:15 PM

#

agile pier Thanks for the advice! But what do you mean by "no temporal continuity" of the d...

e.g say you ahd samples from 2021, 2022 and 2023. You would want to leave more of 2023 out for testing compared to the other cohorts, to see if the model performance was degradating

thick vine Dec 9, 2023, 4:36 PM

#

https://arxiv.org/ftp/arxiv/papers/2210/2210.05673.pdf

twilit sierra Dec 12, 2023, 10:21 PM

#

with a paper like the following. https://arxiv.org/pdf/2311.13319.pdf why does there even need to be a competition? It seems to cover the bases really well. What could be improved?

thick vine Dec 12, 2023, 11:49 PM

#

The only thing that comes to mind is smaller models / ensemble strategies / MoE, only because this is a notebook competition.
If this was a "regular" competition, going for anything othe than an nnUNet would be a bad decision, but since it needs to run on a notebook / within a limited time, the goal is probably to figure out some smart way to leverage smaller models
(pure speculation tho, might be completly wrong)

twilit sierra Dec 13, 2023, 12:34 AM

#

I see. I just got done looking at the uuNet docs. Also it look like the paper gives the right order. Conv2d (relu)-> MaxPool2xd ->Upsample Seems like the whole competition revolves on efficacy of preprocessing and false negtive, false positive handling.

keen rapids Dec 15, 2023, 6:17 PM

#

Hey guys just found out 395 images have all 0 masks:

https://www.kaggle.com/code/pranshubahadur/so-hoa-eda-395-rows-w-all-0-masks

SO+HOA-EDA: 395 rows /w all 0 masks

Explore and run machine learning code with Kaggle Notebooks | Using data from SenNet + HOA - Hacking the Human Vasculature in 3D

prisma grove Dec 15, 2023, 11:20 PM

#

sure, the ones on top and bottom

#

they are the slices of kidneys cross z axis, the vessels grow from the middle

keen rapids Dec 20, 2023, 5:29 AM

#

Ok got it - thought this was an oulier set haha

#

Hey btw is it just me or are submissions super hard 😂

marsh idol Dec 20, 2023, 6:56 AM

#

keen rapids Hey btw is it just me or are submissions super hard 😂

yeah Im really confused on how to submit

keen rapids Dec 20, 2023, 1:18 PM

#

hey guys how long does scoring take?

cosmic tundra Dec 20, 2023, 5:02 PM

#

it depends on your model size, mine was like 1:30 hours

vestal igloo Dec 20, 2023, 6:10 PM

#

I have some queries about the tiff image shapes. Might be very basic but please bear with me.

#

So when i read the images with cv2.IMREAD_UNCHANGED, the shapes are 2D. So, are the tiff images grayscale images i.e single channel?

#

in another baseline notebook i saw the images being read as cv2.IMREAD_GRAYSCALE. What is the reason and implication behind choosing these parameters? If its single channel, arent these basically same❓

marsh idol Dec 21, 2023, 6:36 AM

#

hey guys Im really confused about how submission works

#

do I just code random values in for the submissoin just so I have an output file

#

also am I able to see the outputs of my submission?

marsh idol Dec 21, 2023, 9:53 AM

#

also is anyone else here getting submision csv not found

prisma grove Dec 21, 2023, 7:56 PM

#

vestal igloo So when i read the images with cv2.IMREAD_UNCHANGED, the shapes are 2D. So, are ...

they are single channel with very high values

prisma grove Dec 21, 2023, 7:56 PM

#

vestal igloo in another baseline notebook i saw the images being read as cv2.IMREAD_GRAYSCALE...

I assume that they will be normalized

#

the thing is that different datasets could have different range values depending on the machine, the person, ... so normalize them somehow it helps

prisma grove Dec 21, 2023, 7:59 PM

#

marsh idol also is anyone else here getting submision csv not found

you must refer at submission window the name of the output file and write it with the same name at execution

prisma grove Dec 21, 2023, 8:01 PM

#

marsh idol do I just code random values in for the submissoin just so I have an output file

you have acces locally to 6 slices (3 per donant) at test folder. When submitting your inference code will execute on a secret test folder with the real slices to be scored. No, you can't see the actual submission.

#

There is a couple of threads about the submission issues The two main problems are the unforgiving metric score and it's computation cost additionally to the fact that real test is secret so the error messages can be criptic sometimes.

marsh idol Dec 21, 2023, 8:06 PM

#

prisma grove you have acces locally to 6 slices (3 per donant) at test folder. When submittin...

I ended up figuring out how to code the submission part

#

the problem now is that for some reason it’s not creating the csv during submission

prisma grove Dec 21, 2023, 8:07 PM

#

I suggest a very simple even dummy starting point as pd.DataFrame({'id':id,'rle':rle}).to_csv('submission.csv')

#

with just extracting the id from a general test folder and '1 0' to everything

#

from there change slowly to your solution

marsh idol Dec 21, 2023, 8:25 PM

#

ok

marsh idol Dec 22, 2023, 4:38 AM

#

@prisma grove do you know if the actual test data is in 3d form?

prisma grove Dec 22, 2023, 8:09 AM

#

As they say kidney_5 and kidney_6 are continuous slices from different donors. So yes, they should

marsh idol Dec 22, 2023, 5:12 PM

#

prisma grove As they say kidney_5 and kidney_6 are continuous slices from different donors. S...

so what’s the point of the 2d data?

marsh idol Dec 22, 2023, 6:13 PM

#

prisma grove As they say kidney_5 and kidney_6 are continuous slices from different donors. S...

ah this is probably why i’m getting errors. My setup is meant for 2D data

prisma grove Dec 22, 2023, 6:14 PM

#

the test are 2d slices from a continuous 3d kidney section, as the train ones, like a 3d puzzle

marsh idol Dec 23, 2023, 6:06 AM

#

#

anyone else here getting this?

prisma grove Dec 23, 2023, 7:37 AM

#

how you call your submission file, submission.csv?

marsh idol Dec 23, 2023, 4:13 PM

#

prisma grove how you call your submission file, submission.csv?

yes

marsh idol Dec 23, 2023, 7:55 PM

#

I got it working

ocean wraith Dec 23, 2023, 10:17 PM

#

Hello I'm new to this competition and this area and I want to participate in this competition mostly to learn and understand these type of projects . can someone explain the competition in simple terms ? what should we do and what type of task is this and what is the data
and what is the input and the expected output ?

Thanks

vestal igloo Dec 24, 2023, 3:06 AM

#

Hi! This is an image segmentation competition in simple terms. You have input images and corresponding masks, and the task is to build a model to accurately predict segmentation masks, which are binary images. Here, a part of the challenge is that the images are continuous 2D slices from a 3D scan, representing blood vessels in a kidney.

#

There are excellent starter notebook to help you get started. You'll also find a lot of resources in the discussions section. Try to read through the EDA notebooks to gain some more insights on the data.

ocean wraith Dec 24, 2023, 6:51 AM

#

Thanks a lot

marsh idol Dec 24, 2023, 7:52 PM

#

Hey I'm kind of confused on the kidney 3 dense dataset. For the images are we supposed to be using the kidney 3 sparse images?

prisma grove Dec 24, 2023, 11:39 PM

#

yes, the images are the same, but the masks are more detailed in dense, decide wich one to use or both

marsh idol Dec 25, 2023, 12:22 AM

#

ohhh okay that makes more sense

marsh idol Dec 26, 2023, 7:21 PM

#

anyone else getting the score error still?

ocean wraith Dec 26, 2023, 8:54 PM

#

do you have any starter example ?

prisma grove Dec 26, 2023, 10:59 PM

#

In case you are pading your inputs have you checked that you are croping to the original sizes before rle and submit?

marsh idol Dec 27, 2023, 9:29 PM

#

prisma grove In case you are pading your inputs have you checked that you are croping to the ...

Turns out my model was making wacky inferences

#

also I tried resubmitting one of my older models that did work with the newfound knowledge(recropping to original size) and I got a higher score

#

so I think now it’s just a matter of training a better model

marsh idol Dec 28, 2023, 8:03 PM

#

For anyone who has scored anything above the 0.00[x] range, how did you get past the 0.00[x] phase?

keen rapids Jan 2, 2024, 6:38 AM

#

hey guys so I figured out a way to use the orignal surface dice metric they provided!
https://www.kaggle.com/code/pranshubahadur/sn-hoa-hf-sdm-dl-training

lmk what yall think (especially if im doing it wrong haha)

SN+HOA: HF, SDM-DL (Training)

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

tawdry lantern Jan 4, 2024, 10:26 AM

#

Hello guys, I am new to this competition and have never participated in any image segmentation competition.
Can anyone help me by providing some insights on the competition.
Related to train bcz of such a huge data can the model be trained on kaggle btw?
dk how image segmentation competition works.
Thanks a lot!!

prisma grove Jan 4, 2024, 7:48 PM

#

At the moment there are some high score public notebooks that you can check. Their score is 99% based on hyperparameter tuning/optimization but the basis is still close to the standard state of the art.

#

The data are some datastes of *.tif corresponding to the 2D slices of a 3D volume from kidneys of different donnors. For each datasets you have the inputs on images folder and the targets on labels folder.

#

The objective is to classify each of the pixels of images into vessel or not vessel. The most common model for this is UNet, a series of CNN separated by downscaling (ex: maxpooling) and upscaling steps. With concatenating bridges between the compatible down/up steps. So from a starting input (in the 2D case) HxW you get a set of HxW features that u can use in a final 1x1 convolution to get a classification for each pixel.

#

You can train on kaggle yes.

tawdry lantern Jan 5, 2024, 5:44 AM

#

@prisma grove woah Thanks a lot dude!!

vestal igloo Jan 6, 2024, 9:08 AM

#

hello, i've prepared this notebook for 3D training w/ MONAI so you can more easily experminent with SOTA networks and tools. Have a look and let me know what you think, and thanks for the time: https://www.kaggle.com/code/tahseenislamsajon/sennet-hoa-3d-training-monai-pytorch

[SenNet + HOA] 3D Training - Monai + PyTorch 🚀

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

mellow tartan Jan 7, 2024, 10:03 AM

#

Hello guys, I'm new to this competition. What does the voi data at 5.2um resolution use for?

prisma grove Jan 7, 2024, 11:15 AM

#

you can check robusthness of your model to different resolution or you can train robusthness with it

marsh idol Jan 7, 2024, 5:35 PM

#

vestal igloo hello, i've prepared this notebook for 3D training w/ MONAI so you can more easi...

FINALLY another MONAI USER!

silk vault Jan 8, 2024, 4:14 AM

#

Hi, I'm new to the competition.
I inspected the test images of kidney 5/6 and noticed they appeared identical.
To confirm, I checked the file hashes and found they were indeed the same.
Why do we need two identical sets of images?

silk vault Jan 8, 2024, 6:08 AM

#

The kidney 5/6 images seem to be the first 3 from kidney_1_images without the labels.
Does this mean that submitting the corresponding labels would yield 100%?
In which case, should I be holding out the images from kidney_1_dense during training?
Apologies in advance if the question is dumb, this is my first competition.

prisma grove Jan 8, 2024, 9:11 AM

#

thats a local example test folder

#

the submission will be executed on another secret test folder

silk vault Jan 8, 2024, 9:24 AM

#

Ahh I see, thanks!

stray roost Jan 8, 2024, 1:31 PM

#

Finished: webinar about SenNet blood-vessel-segmentation.

Today was live video webinar about SenNet competition on Telegram.

Link:
https://t.me/data_science_winners

Time:
Finished

Competitive Data Science | channel

Стримы по разбору Data Science чемпионатов. Новостной канал курса. Тут авторы курса публикуют новости и полезности Ссылка на курс: https://stepik.org/a/108888 Ссылка на чат: https://t.me/+SIW3ZaFaBvdjNzNi @Aleron

vestal igloo Jan 8, 2024, 6:53 PM

#

was it in english?

#

the webinar

wicked wraith Jan 10, 2024, 3:09 PM

#

Hello guys. I was getting a submission error with my predictions so I tried to write a submission.csv file with only 1 0 as rle segmentation (basically no inference) and I still get a submission error...

This is the code that I use to write the resulting csv file

submission = []
for dataset_name, test_data_loader in test_dataloaders.items():
    for idx, batch in enumerate(test_data_loader):
        print(dataset_name, idx)
        rle_predictions = '1 0'#rle_encode(prediction)
        id_ = f"{dataset_name}_{idx:04}"
        submission_data = {"id": id_, "rle": rle_predictions}
        current_submission = pd.DataFrame(data= submission_data, index=[0])
        submission.append(current_submission)
submission_df = pd.concat(submission)
submission_df.to_csv("submission.csv", index=False)
print(submission)

EDIT 0: deleting unnecessary lines of code
EDIT 1: I forgot to add that the notebook is not throwing any exception, it is the scoring part that is giving the error
EDIT 2: I also tried to use 1 1 as encoding for empty masks, it stills fails

prisma grove Jan 10, 2024, 4:42 PM

#

enumerate starts from 0, no one says that actual test idx has to start from 0

wicked wraith Jan 10, 2024, 6:33 PM

#

That makes perfect sense, thank you very much for your help!

manic barn Jan 24, 2024, 7:43 PM

#

Haven't found in the discussion, has anybody posted a notebook/script to go through all of the steps of

pytorch tensor of full 3d segmentation --> rle encoding --> calculating suface dice score using the method in the competition

?

manic barn Jan 24, 2024, 8:02 PM

#

all I can find is methods that process 1 .tif slice at a time and go directly to RLE from there so will have to slightly modify that I guess.... Just really need to be careful I don't do the encoding incorrectly

prisma grove Jan 24, 2024, 9:20 PM

#

I did a while ago a first attempt with a custom model: https://www.kaggle.com/code/sacuscreed/unet3d-torch-gpu There is the full process with terrible results. I'nt actually tested the inference,but should work. And more recently a more serious notebook of how to generalize any SMP pretrained back bone to 3D. Is the best results I've got with 3D Unet: https://www.kaggle.com/code/sacuscreed/adapting-2d-smp-pretrained-back-bones-to-3d Is not exactly my model, but is how I started it.

Unet3D torch GPU

Explore and run machine learning code with Kaggle Notebooks | Using data from SenNet + HOA - Hacking the Human Vasculature in 3D

Adapting 2D SMP pretrained back bones to 3D

Explore and run machine learning code with Kaggle Notebooks | Using data from SenNet + HOA - Hacking the Human Vasculature in 3D

prisma grove Jan 25, 2024, 9:19 AM

#

Newbe technical question about kaggle, if anyone knows. I made a dataset from the output of a notebook a couple of days ago. Yesterday I edited that notebook and ran it again, the output apparently has been directly saved on the previous dataset since there is no other files in the finished notebook. But the dataset info doesnt shows that has been updated. Can I trust that the output is in the dataset or I've lost 10 hours of GPU quota? Thanks.

mellow tartan Jan 25, 2024, 8:35 PM

#

Sorry I have a stupid question. What is mean by train on XY,XZ,YZ? Is that mean training the model in multi-view? To achieve that do I need to train 3 separate U-net and obtain a fusion result?

prisma grove Jan 25, 2024, 9:19 PM

#

the orriginal data (each *.tif file) are de XY slices cros Z axis. For each dataset all of them compose the full XYZ volume. So XY,XZ,YZ means slice the full volume in the resting axis (slicing Y axis gives XZ planes and slicing X axis gives YZ planes). Is just a trick to triple the original 2D slices for training mainly 2D models. No, you dont need 3 models. The vessels are vessels, no matter form where you look at them. And in a more accurate sense, the spacing grid should be symmetric.

mellow tartan Jan 25, 2024, 9:59 PM

#

Thanks for the explanation

prisma grove Jan 25, 2024, 10:38 PM

#

I think you've got it already. But just to be more clear, the full volume would be V.shape = (Z,X,Y)

#

initial slices would be each of the Z possible slices XY V= [z,:,:]

#

same for X slices ZY V=[:,x,:]

#

and finally for Y slices ZX V=[:,:,y]

mellow tartan Jan 26, 2024, 1:05 PM

#

That make perfect scene! Thanks again for the explanation

manic barn Jan 26, 2024, 4:04 PM

#

Was able to figure out the submission process and can get my 3d output through to get a score, although wondering if I'm doing something wrong in my encoding because LB score is lower than I would have thought. I've got the submission down, but still have not found good resources that go through calculating surface dice locally though. @prisma grove I didn't see anything in those notebooks you linked about surface dice calculation unfortunately.

How are y'all calculating it, are you just submitting to the LB to get a surface dice score?

#

I am also still a bit confused about the rle and the "3d" vs "2d" surface dice calc mode. My model works w/ 3d awareness so would very much like the calc to happen in 3d mode. Right now for rle encoding I am just coming up with columns 'id' and 'rle', but I see there's options for more columns such as 'width', 'height', 'group', 'slice' etc. Should I be using those?? no clue.

Also, getting ~.89-.94 regular dice locally, and picking up small vessels decently, but surface dice from comp came back at .485, does this line up with what y'all have seen? would not imagine such a drop in score for surface dice

manic barn Jan 26, 2024, 5:00 PM

#

Oop should have been looking at the notebooks tab not just discussion I am silly 🙈

flat zealot Jan 30, 2024, 7:27 PM

#

is it required for IDs in the submission.csv file to be in sequential order. kidney_2_0000.tif to kidney_2_0005?

prisma grove Jan 30, 2024, 7:28 PM

#

read them and reutilise them sorted, dont try generate them

flat zealot Jan 30, 2024, 7:38 PM

#

okay thanks will do that!

flat zealot Jan 30, 2024, 8:06 PM

#

this is my submission.csv. is it due to missing id? should i develop custom code to identify any missing ids in the submissions and append them with 1 1 as the rle?

prisma grove Jan 30, 2024, 8:19 PM

#

why kidney_2?

#

at test folder there it will be some files, apparently kidney_5 and kidney_6, but dont assume anything, will be easyer

#

make a general code that reads the name files in test, sort and save them in a list (keys for example)

#

"should i develop custom code to identify any missing ids in the submissions and append them with 1 1 as the rle?" No, at real test folder there is all the files that need to be submitted, the test files at local folder are just an example

flat zealot Jan 30, 2024, 8:41 PM

#

oops right thank you for the guidance!

vestal igloo Jan 31, 2024, 7:13 AM

#

Can someone please explain the implications of different TH percentiles in the top scoring training notebooks? Like this one: https://www.kaggle.com/code/misakimatsutomo/inference-1024-should-have-a-percentile-of-0-00149

it seems the authors experimented with many percentiles. Also, any discussion on how to choose these percentiles would be very much appreciated. I'm new to image segmentation in general and looking to learn. Thanks.

Inference 1024 should have a percentile of 0.00149

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

prisma grove Jan 31, 2024, 7:56 AM

#

Most of models (any I can imagine) have a final output (or can be easily projected into) that can be interpreted as a score to be mask or not. Let's say a number between 0 and 1, closer to zero is not, closer to one it is. The natural threshold (TH) would be 0'5. So scores > .5 would be labeled as mask. Those notebooks tune threshold so the amount of positives (and probably false positives) is reduced.

#

Note that TH percentile not tune directly threshold. It determines it as the value corresponding to the top higher scored percentile.

#

A risky practice in my opinion since as I sayd the natural value and the one have been training our models is 0'5. I don't think public test optimized value would be generalizable to private or other samples.

vestal igloo Feb 4, 2024, 11:17 AM

#

Sorry @prisma grove I had to go out of town. I understood the threshold but didn't quite get the TH percentile and what you meant there. Basically could u please elaborate a bit ur last two paragraphs

prisma grove Feb 4, 2024, 11:18 AM

#

the output is flatten sorted (np.partitioned) by scores

#

so you have a ranking of scores to be mask, the higher scored are more probable mask

#

the natural process would be say everithing up .5 is mask since the models train in that behavior

#

but those notebooks tune the TH (threshold) choosing it as the one that keeps labeled a certain portion (percentile) of the output

#

so percentile of 0.00149 percentile takes the top scored percentile of 0.00149, sees whats the limiting score, and labels everything upon that score as mask

vestal igloo Feb 4, 2024, 11:50 AM

#

ahhh. so i understand now why it might not generalize well to the test data