#blood-vessel-segmentation

1 messages · Page 1 of 1 (latest)

hollow schooner
#

Anyone willing to join a team?

lavish lily
severe stump
#

what kind of label in this segmentation work? 0-background 1-kidney 2-vasculature?

lilac oak
#

Hi! I would like to join a team. Please DM me if you have a vacancy.

agile pier
naive lintel
#

Hey, I have worked extensively with 3D image segmentation and currently I'd be participating in Hacking the Human Vasculature in 3D. I am a final year UG student in the department of Information Technology. Past experiences: Summer Intern at Wells Fargo as SWE, Scholar at Google ML Bootcamp
HMU asap for team formation

thick vine
#

Has anyone tried converting to a nifty or similar 3d format?

#

its a bit weird, considering we would only have 3 samples, but with 2d tiff we lose all spatial information

naive lintel
#

Btw what's the difference between RLE masks in train.csv and the labels in train directory?

thick vine
#

Rle is the encoded submission format

#

Labels are the actual binary masks

ionic comet
#

Hi, first time competing. In the Code Competition FAQ it says that in some competitions it is allowed to upload/external data. I cannot see anything in the rules allowing/disallowing it. More experienced users know what the default is?

ionic comet
thick vine
#

It still seems 3d, just not a complete voi but a few slices

thick vine
#

Any particular trick to denoise/make more clear the 6 test cases?

#

Very strong cv and holdout results, howerver the test samples are horrible

gleaming acorn
agile pier
thick vine
#

Inspecting the images, they look like the initial slices of kidkey_1_dense

#

which have no annotations in their masks

agile pier
gleaming acorn
thick vine
ionic comet
agile pier
ionic comet
#

Can we pre-train the predictor off-line and just load it during evaluation time to cut time? The rules seem to indicate that the training and evaluation needs to be done in those <=9 hrs, right?

agile pier
#

I'm not quite sure. I know that in a lot of previous competitions people have done exactly that, but it might depend on the rules of the competition. Better to ask that in the discussion forum on Kaggle.

thick vine
ionic comet
thick vine
#

altought, that is unfortunate, will probably drop from the comp. if that is the case

languid bison
languid bison
rustic talon
#

I'm wondering how much variation we can expect between the given training set and hidden test, could it be possible for some submissions to get resource constraints (memory/time-out<=9 hrs) that didn't exist during training, this could be an issue for some and an example is if the test set is of different image quality.

thick vine
#

If inference takes more than 9h than there is something seriously wrong with the architecture

vestal igloo
#

cause the baseline 2D public notebook i saw had 0.14 score

thick vine
vestal igloo
#

ah all right. thanks

severe stump
#

Are all the images in one folder the same size?

Is there a relationship between resolution and image size?

vestal igloo
#

i was wondering about splitting the train and validation sets

#

in this case, ig we have to keep the kidneys separate right? is it appropriate to do traditional 80-20 splits here

thick vine
#

idealy in medical imaging you want to split based on patients / study ids, however you only have 3 samples this time. One possible approach would be to take portions of the top, middle and bottom slices of each sample you will use. e.g, if you are going to use kidney 1 dense and voi, take x slices from th top middle and bottom of the dense and voi data

vestal igloo
#

so, i've been following this notebook for submissions: https://www.kaggle.com/code/kashiwaba/sennet-hoa-inference-unet-simple-baseline/notebook, it mentions that As of Nov. 13, Sucoring Error occurs when empty mask is predicted as "1 0". To avoid this, this Notebook tentatively sets the empty mask prediction to "1 1". As soon as this issue is resolved, we plan to update the Notebook. However, here, https://www.kaggle.com/code/hengck23/lb0-534-baseline-simple-unet-seresnext26d-32x4/notebook 1 1 is still used. can anyone please elaborate on this? I would like to discuss the implications of it, like why is one or the other chosen/working.

agile pier
# vestal igloo so, i've been following this notebook for submissions: https://www.kaggle.com/co...

Honestly, i think everyone is still slightly confused about the OOM error regarding the competition metric. In this thread (https://www.kaggle.com/competitions/blood-vessel-segmentation/discussion/455220) one of the organizers of the competition wrote (about a week ago) that the competition metric should be fully functional now.
I've also heard that an abundance of small components in your predictions can cause the OOM error, so another approach would be to remove small regions of the predictions where the model is not very confident.

thick vine
agile pier
#

Does anybody have an idea about how to train the model to generalize well? My current approach is to use Kidney 1 and 2 for training, Kidney 3 for validation. From kidneys 1 and 2 i randomly select 256x256 subtiles, rotate them, mirror them, normalize them and adjust their brightness; But while the model does seem to learn the training dataset (kidney 1,2) quite well, the performance on the validation dataset (kidney 3) does not seem to be improving. I'm using a standard U-Net architecture trained on the 2d slices of the ct scans. Any ideas on how to improve generalization here?

cosmic tundra
agile pier
thick vine
#

@agile pier quit the competition a while ago so dont fuly remember the details on the data, however from past experience in medical imaging segmentation, you want to get a high degree of variability in your training set.
Neither of the kidney datasets had a temporal continuity so using one entirely as a hold-out test set doesnt feel like a good choice. A suggestion would be to , e.g, remove 15% of each dataset to make your holdout, and use the remaing data for CV
that way you expose your model to a higher degree of data variability, which can be good since the test set is not disclosed so there is no way to know which resolution is better

agile pier
thick vine
twilit sierra
thick vine
#

The only thing that comes to mind is smaller models / ensemble strategies / MoE, only because this is a notebook competition.
If this was a "regular" competition, going for anything othe than an nnUNet would be a bad decision, but since it needs to run on a notebook / within a limited time, the goal is probably to figure out some smart way to leverage smaller models
(pure speculation tho, might be completly wrong)

twilit sierra
#

I see. I just got done looking at the uuNet docs. Also it look like the paper gives the right order. Conv2d (relu)-> MaxPool2xd ->Upsample Seems like the whole competition revolves on efficacy of preprocessing and false negtive, false positive handling.

keen rapids
prisma grove
#

sure, the ones on top and bottom

#

they are the slices of kidneys cross z axis, the vessels grow from the middle

keen rapids
#

Ok got it - thought this was an oulier set haha

#

Hey btw is it just me or are submissions super hard 😂

marsh idol
keen rapids
#

hey guys how long does scoring take?

cosmic tundra
#

it depends on your model size, mine was like 1:30 hours

vestal igloo
#

I have some queries about the tiff image shapes. Might be very basic but please bear with me.

#

So when i read the images with cv2.IMREAD_UNCHANGED, the shapes are 2D. So, are the tiff images grayscale images i.e single channel?

#

in another baseline notebook i saw the images being read as cv2.IMREAD_GRAYSCALE. What is the reason and implication behind choosing these parameters? If its single channel, arent these basically same❓

marsh idol
#

hey guys Im really confused about how submission works

#

do I just code random values in for the submissoin just so I have an output file

#

also am I able to see the outputs of my submission?

marsh idol
#

also is anyone else here getting submision csv not found

prisma grove
prisma grove
#

the thing is that different datasets could have different range values depending on the machine, the person, ... so normalize them somehow it helps

prisma grove
prisma grove
#

There is a couple of threads about the submission issues The two main problems are the unforgiving metric score and it's computation cost additionally to the fact that real test is secret so the error messages can be criptic sometimes.

marsh idol
#

the problem now is that for some reason it’s not creating the csv during submission

prisma grove
#

I suggest a very simple even dummy starting point as pd.DataFrame({'id':id,'rle':rle}).to_csv('submission.csv')

#

with just extracting the id from a general test folder and '1 0' to everything

#

from there change slowly to your solution

marsh idol
#

ok

marsh idol
#

@prisma grove do you know if the actual test data is in 3d form?

prisma grove
#

As they say kidney_5 and kidney_6 are continuous slices from different donors. So yes, they should

marsh idol
marsh idol
prisma grove
#

the test are 2d slices from a continuous 3d kidney section, as the train ones, like a 3d puzzle

marsh idol
#

anyone else here getting this?

prisma grove
#

how you call your submission file, submission.csv?

marsh idol
#

I got it working

ocean wraith
#

Hello I'm new to this competition and this area and I want to participate in this competition mostly to learn and understand these type of projects . can someone explain the competition in simple terms ? what should we do and what type of task is this and what is the data
and what is the input and the expected output ?

Thanks

vestal igloo
#

Hi! This is an image segmentation competition in simple terms. You have input images and corresponding masks, and the task is to build a model to accurately predict segmentation masks, which are binary images. Here, a part of the challenge is that the images are continuous 2D slices from a 3D scan, representing blood vessels in a kidney.

#

There are excellent starter notebook to help you get started. You'll also find a lot of resources in the discussions section. Try to read through the EDA notebooks to gain some more insights on the data.

ocean wraith
#

Thanks a lot

marsh idol
#

Hey I'm kind of confused on the kidney 3 dense dataset. For the images are we supposed to be using the kidney 3 sparse images?

prisma grove
#

yes, the images are the same, but the masks are more detailed in dense, decide wich one to use or both

marsh idol
#

ohhh okay that makes more sense

marsh idol
#

anyone else getting the score error still?

ocean wraith
#

do you have any starter example ?

prisma grove
#

In case you are pading your inputs have you checked that you are croping to the original sizes before rle and submit?

marsh idol
#

also I tried resubmitting one of my older models that did work with the newfound knowledge(recropping to original size) and I got a higher score

#

so I think now it’s just a matter of training a better model

marsh idol
#

For anyone who has scored anything above the 0.00[x] range, how did you get past the 0.00[x] phase?

keen rapids
tawdry lantern
#

Hello guys, I am new to this competition and have never participated in any image segmentation competition.
Can anyone help me by providing some insights on the competition.
Related to train bcz of such a huge data can the model be trained on kaggle btw?
dk how image segmentation competition works.
Thanks a lot!!

prisma grove
#

At the moment there are some high score public notebooks that you can check. Their score is 99% based on hyperparameter tuning/optimization but the basis is still close to the standard state of the art.

#

The data are some datastes of *.tif corresponding to the 2D slices of a 3D volume from kidneys of different donnors. For each datasets you have the inputs on images folder and the targets on labels folder.

#

The objective is to classify each of the pixels of images into vessel or not vessel. The most common model for this is UNet, a series of CNN separated by downscaling (ex: maxpooling) and upscaling steps. With concatenating bridges between the compatible down/up steps. So from a starting input (in the 2D case) HxW you get a set of HxW features that u can use in a final 1x1 convolution to get a classification for each pixel.

#

You can train on kaggle yes.

tawdry lantern
#

@prisma grove woah Thanks a lot dude!!

vestal igloo
mellow tartan
#

Hello guys, I'm new to this competition. What does the voi data at 5.2um resolution use for?

prisma grove
#

you can check robusthness of your model to different resolution or you can train robusthness with it

silk vault
#

Hi, I'm new to the competition.
I inspected the test images of kidney 5/6 and noticed they appeared identical.
To confirm, I checked the file hashes and found they were indeed the same.
Why do we need two identical sets of images?

silk vault
#

The kidney 5/6 images seem to be the first 3 from kidney_1_images without the labels.
Does this mean that submitting the corresponding labels would yield 100%?
In which case, should I be holding out the images from kidney_1_dense during training?
Apologies in advance if the question is dumb, this is my first competition.

prisma grove
#

thats a local example test folder

#

the submission will be executed on another secret test folder

silk vault
#

Ahh I see, thanks!

stray roost
#

Finished: webinar about SenNet blood-vessel-segmentation.

Today was live video webinar about SenNet competition on Telegram.

Link:
https://t.me/data_science_winners

Time:
Finished

vestal igloo
#

was it in english?

#

the webinar

wicked wraith
#

Hello guys. I was getting a submission error with my predictions so I tried to write a submission.csv file with only 1 0 as rle segmentation (basically no inference) and I still get a submission error...

This is the code that I use to write the resulting csv file

submission = []
for dataset_name, test_data_loader in test_dataloaders.items():
    for idx, batch in enumerate(test_data_loader):
        print(dataset_name, idx)
        rle_predictions = '1 0'#rle_encode(prediction)
        id_ = f"{dataset_name}_{idx:04}"
        submission_data = {"id": id_, "rle": rle_predictions}
        current_submission = pd.DataFrame(data= submission_data, index=[0])
        submission.append(current_submission)
submission_df = pd.concat(submission)
submission_df.to_csv("submission.csv", index=False)
print(submission)

EDIT 0: deleting unnecessary lines of code
EDIT 1: I forgot to add that the notebook is not throwing any exception, it is the scoring part that is giving the error
EDIT 2: I also tried to use 1 1 as encoding for empty masks, it stills fails


prisma grove
#

enumerate starts from 0, no one says that actual test idx has to start from 0

wicked wraith
#

That makes perfect sense, thank you very much for your help!

manic barn
#

Haven't found in the discussion, has anybody posted a notebook/script to go through all of the steps of

pytorch tensor of full 3d segmentation --> rle encoding --> calculating suface dice score using the method in the competition

?

manic barn
#

all I can find is methods that process 1 .tif slice at a time and go directly to RLE from there so will have to slightly modify that I guess.... Just really need to be careful I don't do the encoding incorrectly

prisma grove
#

I did a while ago a first attempt with a custom model: https://www.kaggle.com/code/sacuscreed/unet3d-torch-gpu There is the full process with terrible results. I'nt actually tested the inference,but should work. And more recently a more serious notebook of how to generalize any SMP pretrained back bone to 3D. Is the best results I've got with 3D Unet: https://www.kaggle.com/code/sacuscreed/adapting-2d-smp-pretrained-back-bones-to-3d Is not exactly my model, but is how I started it.

prisma grove
#

Newbe technical question about kaggle, if anyone knows. I made a dataset from the output of a notebook a couple of days ago. Yesterday I edited that notebook and ran it again, the output apparently has been directly saved on the previous dataset since there is no other files in the finished notebook. But the dataset info doesnt shows that has been updated. Can I trust that the output is in the dataset or I've lost 10 hours of GPU quota? Thanks.

mellow tartan
#

Sorry I have a stupid question. What is mean by train on XY,XZ,YZ? Is that mean training the model in multi-view? To achieve that do I need to train 3 separate U-net and obtain a fusion result?

prisma grove
#

the orriginal data (each *.tif file) are de XY slices cros Z axis. For each dataset all of them compose the full XYZ volume. So XY,XZ,YZ means slice the full volume in the resting axis (slicing Y axis gives XZ planes and slicing X axis gives YZ planes). Is just a trick to triple the original 2D slices for training mainly 2D models. No, you dont need 3 models. The vessels are vessels, no matter form where you look at them. And in a more accurate sense, the spacing grid should be symmetric.

mellow tartan
#

Thanks for the explanation

prisma grove
#

I think you've got it already. But just to be more clear, the full volume would be V.shape = (Z,X,Y)

#

initial slices would be each of the Z possible slices XY V= [z,:,:]

#

same for X slices ZY V=[:,x,:]

#

and finally for Y slices ZX V=[:,:,y]

mellow tartan
#

That make perfect scene! Thanks again for the explanation

manic barn
#

Was able to figure out the submission process and can get my 3d output through to get a score, although wondering if I'm doing something wrong in my encoding because LB score is lower than I would have thought. I've got the submission down, but still have not found good resources that go through calculating surface dice locally though. @prisma grove I didn't see anything in those notebooks you linked about surface dice calculation unfortunately.

How are y'all calculating it, are you just submitting to the LB to get a surface dice score?

#

I am also still a bit confused about the rle and the "3d" vs "2d" surface dice calc mode. My model works w/ 3d awareness so would very much like the calc to happen in 3d mode. Right now for rle encoding I am just coming up with columns 'id' and 'rle', but I see there's options for more columns such as 'width', 'height', 'group', 'slice' etc. Should I be using those?? no clue.

Also, getting ~.89-.94 regular dice locally, and picking up small vessels decently, but surface dice from comp came back at .485, does this line up with what y'all have seen? would not imagine such a drop in score for surface dice

manic barn
#

Oop should have been looking at the notebooks tab not just discussion I am silly 🙈

flat zealot
#

is it required for IDs in the submission.csv file to be in sequential order. kidney_2_0000.tif to kidney_2_0005?

prisma grove
#

read them and reutilise them sorted, dont try generate them

flat zealot
#

okay thanks will do that!

flat zealot
#

this is my submission.csv. is it due to missing id? should i develop custom code to identify any missing ids in the submissions and append them with 1 1 as the rle?

prisma grove
#

why kidney_2?

#

at test folder there it will be some files, apparently kidney_5 and kidney_6, but dont assume anything, will be easyer

#

make a general code that reads the name files in test, sort and save them in a list (keys for example)

#

"should i develop custom code to identify any missing ids in the submissions and append them with 1 1 as the rle?" No, at real test folder there is all the files that need to be submitted, the test files at local folder are just an example

flat zealot
#

oops right thank you for the guidance!

vestal igloo
#

Can someone please explain the implications of different TH percentiles in the top scoring training notebooks? Like this one: https://www.kaggle.com/code/misakimatsutomo/inference-1024-should-have-a-percentile-of-0-00149

it seems the authors experimented with many percentiles. Also, any discussion on how to choose these percentiles would be very much appreciated. I'm new to image segmentation in general and looking to learn. Thanks.

prisma grove
#

Most of models (any I can imagine) have a final output (or can be easily projected into) that can be interpreted as a score to be mask or not. Let's say a number between 0 and 1, closer to zero is not, closer to one it is. The natural threshold (TH) would be 0'5. So scores > .5 would be labeled as mask. Those notebooks tune threshold so the amount of positives (and probably false positives) is reduced.

#

Note that TH percentile not tune directly threshold. It determines it as the value corresponding to the top higher scored percentile.

#

A risky practice in my opinion since as I sayd the natural value and the one have been training our models is 0'5. I don't think public test optimized value would be generalizable to private or other samples.

vestal igloo
#

Sorry @prisma grove I had to go out of town. I understood the threshold but didn't quite get the TH percentile and what you meant there. Basically could u please elaborate a bit ur last two paragraphs

prisma grove
#

the output is flatten sorted (np.partitioned) by scores

#

so you have a ranking of scores to be mask, the higher scored are more probable mask

#

the natural process would be say everithing up .5 is mask since the models train in that behavior

#

but those notebooks tune the TH (threshold) choosing it as the one that keeps labeled a certain portion (percentile) of the output

#

so percentile of 0.00149 percentile takes the top scored percentile of 0.00149, sees whats the limiting score, and labels everything upon that score as mask

vestal igloo
#

ahhh. so i understand now why it might not generalize well to the test data