#rsna-2024-lumbar-spine-degenerative-classification

1 messages · Page 1 of 1 (latest)

royal aurora
#

Anybody else really struggling to create a submission that is excepted?

craggy oak
#

I don't understand how the submission file should be generated? Do we only need to make predictions for the one study (44036939) that's in the test_images folder?
And I don't understand this part -> “This competition uses a hidden test. When your submitted notebook is scored, the actual test data (including a full length sample submission) will be made available to your notebook.” Do I have to go through the test_images folder and make predictions for each study in it (as it says there will be other data added when submitting)? So our submission.csv file has to have more than 25 lines? What is checked in our notebook and in our submission file?

warm hawk
#

I haven't tried anything yet but...

#

if it's a code competition with hidden test the local test that you see is only an example

#

your code will be executed on another environment with the actual test

#

so must be capable of read a general test folder with unknown number of samples with unknowns id

#

so don't assume anything

#

make your code as simple as you can in terms of reading test and predicting it

#

read the samples and the keys, store them, and use them to predict and generate the corresponding sumbmission.csv without any assumption about the format. Use the keys as they've been readed

warm hawk
#

since there is no test.csv

#

I suspect you must find the hidden test id on train.csv and extract the other relevant information to generate the key from there

white valley
#

hi, everyone. I wish to know if this competition has anything to do with image registration or is it more like a pure detection task? I'm interested in learning stuffs like Voxel-Morph or Unet and I'm not sure if it play some part in this competition. Thank you.

warm hawk
#

Hi. We have not any Voxel-Morph labels so, although you're free to be imaginative, doesn't looks like detection task.

#

May be you could try some kind of semisupervised approach for autogenerate them on positives... is not trivial.

white valley
#

sounds great, thanks

split geyser
lucid kiln
#

Hi @everyone, can someone please confirm if instance number is slice number for a 3D image and there are some images which don't have a level do we have to remove them? Images vary by study_id so are not all patients covered of 5 conditions on 5 levels?(ideally 5X5=25 images)

slender bluff
split geyser
slender bluff
#

creating a point ROI at the coordinate and affining that transformation matrix post registration to the ROI to get new coordinates should fix the issue. if someone were to proceed with registration.

unborn haven
#

Hi @everyone, I've been first trying to extract the ROIs using the train label coordinates. My error metric is MSELoss, but I'm kind of stuck because no matter what I've tried the train loss is always 10 times more than the val loss. The more data I use the higher magnitude of loss I get, but they do converge it's just at a really high loss. Any suggestion are appreciated.

opal moth
#

Would anyone be willing to clarify how the instance number relates to the image sequences or the images themselves?

sterile vesselBOT
#
7adieu has been warned

Reason: Bad word usage

rain token
#

Hello all, am I wrong thinking that the way to go should be to have an object detection model for each condition_level and then a classifier for the severity ? I haven't seen any notebook going this path yet

warm hawk
#

At the beggining I thought we haven't any information about Voxel-Morph labels, but I was wrong

#

We have the 2D pixel of the center of the label. What's more than nothing to train detection models.

#

[x/y] - The x/y coordinates for the center of the area that defined the label.

#

So yes. Seems a reasonable schema to proceed.

olive sentinel
#

Looking for a teammate

manic shale
smoky bone
#

Hello people! My name is paul and I'm currently on going for the RSNA 2024 lumbar spine.... competition, but I need some help. I understand the data given to me by the competition, but I don't get how to format the x_train/test and/or the y_train/test. Basically, I don't know what to do with the data I'm given, I'm stuck with what I should load the model and what the output should be. Please help if you can.

zenith pilot
#

Hi everyone,
My notebook takes 2.5 hours to run, but my current submission has been running for 5 hours and still isn't finished. Is this normal? If not, what could be causing the issue, and how can I resolve it?

gloomy yoke
#

does anyone know what the folder names (the long numbers ex. 4003253) mean ??

warm hawk
warm hawk
glossy rock
#

Hello everyone, I am venkatkumar. Currently I will look for teammate in this competition. I like to learn this data and also try to achieve good lb in this competition

If you anyone interested ping me

https://www.kaggle.com/venkatkumar001

icy gorge
pulsar field
warm hawk
#

Yes, probably was another puntual bug, and 2.5 hours was already for hidden.

warm hawk
#

or training included

icy gorge
#

@warm hawk - The final images are in variable "coor_entries" . In the function "train_test_split()" , where would variable "coor_entries" and labels being assigned for getting test and train samples. The function is below ?

X_temp, X_test, y_temp, y_test = train_test_split(np.array(images_decreased),labels , test_size=0.1, random_state=42,stratify=labels)

warm hawk
#

If what you want is to split your inputs ("coor_entries") and they are aligned with equal number of labels stored at labels... I guess in substituition of "np.array(images_decreased)"

#

But without the complete context is not trivial to say

#

If that split is the one from sklearn it takes two aligned sets X (inputs) and Y (labels) and splits them in X_temp, X_test and y_temp, y_test in a portion of test_size=.1 (thats 9 temp per each 1 test). With stratify=labels will keep the proportion of labels on both splits.

icy gorge
#

@warm hawk :
After going over the solution from https://www.kaggle.com/code/abhinavsuri/anatomy-image-visualization-overview-rsna-raids . It gives two variables data "coor_entries" and "df_coor".

coor_entries = df_coor[df_coor['study_id'] == int(patient['study_id'])]

and

df_coor = pd.read_csv('/kaggle/input/rsna-2024-lumbar-spine-degenerative-classification/train_label_coordinates.csv')

I am trying to split the data using train_test_split() function . I have passed both "coor_entries" and "df_coor" into the function and trying to split the data and get X_train, X_test, y_train, y_test. When i do that i got "ValueError"

X_train, X_test, y_train, y_test = train_test_split(np.array(coor_entries), df_coor , test_size=0.1, random_state=42,stratify=df_coor) # Complete the code to split the data with test_size as 0.1

Error:
ValueError: Found input variables with inconsistent numbers of samples: [25, 48692]

Question: How can i split the train data into X_train, X_test, y_train, y_test ? SO, once this is completed, I can do the model building as in attachment

warm hawk
#

but coor_entries is one unic sample (the one sampled with study_id) is not a set

#

and it has only the pixels from some slices that have been diagnosticated. This notebook is to familiarize with data and the problem. Read it and understand it. Is not a solution.

#

Here is a notebook that shows how to train a CNN and use it to inference

icy gorge
#

Thank you. I will have a look

near mason
#

Hello, I am new to kaggle. I have a quick silly question. I see only one study_id in test_series_descriptions.csv. Does it mean the only test result for the one study_id should be in submission.csv ?

warm hawk
#

that's a local example as reference to what espect

#

the code will run on hidden test with similar structure but different number of study_id and corresponding series_id

near mason
#

Thank you for your answer. So that sounds like what's in submission.csv is not important for the score. Is it correct?

near mason
#

Still do we need to submit the submission.csv for the score even though it has testing result for the only one study_id?

warm hawk
#

We need to submit a general code that reads a general test_series_descriptions.csv (that will contain different ids than the one you see right now) and with that generate a submission.csv with the predictions of the corresponding ids.

#

You can cancel the local execution as soon scoring triggers.

near mason
#

Ah now I understand. Thank you so much!

solemn dock
#

Anyone looking for a team for RSNA-2024??

orchid sun
#

I have a question about the data format

#

Are you given an image and the condition + level to predict the probabilities?

warm hawk
#

Are they giving a set of study_id with his corresponding sets of series_id that will refer to different sets of images (slices of a chest body) on different MRI formats (T1/T2,sagittal/axial). With them we have to predict the probailities of 3 classes (Normal/Moderate/Severe) for all of 5 conditions(left and right foraminal, left and right subarticular and spinal) and all vertebrate levels(L1L2,L2L3,L3L4,L4L5 and L5S1).

solemn dock
#

I think both of these are answered on the Data page

tall wigeon
#

I have a question regarding the severity levels. Are the severity dependent on the degree of 'pinching' of the nerves or is it a combination of factors such as the state of the disc, muscle atrophy, degree of bone deformation, etc.

warm hawk
timid onyx
#

I see no distinguishment between the levels, only the conditions. We dont get told what the levels are so an approach could be to train a model to predict the levels for thee test as part of the inference pipeline, but like i said, i dont see any clear difference between the levels, only the conditions. I tried training a model to predict the levels but as i expected, it performed poorly. What are your thoughts? I appreciate any feedback, thanks.

warm hawk
#

There is a few codes that do that already, check it.

#

Just to be clear, in all sagittal images you'll find each of the 5 levels. You'll need at least the pixel locations to train them.

timid onyx
#

Please elaborate. So youre saying that the best option would be to train a model to predict the levels and then use that model to find the levels for each image in the test set? These two imagess have two different filepaths. The onee on the left has level L1/L2 and the image on teh right has level L2/L3 but they look identical? We see a cleear difference in the images ranging from the different conditions, but not levels. And when you say "You'll need at least the pixel locations to train them." What data are you reffering to? The X and Y coordinates? The X and Y coordinates are not present in the test set either

warm hawk
#

No, both images have all levels in there, they've just beign used to label different levels. But obviously, every sagittal slice were you see the vertebral columns have all 5 levels on it. Even they haven't been annotated.

timid onyx
#

okay so you would recommend to train a model to find the conditions and the severity but not levels

timid onyx
timid onyx
upbeat raven
#

The notebook also provides code to crop images of every disc, which you can use to feed a severity classification model

lament widget
lament widget
glossy rock
pulsar field
#

IMO the best approach is a per vertebra segmentation, but those models do not perform well across modalities either so you need lumbar T1 or T2 labels specifically

#

As opposed to using something trained on a different dataset

upbeat raven
#

I mean, this is just a baseline approach. There's a lot to improve, but now I'm focusing to achieve a complete solution before diving deep on this topic

sonic parcel
#

how was your experience with it?

upbeat raven
waxen parcel
#

I built my own model to detect the levels for saggital t1. I am focused on a center point of a region of interest, and will use some other tools I developed from there. It’s working great on the test images however the model is pretty big. I am doing everything offline for testing. Anyone know the size limitations on model uploads? The model to just predict lumbar levels in saggital t1 images is about 500mb. I imagine the t2, and axial will be similar in size.

#

the sad part is how easy this part is for humans, and hard for the computers

warm hawk
#

I've been uploading 5 fold 650 mb, you're fine

#

But I've upload them as datasets, I suppose model tool has similar storage limitations.

#

The hard part is when L6 appears or L5 vanish, even for humans.

pulsar field
#
PubMed Central (PMC)

The sacrum, by virtue of its anatomic location plays a key role in providing stability and strength to the pelvis. Presence of intervertebral discs in sacrum and coccyx is rare. Knowledge of its variations is of utmost importance to surgeons and radiologists. ...

#

So don’t spend too much time on keypoint detection as your stage 1. You’re far better off with vertebral (not disc) segmentation

warm hawk
#

Yes and no, I've just realized that I've been training the predictor with the crops from my locator (that obviously fails more times than I'd like) and that would be much better train it with the provided crops (the real ones). I've obtained worst score, what i interpret as ok the predictor is better, but makes predictions over the wrong crops. So if you're in a 2 stage model, don't forget locator.

waxen parcel
#

well at this point at least i am trapped, i am creating ROIs centered on the x,y locations in the training data, then doing feature extraction then basic ML tasks on these locations, then comparing the locations in the test images to these features. this has worked for me on similar data sets, so we will see how it works here, otherwise the custom masks may be a far better option and i will have to adjust the locator functions and models

pulsar field
#

But why go through all that trouble when you can just run Canny on mid slices and get perfectly good segmentation masks with maybe just a little manual labeling effort?

timid onyx
#

i still have some computes left on my google colab. I want to team up and learn

pulsar field
#

Retrieve hidden test images as well
Hmm

warm hawk
#

I suppose he means not literally. "test_series_descriptions.csv" will contain all necessary info you'll need on hidden. If your code can retrieve all local test images from local csv, will also do it from hidden. You can use train_series_descriptions.csv if you want to test a bigger file.

oblique wharf
#

can anyone share other datasets for this competition?

ebon nymph
#

Hi, everybody.
I'm finding a teammate leading to me or learning from each other.
Actually, I'm new to kaggle competition so I want to collaborate and assist in this competition.
Even I'm new to kaggle, I think I know about LLM well.
This is my profile.
https://www.kaggle.com/jasperjack

timid onyx
# ebon nymph Hi, everybody. I'm finding a teammate leading to me or learning from each other....

I have developed a unique model for each unique combination of condition and levels. I encounter the Cuda out-of-memory error while trying to allocate them for inference. I would be open to collaborating if you have any insights or can assist me with a mutual learning experience. My kaggle account is: https://www.kaggle.com/josephmargaryan

warm hawk
#

reduce batch size and or load and inference model by model

oblique wharf
#

I wanna know, should the probabilities add up to 1?

warm hawk
#

yes

pulsar field
#

Biggest takeaways are LogisticCumulativeLink for ordinal regression i.e learning a continuous severity feature from ordinal labels

#

And the affine transforms to project all slices from all views into the same coordinate space

azure acorn
#

This is my first time dealing with a complex dataset like this. It appears to require a lot of preprocessing. Can someone point me in the right direction to learn how to approach this? I.E., I know what to do when the dataset is simply a set of fairly homogenous images, but I'm overwhelmed by this dataset (do I need to use a NN to identify zones of interest, is there some way to composite the stack of 2d images into a 3d entity even though different series have different amounts of images in them, etc. etc.). Any help is greatly appreaciated!

south hawk
#

Hi, can anyone please explain the followings:
"This competition uses a hidden test. When your submitted notebook is scored, the actual test data (including a full length sample submission) will be made available to your notebook."

Will there be a submission file similar to the file given?
Will the test data be arranged in the same format that is given?
Will there be a test_series_description.csv file ?
Should we make a prediction function which has test_folder, test_series_description and submision file as input and final submission file as output ?

Please help!!!

warm hawk
#

the environment will be the same. Just test folders, test_series_descriptions.csv and sample_submission.csv will change according to hidden samples. If you have submission errors don't waste daily submissions to debug. Run your code over a train chunk instead. There is many submissions and debug submissions in code that you can check.

#

"Should we make a prediction function which has test_folder, test_series_description and submision file as input and final submission file as output ?" That looks reasonable.

south hawk
#

thank you very much 🙂

south hawk
#

Hi, I was scratching my head about the model to detect the type of degeneration shown in the submission file. If the whole submission file is available in this format, I think it will be easier to create a model to predict the xy values based on the l1l2 or other l values given in the submission file. We don't have to create a model to predict the type of degeneration also. Am I in the correct path? Please help!!! I am really struggling here.

warm hawk
#

at hidden you will have some study_id's with some series_id's associated. For each study_id you'll have to predict probability for all 3 severity ('Normal/Mild','Moderate' and 'Severe'), for all 5 patologyes ('Left and Right Foraminal', 'Left and Right Subarticular' and 'Spinal') and for all five vertebrate levels ('L1L2','L2L3','L3L4','L4L5' and 'L5S1'). Not sure about what are you exactly asking for.

#

The challenge is not as simple, not as complicated as it can looks like

#

Take some time to check the available codes and discussions and you'll be fine

south hawk
#

Hi, thank you. I am exploring all the discussions and I think I get it now. Again, thank you very much you are a kind person 🙂

pulsar field
pulsar field
covert hazel
#

can anyone help me ? whenever i went for submit it show inference error and got reject

pulsar field
#

Make sure all of the study ids and conditions are getting populated. If you are doing series based inference, some study ids do not have all of the series

pulsar field
#

Look at the sample submission. You want all of those rows, for all of the studies

odd hinge
#

Hello, I have a question regarding submission.csv file. Are we going to send a submission.csv file for each image inside this folder or we should run it for all images existed in this folder and get predictions for each column then print submission.csv?

#

this is how we are supposed to upload. it says that for each row_id we should predict. but this row_id points to the test folder that involves 3 folders. each folder involves many images. does it mean that we should calculate the prediction for each image seprately and upload submission.csv or we must calculate the probability of each rows for all images existed in the test file?

warm hawk
#

Hello again Sima, we are predicting exactly 25 rows per case, that is per each study_id, no matter how many series_id inside that study_id folder, no matter how many images inside each series_id folder

#

I suggest to read carefully the first pinned code on Kaggle to understand how manage this heterogeneous data

#

I'll elavorate a bit more. Let's say your model takes a single 2D image of a Sagittal T1 series folder and predicts foraminal severity for all levels visible on it. And even more, you did this for all images on the folder. You still need to determine wich side (left or right) are you predicting. For this task, the suggested notebook doesnt help much. I suggest instead to check "DICOM" on code section to see some of most popular approaches.

warm hawk
#

Or abboard it as a single stage procedure. There is also some exemples at code section, one some lines above.

odd hinge
#

hello, My submission gets error scoring. does anyone know the reason?

pulsar field
#

Make sure your submission.csv has all 25 conditions x levels for each of the study ids

#

Also make sure you are saving without autoindex i.e
my_results.to_csv("/kaggle/working/submission.csv", index=False)

odd hinge
#

I did it but it gave me the same error. Submission scoring error

pulsar field
#

Aside from that, can be per row probabilities not summing to 1, incorrect column headers or incorrect row ids. Most likely some study ids somehow missing though

odd hinge
#

Thank you for your help. However, I have another question. I submitted my notebook to this competition and encountered an error. After saving and running my notebook, I clicked on the "Submit" button on the right side of the notebook. When I clicked it, I selected my notebook and submitted it. In the end, I generated a submission.csv file in my notebook, which contains my results.

My question is: Is this the correct way to submit the results, or is there another way where we can upload the submission.csv file separately?

warm hawk
#

We can't submit results separately because those you see are the single local case as an example. The true test cases remain hidden and are predicted with your code in another environment.

odd hinge
odd hinge
#

I have another question, are we allowed to train models on our local pc and then use them in our notebook?

warm hawk
mint sparrow
#

Because I have been having issue with computation and all it's taking a lot of time so I thought to use cuda but this might cause problems when the code is running on new environment

warm hawk
#

depends on the environment chosen for the notebook

mint sparrow
#

Ok

pulsar field
#

So, is the final submission deadline Oct 8th 00:00 GMT, or 23:59 GMT

warm hawk
#

23:59? I'm not sure

#

work as 00:00, if more time then great

clear spindle
#

any idea what might cause this? first time doing a competition. Worked on it for several days and a bit frustrating to get stuck on it towards the end.
The notebook works fine on the one test case we are given.