#rsna-2023-abdominal-trauma-detection
1 messages · Page 1 of 1 (latest)
Hello Friends, I am looking to team-up with somebody for this competition. I am a data analytics graduate and would prefer to join anymedical domain individual.
Hi everyone, thanks for getting this channel kicked off! I see you've posted in the #👥┊looking-for-a-team channel, @vital axle. That's great, thank you! @bitter bone, you'll notice I've just posted a template in that channel if you'd like to post there as well!
Hello all - happy to be here. Am also looking to form a team or be part of a team
Hey if you are looking to form a team or be in a team we do have a channel where you can further your conversation - see https://discord.com/channels/1101210829807956100/1130572338182762657
I hope this helps
@neon vine good timing, we clicked enter at the same time 😄
Haha, touché! 😉
it's still a little difficult to solicit team members
Yeah, I understand. There's our "looking for a team" thread on our forums at: https://www.kaggle.com/competitions/rsna-2023-abdominal-trauma-detection/discussion/427220. We just launched this Discord a few days ago and expect activity to continue increasing quite a bit day over day, so I'd just give it a bit more time!
Detect and classify traumatic abdominal injuries
yup - appreciate the efforts
getting error while submission can anyone say procees of submission
Hi all. I’m just getting started on this competition. Thought I’d share my plan for a first model. I’m loading the entire 3D scan series, permuting it to get axial, sagittal, and coronal views and taking the middle slice of each. Then concatting them together into a 2D image. I still need to apply the normalization ideas people are talking about in the forum
I’m not expecting this model to perform very well, but it’s a start. Later I want to try using the segmentation data to grab a 3D crop of relevant organs, and then take middle slice of those and concat them.
anyone say about this
Seems like a solid idea. Is there a reason you concatenate them into a single 2D image instead of, say, store them in a (height, width, 4) array (i.e. store each perspective as a separate 2D array)?
Has anybody thought of how the 3D arrays (CT scans) could be properly normalized. The range of the pixel-values in each .dcm file is not the normal 0-255 range, so dividing by 255.0, as you usually do with images, does not seem to be a good normalization. Any idea?
@sinful niche - Can you elaborate on what error mesage you are reeiving?
Can you tell us what the error message says? It's hard to tell what the issue is without that knowledge. Maybe take a look at a sample submission in the 'code'' section of the competition to see what a working submission looks like
note book is running but while scoring it is getting failed
showing an error submission CSV NOT found
where we need to upload it and from where we get itt plz solve this problem
see error
see the issue bro
How are you submitting / uploading the file? Please see the submission section in this FAQ and let us know if it helps - https://www.kaggle.com/docs/competitions#notebooks-only-FAQ
Find challenges for every interest level
i have read it still not getting
Did you write "df.to_csv('submission.csv', index=False)" for your prediction datafram 'df'? See for example the notebook https://www.kaggle.com/code/jakebrusca/rsna23-weighted-mean-baseline for a working submission
help me bro plz
It says submission csv isn't found. Something is happening that is making a file with submission.csv not show up. Either you have too many other files in your working directory or your code is failing before it writes out your submission file.
On the private leaderboard it will run on different data that you don't have exposure to so you have to write your code to be defensive against new data, you can't have anything hard coded in terms of file paths or naming that might throw it off.
The general process is load in the data, run your model against this data, write the predictions out to fill in the sample_submission.csv and then write it back out as submission.csv
we need to load our note book in data later ???
Does your notebook generate the submission.csv file?
This would be an example (see the notebook i linked above):
The crucial part is (i) loading the submission file (first line) (ii) filling it with your predictions (second line) and (iii) submitting it by writing submission.to_csv('submission.csv', index=False)´
Here 'Injuries' is a list of the column names in the submission dataframe
To bring this back up: I went through the first 100 samples and recorded the largest and smallest pixel values in the pixel_array of the .dcm files and it seems that the pixel values range between -32768 and +32768, so one way to normalize the images would be to do (img + 32768)/(2*32768), so that all pixels (or at least most pixels) are in [0,1].
Did anybody do anything different? Any other ideas?
This is where "Windowing" comes in Felix.
There's a problem with PyDICOM and it doesn't handle VOI LUTs properly from this dataset.
PyDiCOM does not apply the WindowWidth and WindowCenter values properly inside its apply_voi_lut() and apply_windowing() functions.
The "window" we see is almost a "bone" window and is not optimal for soft tissue.
This is what a PyDicom export looks like ..
This is what it should look like ..
I exported that image with non-python software and it honors the windowing values.
not the same slice, but the concept remains.
Thanks! That's very helpful. I found this notebook (https://www.kaggle.com/code/redwankarimsony/ct-scans-dicom-files-windowing-explained?rvi=1) explaining a little more about windowing. It seems that a soft tissue window might be a good idea?
can someone let me know if this empty space in a windowed ct scan is expected? I'm using the default windowing values (400 and 50)
Why not sklearn minmax scaling?
That's air in the bowel
Because the minimum/maximum varies a lot between different CT scans and i want a uniform way to normalize the data. But thanks for the recommendation, i was not aware of that function.
That's why you should use sklearn MinMaxScaler. By writing one line of code it will be normalized.
Would that not require loading in all of the data at once into an array X?
the problem with that approach is that it's lossy. That is, we're going from 12-16 bit numbers down to 8 bit numbers.
some data is always lost during normalization. The key is to select which data is most important to the specific anatomy. That's why we "window" CTs.
Yes, need array. pydicom will take .dcm as array. sample for a single image
from sklearn.preprocessing import MinMaxScaler
import pydicom as dicom
scaler = MinMaxScaler()
img = dicom.dcmread('/kaggle/input/rsna-2023-abdominal-trauma-detection/train_images/10300/31085/10.dcm').pixel_array
print('max pixel before scalling =',img.max())
img = scaler.fit_transform(img)
print('max pixel after scalling =',img.max())
what is 'window' CT, is it alternative of normalization?
Yes, that would work, but the data is >400GB, so there's no way to load all the data into one array.
For that case, you need to input data batch wise and load image one by one with pydicom. 460GB is nothing for batch iterator
"windowing" or "leveling" CT's is what radiologists do to dial in the proper image contrast. Most CTs have special DICOM tags that specify up to three "windows" to best view the image in. I posted two images above. The top one uses 0-1 normalization, and the bottom one is "Windowed" to the default DICOM tag values of 500/50 .. which is a common abdomen window.
Think of a CT as a latent image. Because it has pixel values greater than the total amount of shades of gray our consumer-grade monitors can display, we must choose some to view and some to throw away. If we choose properly, we can maximize the contrast between similar tissue. If we just normalize 0-1, we lose valuable information in most cases.
Hello all - Have a lot of reading to do - but I wanted to check if I am on the right track. I'll break my comment into multiple sections so that the replies can be tracked against each section. From the data provided, it appears this is a classification task - possibly within 13 multi-classifications? There seem to be 5 higher level classifications - two binary (bowel health / injury, extravasation health / injury), 3 ternary ( kidney healthy / low / high, liver healthy / low / high, spleen healthy / low / high). What is notable (seems to me) is that probabilities do not seem to add to 1 across all classifications. They seem to add to 1 across the 5 higher level classifications.
I am next wondering what constitutes the training data (features and labels). It seems to me that entries in train.csv are all labels (excepting ofcourse for ids); the information contained in the .dcm and .nii files is likely all the training data. I am not entirely sure yet where the meta data, the aortic_hu (hounsfield units) fits in yet. As you can see, I have a long way to go 😦
Lastly - I am still trying to read through past competitions / notebooks to understand the nature of the .dcm and .nii files, what preprocessing is needed, what models should be used/reused, and is there one model or ultimately multiple models that will somehow work together
I should add that there are the image level labels that seems to indicate bowel injury or extravasation injury
Yes, there are multiple labels and for each patient you'll have to classify whether bowel/extravasation/kidney/liver/spleen is healthy or not. This means, in each binary category the probabilities will have to add up to 1 and in each ternary category the probabilities will have to add up to 1; but all of the values together will not add up to 1.
The most important training data (i think) are the CT scans in train_images. In each subfolder you will find a sequence of .dcm files. Each dcm file represents a 2D slice of a CT scan, which means all of the slices together represent a 3D volume of the patient that is being scanned.
hello, i'm preprocessing a slice as such:
# https://pydicom.github.io/pydicom/stable/old/working_with_pixel_data.html
def open_file(p):
f = dicom.read_file(p)
im = f.pixel_array
im = apply_modality_lut(im, f) # convert to Hounsfield units
im = apply_voi_lut(im , f) # windowing
im = (im - im.min()) / (im.max() - im.min()) # scale to [0,1]
return im * 255
is this correct? I do get some images that that "inverted" i.e. mostly white, can someone explain what causes that?
probably you missing this
if dicom.PhotometricInterpretation == "MONOCHROME1":
img = 1 - img
tried this but still few images which are mostly white, I checked their PhotometricInterpretation and they're MONOCHROME2 as well
Have you tried this
def standardize_pixel_array(dcm: pydicom.dataset.FileDataset) -> np.ndarray:
# Correct DICOM pixel_array if PixelRepresentation == 1.
pixel_array = dcm.pixel_array
if dcm.PixelRepresentation == 1:
bit_shift = dcm.BitsAllocated - dcm.BitsStored
dtype = pixel_array.dtype
pixel_array = (pixel_array << bit_shift).astype(dtype) >> bit_shift
return pixel_array
from this topic https://www.kaggle.com/competitions/rsna-2023-abdominal-trauma-detection/discussion/427217 ?
Detect and classify traumatic abdominal injuries
This might be it thank you!!!
can anyone say how to open or preprocee dicm files in train_images and how to seperate data and images in parquet file
@ anyone
Remove the apply_modality_lut() and apply_voi_lut() function calls. They don't work properly with most of the images in this dataset.
Ohh got it thank you
I feel like I saw this somewhere but can't remember where. Is there some metadata the tells you how many mm each "slice" represents? When I view them in coronal or sagittal orientation, some images look really squashed and some look really stretched.
SliceThickness for Z dim, PixelSpacing for X and Y dim
Thank you!
Don't use Slice Thickness to reformat images from plane-to-plane. Use ImagePositionPatient coordinates instead.
Slice Thickness does not always represent the distance between slices. In fact, Slice Thickness does not match in most cases.
Is it legal in this competition to add some labels by hand to the training data?
FYI to get an official answer you will need to ask in the forums, competition staff are not watching this channel.
why they have provided segmentation data ?? as it is a classification problem there is no need of segmentations
can anyone comment on this
https://www.kaggle.com/competitions/rsna-2023-abdominal-trauma-detection/discussion/428538
"In order to provide more anatomical context to where the injuries will will be present, we have provided voxelwise segmentations on a subset of 206 training cases that are enriched for the presence of significant injuries."
Detect and classify traumatic abdominal injuries
Wondering if anyone has any hints on dealing with weak labels? I got an 2.5D RGB representation that I think ought to work okay, but when I tried to train ViT on it, it seemed to just quickly learn to always say "no injury" for all categories. I'm guessing that's because in the training data you have like 90+% "healthy" for each category.
What are some strategies for dealing with this? Use image augmentation to synthetically inflate the number of positive examples? Pre-train on some coarser categories like combine "low" and "high" injuries at first to get more positive examples, and then fine tune later with them separated back out?
@pale sonnet Yes, it's a very common problem when working with imbalanced dataset. You may try using different sampling strategy, so the healthy images and those with injures will appear more equally. In case of single target classification, you can use e.g. WeightedRandomSampler from PyTorch - https://pytorch.org/docs/stable/data.html . For multi target, as in case of this challenge I'm not sure how to solve it yet.
hi, I am new to this competition and I do not know how to work on such large datasets. Should I work on a subset of images or is there some way to process the dicom files into arrays and input the entire dataset in a neural network?
Am pretty much a novice muself but found https://www.kaggle.com/code/stpeteishii/segmentation-image-of-the-matched-id and https://www.kaggle.com/code/stpeteishii/abdominal-trauma-unet as potentially helpful starting points
I dont have access to these links :(
My bad - corrected the links ....
Thanks, ill check them out.
Perhaps this notebook (https://www.kaggle.com/code/alenic/dataset-size-reduction-400gb-to-7-5gb) can help you
Is anyone having trouble with the Public API for this competition? In google colab when I run: !kaggle competitions files rsna-2023-abdominal-trauma-detection
It seems like a bunch of the train_image patient ids are missing.
Certainly, 591. However, this outcome is not surprising.
Well u can use inbuilt functionalities of Python to work with large data sets and yes we usually divide our data for the purpose of training,testing and validating
Do you mind explaining why this isn't surprising? Sorry I'm not well versed in Kaggle competitions.
Hi guys I am new to this challenge and kaggle. I am wondering what is the optimal way to deal with this big dataset. I have few questions:
Is pydicom slow? Is it better to process the whole dataset into other formats? I want create 25D image for each subject, meaning I have to load the entire volume for coronal and saggital views.
I spilt the data into train and val then i created tensorflow pipeline for inputting the data but when i run this for training, kaggle throws an error and tells me to shift to google cloud notebook. This is probably becuase of my batch_size being 64 and kaggle does not have enough RAM to accomodate that. My doubt was that people who are intensively working on this, are they using google cloud??
Hi , how to handle the exteremely large dataset? What is the best way to deal with this?
google cloud's VMs are very confusing. And I tried every location and never got an actual gpu
always unavailable
Is it possible to write code for this competition without GPU
I dont think so. that would take days to train one epoch. If you're using any NN
I mean you can process the data with GPU, of course
but I dont think you can do the training with GPU
Yes, that is bowel gas.
should we use some sort of sampling techniques to work on a small amount of data to test the neural network and the results, and based on those results we could apply the same architechture to the entire dataset?
Kaggle API has "undocumented" restrictions, like file size, number and timeout limits, and requesting a list of 1.5 million files probably falls under these limits. One can achieve desired result from meta data files such as train.csv.
Hello guys. Have you guys encountered bottleneck using online gpus? I rented a A6000 with dataset downloaded on persisent storage(I tested with about 500MB/s). However, my model is significantly slower than running on Kaggle's P100. About 5 times.
I noticed model took most of the time loading data to gpu. but I don't think there is bottleneck in disk speed.
Nevermind guys. I increased num_worker in dataloader and it run very fast now
Thank you, I was not intending on downloading 1.5 million files this way, it's more for downloading all of the files of a single patient when I want to explore poor model performance
Hi, I am new to kaggle contests, and I have a question about what we should submit. Do we need to submit the code that process the data, trains the model and makes the prediction? and that should run in less that 9 hours? Or just, the trained model should run in less than 9 hours when running the predictions?
Yes the trained model should run in less than 9 hours when running the predictions.
You can train your model anywhere you want, in kaggle notebook, in your local computer, google colab etc.
Then upload the model weights to a kaggle dataset, and attach this dataset to your inference notebook.
The inference notebook must run in less than 9 hours and must have internet access disabled.
Thank you for the clarification!
Does anyone having some progress ? I'm facing a lot of issues about the datasets...
Hi, I just started with this competition and looked at the best scored notebook(0.66), can someone please explain what exactly is it doing?
The notebook creates constant probabilities for each test sample.
The probabilities for each class was calculated by calculating the mean label of each class.
You can read the notebook, the code is short and simple.
https://www.kaggle.com/code/mirenaborisova/rsna-0-66-lb
so essentially, each column in the test data was multiplied by a number such that the values of the column is the same as the average value of its corresponding column in the training set. am i correct?
Guys is it true that for testing extravastation you need you have two phases of scans?
If true, does the dataset always provide two scans for potential extravastation subjects?
Will resizing the data from 512×512 to 256×256 have any impact on model performance?
Am thinking of cropping instead. Any suggestions?
Someone's taken the time to put out a minified dataset here with 256x256 images in JPEG format. Benchmarking against the original dataset with a small model should tell if it's something worth pursuing
https://www.kaggle.com/datasets/alenic/rsna-2023-atd-reduced-256-5mm
The average probability of each class is multiplied by a factor so that the log loss is the lowest. Just calculating an average probability doesn’t give the minimum log loss because of the weights on each class (check the evaluation section of this competition’s description)
I have the same question, does anyone know?
Yes, I recreated the thing just 5 mins ago :/ I understand it now. Thanks a lot for your help!
Hello guys, I have a problem with submitting my predictions. It always fails and gives me the following error: "Submission Scoring Error
Your notebook generated a submission file with incorrect format. Some examples causing this are: wrong number of rows or columns, empty values, an incorrect data type for a value, or invalid submission values from what is expected. See more debugging tips". I inspected the notebook and compared it to the sample_submission.csv and there is no difference in the shape/format.
Have you tried submitting one of the public ones that already works? I submitted this one and it worked for me: https://www.kaggle.com/code/cjrrobotpsychologist/rsna23-weighted-mean-baseline?scriptVersionId=140594886 but none of that is my work. It's just a format that I know will work once I get my model trained.
ty very much for your time gonna look into that soon
All the files are in .nii or .dcm format, wich causes a major problem...the Pytorch simply can't read.
How to fix it ?
Read the .dcm file using e.g. pydicom. The data contains an attribute named 'pixel_array', which will give you a standard 2D array you can use.
Hey, I need a quick summary of how to submit. I've seen several notebooks, been following discussions, but the available notebooks load the submission file and modify it using the weights of the sample results. When I create my DataFrame based on predictions from the model and save it exactly as the submission looks, I get an error during scoring.
Hi guys, this is my first Kaggle competition. I have a quick and easy question. Do I need to fill the submission.csv file with the results of my model running in training images? Or should I use another data source to evaluate and send the submission.csv file? Should I use the data that is in testing folder?
Can i get some help on my notebook? I'M having troulbe in my private submission. https://www.kaggle.com/competitions/rsna-2023-abdominal-trauma-detection/discussion/437000
Detect and classify traumatic abdominal injuries
Thnks bro
Hello everyone. According to train_series_meta.csv, there are incomplete organs in the training set. Does anybody know that whether test set hasor not has incomplete organs?
The data is so large, how do we manage to download them to local computer for training.?
This is what I did:
I clicked on the download button on kaggle. It took about 28 hours for the download to complete. The data is archived, so the total size is around 330GB.
If you have a low internet speed or your network is not stable, you can download the dataset in chunks.
https://www.kaggle.com/competitions/rsna-2023-abdominal-trauma-detection/discussion/427427
This post would be helpful. Theo Viel has converted the dicom files to png files, and uploaded them as kaggle datasets. There are 8 parts, 15GB each.
Detect and classify traumatic abdominal injuries
Thank you
Hi, I am new in kaggle. I submitted my submission.csv file and the scoring process is taking more than 3 hours. Is it ok? for just 3 test data images? I read the log and it takes some minutes to run, but does the scoring step take hours?
The real test dataset contains more than 3 images. Kaggle only gives you access to 3 test images to prevent you from tailoring the solution to the test dataset. And yes, it can happen that submission takes ~3 hours. There's probably a time limit which you will find if you take a look at the rules of the competition, but 3 hours should be fine.
Hello @everyone, I have create datasets with 3 different types of Image! Please have a look and upvote if you find it useful! Thank you all!
https://www.kaggle.com/competitions/rsna-2023-abdominal-trauma-detection/discussion/438690
Detect and classify traumatic abdominal injuries
Hey, I am blocked when generating the csv file. I use the
to_csvfunction from pandas dataframe, I removed the index before saving the file. Also I am using thesample_submission.csvand adding all the rows at the end. But I am still getting Submission Scoring Error. I checked the file encoding, and don't know what else to do. I tried this way https://www.kaggle.com/competitions/icr-identify-age-related-conditions/discussion/412946 and nothing. Do you have experienced the same issue? Thanks for your help!
Use Machine Learning to detect conditions with measurements of anonymous characteristics
Hello, would be happy to hear your thoughts, thank you in advance!
There's a successful participator who recommends to use 3DCNN. Do you think it's possible to train those models only having the machine kaggle offers for free? Thank you in advance for your answer.
Im talkin about this person and their model
https://www.kaggle.com/code/hengck23/lb0-55-2-5d-3d-sample-model
from kaggle_helper import *
from kaggle_metric import *
while importing it is showing no kaggle_helper module
can anyone help
If you ran into this while executing the notebook @spice spruce shared, I believe kaggle_helper.py and kaggle_metrics.py are separate files created by the author of the notebook.
You can checkout this notebook to see how that can be done:
https://www.kaggle.com/code/rtatman/import-functions-from-kaggle-script
how we will do for helper and metrics am not getting
DICOM file format contains more information than just image data. It also contains metadata.
I created a dataset that contains all the DICOM metadata for the competition.
Hope it helps someone 🙂
https://www.kaggle.com/datasets/tobetek/rsna-atd-2023-dicom-metadata/code
Help me,please. Failed submission from notebook with CNN in pytorch.do I need to process the dataframe in some way before submission?
from slice_model import Net as SliceNet any idea regarding this model
where we find it
How to fix it:dicomsdl-0.109.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl is not a supported wheel on this platform?
you can refer here https://www.kaggle.com/code/satheeshbhukya1/rsna-read-dicom-set/notebook
how to fix error:from kaggle_helper import * No module named 'kaggle_helper'?
same issue with me ..can anyone help
@everyone
Now
Webinar, TRAUMA Abdominal Detection Competitions
Please connect:
https://us02web.zoom.us/j/82192733489?pwd=VThkMTZRZkx0aUREdU1nRlZxMElXZz09
Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom Rooms is the original software-based conference room solution used around the world in board, conference, huddle, and training rooms, as well as ex...
Hi everyone! I'm stuck on that scoring error issue. I made sure that every label is distributed according to its type like binary ones sum to 1 where triple categories sum to 1. I have no idea where I'm doing wrong. Can you please take a look to my notebook and advise me what I should do? Thank you. https://www.kaggle.com/code/noelyoda/fork-of-dr-oi-model-w-h5/settings?scriptVersionId=145408457
Has anyone tried running the Keras Infer Kernel that was pinned in the competition? https://www.kaggle.com/code/aritrag/kerascv-starter-notebook-infer I have not had any success running this and keep getting the attached error message. This is my first competition and I'm really struggling with these technical details 😕 If anyone has thoughts to share, I'd greatly appreciate it!
The error is coming from this line
model = keras.models.load_model(INPUT_MODEL_PATH)
which should instead be
model = keras.models.load_model(MODEL_PATH)
Tricky variable names. Keras wants to update the model as it learns, but can't write to the kaggle/input/ folder as it is Read-only, as shown.