#ubc-ocean
1 messages · Page 1 of 1 (latest)
@languid swan s I have the same problem😅
Ok, thanks for getting the chat started. I tried both gpu and cpu inference and then submitted. I tried different batch sizes. Finally, one submission went through: cpu inference, batch size of one but it took like 5+ hours. Everything else failed. I'm sure there is a faster way to do it.
Oh, I havn't tried cpu version. Thank you for sharing!
I think in the competition page it mentioned that some test set images were too big and they were looking into it
Not sure what the solution is gonna end up being
You all are using whole images or taking patches of them or something?
Both of them, I tried.
But, threw OOM error.
Including when using CPU for me😅

Hmm
I saw there are some library that lets you load patch without putting the whole image into memory gonna try that later
what library is that?
what worked for me is to downsizing the image (using transforms) to train and same transformation I applied to test dataset. So when processing the image size is 128x128 or 224x224 and batch size of 1/2. It takes 8+ hours for the notebook to finish scoring. Hopefully when they release solution next week for handling large images in test set, things could be improved. If you know of any other workaround, please let me know.
Also, i ran the notebook on cpu not gpu; even with the above workaround, gpu notebook failed everytime. I guess it was due to the same issue they mentioned that test images are too large to fit in gpu memory.
https://www.kaggle.com/code/pjmathematician/ucbo-256-tiles-loading/notebook dataset of small patches of the images
Hey guys. I just prepared my submission notebook and it ran succesfully, but in the submission tab I always get the "Notebook Threw Exception" error. It's impossible to know where (or why) the error occured. Do some of you maybe know of any abnormalities in the test images?
Hi everyone. How do you do with those heavy images just to get started
I created a simple notebook that returns some patches of the original, huge image and saves them in the output folder. It runs for about 8h and then it is available as download.
Currently using 512 by 512, and it takes around 7 hours. I think there's a trade off between data size and model size
Have you tried setting batch size and number of workers to 1?
I could resolve it by using thumbnail.png pictures to WSI images in test dataset!
In my opinion, batch size shold be the same between train and test one.
And, workers depend on the CPU capacity, not GPU.
How do you train a tailed image, is it possible to train it with just a part of it? I think I need to train it with the whole thing, but I don't have a clue how to do that. Is there any code I can refer to?
Hi guys , I am facing that problem while I want to submit my notebook, any insights from you will very helpful 🙂
in your notebook option you need to turn off internet, for submission:
thank you so much @opal sedge
What do you mean by tailed image?
I got validation and test accuracy 1.00 , i trained model on CPU as my kaggle GPU and tpu resources exhausted and and i took 1 hrs 😔
Isn't it overfit?
Yes I'm getting a test accuracy of 0.85
But I don't know why I'm getting an error while submitting the scoring error also
Does it say in logs what error it is
1 hr cpu training for that is pretty good
You trained on image patches?
Maybe there’s a problem with how you convert solution set to patches
what image patches ? i trained them on thumbnails. notbook run fines it succesfully run but it shows error ,what could be problem in coding ? also submission file is in right format
That's weird
if you can't find an error message then idk how you'll debug
start with a dead simple solution that just guesses the same category for everything and build up from there until you find the part that breaks it
It says about some hidden dataset I think hidden dataset have larger size , and i trained model on thumbnail images , and also normalised on thumbnail images this might be issue I have to train on actual data
Yea I guess just double check your code that loads the images
Did you trained on thumbnail data or actual large file data ?
I’m using this right now
Haven’t actually figure out how to process the test set though lol
@wind cairn in your code to load and use the test set, you can try treating the train image folder as if it was the test set and see if it breaks. Maybe your code works fine on the sample test set because it’s only one image but breaks if it has to load multiple
But what u use for training trian_thumbnails or just train_image folder as even with data loader i try to run train_image folder with tpu it gives me memory full error
Oh I meant thumbnails since that’s what you are planning to use as your model input
So run your submission code but use train thumbnails instead of test thumbnails just to see if it crashes or not
Ok so I need to use train_ thumbnail for testing purposes to see crashing or not
Test image took 150gb of ram to load
Yea use thumbnails
Do i need to use train_image folder for training?
As even on tpu getting memory error with dataloader
I will tell u
I used train_thumbnail for training
Then i tested it on test image
Yea so just stick with using thumbnails
So test on test_thumbnails too
If u just use the whole images it’ll probably run out of memory
I tried to test it on train_image folder to predict images but no luck i tried with one by one images by looping no luck also i tried with tile by tile sliding on a image but no luck 😔
Yea it’ll probably run out of memory if u try to run it on the full images
So how it is possible to run it on hidden dataset? Are they using thumbnail hidden set or actual high quality image hidden set
The hidden set has both full images and thumbnails I think?
So if you load test_thumbnails folder and submit then in the hidden submission it will replace the single image there with whatever the hidden dataset thumbnails
I tried to predict with train_thumbails and no memory errors,
But when i submit a notebook notebook submission is running but scoring fails before that , what could be the problem?
Can u post the notebook it’s p hard to tell without error message
Yes sure
i dme u
https://www.kaggle.com/code/aritrag/kerascv-train-and-infer-on-thumbnails
Just posting it here is this help people to get started with KerasCV.
hello!
is there any benchmark available on how long it takes to read all the train/test images?
Hii everyone, I get the error of Notebook Out of Memory. Your notebook requested more memory (RAM) than is available when I submit my notebook. With some research, I have discovered the library pyvips that enables you to compress images.
I would like to know how to use pyvips to compress the train images so as to reduce memory usage.
Is kaggle’s discussion search broken?
there is a guide here, take a look,
https://www.kaggle.com/code/aliabbasi/ubc-eda-pyvips-in-offline-mode
Help me.
For some reason, code submission fails.
Hello, please help me.
The prediction submission fails with the error "Submission score error". The submit file is created with the image slice code, it filled all the image_id rows with labels and the output file "submission.csv", look at this.
Anyway I shared sending my notebook. Thanks for seeing me. https://www.kaggle.com/code/mdquilindo/ubc-submit-large-images
Hi everybody,
How do you guys handling long running time of submission notebook due to large images? Any recommendations or sample notebooks to look at ?
My submission get timeout after 12 hours running, apparently it doesn’t run fast enough
welcome, what is your leaderboard score?
Yeah I'm looking for
Just joined the group! Sorry if this was asked before, but is there a resized version of the dataset hosted somewhere that is lighter in size? I just cannot download about a TB of raw data unfortunately :/ (Thx!)
https://www.kaggle.com/datasets/aifahim/ubc-ocean-jpeg-compress-datasetgunes-approach has the full-sized images converted to JPEG for a download size of 17 GB.
but doesn't it lost quality ? apparantly there are tiny features which indicated which class belongs the image
Yes, there is some lost quality due to resizing and saving as a JPEG image at 80%. The reason to use the JPEG images is to be able to download the data and test out various ideas locally without having to have an Internet connection.
Gives 404
I just checked that link and it appears to be OK. The page should have a button to download a zip file which is stored on a Google service. I didn't create the dataset so if you still have problems accessing it, report the problem to Kaggle by creating a new Discussion topic in this competition.
Somehow now it worked, thanks.
going to try some weird technique, I came up with
Hello Everyone, I have had issues with submission for the previous three days. This is my first challenge in kaggle. I really need your help if anyone here can help
hey @orchid flame , what is the issue with the submission?
Even right now, I have just submitted and I get the error: Notebook Threw Exception. Very frustrating
Here is a screenshort
Hi Mukesh, I happen to be getting the same error, Did you solve yours.
Nope
Hi @orchid flame Here's the approach that proved successful for me:
Save Models, Upload, and Create a Kaggle Dataset:
Save your trained models.
Upload them to Kaggle and organize them to create a dataset.
Create a New Notebook for Inference in the Competition:
Develop a new notebook specifically tailored for performing inference during the competition.
Add Your Model Dataset as Input:
Include your model dataset as an input to the notebook for seamless integration.
Write Code to Load Models and Establish Your Inference Pipeline:
Develop code to efficiently load your pre-trained models.
Establish a robust inference pipeline that aligns with the competition's requirements.
Optimize Preprocessing Steps for Computational Efficiency:
Ensure that your preprocessing steps are optimized for efficiency.
Minimize the computational time required for preprocessing to enhance overall performance.
Switch to CPU and Submit:
Consider switching to CPU for final submissions to meet any competition constraints.
Validate and fine-tune your code on CPU to ensure compatibility before submitting your results.
Thank @heavy cloud . This sounds solid. Am going to start with it tomorrow. I will be reporting back the results
Hello Dears,
I want to submit UBC Ovarian Cancer Subtype notebook, but there is a problem with submission, it failed after about 30 seconds and showed this message "Notebook Threw Exception",
It's really short time and definitely doesn't relate to submission I think.
please help, just 2 days left
Hey @heavy cloud , I have just submitted my notebook after following what you told me. Though my score is not that good, I feel happy as this is my first competition. Thank you very much😆
@orchid flame Its always a first try. Mine didn't improve as well. Although my metrics for each model was good. the final pipeline for the test wasn't good as reflected on the LB. apprently it means the models are predicting only one set of class well whiles doing poorly on the others. I am waiting for the methods others used and i will learn from them.
Indeed, this is a great challenge all together . Bur thank you for the well articulated direction you gave me. I found it straight, clear and precise.
dear @heavy cloud i have uploaded models through Data in input section,
but when i submit in offline mode, it cant load the model
and when i want to save and run it fails and when the internet is on(i mean notebook's internet) it runs and saves successfully.
this is my first submission, please help
@indigo onyx . you have to create a dataset from the models you have trained and saved. First download your saved model onto your local machine , then in your kaggle notebook click on add data. and upload them as a dataset. then import it into the input section as you mentioned. next, load them from your new dataset from the input and it will work
thanks i will try
thanks a lot it worked for me
Congratulations to Team "bootstrap" for winning this competition.🏅
But what is the difference between public and private leaderboard?