CNN not good at testing new data | Learn AI Together | Page 1

atomic wedge Dec 1, 2022, 5:04 AM

#

why is my cnn model performs good at training and validation, but performs very bad at testing new data?

what's my dataset?
Person doing a sleeping position: Supine, Left Lateral, and Right Lateral. The background is of the image is black (to prevent complexity).

Does my model overfit?
I don't think so. please refer to the images (model accuracy and model loss)

Did I augment my data?
yes

What's my training and testing data?
I have approx. 4.5k images. I used ImageDataGenerator for for augmenting, shuffling, and separating into training and validation.

obsidian canyon Dec 1, 2022, 6:46 PM

#

What you'd want to see is how different your test set is from your train+val set. It's actually possible to overfit on your train+val set when test set or real world data is not representative

atomic wedge Dec 2, 2022, 5:16 AM

#

the test set is not very different on my train+val data. i have black background on both all dataset. not really sure why the model is not picking up the patterns on test set

obsidian canyon Dec 2, 2022, 10:25 AM

#

What does the classification report look like?

atomic wedge Dec 2, 2022, 2:06 PM

#

Oh, about that I forgot to save the output since I'm constantly tinkering with the model

obsidian canyon Dec 2, 2022, 2:08 PM

#

atomic wedge Oh, about that I forgot to save the output since I'm constantly tinkering with t...

Debugging is a lot of analysis. Having some information on the metrics and parameters would be really helpful

atomic wedge Dec 2, 2022, 2:10 PM

#

#

this is my latest classification report

#

this is the result from the training:

#

i know it overfits, but that's the best result so far based on the classification report

#

models evaluation based on testing set is [0.705134391784668, 0.7232142686843872]

obsidian canyon Dec 2, 2022, 9:25 PM

#

When you say your train and test set are not much different, what's the distribution of each class? Do you have imbalacned data?

solar flare Dec 2, 2022, 9:48 PM

#

atomic wedge the test set is not very different on my train+val data. i have black background...

one common problem in CV is leaking data between the different datasets. Even if it is not exactly the same if it is the same person in the same bed in a slightly different position present in both train+validation but absent in the test data (or worse, production!) there would be serious differences in performance.

#

for a little experiment you could check this yourself but as you scale up this is a great example of why tracking data provenance is so important.

atomic wedge Dec 2, 2022, 10:03 PM

#

obsidian canyon When you say your train and test set are not much different, what's the distribu...

training set are all equal for all classes

atomic wedge Dec 2, 2022, 10:09 PM

#

solar flare one common problem in CV is leaking data between the different datasets. Even if...

so it's better to have (if not all) most of the people doing all the position in the train and val data?

not sure if i understand your suggestion correctly

solar flare Dec 2, 2022, 10:11 PM

#

It’s like if you have multiple photos of the same person (especially frame from a video, for example) they should all be in the same set

#

Or else there is leakage the model can take advantage of and the metrics will not represent unseen data

atomic wedge Dec 2, 2022, 10:14 PM

#

solar flare It’s like if you have multiple photos of the same person (especially frame from ...

I do have that on my train and val set. The testing set is images from other people that I haven't train on doing sleeping positions.

solar flare Dec 2, 2022, 10:18 PM

#

atomic wedge I do have that on my train and val set. The testing set is images from other peo...

That is probably the cause then!

atomic wedge Dec 2, 2022, 10:19 PM

#

but im confused. isn't that the cnn's job to identify the position kf unseen data?

solar flare Dec 2, 2022, 10:20 PM

#

Well yeah but if you have frames from the same video or something else very similar to the train set in your validation set, that metric will not reflect the accuracy on unseen data

#

It’s a lot bigger problem if you accidentally had this leak into your test set and then went out to deploy the model not realizing how poorly it will generalize

atomic wedge Dec 2, 2022, 10:21 PM

#

hmmm got it

#

but I tried doing training the model using the training set and validation using unseen data. will this method gives the model the ability to generalize better for more unseen data?

solar flare Dec 2, 2022, 11:40 PM

#

It won’t improve your model as much as improve your analysis of the model.

#

Again, since it at least isn’t leaking into the test data you might have a better idea still

atomic wedge Dec 3, 2022, 2:07 PM

#

hello, i have a last question. I have trained the model earlier and it's by far the best model I have. the test data received 60% loss and 81% accuracy (based on model.evaluate). is this an acceptable result to continue with more real data?

solar flare Dec 3, 2022, 2:55 PM

#

This is where you need domain knowledge/context to answer. Do you have any similar model to compare it to? Do you know how they performed with a similar amount of data?

It is hard to say for sure, but since you are getting mid-to-high 90%s on training the model is probably at least somewhat sound so yes I think the best way to close the gap for your test data could be gathering more training data. If adding more data does not help then you should reasses

atomic wedge Dec 3, 2022, 3:12 PM

#

maybe i just try to more different models using this base model with the current best performance. then, compare all models using the test data.

#

thanks!

solar flare Dec 3, 2022, 3:39 PM

#

atomic wedge maybe i just try to more different models using this base model with the current...

yep! it's usually a good idea to switch off between working on the data and working on the model and constantly monitoring your baseline metrics if labeling/generating more data is something you can/will be doing.

#CNN not good at testing new data