#General questions about ML Classification

4 messages · Page 1 of 1 (latest)

kind arrow
#

Hello everyone! I have asked myself some questions about the classification aspect and would appreciate some help:

Let's say we have a labeled dataset with some features and two classes. The two classes have no real (significant) difference between them though! My first question now is, if ML algorithms (e.g. NNs) would still be able to "detect a difference", i.e. perform the classification task with sufficient accuracy, even though conceptually/logically, it shouldn't really be possible? In my knowledge, NNs can be seen as some sort of optimization problem with regards to the cost function, so, would it be possible to nevertheless just optimize it fully, getting a good accuracy, even though it will, in reality, make no sense? I hope this is understandable haha

My second question concerns those accuracy scores. Can we expect them to be lower on such a nonsense classification, essentially showing us that this is not going to work, since there just isn't enough difference among the data to do proper classification, or can it still end up high enough, because minimizing a cost function can always be pushed further, giving good scores?

My last question is about what ML can tell us in general about the data at hand. Now, independent of whether or not the data realistically is different or not (allows for proper classification or not), IF we see our ML algorithm come up with good classification performance and a high accuracy, does this allow us to conclude that the data of the two classes indeed has differences between them? So, if I have two classes, healthy and sick, and features like heart rate, if the algorithm is able to run classification with very good accuracy, can we conclude by this alone, that healthy and sick people show differences in their heart rate?
(I know that this would be done otherwise, e.g. t-Test, but I am just curious about what ML alone can tell us, or what it cannot tell us, referring to its limitations in interpretation of results)

#

Disclaimer:

I am not formally educated in ML, so I hope these questions aren't too dumb and could be answered with an intro ML class lol. Thanks for any answers in advance tho!

keen lava
#

Sometimes models capture information/insight that are not readily apparent to the human eye. Some forms of data analysis depends on training models on the data and observing what they learned.

#

If the accuracy is good then there is a difference, you just can't see it without doing extensive data analysis.