#Mathematical formulation of predicted probabilities into a Score

11 messages · Page 1 of 1 (latest)

bright junco
#

Im currently working on a machine learning model which can predict the presence of a disease given the sample. I have the model output the probabilities associated with the classes (positive, negative , suspicious). I want to create sort of a score which has the following properties:

  1. If the score was above a particular threshold (lets say t1), the interpretation would be that the predicted class is positive.
  2. If the score was between two thresholds (t1 and t2 where t2<t1), the interpretation would be that the predicted class is suspicious.
  3. If the score was below the threshold t2, then the interpretation is that the predicted class is negative.
  4. The score should be between 0 to 1 preferably.
    I'm also not sure how to figure out the threshold values t1 and t2 for this case.
    Also, from what i have searched on the internet, there is an option to give weights to the different cases thereby reducing type 1 errors. Is this advisable, since I feel like it is manual intervention instead of improving the models perfomance directly.
    Im a newbie in machine learning so please be patient in case my doubt is too basic.
    Thanks a lot >.<
gilded wyvern
#

t1 and t2 are just subjective judgements (perhaps not really even required). i am unsure what u mean "weights to diffferent cases"

tepid wing
#

I assume you have a dataset with classes [positive, negative, suspicious]. The choice of a machine learning model plays a crucial role. For instance, if you go for K-Nearest Neighbors (KNN), it depends on feature values mapped in an n-dimensional feature space. There's no concept of t1 or t2 thresholds; instead, it relies on the nearest neighbors for classification.

On the other hand, if you opt for a Neural Network (NN) with softmax output, the model provides probabilities for each class. In such scenarios, weight configuration estimates the probabilities of each class, and you typically choose the label with the maximum probability.

However, if you're determined to stick to a specific approach, you could use feature nodes mapped to a single neuron with sigmoid activation. Record the min and max of each class, although this doesn't guarantee the overlap of records cuz, , in most real-world scenarios, there are outliers in abundance.

gilded wyvern
#

i wouldnt really even do that cuz u can just.. use a single logit and output NN's confidence scores. then decide whatever u would like t1/t2 to be. better yet calibrate it properly and u can report more accurate confidence scores and just use that

tepid wing
gilded wyvern
#

There is also probit if u want to process ordinal variables

#

Tho idk if it's required here and idk if pytorch has that

#

Idk if even sklearn has it

bright junco