#What is the derivative of the softmax function with respective to the logits?

6 messages · Page 1 of 1 (latest)

steady lark
#

Hi, I’m still fairly new to Machine Learning and I’ve learned the foundations of ML as well as the foundations of neural networks. As practice, I am currently implementing my own Neural Network library like tensorflow from scratch of course not that advanced. Currently, I am trying to figure out the derivative of the activation value outputted by the softmax function with respect to the logits. I have tried to derive it but when checking it over with online derivatives it does not seem correct. Just wanted to know whether my solution was correct or how to get to the actual solution or my mistake. Thanks🙏.

#

Also on a side note, how should I store the activation function? For example I have a neuron class and a layer class. Tbh don’t see the point in a neuron class but having it anyway if there is any situation where individual neurons need to be modified. Should I put the activation function on the neuron level meaning it is a member of the neuron class and the output of the neuron is put through the activation function given to that neuron. Or on the layer level, meaning the outputs of all the neurons are computed and then put through an activation function that is assigned to the layer.

#

In terms of putting the activation function on the layer level, it means each neurons in a layer cannot be assigned to different activation functions. At my knowledge don’t know why someone would do it but it might be helpful.

noble sedge
#

Maybe some activation functions can be coded as a neuron but not the softmax

steady lark
#

Alright thank you so much I’ll check it out