#Why is my simple backpropagation incorrectly calculating the gradients?

1 messages · Page 1 of 1 (latest)

slow urchin
#

I am trying to use julia to create a simple a readable backpropagation implementation ( mostly following https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/ ), but it is training really slowly ( take more steps to decrease the error, and bottoms out at a higher error ) compared to an identical reference implementation with zygote's AD to calculate gradients.

Here is the from scratch backprop: https://github.com/AlistairKeiller/MLjulia/blob/main/simple.jl
And here is the zygote implementation: https://github.com/AlistairKeiller/MLjulia/blob/main/zygote.jl

Here is a run from custom backprop:
error: 1.0007393
error: 0.24435842
error: 0.21032272
error: 0.19531001
error: 0.18624645
error: 0.18023509
error: 0.17640144
error: 0.17355171
error: 0.17135184
error: 0.16990495

Her is a run from zygote backprop:
error: 0.9004709
error: 0.16243693
error: 0.12893249
error: 0.11488452

What am I doing wrong?

The backpropagation algorithm is used in the classical feed-forward artificial neural network. It is the technique still used to train large deep learning networks. In this tutorial, you will discover how to implement the backpropagation algorithm for a neural network from scratch with Python. After completing this tutorial, you will know: How t...

GitHub

Contribute to AlistairKeiller/MLjulia development by creating an account on GitHub.

GitHub

Contribute to AlistairKeiller/MLjulia development by creating an account on GitHub.

topaz drum
#

initially correct me if im wrong since i dont know the inner workings of backpropagation yet

#

zygote does rand(out) / sqrt(in)

#

you do rand(out) / sqrt(out), so id assume youre initialising your layers differently than what zygote does

slow urchin
#

Ohhh, thank you so much! I thought I made them identical, but after seeing that mistake I looked fruther. And I found that I left a softmax for the last layer of the zygote version 🤦‍♂️

#

Now their gradient descents are performing comparably, so this issue is solved ✅