Transformer | OpenAI | Page 1

spiral marsh Dec 19, 2022, 11:40 PM

#

And I'll create a new thread just incase, because I don't want it to obstruct other conversations.

hollow mauve Dec 19, 2022, 11:53 PM

#

hi

#

wass up

spiral marsh Dec 20, 2022, 12:20 AM

#

hollow mauve wass up

Hi! So I was wondering, does the "Masked Multi-Head Attention" block just mean that it performs multi-head attention only on the tokens that precede the token that is currently being produced? Hopefully I explained that alright

#

I guess, I'm wondering exactly how the masked multi-head attention and regular multi-head attention blocks differ.

#

Also, does "layer normalization" mean that the individual signals of the nodes in a layer are scaled based on some sort of "total" measure of the strength of the signals throughout the layer (to keep them from getting too reduced / large)?

#Transformer