#Transformer

7 messages · Page 1 of 1 (latest)

spiral marsh
#

And I'll create a new thread just incase, because I don't want it to obstruct other conversations.

hollow mauve
#

hi

#

wass up

spiral marsh
# hollow mauve wass up

Hi! So I was wondering, does the "Masked Multi-Head Attention" block just mean that it performs multi-head attention only on the tokens that precede the token that is currently being produced? Hopefully I explained that alright

#

I guess, I'm wondering exactly how the masked multi-head attention and regular multi-head attention blocks differ.

#

Also, does "layer normalization" mean that the individual signals of the nodes in a layer are scaled based on some sort of "total" measure of the strength of the signals throughout the layer (to keep them from getting too reduced / large)?