#Positional Encoding Function in "Attention is All You Need"

19 messages · Page 1 of 1 (latest)

hushed hazel
#

Can someone explain to me what i is?

scenic delta
#

This is the function that computes a unique vector for each token according to its position in the sentence.
Without it, the model would have no way of knowing in what order are the tokens, so what tokens are next to what other tokens.

candid estuary
#

Long story short, it is the column in your positional encoding matrix that will go into the transformer model

hushed hazel
#

I know what positional encoding is

#

I just don't really know how i works

scenic delta
# hushed hazel I just don't really know how i works

it generates signals at different frequencies in a high dimensional space according to the position of the token in the sequence
the variable pos is your position, the i in the dimension index (well, it is transformed in the true dimension index as 2i and 2i+1)

hushed hazel
scenic delta
#

is that helping you?

hushed hazel
#

Yes, I think so. In that example, you would just input values of 0 to 499 as i to produce a matrix at the end?

scenic delta
#

for 1 token you would get a single vector, for a whole sequence you would get a matrix yes
the values of i from 0 to 499 would be repeated for each token

#

but since the pos is different for each token, you get a different vector for each token

hushed hazel
#

Alright, that makes sense.

#

Thank you.

candid estuary
#

@hushed hazel If you had read the article, you would have found your answer

hushed hazel
#

I already did, and it doesn't explain it as well.

cobalt urchin
#

😂