#Positional Encoding Function in "Attention is All You Need"
19 messages · Page 1 of 1 (latest)
This is the function that computes a unique vector for each token according to its position in the sentence.
Without it, the model would have no way of knowing in what order are the tokens, so what tokens are next to what other tokens.
Here is a good explanation on positional encoding: https://machinelearningmastery.com/a-gentle-introduction-to-positional-encoding-in-transformer-models-part-1/ and it answers what i is
Long story short, it is the column in your positional encoding matrix that will go into the transformer model
it generates signals at different frequencies in a high dimensional space according to the position of the token in the sequence
the variable pos is your position, the i in the dimension index (well, it is transformed in the true dimension index as 2i and 2i+1)
Yes, but how is i defined? What value do you use as i?
let's say your embedding has 1000 dimensions, then i will take values between 0 and 499, so the formula above will serve to create a 1000 dimensional vector
is that helping you?
Yes, I think so. In that example, you would just input values of 0 to 499 as i to produce a matrix at the end?
for 1 token you would get a single vector, for a whole sequence you would get a matrix yes
the values of i from 0 to 499 would be repeated for each token
but since the pos is different for each token, you get a different vector for each token
@hushed hazel If you had read the article, you would have found your answer
I already did, and it doesn't explain it as well.
The emperor of mankind will do as he pleases
/s
😂