Hyper-efficient self-attention | Stockfish | Page 1

eager night Feb 25, 2025, 5:53 PM

#

You need to include something to model positional information

#

Unless you’re assuming Qh, Kh, Vh have been produced with biases per token

deft goblet Feb 26, 2025, 5:47 PM

#

eager night You need to include something to model positional information

forgot

#

on it

#

wait can this not just be used with rope

eager night Feb 26, 2025, 7:09 PM

#

deft goblet wait can this not just be used with rope

You should read the LC0 paper if you haven’t already https://arxiv.org/pdf/2409.12272

#

I don’t think there’s really any downside to the relative positional embeddings (Shaw et al) they use, it can be implemented with really speed no trouble for CPU inference

#

Anyways, you’re proposing a rather obvious simplification of the self attention mechanism with no proposal of how to feasibly integrate this into a chess network - how many layers, what do you want your inputs to look like, etc are all more difficult/important questions to answer

deft goblet Feb 27, 2025, 8:47 AM

#

fair

#Hyper-efficient self-attention