#Hyper-efficient self-attention
9 messages · Page 1 of 1 (latest)
forgot
on it
wait can this not just be used with rope
You should read the LC0 paper if you haven’t already https://arxiv.org/pdf/2409.12272
I don’t think there’s really any downside to the relative positional embeddings (Shaw et al) they use, it can be implemented with really speed no trouble for CPU inference
Anyways, you’re proposing a rather obvious simplification of the self attention mechanism with no proposal of how to feasibly integrate this into a chess network - how many layers, what do you want your inputs to look like, etc are all more difficult/important questions to answer
fair