#Bit QK attention

5 messages · Page 1 of 1 (latest)

upper elbow
#

Can we make use of tensorcore single-bit operations to speed up the QK part. Either calculate Q and K directly in one bit, or add a random (or trainable) projection, which I think would let us interpret the one-bit matmul as a locality-sensitive hashing lookup.

zealous oyster
#

Can you give an example of previous work on "which I think would let us interpret the one-bit matmul as a locality-sensitive hashing lookup"? I'm familiar with reformer https://arxiv.org/pdf/2001.04451, but the purpose of LSH via random projections there was not for dealing with low precision.

upper elbow
#

ok, so I think the word lookup here is probably not a good choice. What I mean is just that, random-projection lsh uses a random hyperplane h for each hash function, and sets the corresponding bit to sgn(<h, k>). If we generate these bits for each key and query, then the 1-bit tensorcore-matmul in XOR mode should just give us the (negative) fraction of hash buckets in which each key and query coincide.

#

then you could do the higher-precision inner products for the most promising candidates (I think this would correspond to reformer), but maybe we could also just feed those directly into the softmax. Really, I'm just trying to find somehing to get some use out of tensorcore bit operations 🙂

zealous oyster