#The NNUE is sparse
6 messages · Page 1 of 1 (latest)
the later layers are so minimal that significant improvement is needed for a overall speedup can be measured
Maybe that could be the case...
But what about we could make the later layers bigger?
I actually tried replacing the affine transform with affine transform sparse input but it got an unstable bench.
yes because the later layers are too small for even the entire L1 output to fit fully into an avx512 vector