Deliberately make L1 sparse for small net. | Stockfish | Page 1

steep skiff Oct 6, 2025, 3:57 PM

#

An intriguing idea. If we sparsify the net, we can get increased speed. Since the feature of the small net is speed, maybe what we really need is a sparse one, not necessarily the smallest one.

junior edge Oct 6, 2025, 3:59 PM

#

I think that's a fair idea to try.

pastel condor Oct 7, 2025, 3:32 PM

#

Any details how to do this? One way I think is rewrite loss function so that it can see sparsity but then training steps would be immensely slower

junior edge Oct 7, 2025, 3:37 PM

#

smallnet is hardly limited by loss function I think. Yes, I agree one can add a suitable norm on the weights to achieve this, is think.

#

https://github.com/moskomule/l0.pytorch

vivid flame Oct 7, 2025, 7:50 PM

#

how much sparser would the net need to be for a nonnegligible speed improvement

junior edge Oct 7, 2025, 8:13 PM

#

idk

steep skiff Oct 8, 2025, 11:14 AM

#

My proposal is actually just sigmoid(x-1) for each l1 or something.

#

Or something like one-sided Huber loss.

#

From say -0.5 upward.

pastel condor Oct 13, 2025, 7:38 AM

#

I first tested how much AffineTransformSparseInput of smallnet speeds up the inference,

    using L1AffineTransform = std::conditional_t<L1 == TransformedFeatureDimensionsBig,
                                                Layers::AffineTransformSparseInput<L1, FC_0_OUTPUTS + 1>,
                                                Layers::AffineTransform<L1, FC_0_OUTPUTS + 1>>;

    L1AffineTransform fc_0;

But interestingly:

stockfish-master: 28313813
stockfish: 28455070

It's within the error range but dense layer seems faster than sparse layer.

vivid flame Oct 13, 2025, 7:39 AM

#

I think that checks out, there's a decent constant overhead for the sparse processing

pastel condor Oct 13, 2025, 7:40 AM

#

Yes, also I guess making small net more sparse wouldn't benefit that much overall.

vivid flame Oct 13, 2025, 7:40 AM

#

ye

#

amdahl's law

#

😩

pastel condor Oct 13, 2025, 7:55 AM

#

Excluding sparse layer entirely:

stockfish: 23650404

~16.5% slowdown (+19.7% speedup), also interesting, because this magnitude of speedup is greater than what AndrovT suggested originally (~10%)

#

I guess bigger L1 size is the main factor for that...

vivid flame Oct 13, 2025, 7:59 AM

#

pastel condor Excluding sparse layer entirely: ``` stockfish: 23650404 ``` ~16.5% slowdown (+1...

this figure is for L0?

pastel condor Oct 13, 2025, 7:59 AM

#

What do you mean?

vivid flame Oct 13, 2025, 8:00 AM

#

where is this number from

#

sry I'm a little confused

pastel condor Oct 13, 2025, 8:01 AM

#

diff --git a/src/nnue/nnue_architecture.h b/src/nnue/nnue_architecture.h
index c020ce05..d417a69d 100644
--- a/src/nnue/nnue_architecture.h
+++ b/src/nnue/nnue_architecture.h
@@ -61,7 +61,7 @@ struct NetworkArchitecture {
     static constexpr int       FC_0_OUTPUTS                 = L2;
     static constexpr int       FC_1_OUTPUTS                 = L3;
 
-    Layers::AffineTransformSparseInput<TransformedFeatureDimensions, FC_0_OUTPUTS + 1> fc_0;
+    Layers::AffineTransform<TransformedFeatureDimensions, FC_0_OUTPUTS + 1> fc_0;
     Layers::SqrClippedReLU<FC_0_OUTPUTS + 1>                                           ac_sqr_0;
     Layers::ClippedReLU<FC_0_OUTPUTS + 1>                                              ac_0;
     Layers::AffineTransform<FC_0_OUTPUTS * 2, FC_1_OUTPUTS>                            fc_1;

vivid flame Oct 13, 2025, 8:01 AM

#

yeah ok

#

that's what I thought

gritty axle Oct 17, 2025, 6:48 PM

#

pastel condor Excluding sparse layer entirely: ``` stockfish: 23650404 ``` ~16.5% slowdown (+1...

weight permutation has also gotten better since then i think, but unsure by how much (that and bigger L1)

pastel condor Oct 27, 2025, 3:20 AM

#

Update: https://stockfish.mineta.dev/nnue/readme/experiment-4

steep skiff Oct 27, 2025, 3:36 AM

#

pastel condor Update: <https://stockfish.mineta.dev/nnue/readme/experiment-4>

Okay... but a slowdown even without strength penalty?

#

That is weird.

vivid flame Oct 27, 2025, 3:38 AM

#

how small is small net

pastel condor Oct 27, 2025, 3:41 AM

#

vivid flame how small is small net

L1=128, 32 x 32b blocks processed in find_nnz (big net is L1=3072)

vivid flame Oct 27, 2025, 3:41 AM

#

ah yeah

#

that's just too smol I think :(

#

maybe if we move the find_nnz calculation earlier

pastel condor Oct 27, 2025, 3:42 AM

#

I was thinking of applying the same method to big net but 1) training takes forever and 2) features are already sparse enough (above 70%?)

vivid flame Oct 27, 2025, 3:42 AM

#

how does ur method work

#

sorry I don't know anything about neural networks

pastel condor Oct 27, 2025, 3:43 AM

#

Just add a smooth L0 regularization term to loss so training makes transformed features driven towards zero

#

Mean of 1 - exp(-20.0 * |w|) where w is a feature tensor

#

For visual representation:

#Deliberately make L1 sparse for small net.