rocky vigil Oct 18, 2025, 2:13 PM

#

Yes it should be using ^ for the factorized features

violet badger Oct 18, 2025, 2:13 PM

#

so, I'd better setup an additional run with that enabled...

#

makes me wonder why the first quick test of stage 1 seemed so good?

#

it did finish stage 2... so I guess we could test again.

rocky vigil Oct 18, 2025, 2:16 PM

#

Closer if you consider that we are around this close to master w/o spsa for a standard net

violet badger Oct 18, 2025, 2:16 PM

#

yes, I've been keeping that in mind 🙂

rocky vigil Oct 18, 2025, 2:17 PM

#

Though I hope there is still some 5-10 elo more we can squeeze in total

#

I don’t think the fishtest actually helps let us know the scaling

#

At least it shouldn’t scale poorly

violet badger Oct 18, 2025, 2:18 PM

#

well... #1336647760388034610 message is a test for scaling to some extend. Looks good but no cigar

rocky vigil Oct 18, 2025, 2:19 PM

#

Oh shoot I thought that was also stc just done faster locally

violet badger Oct 18, 2025, 2:20 PM

#

stc, but with a few more threads

green moat Oct 18, 2025, 2:20 PM

#

violet badger so, I'd better setup an additional run with that enabled...

Another 3 days waiting
😭

violet badger Oct 18, 2025, 2:21 PM

#

no need to ping ...

rocky vigil Oct 18, 2025, 2:22 PM

#

It remains to be seen whether 1280 can be worth it over 1024

#

Stage 4 should be done now right?

green moat Oct 18, 2025, 2:23 PM

#

rocky vigil Stage 4 should be done now right?

yes, currently Stage 5 is ongoing

rocky vigil Oct 18, 2025, 3:10 PM

#

@stray reef do you think lookup could still be faster than current indexing scheme post-cj speedup?

stray reef Oct 18, 2025, 3:12 PM

#

i think it's worth trying the full lookup table for sure

#

but i personally will try this way as well

lofty cedar Oct 18, 2025, 3:14 PM

#

Another potential speedup patch inbound!

stray reef Oct 18, 2025, 3:16 PM

#

ig it's a question of larger table vs. no branches/one pseudo_attacks lookup and popcount saved

lofty cedar Oct 18, 2025, 3:38 PM

#

I think larger tables are relatively free if their cache lines aren't used.

#

They just stay out of the caches and don't cause slowdown.

rocky vigil Oct 18, 2025, 3:59 PM

#

Can try it ig

rocky vigil Oct 18, 2025, 4:34 PM

#

oh

#

size limit increased

#

gonna try 1280 stc / ltc ig

twilit oriole Oct 18, 2025, 4:47 PM

#

is this with factoriser

rocky vigil Oct 18, 2025, 4:55 PM

#

nope

#

is just stage 4

#

also it's on fishtest now

#

so you can just pull the two branches involved

#

if you want to run locally

twilit oriole Oct 18, 2025, 4:57 PM

#

violet badger it did finish stage 2... so I guess we could test again.

should test this also ig

rocky vigil Oct 18, 2025, 4:57 PM

#

yeah idk why "factorized" stage 1 is so much better this time

#

training luck?

twilit oriole Oct 18, 2025, 4:58 PM

#

https://tests.stockfishchess.org/tests/view/68f31c6c28e6d77fcffa0611 i started the stage 1 test again to see if its just a fluke

sharp sail Oct 18, 2025, 4:58 PM

#

how much is factorized training worth?

twilit oriole Oct 18, 2025, 4:59 PM

#

this net isnt factorized. they thought it was but it wasnt

rocky vigil Oct 18, 2025, 4:59 PM

#

we will find out for threat inputs in ~ 3 days

#

real factorization

sharp sail Oct 18, 2025, 4:59 PM

#

was the master net trained without factorizer?

rocky vigil Oct 18, 2025, 5:00 PM

#

the current best net is w/o factorizer

twilit oriole Oct 18, 2025, 5:00 PM

#

the master net is with factorizer

#

ofc

rocky vigil Oct 18, 2025, 5:00 PM

#

I still feel that https://tests.stockfishchess.org/tests/view/68f091a228e6d77fcffa0128 will help 1280

#

since this one reduced avg. number of threats updated

#

oh well

twilit oriole Oct 18, 2025, 5:01 PM

#

continue the fake factoriser run

#

the sprt became even better after resuming

rocky vigil Oct 18, 2025, 5:01 PM

#

yeah both are active rn

#

it'll be really funny though

#

if just better training luck == elo

#

though bad overall in long term

twilit oriole Oct 18, 2025, 5:02 PM

#

thats not normal. bullet doesnt have this

rocky vigil Oct 18, 2025, 5:02 PM

#

idk how aggressive the skipping / filtering is

#

but each stage is much less than 1 epoch I think

violet badger Oct 18, 2025, 5:03 PM

#

I wonder if the branch had an additional fix?

rocky vigil Oct 18, 2025, 5:03 PM

#

i did not edit anything touching the original threat inputs definition so I wouldn't expect that to be the case

#

but who knows

#

what is going on at ltc rn

#

with the 1280 net

violet badger Oct 18, 2025, 5:17 PM

#

idk..

twilit oriole Oct 18, 2025, 5:17 PM

#

Can you do a training run getting rid of king buckets? at l1 1280 again

violet badger Oct 18, 2025, 5:17 PM

#

no..

#

I think I've seen what the issue is.

twilit oriole Oct 18, 2025, 5:18 PM

#

yeah i mean in general lol

#

not related to this

rocky vigil Oct 18, 2025, 5:18 PM

#

need to define new feature set

twilit oriole Oct 18, 2025, 5:18 PM

#

I tested it at L1 3072 at it was -20 fixed nodes at that L1 size. But others tried at smaller L1s and it gained a lot

rocky vigil Oct 18, 2025, 5:18 PM

#

it could be done though

#

let's wait to see

#

if this single machine is just very anomalous

violet badger Oct 18, 2025, 5:20 PM

#

diff --git a/model/features/full_threats.py b/model/features/full_threats.py
index 8219c47..6ff36fd 100644
--- a/model/features/full_threats.py
+++ b/model/features/full_threats.py
@@ -160,9 +160,11 @@ class FactorizedFeatures(FeatureBlock):
     def get_feature_factors(self, idx: int) -> list[int]:
         if idx >= self.num_real_features:
             raise Exception("Feature must be real")
-
-        a_idx = idx % NUM_PLANES_REAL
-        k_idx = idx // NUM_PLANES_REAL
+        if idx < 79856:
+            return [idx]
+        
+        a_idx = (idx - 79856) % NUM_PLANES_REAL
+        k_idx = (idx - 79856) // NUM_PLANES_REAL
 
         if a_idx // NUM_SQ == 10 and k_idx != KingBuckets[a_idx % NUM_SQ]:
             a_idx += NUM_SQ

#

is that specific to the factorizer or a bug fix in general?

rocky vigil Oct 18, 2025, 5:21 PM

#

this is specific to factorizer

#

i think

#

isn't this code not called?

#

when factorizer is not used

violet badger Oct 18, 2025, 5:22 PM

#

Ok, looking at the diff between the trainers:
git diff 73696ad5f56e6ba216ba693bf5ad41a278004e36 5bcb0036825206ad6a23df6ed1b07211e3a73f58

#

which are the shas of the two versions used for training.

#

It also contains the change to the rng.. but I doubt that matters?

rocky vigil Oct 18, 2025, 5:24 PM

#

probably not

#

i really don't know how training luck is 30 elo though

violet badger Oct 18, 2025, 5:24 PM

#

I have certainly never seen that.

#

At the end of the pipeline there is maybe 1-2 Elo variation

#

(certainly <5Elo)

rocky vigil Oct 18, 2025, 5:26 PM

#

approx how long will it take for stage 1 of the real factorized run?

#

we could also compare then

violet badger Oct 18, 2025, 5:26 PM

#

The usual <24h.

#

Prob 14h.

rocky vigil Oct 18, 2025, 5:26 PM

#

yeah so it won't take that long to find out...

violet badger Oct 18, 2025, 5:28 PM

#

I wonder if it makes sense to start training L1 = 768...

rocky vigil Oct 18, 2025, 5:28 PM

#

perhaps

#

stc of 1280 is holding steady so far

#

i guess really just wait a while for ltc

violet badger Oct 18, 2025, 5:38 PM

#

I think the better performance of that factorized net is probably because of the reference net used, I don't see it to be the stage 2 equivalent. The training run has these nets:

Step 1 : starting from None leading to 574f3061fd9e
--> step 1 is final already. Result: /workspace/scratch/574f3061fd9e/run/lightning_logs/version_1/checkpoints/nn-3e22bf1f564d.nnue
Step 2 : starting from 574f3061fd9e leading to e3109a97a662
--> step 2 is final already. Result: /workspace/scratch/e3109a97a662/run/lightning_logs/version_1/checkpoints/nn-a878500a97a8.nnue
Step 3 : starting from e3109a97a662 leading to 6d0eccfc51a2
--> step 3 is final already. Result: /workspace/scratch/6d0eccfc51a2/run/lightning_logs/version_1/checkpoints/nn-bf4519f857f4.nnue
Step 4 : starting from 6d0eccfc51a2 leading to bedc9e9b73fd
--> step 4 is final already. Result: /workspace/scratch/bedc9e9b73fd/run/lightning_logs/version_2/checkpoints/nn-598188c9a702.nnue
Step 5 : starting from bedc9e9b73fd leading to e919dd3ada1a
--> step 5 is final already. Result: /workspace/scratch/e919dd3ada1a/run/lightning_logs/version_1/checkpoints/nn-d1dc1ab9cb1c.nnue

#

while the test has a different base net, not sure what it is. https://tests.stockfishchess.org/tests/view/68f31c6c28e6d77fcffa0611

#

Step 1 : starting from None leading to fbfaa6b547c6
--> step 1 is final already. Result: /workspace/scratch/fbfaa6b547c6/run/lightning_logs/version_1/checkpoints/nn-020430fc567b.nnue

twilit oriole Oct 18, 2025, 5:41 PM

#

has to be, this sprt result is impossible lol

violet badger Oct 18, 2025, 5:41 PM

#

so the proper test would be nn-3e22bf1f564d.nnue vs nn-020430fc567b.nnue

rocky vigil Oct 18, 2025, 5:44 PM

#

lmao

#

nn-fd9f...

#

that base net

#

is a 100 SB net

#

so 1/8 of the real stage 1

#

yeah this test is meaningless

twilit oriole Oct 18, 2025, 5:47 PM

#

kekw

#

first hashgate now netgate Kappa

rocky vigil Oct 18, 2025, 5:49 PM

#

i kinda wanna see in a few days

#

if we are at the level of master replication attempt

#

(i.e. pre-spsa)

violet badger Oct 18, 2025, 5:51 PM

#

so, at least we know we can't stop training after 100epochs.

regal steeple Oct 18, 2025, 6:05 PM

#

Could someone test if this is a speedup https://tests.stockfishchess.org/tests/view/68f3d698637acd2a11e71ffe ? My local test says it is but I dont really trust my tests anymore

violet badger Oct 18, 2025, 6:08 PM

#

let me try..

#

probably no difference?

Result of 100 runs
==================
base (./stockfish.base         ) =     977742  +/- 3024
test (./stockfish.new          ) =     977027  +/- 3049
diff                             =       -715  +/- 2094

speedup        = -0.0007
P(speedup > 0) =  0.2520

But well, always tricky to measure small difference

#

it might also be that things now are a bit HW dependent

rocky vigil Oct 18, 2025, 6:26 PM

#

it might? be time to try 768

#

with the factorizer

regal steeple Oct 18, 2025, 6:29 PM

#

Thank you, thats quite interesting, my local testing showed a decent speedup, but im not sure, maybe I made some mistake in my test

violet badger Oct 18, 2025, 6:31 PM

#

I think it depends a bit on what dominates, and probably in my case slow memory access dominates.

#

but well, we will see what fishtest figures out..

rocky vigil Oct 18, 2025, 6:32 PM

#

the 1280 results are strange

twilit oriole Oct 18, 2025, 6:35 PM

#

i dont think they are. can be explained its just too slow, maybe undertrained etc

#

I think the current net size is well selected and should focus on optimising it fully first

#

it terms of training schedules etc

violet badger Oct 18, 2025, 6:36 PM

#

I already played quite a bit around with lr / alpha, but no gains so far.

rocky vigil Oct 18, 2025, 6:37 PM

#

yeah 1024 seems good, we could give 768 a try later just to confirm

violet badger Oct 18, 2025, 6:37 PM

#

I'll start 768 in a short while.

#

https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/2926829081096545/-/pipelines/2107270270

green moat Oct 18, 2025, 9:19 PM

#

nn-e0189470ae73.nnue available for "Use 1280" pipeline (based on threat_inputs branch)

rocky vigil Oct 18, 2025, 9:39 PM

#

kind of a wash though at least with current estimates

#

like it will be 1-2 elo stronger than stage 4

#

that'll put it at maybe -10 elo stc, -4 elo ltc

#

768 might be more interesting

#

probably antiscales but maybe the base stc gain will be high

prime mica Oct 18, 2025, 10:16 PM

#

hm

#

do u have an estimate on how much incremental threat updates would help the situation

#

I also wonder whether some of the threats are "low information" in the sense that they're already encoded somehow in the main net

#

like if you have a queen right next to the king lol

rocky vigil Oct 18, 2025, 10:19 PM

#

prime mica do u have an estimate on how much incremental threat updates would help the situ...

1-2% is my guess, idk, we won’t know until we try

prime mica Oct 18, 2025, 10:19 PM

#

😩

#

piddly

rocky vigil Oct 18, 2025, 10:19 PM

#

Like threat tracking is ~5% of runtime rn

prime mica Oct 18, 2025, 10:19 PM

#

oh that's not much

rocky vigil Oct 18, 2025, 10:19 PM

#

Threat indexing is another 5% still

#

It adds up

prime mica Oct 18, 2025, 10:19 PM

#

true

rocky vigil Oct 18, 2025, 10:19 PM

#

The biggest time sink is accumulating

#

Which is 20%

prime mica Oct 18, 2025, 10:20 PM

#

🤮

#

I need to gather more data on whether it's truly memory bound on other computers

rocky vigil Oct 18, 2025, 10:20 PM

#

Idk if that can be done faster though, unless the raw number of updates is decreased

prime mica Oct 18, 2025, 10:20 PM

#

my computer is weird because it has extremely high arithmetic throughput

rocky vigil Oct 18, 2025, 10:20 PM

#

Wait what machine do you have

prime mica Oct 18, 2025, 10:21 PM

#

it's a recent AMD EPYC machine

rocky vigil Oct 18, 2025, 10:21 PM

#

Oh yeah

rocky vigil Oct 18, 2025, 10:21 PM

#

rocky vigil Idk if that can be done faster though, unless the raw number of updates is decre...

Bc it very simple loop

prime mica Oct 18, 2025, 10:21 PM

#

so it can do 4x 512-bit vpaddw per cycle

#

er wait no

#

2x

#

but still quite a bit more than most computers on fishtest

rocky vigil Oct 18, 2025, 10:22 PM

#

Using many threads probably stresses the avx / memory more

prime mica Oct 18, 2025, 10:22 PM

#

true

rocky vigil Oct 18, 2025, 10:22 PM

#

That’s why I think viren said i8 would only be worth it at smp

prime mica Oct 18, 2025, 10:22 PM

#

ye

rocky vigil Oct 18, 2025, 10:23 PM

#

But fishtest conditions are also multithread basically in terms of memory pressure

#

Bc of concurrency

prime mica Oct 18, 2025, 10:23 PM

#

right

#

although hm

#

what if we tried combining the shared memory branch

#

like

rocky vigil Oct 18, 2025, 10:23 PM

#

I also learned there is no simd i8 * i8 = i16 mul

prime mica Oct 18, 2025, 10:23 PM

#

threat inputs, shared memory vs. master, shared memory

rocky vigil Oct 18, 2025, 10:23 PM

#

So i8 requires double add

#

Or we drop mulhi trick

prime mica Oct 18, 2025, 10:23 PM

#

a tragedy

#

let's call up the CPU manufactures and have them add vpsfaddsubw

rocky vigil Oct 18, 2025, 10:24 PM

#

prime mica threat inputs, shared memory vs. master, shared memory

According to viren / jw’s experience with Monty this indeed favors threat inputs

prime mica Oct 18, 2025, 10:24 PM

#

gotcha

#

what is Monty?

rocky vigil Oct 18, 2025, 10:25 PM

#

CPU mcts engine

prime mica Oct 18, 2025, 10:25 PM

#

oh! cool

rocky vigil Oct 18, 2025, 10:25 PM

#

The one where this idea originated from

#

Since they got it to work in Monty first

#

And then it worked in Yukari, then Plentychess

#

And soon hopefully sf

prime mica Oct 18, 2025, 10:26 PM

#

I see

#

it's cool that stockfish imports ideas from other engines!

#

like big ideas would probably be really hard to test and push through bc master is so carefully tuned

rocky vigil Oct 18, 2025, 10:27 PM

#

I mean that’s the purpose of all this stuff being open source

prime mica Oct 18, 2025, 10:27 PM

#

sure

rocky vigil Oct 18, 2025, 10:27 PM

#

And the collaborative nature

prime mica Oct 18, 2025, 10:27 PM

#

msheart_eyes

rocky vigil Oct 18, 2025, 10:27 PM

#

corrhist which was a big gain (like 6 elo, it’s literally a whole third of the progress from 17 to 17.1) shortly after SF17 was also originally done in other ab engines

rocky vigil Oct 18, 2025, 10:30 PM

#

prime mica like big ideas would probably be really hard to test and push through bc master ...

Technically anything above -5 stc / ltc vs master is a win because we can’t get a new net above that without spsa either

#

But I’m hoping we can go the full way

#

And yeah big ideas like this require many many people

green moat Oct 18, 2025, 10:32 PM

#

prime mica what is Monty?

https://github.com/official-monty/Monty
https://tests.montychess.org/tests

prime mica Oct 18, 2025, 10:32 PM

#

https://tenor.com/view/ferris-rust-rustlang-crab-cute-gif-26396486

Tenor

rocky vigil Oct 18, 2025, 10:49 PM

#

@prime mica one of these is the current version

#

the one on the right is

prime mica Oct 18, 2025, 10:53 PM

#

cool beans

sharp sail Oct 19, 2025, 12:14 AM

#

prime mica https://tenor.com/view/ferris-rust-rustlang-crab-cute-gif-26396486

oh this is the model this nice twitch streamer made: https://www.twitch.tv/raymarch
he has a very aesthetic stream

naive comet Oct 19, 2025, 12:52 AM

#

I have another idea

#

but the thing is that all my smart ideas fail and my dumb ideas tend to work

#

just look at my last speedup for example

#

like how tf does that give 2%

twilit oriole Oct 19, 2025, 4:54 AM

#

-11.6 STC to https://tests.stockfishchess.org/tests/view/68f3c3fa637acd2a11e71fde

naive comet Oct 19, 2025, 5:07 AM

#

pogey

prime mica Oct 19, 2025, 5:18 AM

#

sharp sail oh this is the model this nice twitch streamer made: <https://www.twitch.tv/raym...

lol that's cute

violet badger Oct 19, 2025, 5:22 AM

#

5 stages of 1280:
1: Elo: -83.06 +/- 1.85, nElo: -158.79 +/- 3.40 nn-8f15e80a1212.nnue
2: Elo: -44.34 +/- 1.84, nElo: -82.88 +/- 3.40 nn-ee65bf2468c5.nnue
3: Elo: -41.99 +/- 1.84, nElo: -78.44 +/- 3.40 nn-da4726ad1062.nnue
4: Elo: -38.09 +/- 1.84, nElo: -71.15 +/- 3.40 nn-07f85ae62b17.nnue
5: Elo: -36.27 +/- 1.86, nElo: -67.03 +/- 3.40 nn-e0189470ae73.nnue

#

vs #1336647760388034610 message of 1024

#

(not the latest optimized SF playing of course)

rocky vigil Oct 19, 2025, 5:26 AM

#

yeah -11 stc to neutral ltc what is this

twilit oriole Oct 19, 2025, 5:27 AM

#

violet badger 5 stages of 1280: 1: Elo: -83.06 +/- 1.85, nElo: -158.79 +/- 3.40 nn-8f15e80a121...

this gives impression of being undertrained

#

the convergence time increased

twilit oriole Oct 19, 2025, 5:29 AM

#

rocky vigil yeah -11 stc to neutral ltc what is this

maybe stage 5 is enough to pass ltc

rocky vigil Oct 19, 2025, 5:29 AM

#

well 768 will probably be huge antiscaler at this rate

#

💀

#

oh well

#

might as well see

violet badger Oct 19, 2025, 5:31 AM

#

twilit oriole this gives impression of being undertrained

so, just replace last step with e.g. 1200 epochs, add a step of 1200 epochs, or redo all 🙂

twilit oriole Oct 19, 2025, 5:32 AM

#

i would assume safest bet is to increase length of all stages and redo. dunno how other stuff might affect things

#

but wait for factoriser first ig that should help a bit

violet badger Oct 19, 2025, 5:33 AM

#

right, maybe smarter to wait for the factorizer.

#

that one (the for real one) should finish step 1 soon (2h?) and I think we should run a sanity check against the corresponding step without factorizer.

stray gyro Oct 19, 2025, 5:47 AM

#

Is there a reason why we don't push indices directly to active in append_active_indices?

#

I'm seeing ~1.5% speedup, also bench looks identical (to xu-shawn/threats_inputs)

naive comet Oct 19, 2025, 5:51 AM

#

yeah

#

it's just a useless intermediate list

stray gyro Oct 19, 2025, 5:51 AM

#

Another free speedup...

naive comet Oct 19, 2025, 5:51 AM

#

yeah

rocky vigil Oct 19, 2025, 5:52 AM

#

wait shoot where'd that come from

#

i thought I removed that

#

it's tech debt back when I was doing "UE at home"

stray gyro Oct 19, 2025, 5:52 AM

#

Idk what's the latest version, I just looked at shawn's branch and it seemed strange to have that.

naive comet Oct 19, 2025, 5:52 AM

#

also mineta ray is unused in the threats updates in Position

rocky vigil Oct 19, 2025, 5:52 AM

#

yeah i thought i removed it in the retry

naive comet Oct 19, 2025, 5:52 AM

#

you should prolly include that too

rocky vigil Oct 19, 2025, 5:53 AM

#

i guess i didn't, and then shawn copied it over

#

i mean free speedups always nice

stray gyro Oct 19, 2025, 5:54 AM

#

naive comet you should prolly include that too

You mean ray &= BetweenBB[s][threatened_sq]; ?

rocky vigil Oct 19, 2025, 6:06 AM

#

yeah i guess just put it up on fishtest

#

should be free 2-3 elo

stray gyro Oct 19, 2025, 6:11 AM

#

I'd rather let it included directly because it's trivial. fishtest is already a bit under the strain atm.

violet badger Oct 19, 2025, 6:17 AM

#

rocky vigil yeah -11 stc to neutral ltc what is this

I'm not really seeing that difference in fixed game tests (for 1024) tbh. Let's not forget this is just a sprt run.

#

maybe 1280 is different... who knows. Still some work to do, but we're making progress.

naive comet Oct 19, 2025, 6:24 AM

#

stray gyro You mean `ray &= BetweenBB[s][threatened_sq];` ?

yeah

violet badger Oct 19, 2025, 6:35 AM

#

(60+0.6, 72t, 32000MB, UHO_Lichess_4852_v1.epd):
   # PLAYER    :  RATING  ERROR  POINTS  PLAYED   (%)
   1 master    :     0.0   ----  4173.0    8192    51
   2 patch     :    -6.7    5.5  4019.0    8192    49

(patch is 3fc4b6a58c288001f929acc560cb8b28adf03125 (cj-latest-speedup branch))

#

relative to #1336647760388034610 message

prime mica Oct 19, 2025, 6:36 AM

#

this is 8192 games, each one having engines with 72 threads?

violet badger Oct 19, 2025, 6:36 AM

#

yes

prime mica Oct 19, 2025, 6:36 AM

#

gotcha

rocky vigil Oct 19, 2025, 6:48 AM

#

ok so 1024 is approximately neutral scaling with master

#

in threads and time

#

so probably it is a good spot

#

though of course 768 might pull a surprise...

violet badger Oct 19, 2025, 6:56 AM

#

well, I should probably compute the results with 1 thread... I think there is good scaling actually. One thing we're definitely seeing, as with all arch changes, is that the real result depends on the HW.

rocky vigil Oct 19, 2025, 6:58 AM

#

LTCs on fishtest do get a larger diversity of machines (at least for the same number of games), so that might play a role

naive comet Oct 19, 2025, 7:12 AM

#

rocky vigil ok so 1024 is approximately neutral scaling with master

yikes...

#

then would 1280 have more promise?

rocky vigil Oct 19, 2025, 7:27 AM

#

i still think 1024 is good

#

but yeah a longer trained 1280 is definitely promising

rocky vigil Oct 19, 2025, 7:29 AM

#

naive comet yikes...

to be fair, all this test shows is that stc 72t and ltc 72t are around the same, as stc, at -7

twilit oriole Oct 19, 2025, 7:29 AM

#

rocky vigil yeah i guess just put it up on fishtest

Is there a test yet

rocky vigil Oct 19, 2025, 7:29 AM

#

was waiting for mineta to do so but I can make one

twilit oriole Oct 19, 2025, 7:30 AM

#

I don't see what is 'yikes' about -6 at this stage it's completely fine

#

Need a bit of patience...

#

There are gainers to come

stray gyro Oct 19, 2025, 7:33 AM

#

If there are no pending improvements incoming I can make a test

naive comet Oct 19, 2025, 7:34 AM

#

I think sscg recently made one

stray gyro Oct 19, 2025, 7:34 AM

#

You mean vs master or vs lates threat?

#

oh good

rocky vigil Oct 19, 2025, 7:34 AM

#

oh i already have the branch set up

#

but have not made test

#

i'll make it

#

it'll run slightly faster

stray gyro Oct 19, 2025, 7:35 AM

#

sure

rocky vigil Oct 19, 2025, 7:35 AM

#

https://tests.stockfishchess.org/tests/view/68f494ca637acd2a11e720d4

#

here we go

#

btw smth strange

#

https://tests.stockfishchess.org/tests/view/68f091a228e6d77fcffa0128

#

@regal steeple can this test be reconciled with upstream changes

#

in particular https://github.com/sscg13/Stockfish/commit/8201c3ee003fe618aee0cbeeaf64c820ac204d8c

violet badger Oct 19, 2025, 7:52 AM

#

so, thread scaling is real in this context. Relative to #1336647760388034610 message

(60+0.6, 1t, 64MB, UHO_Lichess_4852_v1.epd)
   # PLAYER    :  RATING  ERROR  POINTS  PLAYED   (%)
   1 master    :     0.0   ----  9367.0   17920    52
   2 patch     :   -16.2    3.7  8553.0   17920    48

prime mica Oct 19, 2025, 7:54 AM

#

😲

violet badger Oct 19, 2025, 7:55 AM

#

I assume quite a bit comes from the memory pressure (i.e. the known thing where nets are not (yet) shared between processes, using 72t is an easy workaround).

prime mica Oct 19, 2025, 7:55 AM

#

was about to say that

violet badger Oct 19, 2025, 7:55 AM

#

So, for completeness... STC to follow.

rocky vigil Oct 19, 2025, 7:57 AM

#

violet badger I assume quite a bit comes from the memory pressure (i.e. the known thing where ...

if it ever gets added to fishtest I assume it benefits threat input net more than master yeah

#

btw factorized stage 1 is done

#

so I'll put that on fishtest

#

and hopefully avoid netgate...

prime mica Oct 19, 2025, 7:58 AM

#

rocky vigil and hopefully avoid netgate...

what does this mean lol

rocky vigil Oct 19, 2025, 7:58 AM

#

shawn tested the last "fake factorized" stage 1 net against the test net which only had 1/8 of a stage

#

and surprise surprise +30 elo

violet badger Oct 19, 2025, 7:58 AM

#

prime mica what does this mean lol

https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal

#

(chess equivalent, obviously)

prime mica Oct 19, 2025, 7:59 AM

#

lol

#

that's actually rly funny

#

defrauded by threat inputs

regal steeple Oct 19, 2025, 8:04 AM

#

rocky vigil <@628932984459886612> can this test be reconciled with upstream changes

Im not entirely sure, I can speedup test this in a bit and resubmit if that test seems promising

rocky vigil Oct 19, 2025, 8:07 AM

#

bruh forgot to change the fixed games preset

#

can just stop after 20k or so if the results are clear

rocky vigil Oct 19, 2025, 8:09 AM

#

regal steeple Im not entirely sure, I can speedup test this in a bit and resubmit if that test...

ideally if the gain is still there it is amplified now that the engine is overall faster

regal steeple Oct 19, 2025, 8:36 AM

#

regal steeple Im not entirely sure, I can speedup test this in a bit and resubmit if that test...

I get

speedup        = +0.0113
P(speedup > 0) =  1.0000

I submitted the test https://tests.stockfishchess.org/tests/view/68f4a2e3637acd2a11e72101

violet badger Oct 19, 2025, 8:45 AM

#

so, looks like the factorizer still works as advertised: https://tests.stockfishchess.org/tests/view/68f49c11637acd2a11e720f0 ... we'll have to see how much of this remains after 4-5 stages, but it is a good start

violet badger Oct 19, 2025, 10:42 AM

#

violet badger so, thread scaling is real in this context. Relative to https://discord.com/chan...

and finally the STC number in this set.

   (10+0.1, 1t, 16MB, UHO_Lichess_4852_v1.epd)
   # PLAYER    :  RATING  ERROR  POINTS  PLAYED   (%)
   1 master    :     0.0   ----  9485.5   17920    53
   2 patch     :   -20.8    3.7  8434.5   17920    47

So summarizing, at l1=1024, we have

STC,  1t: -20.8
LTC,  1t: -16.2
STC, 72t:  -6.6
LTC, 72t:  -6.7

regal steeple Oct 19, 2025, 10:44 AM

#

Could you speedup test this https://github.com/rn5f107s2/Stockfish/tree/maybemaybemaybe vondele? 10 runs or so would suffice

violet badger Oct 19, 2025, 10:45 AM

#

against the shawn threat_inputs branch, I assume

regal steeple Oct 19, 2025, 10:46 AM

#

violet badger against the shawn threat_inputs branch, I assume

Yes

violet badger Oct 19, 2025, 10:48 AM

#

speedup        = +0.0135
P(speedup > 0) =  1.0000

regal steeple Oct 19, 2025, 10:48 AM

#

Thank you that looks good I guess

violet badger Oct 19, 2025, 10:48 AM

#

yes, it does

regal steeple Oct 19, 2025, 10:49 AM

#

I got around the same value, so hardware difference seems to not be an issue

stray reef Oct 19, 2025, 11:11 AM

#

Some data from pohls tests

PlentyChess 7 TI Test

STC (3min+1sec, ratinglist conditions, 512MB):

Torch 4 a512             : 1000 (+215,=472,-313), 45.1 %, -34 +- 15
Stockfish 17.1 250330    : 1000 (+122,=503,-375), 37.4 %, -89 +- 15


LTC (30min+10sec, 512MB):

PlentyChess 7.0.0 a512   : 1000 (+259,=497,-244), 50.8 %, 5 +- 15
Torch 4 a512             : 1000 (+221,=476,-303), 45.9 %, -28 +- 15
Stockfish 17.1 250330    : 1000 (+142,=491,-367), 38.8 %, -79 +- 15

the error bars are not great ofc, but the trend is there.
and keep in mind it's 512MB hash in both cases, he doesn't have RAM for more

plain flower Oct 19, 2025, 11:13 AM

#

no STC plentychess results?

violet badger Oct 19, 2025, 11:15 AM

#

violet badger and finally the STC number in this set. ``` (10+0.1, 1t, 16MB, UHO_Lichess_48...

and now l1=1280 (but won't repeat all tests for this one):

(10+0.1, 72t, 32000MB, UHO_Lichess_4852_v1.epd)
   # PLAYER        :  RATING  ERROR  POINTS  PLAYED   (%)
   1 master        :     0.0   ----  8304.5   16384    51
   2 patch.1280    :    -4.9    3.9  8079.5   16384    49

stray reef Oct 19, 2025, 11:16 AM

#

plain flower no STC plentychess results?

those were not played actually, the "STC" is from his normal ratinglist run

violet badger Oct 19, 2025, 11:42 AM

#

I've also started a factorized 1280 training run.

naive comet Oct 19, 2025, 12:04 PM

#

regal steeple Could you speedup test this https://github.com/rn5f107s2/Stockfish/tree/maybemay...

maybemaybemaybe you can template put_piece

regal steeple Oct 19, 2025, 12:59 PM

#

naive comet maybemaybemaybe you can template `put_piece`

Oh yeah, I also just saw that im not initializing threatened and threatening square anywhere, so im gonna resubmit the test

finite wind Oct 19, 2025, 4:22 PM

#

what do you predict? what is the future of threat inputs?

violet badger Oct 19, 2025, 4:30 PM

#

threatened ? More seriously, not quite stronger than master, but close. There are still inference patches that will speedup, training sessions that will improve, and tests to be done. So quite some work. Even if this is not certain that it would be merged, it definitely helped to revamp some of our tools and processes.

twilit oriole Oct 19, 2025, 5:11 PM

#

https://tests.stockfishchess.org/tests/view/68f494ca637acd2a11e720d4 some of these patches are not running with the correct bounds @rocky vigil

rocky vigil Oct 19, 2025, 5:11 PM

#

this could be simplification yes

#

oh bruh

#

i thought it would've sped up at least by some nontrivial amount

#

who knows

#

eh I'll just recalculate the llr later

#

it's kind of a waste of games to just restart the test

rocky vigil Oct 19, 2025, 5:38 PM

#

violet badger and now l1=1280 (but won't repeat all tests for this one): ``` (10+0.1, 72t, 320...

i do feel like memory bottleneck is big

violet badger Oct 19, 2025, 6:33 PM

#

at 72t it should not be, the full net should fit in the socket's L3 cache.

rocky vigil Oct 19, 2025, 6:52 PM

#

for the stc 1t i mean

violet badger Oct 19, 2025, 6:53 PM

#

yeah, in that case most likely.

rocky vigil Oct 20, 2025, 3:10 AM

#

gg speedup

#

first one in a little bit

naive comet Oct 20, 2025, 4:01 AM

#

coolio

green moat Oct 20, 2025, 9:53 AM

#

final net nn-6b685002b4b6.nnue available for "Factorized" pipeline:
https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/2926829081096545/-/pipelines/2107827770

violet badger Oct 20, 2025, 10:38 AM

#

The Elo difference to reference is much better in the tests that are executed in the end, but one would need to check if the shas of the playing engines are the same. If the engines are the same, would be quite an improvement.

#

Probably the playing engine that is different.

Previous:
75edbee01e6f8cb53a2555499192ccaddb883577  b7f553ee8b28a4abace6c1056dceb1d69169873a
Elo: -25.29 +/- 1.82, nElo: -47.45 +/- 3.40
Factorized:
75edbee01e6f8cb53a2555499192ccaddb883577  d5fad05e412e3118f94ab79aa5e03067ac86d204
Elo: -21.06 +/- 2.19, nElo: -39.16 +/- 4.06

So, 4 Elo progress in this test, but we can't attribute to net or playing engine.

#

quite a few moving targets.

green moat Oct 20, 2025, 10:49 AM

#

Would it better then testing nn-6b685002b4b6.nnue against nn-598188c9a702.nnue on Fishtest?

violet badger Oct 20, 2025, 10:50 AM

#

yes.

#

that's the test to run, or maybe once the testing at the end of the pipeline finishes, so we take the proper net.

#

but it is already rather clear that this time step 5 is better (nn-6b685002b4b6.nnue)

stray reef Oct 20, 2025, 10:53 AM

#

@twilit oriole I just tried i8 input weights, it's 13% faster but unfortunately -20 fixed nodes (https://furybench.com/test/3426/). I guess the difference to monty is the i8 l1 matmul. but maybe you (or someone else) have/has any ideas how to sneak more accuracy into my impl?

#

there is still a factor of 2 up for grabs technically, if i replace _mm512_cmpgt_epi32_mask with _mm512_cmpneq_epi32_mask for nnz calculation, though i've not found a nice way to do cmpneq on avx2 and below yet

#

another thing is, i saw you using 128 (or 127?) for input weight quantisation, whereas I am limited to 64/63 due to the factoriser ([-98, 125] is the range rn). my weights are clamped to [-0.99, 0.99] during training

naive comet Oct 20, 2025, 11:09 AM

#

stray reef there is still a factor of 2 up for grabs technically, if i replace `_mm512_cmpg...

or just cmpeq and invert the mask?

stray reef Oct 20, 2025, 11:09 AM

#

cmpeq_epi32_mask (also cmpneq) is all avx512 only

lofty cedar Oct 20, 2025, 11:48 AM

#

Where do I make a PR for threat inputs?

violet badger Oct 20, 2025, 11:57 AM

#

I think follow this example https://github.com/xu-shawn/Stockfish/pull/17

lofty cedar Oct 20, 2025, 12:13 PM

#

My test got a bit of a problem due to one worker having 10 residual... which may subject passed test to additional tests.

violet badger Oct 20, 2025, 12:16 PM

#

don't worry, that one will be purged

naive comet Oct 20, 2025, 12:17 PM

#

stray reef `cmpeq_epi32_mask` (also cmpneq) is all avx512 only

you can just do what you currently do for avx2, just replace cmpgt with cmpeq and invert the mask??

stray reef Oct 20, 2025, 12:23 PM

#

ohh didn't see _mm256_cmpeq_epi32 is avx2, yes, thx

#

doubling quantisation accuracy for the threat weights does a bit, but not much it seems https://furybench.com/test/3430/

green moat Oct 20, 2025, 12:32 PM

#

@lofty cedar
Is your "Threat_input_speedup" orthogonal to rn5f107s2 PR (https://github.com/xu-shawn/Stockfish/pull/17) ?

lofty cedar Oct 20, 2025, 12:38 PM

#

Yes, it just changes the update_piece_threats to basically precompute the pawn attack bitboard for 1 square (which simply was not there because we didn't need it), and saved a few attack recalculations by re-using values we need anyway.

naive comet Oct 20, 2025, 1:34 PM

#

I think https://tests.stockfishchess.org/tests/view/68f3adba28e6d77fcffa0727 might need a retest after https://github.com/xu-shawn/Stockfish/pull/17 because with rn5's patch, Threat_input_speedup now does extra computation in the mutate_piece phase rather than less

#

@lofty cedar

#

@frosty imp @regal steeple

#

also aside from that patch, @stray reef do you have Finny tables for your threat inputs? I was thinking about it, do you do a large bitmask and calculate difference that way by any chance?

lofty cedar Oct 20, 2025, 1:40 PM

#

naive comet I think <https://tests.stockfishchess.org/tests/view/68f3adba28e6d77fcffa0727> m...

Why?

naive comet Oct 20, 2025, 1:41 PM

#

cuz your patch hinges on the fact that we always compute rAttacks, qAttacks and bAttacks

#

but with rn5's patch we dont do it on mutate_piece

#

so combining the 2 means you are computing rAttacks, qAttacks, bAttacks even when unnecessary

#

ofc this is a trivial specialcasing away but still

lofty cedar Oct 20, 2025, 1:43 PM

#

Oh... okay...

#

But well, I think the compiler won't miss special case. After all, it's a template argument.

naive comet Oct 20, 2025, 1:46 PM

#

well still good to at least do a speedtest using either speedtest or pyshbench before merging

lofty cedar Oct 20, 2025, 1:46 PM

#

And if one were to be pedantic, one could also say that if the compiler didn't recognize the fact that you can elide qAttack in the case, then you don't need to compute that...

#

I see.

regal steeple Oct 20, 2025, 2:20 PM

#

naive comet <@453859636890828802> <@628932984459886612>

I think its fine unless im missing something, slider attacks are still needed for the new attacks to the square the piece changed (~~https://github.com/xu-shawn/Stockfish/blob/293d3a673f8a7cc0983d48feb9b202f4286e9985/src/position.cpp#L1039C9-L1039C79~~), since threats-to-square are no longer getting incrementally updated. But I really dont like the new table, is it maybe possible to use attacks_bb<PAWN>(s, C) instead?

#

Sorry wrong link, I meant this line https://github.com/xu-shawn/Stockfish/blob/293d3a673f8a7cc0983d48feb9b202f4286e9985/src/position.cpp#L1072

stray reef Oct 20, 2025, 2:27 PM

#

naive comet also aside from that patch, <@415167192296849409> do you have Finny tables for y...

no, the threat refreshes literally don't matter according to my profiling

naive comet Oct 20, 2025, 2:29 PM

#

oh

stray reef Oct 20, 2025, 2:29 PM

#

with some improvements, i8 input weights are -14 fixed nodes now, and seem to fail STC
https://furybench.com/test/3437/
https://furybench.com/test/3438/
I know QAT has been tried and declared neutral, but maybe with tighter quantisations there's some elo up for grabs? @formal smelt @twilit oriole

naive comet Oct 20, 2025, 2:30 PM

#

regal steeple I think its fine unless im missing something, slider attacks are still needed fo...

ohhh um actually you can put compute_rays on the outside

#

skip the while loop even

#

at least my impl did that

regal steeple Oct 20, 2025, 2:31 PM

#

I think that was possible while threats to square were still getting incrementally tracked because threats by sliders were getting added in the third loop but now they are getting added in the second loop

naive comet Oct 20, 2025, 2:33 PM

#

surely you can still do that by:

<pseudocode>
threats_remove()
threats_add()
mutate_board()

or something in that order

#

needs checking at least

regal steeple Oct 20, 2025, 2:37 PM

#

Im not sure im following, say we have a position like this 6k1/8/5n2/4p3/4P3/8/6B1/6K1 b - - 0 1 with f6e4 played, we still need to remove the bishop to pawn threat and add the bishop to knight threat

prisma hatchBOT Oct 20, 2025, 2:37 PM

#

6k1/8/5n2/4p3/4P3/8/6B1/6K1 b - - 0 1Lichess Link | Image

naive comet Oct 20, 2025, 2:38 PM

#

ill experiment after your pr gets merged

lofty cedar Oct 20, 2025, 3:07 PM

#

Okay! STC passed.

prime mica Oct 20, 2025, 3:12 PM

#

the goat

violet badger Oct 20, 2025, 3:16 PM

#

The net from the factorized-not-really-factorized pipeline is essentially the same strength as what we have, but maybe 1 Elo progress:

   1 nn-6b685002b4b6.nnue    :  2300.6    0.6  73981.5  147456    50
   2 nn-598188c9a702.nnue    :  2299.4    0.6  73474.5  147456    50

#

(I guess good enough to include in the branch..)

#

(and also uploaded to make that easy..)

formal smelt Oct 20, 2025, 3:33 PM

#

stray reef with some improvements, i8 input weights are -14 fixed nodes now, and seem to fa...

Yeah I think it is worth trying

#

We were going to try it also

#

Relatively soon

stray reef Oct 20, 2025, 3:34 PM

#

awesome

#

i know nothing about QAT, is there something in some git repo that shows how it's done in bullet?

frosty imp Oct 20, 2025, 3:37 PM

#

merged speedup and net

lofty cedar Oct 20, 2025, 3:40 PM

#

Has anyone tried doubling the size of the later layers in threat net?

violet badger Oct 20, 2025, 3:41 PM

#

long ago (at least for SF) people tried these things, at that point it didn't help much?

#

might be things have changed, but I somehow doubt.

lofty cedar Oct 20, 2025, 3:42 PM

#

Yeah... but neural networks show that a more detailed input scheme often require larger net to interpret.

#

Also, the slowdown in later layers are not that significant anyway if it really helps.

violet badger Oct 20, 2025, 3:44 PM

#

pure guess here, but I think for later layers to be really useful, they might need to be significantly wider. Somehow neither 32 nor 64 can present enough features to the later layers to be able to reason much about the board...

#

They might help a little bit introducing non-linearity or so, but not 'reasoning' or 'tactics'

lofty cedar Oct 20, 2025, 3:45 PM

#

I see.

rocky vigil Oct 20, 2025, 4:23 PM

#

lofty cedar Where do I make a PR for threat inputs?

You PR’d to wrong branch

#

Put it against xu-shawn threat_inputs not master

lofty cedar Oct 20, 2025, 4:24 PM

#

Hmm? Correct branch? I did put it against threat_inputs?

#

Oops...

#

I tried to put it against threat_input...

#

But for some reason, it sent to master.

rocky vigil Oct 20, 2025, 5:40 PM

#

frosty imp merged speedup and net

what's avg. number of threat updates as compared to previous?

#

wanna see if 1280 might be better now

#

in terms of speed

formal smelt Oct 20, 2025, 7:51 PM

#

stray reef i know nothing about QAT, is there something in some git repo that shows how it'...

https://github.com/jw1912/bullet/commit/f270e3ea72b35d1b3dfaec90be2d964ca18543a8
its there for you now :))

tulip gust Oct 20, 2025, 8:10 PM

#

fucking hell jw you are completely unstoppable

rocky vigil Oct 20, 2025, 8:17 PM

#

@lofty cedar can you rebase https://tests.stockfishchess.org/tests/view/68f637eb637acd2a11e72348 ?

#

And are we merging the old version

rocky vigil Oct 20, 2025, 8:20 PM

#

stray reef with some improvements, i8 input weights are -14 fixed nodes now, and seem to fa...

Is it possible for you to try only quantizing threat weights to i8?

violet badger Oct 20, 2025, 8:30 PM

#

Is there anything speaking against l1=896 ?

frosty imp Oct 20, 2025, 8:30 PM

#

was suggesting that a while ago

violet badger Oct 20, 2025, 8:31 PM

#

I think I better start that as well.

#

we'll need to squeeze a few more Elo I'm afraid.

rocky vigil Oct 20, 2025, 8:35 PM

#

Ah yeah maybe we need finer increments to test

#

Was unsure if 7*128 ran into problems with avx registers but I guess not

violet badger Oct 20, 2025, 8:36 PM

#

for sure better than 8*127

#

but yeah, I think 512bits is the unit of concern.

#

good old days of 512~~bytes~~ word vectors are gone 😉

#

https://en.wikipedia.org/wiki/NEC_SX#/media/File:NEC_SX-5_supercomputer.jpg

rocky vigil Oct 20, 2025, 8:39 PM

#

violet badger we'll need to squeeze a few more Elo I'm afraid.

Repeated speedups failing to get past error bars against master
Still kind of disheartening every time it happens though

violet badger Oct 20, 2025, 8:40 PM

#

they should be run until the end, the gains are small enough now that incomplete sprts are probably not very informative.

prime mica Oct 20, 2025, 10:31 PM

#

really dumb question, does it ever make sense to like, train 8 nets in parallel and then select the best one

#

or do they all end up having the same strength

#

(of course it's computationally annoying but just wondering the variance in training)

tulip gust Oct 20, 2025, 10:37 PM

#

prime mica really dumb question, does it ever make sense to like, train 8 nets in parallel ...

this is kinda what nets are doing internally already due to how subcircuits work

#

but you can also do model souping which is like a stronger version of this

prime mica Oct 20, 2025, 10:38 PM

#

O

#

to be clear I don't mean selecting the best one at runtime

tulip gust Oct 20, 2025, 10:38 PM

#

i know, yeah

frosty imp Oct 20, 2025, 10:38 PM

#

iirc linrock always did multiple runs

prime mica Oct 20, 2025, 10:38 PM

#

oh ok

#

idk what a subcircuit is

#

lol who is this legendary linrock and where did they go

#

did they move on from computer chess

frosty imp Oct 20, 2025, 10:39 PM

#

linrock trained SF's network for a long while

tulip gust Oct 20, 2025, 10:39 PM

#

prime mica idk what a subcircuit is

good read: https://distill.pub/2020/circuits/zoom-in/

frosty imp Oct 20, 2025, 10:40 PM

#

because he was always responsible for it, nobody cared to reproduce his training setup

#

so when he took a indefinite break progress literally stopped

prime mica Oct 20, 2025, 10:41 PM

#

😩

#

does he still respond to questions

rocky vigil Oct 20, 2025, 10:42 PM

#

He is still active somewhat

#

Approved one of the recent threat input tests

#

But overall I think he is mostly happy to move on now that vondele got sufficiently close with reproducing a pre-spsa net

rocky vigil Oct 20, 2025, 10:50 PM

#

rocky vigil <@415862567559364619> can you rebase <https://tests.stockfishchess.org/tests/vie...

I think you should rebase since Shawn already merged your passed version

frosty imp Oct 20, 2025, 10:50 PM

#

@lofty cedar

#

eh gonna set TP to 25

green moat Oct 20, 2025, 10:54 PM

#

prime mica does he still respond to questions

He is a few times tagged here on SF Discord and often he responds

lofty cedar Oct 21, 2025, 12:12 AM

#

Oh... I see.

rocky vigil Oct 21, 2025, 1:44 AM

#

new progtest concluded with no difference to last one

#

it is missing the very slightly better net

#

btw @frosty imp if "speedups" reaches simp bounds threshold should I just stop there

rocky vigil Oct 21, 2025, 1:55 AM

#

twilit oriole https://tests.stockfishchess.org/tests/view/68f494ca637acd2a11e720d4 some of the...

as per suggestion

rocky vigil Oct 21, 2025, 1:56 AM

#

rocky vigil btw <@453859636890828802> if "speedups" reaches simp bounds threshold should I j...

as per latest result it is actually 2.95 at simp bounds rn

frosty imp Oct 21, 2025, 1:59 AM

#

just pr now

#

idt that it even needs testing

rocky vigil Oct 21, 2025, 2:01 AM

#

alright well yeah it is simp bound passing so i feel slightly more at ease

#

cool

#

stopped

rocky vigil Oct 21, 2025, 2:03 AM

#

frosty imp just pr now

https://github.com/xu-shawn/Stockfish/pull/20

lofty cedar Oct 21, 2025, 2:50 AM

#

Oopsies. The new "speedup" doesn't interact well with the newer patches.

stray reef Oct 21, 2025, 4:32 AM

#

rocky vigil Is it possible for you to try only quantizing threat weights to i8?

ofc, tho I will try QAT first

violet badger Oct 21, 2025, 4:55 AM

#

prime mica really dumb question, does it ever make sense to like, train 8 nets in parallel ...

yes, but the variation between fully trained nets is small, 1-2 Elo.

lofty cedar Oct 21, 2025, 10:19 AM

#

Does weight permutation work on threat input?

stray reef Oct 21, 2025, 10:32 AM

#

Yes

violet badger Oct 21, 2025, 10:35 AM

#

and is being used AFAICT.

stray reef Oct 21, 2025, 2:04 PM

#

@formal smelt https://github.com/Yoshie2000/bullet/blob/plenty/examples/plenty/0126rrr4.rs#L870-L900
Does this look reasonable to you? L1 biases aren't quantised in-engine ofc, but I doubt that makes a huge difference. Loss looks reasonable (i'm fine-tuning an existing net with this new config, loss is about 5% higher than normal in the first SB of the fine-tune)

formal smelt Oct 21, 2025, 2:14 PM

#

stray reef <@236941606035521537> <https://github.com/Yoshie2000/bullet/blob/plenty/examples...

you can .faux_quantise(value, true);

#

lgtm, i would probably just have the function "quantise" the weights only rather than also doing the affine op

#

fn quantise<'a>(mut layer: Affine<'a, CudaMarker>, value: f32) -> Affine<'a, CudaMarker> {
    layer.weights = layer.weights.faux_quantise(value, true);
    layer.bias = layer.bias.faux_quantise(value, true);
    layer
}

rocky vigil Oct 21, 2025, 2:21 PM

#

@stray reef how feasible would it be to try and separate the threat tracking to only be done when accumulator update is required?
(on the flip side, if we keep the current structure, is it a sane idea to attempt to prefetch the corresponding weights when the indices are computed?)

formal smelt Oct 21, 2025, 2:21 PM

#

stray reef <@236941606035521537> <https://github.com/Yoshie2000/bullet/blob/plenty/examples...

actually i think you should also quantise after the pairwise?

stray reef Oct 21, 2025, 2:22 PM

#

formal smelt lgtm, i would probably just have the function "quantise" the weights only rather...

I see, thank you!

prime mica Oct 21, 2025, 2:22 PM

#

prefetch corresponding weights when the indices are computed
I tried this with the main net and it didn't help

#

U should still try it but just a data point

formal smelt Oct 21, 2025, 2:22 PM

#

formal smelt actually i think you should also quantise after the pairwise?

because you do the shift>

stray reef Oct 21, 2025, 2:22 PM

#

formal smelt actually i think you should also quantise after the pairwise?

I do (after concatting)

#

line 891

formal smelt Oct 21, 2025, 2:22 PM

#

oh yeah i'm blind

stray reef Oct 21, 2025, 2:23 PM

#

rocky vigil <@415167192296849409> how feasible would it be to try and separate the threat tr...

Definitely doable, only downside I see is you need sort of duplicate logic of handling moves, as you need to figure out from the move what threats to update in hindsight, and in SF it might be more difficult since it uses make-unmake, not sure

#

i'm not sure if this is worth anything

rocky vigil Oct 21, 2025, 2:24 PM

#

ah

#

so right now for L1=1024 threat tracking and indexing each take about 5% of the overall runtime

#

idk

formal smelt Oct 21, 2025, 2:25 PM

#

https://github.com/Yoshie2000/bullet/blob/plenty/examples/plenty/0126rrr4.rs#L851
@stray reef we have skipping at home :p

stray reef Oct 21, 2025, 2:25 PM

#

hehe yeah :P

#

still haven't gotten around to switching to binpacks

formal smelt Oct 21, 2025, 2:26 PM

#

how much did that gain?

stray reef Oct 21, 2025, 2:26 PM

#

3-4 SPRT elo

rocky vigil Oct 21, 2025, 9:31 PM

#

768 seems to have finished training

#

1024 to follow shortly

rocky vigil Oct 21, 2025, 9:38 PM

#

rocky vigil 768 seems to have finished training

Should I assume stage 5 is best net, or wait for local results

frosty imp Oct 22, 2025, 12:08 AM

#

just test stage 5 i guess

rocky vigil Oct 22, 2025, 12:45 AM

#

alright let's see if the threat-input-psq patch passes soon first

frosty imp Oct 22, 2025, 12:49 AM

#

Cursed kekgasm

#

I’d say that one needs LTC so it’s probably not getting in soon

rocky vigil Oct 22, 2025, 1:04 AM

#

eh fine

#

I'll just start STC + LTC then

#

for 768

lofty cedar Oct 22, 2025, 4:32 AM

#

What about a threat finny table? The idea is that when a piece moves to a square, instead of adding the threats of the entire board, we add the previous threat to that piece and the difference between the previous threat and the current threat.

frosty imp Oct 22, 2025, 4:34 AM

#

that's exactly what's happening now?

lofty cedar Oct 22, 2025, 4:35 AM

#

Really?

#

I mean... isn't the current approach that when a piece moves to a square, we add
the threat of that piece to/from every piece?

frosty imp Oct 22, 2025, 4:37 AM

#

nope

rocky vigil Oct 22, 2025, 4:37 AM

#

i think the biggest issue is fusing the add/sub like that massively inflates

#

there is a reason it is not done like that for standard psq either

frosty imp Oct 22, 2025, 4:37 AM

#

oh you mean the threats of that piece

#

how is that different from the first message

#

I see. wouldn't moving a piece then require you to update multiple finny entries

lofty cedar Oct 22, 2025, 4:39 AM

#

Uggh...

#

I see.

naive comet Oct 22, 2025, 8:47 AM

#

naive comet also aside from that patch, <@415167192296849409> do you have Finny tables for y...

@lofty cedar

naive comet Oct 22, 2025, 8:47 AM

#

stray reef no, the threat refreshes literally don't matter according to my profiling

^^

regal steeple Oct 22, 2025, 10:36 AM

#

@frosty imp In this patch https://tests.stockfishchess.org/tests/view/68f67ce0637acd2a11e723d9 maybe its better to replace pawn_attacks_bb<BLACK>(s) with attacks_bb<PAWN>(s, BLACK) instead of pawn_attacks_bb<BLACK>(square_bb(s)), attacks_bb<PAWN>(s, BLACK) uses a table already so its basically equivalent to the previous version without having to create a new table (using a table saves a few bit shifts, not sure if thats signifcant but the test seems to struggle a little).

naive comet Oct 22, 2025, 10:39 AM

#

^^^^

regal steeple Oct 22, 2025, 10:41 AM

#

Also did anyone measure whether this patch https://github.com/xu-shawn/Stockfish/commit/d9cbd59e29ea1cf9b04000c43cd971b275509dd7 is a slowdown? From my measurements it looks like a slowdown, I profiled it and the issues seems to be that the new pawn table gets calculated on every function call for some reason, maybe its better to revert that one

naive comet Oct 22, 2025, 12:02 PM

#

honestly the pawn_attacks_bb thing is just unnecessary

violet badger Oct 22, 2025, 12:14 PM

#

rocky vigil Should I assume stage 5 is best net, or wait for local results

for 768, step 5 is indeed the best, differences are not so large ( -63.62 +/- 1.82, -28.61 +/- 1.82, -27.43 +/- 1.82, -29.80 +/- 1.81, -25.85 +/- 1.81).

#

Meanwhile 1024 factorized is also ready (-58.02 +/- 1.82, -22.61 +/- 1.82, -20.32 +/- 1.82, -18.55 +/- 1.84, -18.67 +/- 1.82), also here step5 seems just fine. Maybe somebody can also kickoff a test, maybe against the 786 net? https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/2926829081096545/-/jobs/11805606614/artifacts/browse/step_7ecb955f82bf/

#

I assume tomorrow evening we will have 1280, and probaby the day after 896

stray reef Oct 22, 2025, 12:19 PM

#

Elo   | 9.47 +- 2.93 (95%)
Conf  | N=20000 Threads=1 Hash=16MB
Games | N: 20634 W: 6151 L: 5589 D: 8894
Penta | [338, 2253, 4631, 2699, 396]

https://furybench.com/test/3472/
fine-tuning with QAT vs. fine-tuning without QAT (for i8 feature weights)

#

maybe around -10 fixed nodes to master

naive comet Oct 22, 2025, 12:19 PM

#

pogey

desert tree Oct 22, 2025, 12:19 PM

#

QAT not copium for this then? nice

violet badger Oct 22, 2025, 12:20 PM

#

sounds like the right thing to do ...

stray reef Oct 22, 2025, 12:20 PM

#

there is still some packus accuracy to gain in this impl, maybe 3-4 elo from that. testing STC vs main now

formal smelt Oct 22, 2025, 12:21 PM

#

stray reef ``` Elo | 9.47 +- 2.93 (95%) Conf | N=20000 Threads=1 Hash=16MB Games | N: 20...

that is a great result tho

stray reef Oct 22, 2025, 12:54 PM

#

seems roughly neutral to master at STC. I'll run another fine-tune with more fine-grained quantisation, inference is a bit slower there but still faster than i16 feature weights

#

I suppose a positive STC+LTC result with -5 elo at fixed nodes is mergable? if monty did it too

formal smelt Oct 22, 2025, 12:56 PM

#

we also did an SMP test

stray reef Oct 22, 2025, 12:56 PM

#

how did that compare to STC?

formal smelt Oct 22, 2025, 12:56 PM

#

+11 rather than +16 or something

#

+- error

rocky vigil Oct 22, 2025, 6:06 PM

#

violet badger for 768, step 5 is indeed the best, differences are not so large ( -63.62 +/- 1....

Fishtest strangely doesn’t reflect the 5 elo loss against master in h2h against current 1024 net
I guess we’ll see in a few days

violet badger Oct 22, 2025, 6:28 PM

#

I guess careful testing needed, in these sequences we know the inference is always the same, between the runs the sha of the testing binary might not be equivalent. I think 768 and 1024 are essentially equivalent.

#

But that needs a test on fishtest with care on picking the right version of SF.

#

(or some analysis of the sha of the SF used for playing)

regal steeple Oct 23, 2025, 8:13 AM

#

regal steeple Also did anyone measure whether this patch <https://github.com/xu-shawn/Stockfis...

I started a test for this https://tests.stockfishchess.org/tests/view/68f9e323637acd2a11e7299a , it clashes with this https://tests.stockfishchess.org/tests/view/68f67ce0637acd2a11e723d9 @frosty imp patch but the shawn patch is before the cleanup commit so the base doesnt have the suspected slowdown, I hope thats fine

lofty cedar Oct 23, 2025, 11:51 AM

#

I suspect we're at about -5 elo.

#

Which is about the pre-tune net.

prime mica Oct 23, 2025, 11:51 AM

#

💪

#

so close

regal steeple Oct 23, 2025, 3:32 PM

#

regal steeple Also did anyone measure whether this patch <https://github.com/xu-shawn/Stockfis...

Can someone measure this? I can measure a significant slowdown but the fishtest test doesn look too promising so far

torn lagoon Oct 23, 2025, 3:33 PM

#

Vs master?

regal steeple Oct 23, 2025, 3:35 PM

#

This commit against the one prior to that one

#

https://github.com/xu-shawn/Stockfish/commits/threat_inputs/
so d9cbd59e29ea1cf9b04000c43cd971b275509dd7 vs d71b0865693593f5e9341bede4750a4cc4896ee5

torn lagoon Oct 23, 2025, 3:43 PM

#

Compiled by                : g++ (GNUC) 15.2.0 on Linux
Compilation architecture   : x86-64-avx512icl
Compilation settings       : 64bit AVX512ICL VNNI AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 15.2.0
Large pages                : yes
User invocation            : speedtest 
Filled invocation          : speedtest 12 1536 150
Available processors       : 0-11
Thread count               : 12
Thread binding             : none
TT size [MiB]              : 1536
Hash max, avg [per mille]  : 
    single search          : 56, 30
    single game            : 798, 566
Total nodes searched       : 2257313338
Total search time [s]      : 153.514
Nodes/second               : 14704283```

#

Compiled by                : g++ (GNUC) 15.2.0 on Linux
Compilation architecture   : x86-64-avx512icl
Compilation settings       : 64bit AVX512ICL VNNI AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 15.2.0
Large pages                : yes
User invocation            : speedtest 
Filled invocation          : speedtest 12 1536 150
Available processors       : 0-11
Thread count               : 12
Thread binding             : none
TT size [MiB]              : 1536
Hash max, avg [per mille]  : 
    single search          : 53, 30
    single game            : 794, 563
Total nodes searched       : 2224392957
Total search time [s]      : 153.514
Nodes/second               : 14489837```

#

Looks like a slowdown on zen 5

regal steeple Oct 23, 2025, 3:46 PM

#

Hm thank you that looks like about the same speed difference I get

#

Maybe im worrying too much here but according to the speedup Elo estimate formula a ~1.5% speedup should be ~3.15 Elo which is (barely) out of error at the moment, but lets see how the test goes

torn lagoon Oct 23, 2025, 3:50 PM

#

I can test on a pre-avx Intel if it helps

#

To check for variance

regal steeple Oct 23, 2025, 3:51 PM

#

I think that one test is fine, I just wanted to rule out that this is some weird case where the slowdown only exists for me

rocky vigil Oct 23, 2025, 5:03 PM

#

ahhhh

#

why does pt never show progress

prime mica Oct 23, 2025, 5:04 PM

#

😭

#

nontransitivity

violet badger Oct 23, 2025, 6:23 PM

#

speedups ought to be transitive in any case.

#

anyway, tonight probably we have step 5 of the 1280 net, and maybe tomorrow the 896 net...

rocky vigil Oct 23, 2025, 6:25 PM

#

🙏

violet badger Oct 23, 2025, 6:25 PM

#

doesn't feel like there will be major breakthroughs though..

rocky vigil Oct 23, 2025, 6:25 PM

#

from local results factorized 1024 maybe another 2 or 3 elo

#

i just started stc on fishtest a bit ago

violet badger Oct 23, 2025, 6:26 PM

#

yes, seen that.

stray reef Oct 23, 2025, 6:59 PM

#

I tried another way of doing i8 quantisations, this time it was worse (last time STC vs main was neutral)
I think with larger L1s you will both see more benefit from it, as well as see less elo loss from stricter quantisations, it's 100% worth trying in SF

Maybe it's even worth for me to try passing L1=768 as i8 directly, to mitigate the fixed node loss, since it's pretty clear it won't pass my current LTC on its own

rocky vigil Oct 23, 2025, 7:03 PM

#

What was the different way

#

I also think threat only i8 is better

#

Since then you maybe don’t need to quantize so aggressively

stray reef Oct 23, 2025, 7:05 PM

#

rocky vigil What was the different way

less aggressive quantisations at the cost of tiny bit slower inference

#

ended up not mattering at fixed nodes which way I did it, but the tiny slowdown mattered at STC

rocky vigil Oct 23, 2025, 7:05 PM

#

Interesting

#

Have you gotten threat only i8 to work

stray reef Oct 23, 2025, 7:06 PM

#

(Both with QAT of course)

stray reef Oct 23, 2025, 7:06 PM

#

rocky vigil Have you gotten threat only i8 to work

Can try that tomorrow

rocky vigil Oct 23, 2025, 7:06 PM

#

Bc I feel like the factorizer screws with psq extra hard

#

Unless you have a way to bypass

stray reef Oct 23, 2025, 7:07 PM

#

i know nothing about how psq works in the sf arch

rocky vigil Oct 23, 2025, 7:07 PM

#

I meant like

#

Quantize the threat feature weights to i8

#

And keep the psq features as i16

stray reef Oct 23, 2025, 7:08 PM

#

ohh not that psq i see

rocky vigil Oct 23, 2025, 7:08 PM

#

💀

stray reef Oct 23, 2025, 7:09 PM

#

bullet now has this way to clamp not only factoriser weights and psq weights individually, but also in combination, which is ofc extremely useful for this. but it requires a full training run, not just a fine-tune for me

#

will do that eventually after my new gpu is set up

stray reef Oct 23, 2025, 7:12 PM

#

rocky vigil Have you gotten threat only i8 to work

the only downside to this is that it means you gotta quantise psq and threat weights differently, and have to scale them back somehow before adding and clamping them

#

which will make inference a bit slower

#

if they're different by a factor of 2 it's easy but it's essentially the same slowdown as in my second i8 test (~3 STC elo)

rocky vigil Oct 23, 2025, 7:23 PM

#

stray reef the only downside to this is that it means you gotta quantise psq and threat wei...

Huh can you not just like clamp threat weights to 128

#

Most of them are small in absolute value anyways

stray reef Oct 23, 2025, 7:23 PM

#

dunno, if that's true then that'd work

rocky vigil Oct 23, 2025, 7:24 PM

#

Like in our nets the natural frequency of weights exceeding limit is close to 1%%

#

But the x2 trick for mulhi is the real issue

rocky vigil Oct 23, 2025, 7:29 PM

#

stray reef I tried another way of doing i8 quantisations, this time it was worse (last time...

Actually how do you deal with mulhi

stray reef Oct 23, 2025, 7:37 PM

#

rocky vigil Actually how do you deal with mulhi

i'd say i do the standard stuff, what's the x2 thing in SF?

rocky vigil Oct 23, 2025, 7:39 PM

#

Uh x2 to use some mulhi trick

#

Because mulhi preserves sign

rocky vigil Oct 23, 2025, 7:41 PM

#

stray reef i'd say i do the standard stuff, what's the x2 thing in SF?

Wait you do the mulhi trick, it’s right in https://github.com/Yoshie2000/PlentyChess/blob/main/src/nnue.cpp#L356

#

Maybe quantization issue

#

In sf need to internally store 2x the weights

#

Otherwise one of the shifts will overflow

stray reef Oct 23, 2025, 8:34 PM

#

rocky vigil Otherwise one of the shifts will overflow

You can also get around the shift overflow by shifting both values, as in https://github.com/Yoshie2000/PlentyChess/blob/i8weights-3/src/nnue.cpp#L356

#

but for my master net it's currently not an issue, one shift works fine

rocky vigil Oct 23, 2025, 8:34 PM

#

huh

stray reef Oct 23, 2025, 8:35 PM

#

So if this does end up being a problem for SF it's easy to solve

lofty cedar Oct 24, 2025, 3:37 AM

#

Anyone tried the new muon optimizer?

frosty imp Oct 24, 2025, 4:22 AM

#

out of scope for this post

rocky vigil Oct 24, 2025, 5:22 AM

#

L1=1280 looking like it'll be -10 stc again compared to factorized L1=1024

#

i guess can try speculative ltc soon

rocky vigil Oct 24, 2025, 5:23 AM

#

rocky vigil i guess can try speculative ltc soon

actually i think maybe a speculative stc smp is better

#

to reduce issues around copying 260 MB of net per concurrency

violet badger Oct 24, 2025, 5:59 AM

#

if anything I agree smp might be more interesting. However, it looks quite a bit weaker indeed. The 896 net should also be fully trained later today.

prime mica Oct 24, 2025, 9:34 AM

#

(sry I read smth wrong)

#

I thought it was the other way around

regal steeple Oct 24, 2025, 9:34 AM

#

Its not really a fail, the name was just chosen poorly

prime mica Oct 24, 2025, 9:34 AM

#

that it was actually a good change

#

okok

#

sry for ping

lofty cedar Oct 24, 2025, 10:07 AM

#

frosty imp out of scope for this post

I meant... Muon optimizer for threat input...

#

But maybe let's make another thread instead?

#

Factorized stage 5 STC passed!

@rocky vigil

#

It would probably pass LTC... but would it need testing anyway?

#

Okay... I'll LTC.

violet badger Oct 24, 2025, 10:30 AM

#

no need for LTC... we're developing a branch. It is obviously stronger than the existing STC, only when changing scales is that useful.

#

(I've stopped it).

#

In other news 896 finished.. that's probably more interesting, but I guess it will be weaker than 1024. Will get with some more data later today I think.

rocky vigil Oct 24, 2025, 10:36 AM

#

I’ll pr to Shawn’s branch soon

#

The play I think now is to figure out how “cleanup” should be reverted

#

Because it looks like it should be reverted somehow

lofty cedar Oct 24, 2025, 10:39 AM

#

Just use the attack_bb<pawn> thing.

#

I guess it means that the attack_bb<pawn> is faster.

rocky vigil Oct 24, 2025, 11:43 AM

#

@frosty imp pr made

rocky vigil Oct 24, 2025, 12:55 PM

#

violet badger In other news 896 finished.. that's probably more interesting, but I guess it wi...

somehow the testing shows it as being worse than both 768 and 1024

violet badger Oct 24, 2025, 1:05 PM

#

yes, that's what I see as well, later today I'll come up with a graph. Want to find some time to do fixed nodes test as well. I wonder if somebody could measure once nps for the 4 sizes we have now (with consistent versions of the code, just net size changes).

violet badger Oct 24, 2025, 4:00 PM

#

so, collected the data now..

#

so, at fixed nodes outperforming master, at tc, underperforming.

#

I found the dip in performance for 896 and 1280 interesting, as if these versions are for whatever reason slower than 768 and 1024 (like performance goes up smoothly at fixed nodes)

#

raw data

$ cat ..

   # PLAYER                       :  RATING  ERROR    POINTS  PLAYED   (%)
   1 1280-nn-71f4e3cc3782.nnue    :    38.8    1.9   40878.0   73728    55
   2 1024-nn-26b0e5126117.nnue    :    25.3    1.8   39486.0   73728    54
   3 0896-nn-7347b2877a12.nnue    :    20.2    1.9   38958.0   73728    53
   4 0768-nn-914a5c3a46dc.nnue    :    11.5    1.9   38056.0   73728    52
   5 master                       :     0.0   ----  137534.0  294912    47

White advantage = 40.31 +/- 0.46
Draw rate (equal opponents) = 45.67 % +/- 0.09


   # PLAYER                       :  RATING  ERROR    POINTS  PLAYED   (%)
   1 master                       :     0.0   ----  155112.0  294912    53
   2 1024-nn-26b0e5126117.nnue    :   -15.1    1.8   35304.0   73728    48
   3 0768-nn-914a5c3a46dc.nnue    :   -16.0    1.8   35207.0   73728    48
   4 0896-nn-7347b2877a12.nnue    :   -19.8    1.8   34815.5   73728    47
   5 1280-nn-71f4e3cc3782.nnue    :   -23.1    1.8   34473.5   73728    47

White advantage = 41.79 +/- 0.45

rocky vigil Oct 24, 2025, 6:29 PM

#

Are we also merging cleanup revert

twilit oriole Oct 24, 2025, 6:35 PM

#

Can code changes be accompanied with a test always. Nobody is skilled enough to do a cleanup in a hot path and know with 100% certainty it has no slowdown

#

Compiler behaviour can be unpredictable

rocky vigil Oct 24, 2025, 6:38 PM

#

rocky vigil Are we also merging cleanup revert

After we do this I’ll spec stc smp the 1280

regal steeple Oct 24, 2025, 6:40 PM

#

I submitted a pr for this https://github.com/xu-shawn/Stockfish/pull/23, I think the rest of the cleanup commit is just formatting changes so I dont think we need to revert the whole commit

#

I can also run a non regr against the pre cleanup version if thats needed

rocky vigil Oct 24, 2025, 6:41 PM

#

Anyways this + net should take us to -5 stc

#

Or smth

twilit oriole Oct 24, 2025, 6:42 PM

#

Is there a 768 factorised net

rocky vigil Oct 24, 2025, 6:42 PM

#

twilit oriole Is there a 768 factorised net

Most likely worse than 1024 factorized

violet badger Oct 24, 2025, 6:42 PM

#

like how hard is to read that table #1336647760388034610 message

twilit oriole Oct 24, 2025, 6:43 PM

#

On mobile it is pretty hard :p

violet badger Oct 24, 2025, 6:43 PM

#

nokia hitting back hard 😉

rocky vigil Oct 24, 2025, 6:43 PM

#

This data seems to suggest “magic numbers” like 768, 1024, 1536 might be optimal

#

For whatever reason

violet badger Oct 24, 2025, 6:44 PM

#

well, I'm suggesting it must be speed related.

#

(given performance at fixed nodes)

#

if asked to explain I'll mumble cache associativity effects ...

#

but I really have no idea what's causing this

lofty cedar Oct 24, 2025, 6:46 PM

#

Why?
Cache lines are only 64-bytes long so at most 64 elements.

twilit oriole Oct 24, 2025, 6:46 PM

#

This data suggests factorisation benefitted the 1024 more than the 768. Might give some signal that convergence time is still a factor

daring wren Oct 24, 2025, 6:46 PM

#

rocky vigil This data seems to suggest “magic numbers” like 768, 1024, 1536 might be optimal

how many HL values fit in a register

#

tbh it's probably not related to that

lofty cedar Oct 24, 2025, 6:47 PM

#

There are normally 8 or 16 AVX registers I guess?

#

Each 64 bytes.

twilit oriole Oct 24, 2025, 6:47 PM

#

It is more related to the fetching of the weights itself I assume

violet badger Oct 24, 2025, 6:47 PM

#

lofty cedar Why? Cache lines are only 64-bytes long so at most 64 elements.

https://en.wikipedia.org/wiki/Cache_placement_policies

lofty cedar Oct 24, 2025, 6:48 PM

#

violet badger https://en.wikipedia.org/wiki/Cache_placement_policies

Yeah... though I thought that in modern caches this was an antique concern.

#

As in, in practice, shouldn't matter.

twilit oriole Oct 24, 2025, 6:49 PM

#

Huh

lofty cedar Oct 24, 2025, 6:49 PM

#

I mean... back in the day of direct mapped cache, it mattered a lot.

#

Nowadays, lots of people just assumed that approximately the last N lines accessed are in the cache.

lofty cedar Oct 24, 2025, 6:51 PM

#

violet badger https://en.wikipedia.org/wiki/Cache_placement_policies

But why would this matter? 1024 elements are only like 2kb and it's contiguous so even a direct-mapped cache could do.

violet badger Oct 24, 2025, 6:52 PM

#

happily exchanging this idea for the one shown to explain the effect on performance we measured 😉

lofty cedar Oct 24, 2025, 6:53 PM

#

Well, beyond 1024 elements, we run out of registers.

#

There are 32 registers in AVX512.

#

So, 2048 bytes or 1024 elements.

#

Though the trailing parts might be lagging.

violet badger Oct 24, 2025, 6:54 PM

#

wait, we're looking for a reason why 896 is worse than 768 and 1024

#

(also 1280, but well)

lofty cedar Oct 24, 2025, 6:54 PM

#

Oh... well... that gets even more confusing.

#

Has anyone inspected the assembly?

rocky vigil Oct 24, 2025, 6:55 PM

#

I somehow suspect 896 is performing at same speed as 1024

twilit oriole Oct 24, 2025, 6:55 PM

#

I don't think 1280 is underperforming where it should be?

violet badger Oct 24, 2025, 6:55 PM

#

vs master at tc testing, it is the worst?

rocky vigil Oct 24, 2025, 6:55 PM

#

Like the fixed nodes looks fine

twilit oriole Oct 24, 2025, 6:55 PM

#

Yes that is what I expect

violet badger Oct 24, 2025, 6:56 PM

#

anyway, at TC testing the performance curve is not smooth, and that would need explanation, IMO

#

freelo

rocky vigil Oct 24, 2025, 6:56 PM

#

I suspect on these 72 core machines the net size is more harmful than at fishtest

#

And that would also explain 1280

violet badger Oct 24, 2025, 6:56 PM

#

possibly.

lofty cedar Oct 24, 2025, 6:56 PM

#

We really need deduplicate net!

violet badger Oct 24, 2025, 6:56 PM

#

but again, hard to explain the zigzag performance at tc testing

rocky vigil Oct 24, 2025, 6:57 PM

#

Or at least this hardware consistently gives results around -10 to fishtest

lofty cedar Oct 24, 2025, 6:57 PM

#

So we could finally test free from bias.

rocky vigil Oct 24, 2025, 6:57 PM

#

lofty cedar We really need deduplicate net!

smp is easier

#

In ~30 min I can set up stc smp for 1280

green moat Oct 24, 2025, 6:57 PM

#

twilit oriole Can code changes be accompanied with a test always. Nobody is skilled enough to ...

Also, as a layman, I find it very difficult to follow threat-inputs development inside Shawn Xu branch....
😕

violet badger Oct 24, 2025, 6:57 PM

#

I'll have smp results soon, but they won't get us further before the other PR is fixed

rocky vigil Oct 24, 2025, 6:58 PM

#

If Shawn merges the pr in the middle

rocky vigil Oct 24, 2025, 7:11 PM

#

rocky vigil If Shawn merges the pr in the middle

Can also do new stc vs master after this which I expect to be -5 or -6

#

maybe this time also stc smp

twilit oriole Oct 24, 2025, 7:12 PM

#

Why not do LTC Vs pre SPSA net and get a green finally

#

May be good to put things back into perspective

rocky vigil Oct 24, 2025, 7:13 PM

#

twilit oriole Why not do LTC Vs pre SPSA net and get a green finally

This also an option lol

#

What was the best pre-spsa net

twilit oriole Oct 24, 2025, 7:14 PM

#

I mean take the one vondele trained recently

#

That was -5.4 Elo without SPSA to master

rocky vigil Oct 24, 2025, 7:16 PM

#

twilit oriole Why not do LTC Vs pre SPSA net and get a green finally

Fixed games or real sprt

twilit oriole Oct 24, 2025, 7:17 PM

#

Sprt ig

violet badger Oct 24, 2025, 7:17 PM

#

I think that's not particularly useful

#

just add 5 Elo to that result.

twilit oriole Oct 24, 2025, 7:20 PM

#

Well it allows a sprt to be performed which gives a higher guarantee of pre SPSA superiority. Also I think doubling the training time of all the stages is still something to attempt

violet badger Oct 24, 2025, 7:20 PM

#

yeah, I think training for a bit longer is something that needs to be done.

#

but I suspect the gain is going to be small to be honest.

twilit oriole Oct 24, 2025, 7:26 PM

#

Yes. Maybe 1 Elo lol

twilit oriole Oct 24, 2025, 7:28 PM

#

rocky vigil Or at least this hardware consistently gives results around -10 to fishtest

We observed this in plenty testing. Big machines had to stay off the threat inputs tests otherwise they ruined the results. It does not seem to matter about mmap, it is about SMP

#

For whatever reason the big machines did not perform the same as other machines on STC or LTC tests (1 thread)

violet badger Oct 24, 2025, 7:29 PM

#

so this is not understood?

twilit oriole Oct 24, 2025, 7:29 PM

#

Yeah

violet badger Oct 24, 2025, 7:29 PM

#

(as in mmap not solving this)..

#

instruction cache?

prime mica Oct 24, 2025, 7:31 PM

#

shared memory is implemented in plentychess?

twilit oriole Oct 24, 2025, 7:31 PM

#

Well maybe it's something to do with only 32MB being real L3 cache. Just a speculation

#

Like the rest has to go through the infinity fabric

violet badger Oct 24, 2025, 7:31 PM

#

one big shared L3 on the 72 core testing..

#

(afaik)

green moat Oct 24, 2025, 7:35 PM

#

twilit oriole Well it allows a sprt to be performed which gives a higher guarantee of pre SPSA...

6 days for a net
😉

prime mica Oct 24, 2025, 7:35 PM

#

good things come to those who wait

violet badger Oct 24, 2025, 7:36 PM

#

korean saying

violet badger Oct 24, 2025, 7:59 PM

#

https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/2926829081096545/-/pipelines/2119342422

rocky vigil Oct 24, 2025, 8:00 PM

#

@frosty imp can we merge again

frosty imp Oct 24, 2025, 8:03 PM

#

merged

rocky vigil Oct 24, 2025, 8:05 PM

#

cool

#

lemme set up some progtests ig

#

and the 1280

green moat Oct 24, 2025, 8:36 PM

#

violet badger <https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/29268...

1600 epochs
😮

#

(sorry for the ping 😭 )

regal steeple Oct 24, 2025, 9:16 PM

#

@rocky vigil Your progtest still uses the pre merge version, is that intended?

#

before both merges, so same version as this test https://tests.stockfishchess.org/tests/view/68fa1682637acd2a11e729fd

rocky vigil Oct 24, 2025, 9:22 PM

#

regal steeple <@693549181838819338> Your progtest still uses the pre merge version, is that in...

Shoot

#

I might’ve forgotten to push origin

prime mica Oct 24, 2025, 9:22 PM

#

oof

rocky vigil Oct 24, 2025, 9:23 PM

#

Uh stop the stc

#

Leave the stc smp

#

I cannot fix it right now

#

Not at pc

#

If you want you can submit a progtest ig

#

Otherwise it’ll be ~2 hours

regal steeple Oct 24, 2025, 9:24 PM

#

Im not an approver, I cant stop the STC either

regal steeple Oct 24, 2025, 9:30 PM

#

rocky vigil If you want you can submit a progtest ig

I submitted a new test

twilit oriole Oct 25, 2025, 2:08 AM

#

https://tests.stockfishchess.org/tests/view/68fbefeb637acd2a11e72d2a @rocky vigil why not run the corrected STC SMP to match this

rocky vigil Oct 25, 2025, 2:10 AM

#

twilit oriole https://tests.stockfishchess.org/tests/view/68fbefeb637acd2a11e72d2a <@693549181...

you can approve https://tests.stockfishchess.org/tests/view/68fc3184637acd2a11e72d4e lol

#

tbf the error bars mean it's not gonna say much

lofty cedar Oct 25, 2025, 5:14 AM

#

https://tests.stockfishchess.org/tests/view/68fbe29e637acd2a11e72d1d

#

Can anyone run speedtest?

#

Locally, I found some improvement.

#

But not sure how it works on other machines.

prime mica Oct 25, 2025, 5:34 AM

#

sure!

#

same comparison as in the fishtest?

violet badger Oct 25, 2025, 5:54 AM

#

at 72 threads:

#

   # PLAYER                       :  RATING  ERROR   POINTS  PLAYED   (%)
   1 1280-nn-71f4e3cc3782.nnue    :     0.1    2.7  15365.0   30720    50
   2 master                       :     0.0   ----  62236.0  122880    51
   3 1024-nn-26b0e5126117.nnue    :    -2.9    2.9  15236.0   30720    50
   4 0896-nn-7347b2877a12.nnue    :    -7.6    2.9  15032.5   30720    49
   5 0768-nn-914a5c3a46dc.nnue    :    -8.1    2.7  15010.5   30720    49

lofty cedar Oct 25, 2025, 6:04 AM

#

prime mica same comparison as in the fishtest?

Yeah...

lofty cedar Oct 25, 2025, 6:04 AM

#

violet badger at 72 threads:

Should we go with 1280 then?

#

Maybe we'd need VVLTC to get it passed on fishtest.

lofty cedar Oct 25, 2025, 6:08 AM

#

prime mica same comparison as in the fishtest?

How does it go?

prime mica Oct 25, 2025, 6:10 AM

#

setting it up rn

#

~~trying to remember how to use git~~

#

I'm getting different benches between threat_inputs and your branch

#

what am I doing rwong

#

oh oops I'm using the wrong one

#

lololl

lofty cedar Oct 25, 2025, 6:16 AM

#

This one was without the two last PRs.

#

Threat input moves quickly.

prime mica Oct 25, 2025, 6:16 AM

#

I feel threatened

#

ok let's see, checking out the older commit

violet badger Oct 25, 2025, 6:18 AM

#

lofty cedar Should we go with 1280 then?

not really.

prime mica Oct 25, 2025, 6:18 AM

#

ok bench is the same now

lofty cedar Oct 25, 2025, 6:19 AM

#

violet badger not really.

I thought we were optimizing for the strongest one at TCEC condition.

#

But maybe 1024 could make tunes easier to test?

#

IDK.

prime mica Oct 25, 2025, 6:22 AM

#

Result of 100 runs

base (...ish_0553b61e) = 1388113 +/- 1838
test (./stockfish _af2e862 ) = 1440920 +/- 1520
diff = +52807 +/- 2231

speedup = +0.0380
P(speedup > 0) = 1.0000

#

hopefully I did that right

#

keep in mind my computer behaves way off the mean worker on fishtest so idk

lofty cedar Oct 25, 2025, 6:23 AM

#

Ooh... quite a nice speedup. Almost 4%!

#

Mine was somewhere around 2% I think.

prime mica Oct 25, 2025, 6:23 AM

#

huzzah

#

what is the change?

lofty cedar Oct 25, 2025, 6:23 AM

#

Well, you monomorphize cases with small loop.

prime mica Oct 25, 2025, 6:23 AM

#

oh that is smort

#

what's the current threat_inputs branch? Still shawn's?

lofty cedar Oct 25, 2025, 6:24 AM

#

Yes.

prime mica Oct 25, 2025, 6:24 AM

#

ok now that I have it set up I'll take a look

rocky vigil Oct 25, 2025, 6:25 AM

#

it's always shawns unless said otherwise lol

prime mica Oct 25, 2025, 6:25 AM

#

lol

#

benevolent dictator

violet badger Oct 25, 2025, 6:28 AM

#

lofty cedar I thought we were optimizing for the strongest one at TCEC condition.

no that's wrong... contrary to most other engines, where this could be a goal, we actually have a few million users that run on normal hardware. Asking a million people to download 100MB more, to have an actually weaker engine is not good of a deal.

#

I think our target should be to not regress at the normal LTC and LTC SMP conditions.

#

(in fact be stronger at those).

rocky vigil Oct 25, 2025, 6:29 AM

#

smp is looking decent

#

i think single thread will be most of the issue going forward

lofty cedar Oct 25, 2025, 6:29 AM

#

I think it's maybe stronger at normal LTC already... if not for the fishtest condition.

rocky vigil Oct 25, 2025, 6:30 AM

#

i would be surprised actually if the current net caused a (significantly) larger download after compression

#

btw I think https://tests.stockfishchess.org/tests/view/68f67ce0637acd2a11e723d9 is outdated now

prime mica Oct 25, 2025, 6:34 AM

#

as I'm going through the code... is there a high-level description of the threats architecture anywhere?

lofty cedar Oct 25, 2025, 6:34 AM

#

violet badger no that's wrong... contrary to most other engines, where this could be a goal, w...

TBH... I was thinking of maybe sending a different version to the TCEC than the version we have normal people download but having multiple versions would complicate development.

lofty cedar Oct 25, 2025, 6:34 AM

#

prime mica as I'm going through the code... is there a high-level description of the threat...

The rest of the arch should be the same for now.

prime mica Oct 25, 2025, 6:34 AM

#

sure but like

#

what is a "threat" hahaha

lofty cedar Oct 25, 2025, 6:34 AM

#

The only change is in the threat... where each threat is an input.

#

Well, a potential where a piece can capture another piece iirc?

rocky vigil Oct 25, 2025, 6:35 AM

#

prime mica as I'm going through the code... is there a high-level description of the threat...

you are approximately tracking which pieces attack which other pieces

prime mica Oct 25, 2025, 6:35 AM

#

gotcha

rocky vigil Oct 25, 2025, 6:35 AM

#

so like

lofty cedar Oct 25, 2025, 6:35 AM

#

Though it doesn't include pinned piece logic iirc.

rocky vigil Oct 25, 2025, 6:35 AM

#

white pawn on b2 attacks white pawn on c3

#

yeah

prime mica Oct 25, 2025, 6:35 AM

#

and the feature index is (square, square) or (square, piece, square) or what

rocky vigil Oct 25, 2025, 6:35 AM

#

can't account for pins bc that would take way too long

prime mica Oct 25, 2025, 6:36 AM

#

    Square pc_sq, threatened_sq;```

#

oh wow so it's the full (piece 1, piece 2, square 1, square 2)

violet badger Oct 25, 2025, 6:37 AM

#

lofty cedar TBH... I was thinking of maybe sending a different version to the TCEC than the ...

exactly, no way.

rocky vigil Oct 25, 2025, 6:38 AM

#

prime mica oh wow so it's the full (piece 1, piece 2, square 1, square 2)

yeah

twilit oriole Oct 25, 2025, 6:38 AM

#

The target is already reached anyways, moving net size unnecessary. The SMP test on fishtest is neutral without an spsa to master

rocky vigil Oct 25, 2025, 6:38 AM

#

one of the lower hanging fruits is to test if replacing get_feature_index with a lookup table is worth it

prime mica Oct 25, 2025, 6:40 AM

#

prime mica oh wow so it's the full (piece 1, piece 2, square 1, square 2)

yeah I was thinking that the mapping from this to the feature index must be convoluted lol

#

oh interesting, so you do de-duplicate threats where one threat implies the other

rocky vigil Oct 25, 2025, 6:41 AM

#

yep!

prime mica Oct 25, 2025, 6:41 AM

#

smort

rocky vigil Oct 25, 2025, 6:41 AM

#

it's prob worth a few % in speed

#

not that it's ever been tested

prime mica Oct 25, 2025, 6:41 AM

#

do you de-duplicate pawn->bishop, bishop->pawn or is that not possible

twilit oriole Oct 25, 2025, 6:42 AM

#

The pdf at the first post gives a lot of this info already

prime mica Oct 25, 2025, 6:42 AM

#

oh I didn't know there was one

rocky vigil Oct 25, 2025, 6:43 AM

#

rocky vigil one of the lower hanging fruits is to test if replacing get_feature_index with a...

it's worth trying i think

prime mica Oct 25, 2025, 6:43 AM

#

ok reading, thx

rocky vigil Oct 25, 2025, 6:43 AM

#

you replace a popcount and multiple lookups to small arrays

violet badger Oct 25, 2025, 6:46 AM

#

twilit oriole The pdf at the first post gives a lot of this info already

Elo: 91.39....

twilit oriole Oct 25, 2025, 6:48 AM

#

It's a L1 3072 net

violet badger Oct 25, 2025, 6:48 AM

#

that's -100 Elo on fishtest 😉

#

not quite, but we can extrapolate the graph.

#

#1336647760388034610 message

rocky vigil Oct 25, 2025, 6:48 AM

#

if we were really pushing it probably 1536 is optimal at TCEC conditions

violet badger Oct 25, 2025, 6:49 AM

#

again, tcec can't be the goal for us.

rocky vigil Oct 25, 2025, 6:49 AM

#

would depend on speed yes

#

it might also be okish at LTC SMP

#

but would definitely clock a double digit loss at stc

#

1024 is good

#

a nice number

twilit oriole Oct 25, 2025, 6:50 AM

#

Yes but obviously the test was not intended for any type of net size info... It's to demonstrate the concept only

#

Clearly it was adequate enough to do that given we are here

prime mica Oct 25, 2025, 6:51 AM

#

ok ithink this makes sense now

twilit oriole Oct 25, 2025, 6:51 AM

#

It's using WDL 1, trained on captures and checks etc. it's a monty net plugged into SF lol

rocky vigil Oct 25, 2025, 6:52 AM

#

i still think we have a couple of tricks to pull on 1024

#

definitely will require more effort to squeeze out last elo though

violet badger Oct 25, 2025, 6:52 AM

#

ah, I now see it is a fixed node test.... so well, it means virtually nothing.

twilit oriole Oct 25, 2025, 6:53 AM

#

It means the threat inputs are worth something over the regular net. It is an important basis to establish at the start

violet badger Oct 25, 2025, 6:54 AM

#

so, first, obviously, this is still nice work etc, all appreciated. but even at same L1, the net is bigger right? So it is quite logical it is better?

rocky vigil Oct 25, 2025, 6:54 AM

#

twilit oriole It means the threat inputs are worth something over the regular net. It is an im...

i think it only really got rolling once lofty got it to work in yukari

twilit oriole Oct 25, 2025, 6:54 AM

#

I disagree tbh

#

There was work on it well before that (obviously yukari helped)

violet badger Oct 25, 2025, 6:55 AM

#

in SF 😉

#

but I think the discussion on history doesn't really matter to be honest.

#

I'm still most interesting in getting a better SF out of this.

twilit oriole Oct 25, 2025, 6:56 AM

#

violet badger so, first, obviously, this is still nice work etc, all appreciated. but even at...

The magnitude of the fixed nodes test gain combined with the fact it is not at all optimised for usage in an AB engine. But it is not so relevant anyways, I just needed a big number to get ppl motivated to work on it

rocky vigil Oct 25, 2025, 6:56 AM

#

it does look like my prediction of -5 or -6 for stc progtest will be accurate

#

much of it probably from net

violet badger Oct 25, 2025, 6:57 AM

#

so, I think we'll probably still get 1-2Elo from net squeezing..

prime mica Oct 25, 2025, 6:58 AM

#

🍋

rocky vigil Oct 25, 2025, 6:58 AM

#

pre-spsa i guess

#

i want spsa to be a final step though

violet badger Oct 25, 2025, 6:58 AM

#

and we skip spsa 😉

prime mica Oct 25, 2025, 6:58 AM

#

lol

twilit oriole Oct 25, 2025, 6:58 AM

#

The spsa can be done later. It is inevitable anyways

rocky vigil Oct 25, 2025, 6:58 AM

#

like it makes sense to do it at the end

#

once the process is ironed out

#

in particular I would hope for i8 quantization tests before that

violet badger Oct 25, 2025, 6:59 AM

#

I think that's an example...

prime mica Oct 25, 2025, 6:59 AM

#

how many core hours were spent on the SPSA last time

violet badger Oct 25, 2025, 6:59 AM

#

I guess a few million games at VLTC?

prime mica Oct 25, 2025, 6:59 AM

#

mindblown_cat

violet badger Oct 25, 2025, 6:59 AM

#

and like it makes further testing so much more difficult.

rocky vigil Oct 25, 2025, 6:59 AM

#

each individual one is what 60k at ltc smp?

twilit oriole Oct 25, 2025, 6:59 AM

#

I don't think it is necessary that many. It was done in many stages because it had never been done before at that scale

violet badger Oct 25, 2025, 6:59 AM

#

take i8 as an eexample

rocky vigil Oct 25, 2025, 7:00 AM

#

yeah

violet badger Oct 25, 2025, 7:00 AM

#

if it comes after spsa, it is almost a lost case.

#

It makes incremental tweaks to training almost impossible.

rocky vigil Oct 25, 2025, 7:00 AM

#

the downside of spsa is that you need to compare Y + spsa vs X + spsa in every further test

#

i.e. if master net didn't have spsa we would already be beating it

twilit oriole Oct 25, 2025, 7:00 AM

#

I made these arguments before when the spsa stages started stacking up and didn't seem to matter too much then lol

rocky vigil Oct 25, 2025, 7:01 AM

#

i think since times have changed, linrock largely moved on

#

and then the net got stuck

#

so now opinion on that is different

lofty cedar Oct 25, 2025, 7:02 AM

#

But since Stockfish usually just accepts local improvements, it often means that SPSA gets accepted easily.

twilit oriole Oct 25, 2025, 7:02 AM

#

You can make the counter argument that Elo is now rarer. I don't think the conditions changed all that much

violet badger Oct 25, 2025, 7:03 AM

#

twilit oriole I made these arguments before when the spsa stages started stacking up and didn'...

we have made those since spsa was used the first time like in 2022.

#

but hard to resist Elo ..

lofty cedar Oct 25, 2025, 7:03 AM

#

I mean... SPSA-ing the net is often a way to gain easy elo.

violet badger Oct 25, 2025, 7:03 AM

#

but easy to get into a dead-end.

lofty cedar Oct 25, 2025, 7:03 AM

#

I mean... it should be a final stage where nothing seems to be improving anymore.

prime mica Oct 25, 2025, 7:04 AM

#

lol

#

there is something grotesque about spsa

twilit oriole Oct 25, 2025, 7:04 AM

#

Well I mean if you don't have a rule against it obviously ppl are going to do it lol

lofty cedar Oct 25, 2025, 7:04 AM

#

So, maybe we should set a period of say 6 months and if no new net comes out we SPSA.

#

Another thing we could do is mention it somewhere in the wiki or somewhere that newly trained nets should be compared to pre-SPSA nets.

twilit oriole Oct 25, 2025, 7:05 AM

#

Anyways this isn't actually threat net specific, can move to nnue dev. It is only coming up now because the regular arch had no new nets

rocky vigil Oct 25, 2025, 7:06 AM

#

I am really hoping that like we get this through and it boosts maybe morale or smth, since it must feel bad to have had the exact same net all the way for almost a year now

#

like, we show that master net is not invincible, and maybe then some floodgates will open

lofty cedar Oct 25, 2025, 7:19 AM

#

Do we try our chance with LTC SMP SPRT now? And then VLTC SMP (aka VVLTC).

If it gains, we merge the threat input.

twilit oriole Oct 25, 2025, 7:19 AM

#

No

lofty cedar Oct 25, 2025, 7:19 AM

#

What's left?

twilit oriole Oct 25, 2025, 7:19 AM

#

Have some patience

rocky vigil Oct 25, 2025, 7:19 AM

#

we still have speedup ideas left to try

#

while we try those in the meanwhile

#

we can wait about a week to see if double training time

#

squeezes out anything further

lofty cedar Oct 25, 2025, 7:20 AM

#

Oh, okay.

rocky vigil Oct 25, 2025, 7:20 AM

#

imo we should only do the (v)ltc smp as a formality

#

like only do it when we know it'll pass

twilit oriole Oct 25, 2025, 7:21 AM

#

just don't do it at all?

rocky vigil Oct 25, 2025, 7:21 AM

#

i think it has to be done before merging

lofty cedar Oct 25, 2025, 7:21 AM

#

I mean we do kinda want VVLTC as a progression test anyway.

rocky vigil Oct 25, 2025, 7:21 AM

#

so like eventually

twilit oriole Oct 25, 2025, 7:21 AM

#

A SMP STC and LTC SMP is all that is needed

#

I do not see where we need a VLTC SMP

#

That is not a normal test TC

rocky vigil Oct 25, 2025, 7:22 AM

#

i think is maintainer decision

#

whether we need non-smp ltc

#

vondele indicated he would prefer at least a nonreg on ltc

#

which I think we are also close to

#

maybe, -3 at ltc rn

twilit oriole Oct 25, 2025, 7:22 AM

#

This is not answering the question

rocky vigil Oct 25, 2025, 7:22 AM

#

on fishtest conditions

twilit oriole Oct 25, 2025, 7:23 AM

#

Where does that mean we need a VLTC SMP

rocky vigil Oct 25, 2025, 7:23 AM

#

twilit oriole I do not see where we need a VLTC SMP

oh wait i did not read, yeah there's no point in vltc smp

#

ltc is good enough

twilit oriole Oct 25, 2025, 7:24 AM

#

The SMP outperformance is from that similar threats are active across a search. So 1 multi threaded search benefits from this. I observed with the regular net also but not as severe

#

It's not a mystery really

rocky vigil Oct 25, 2025, 7:25 AM

#

so essentially that the memory is better able to optimize for hot indices in threats?

twilit oriole Oct 25, 2025, 7:25 AM

#

Yes

#

Less cache misses

violet badger Oct 25, 2025, 7:26 AM

#

measure with perf ....

rocky vigil Oct 25, 2025, 7:26 AM

#

then why would i8 be worse at smp than normal

twilit oriole Oct 25, 2025, 7:26 AM

#

rocky vigil then why would i8 be worse at smp than normal

Do you know that is related to speed?

#

(And out of error also)

rocky vigil Oct 25, 2025, 7:28 AM

#

twilit oriole Do you know that is related to speed?

could you elaborate further on this

twilit oriole Oct 25, 2025, 7:28 AM

#

No because it is a question lol. Did u measure speed at all

rocky vigil Oct 25, 2025, 7:29 AM

#

oh i was referring to monty results for this

#

i ofc have not measured speed

rare jacinth Oct 25, 2025, 7:29 AM

#

what is the threat inputs vs master pair ratio right now

rocky vigil Oct 25, 2025, 7:29 AM

#

https://tests.stockfishchess.org/tests/view/68fbefeb637acd2a11e72d2a (1t)
https://tests.stockfishchess.org/tests/view/68fc3184637acd2a11e72d4e (8t)

#

both stc

twilit oriole Oct 25, 2025, 7:29 AM

#

rocky vigil oh i was referring to monty results for this

Look at the result again

#

You missed the huge overlapping error bars I suppose

rocky vigil Oct 25, 2025, 7:30 AM

#

oh right

#

average megagainer in [0, 4] sprt

rocky vigil Oct 25, 2025, 7:31 AM

#

rare jacinth what is the threat inputs vs master pair ratio right now

to more directly answer, seems to be around 0.9 in stc and 1 in stc smp

#

tbh

#

the error bars on this are also more than I would like

rocky vigil Oct 25, 2025, 7:33 AM

#

rocky vigil btw I think <https://tests.stockfishchess.org/tests/view/68f67ce0637acd2a11e723d...

#

i think this got buried

violet badger Oct 25, 2025, 7:35 AM

#

I've put it to prio -1, but will let @frosty imp stop it himself (or modify back to 0 if needed).

rocky vigil Oct 25, 2025, 7:36 AM

#

in any case i think we can let yoshie be the trailblazer for i8 quant in a/b

violet badger Oct 25, 2025, 7:37 AM

#

would be happy, doesn't the QAT need some changes to the trainer?

rocky vigil Oct 25, 2025, 7:37 AM

#

yeah there is major work involved in this

twilit oriole Oct 25, 2025, 7:38 AM

#

The i8 quant is already close in plenty and has more advantage in SF

rocky vigil Oct 25, 2025, 7:38 AM

#

ofc will probably be worth it

#

at some point

twilit oriole Oct 25, 2025, 7:38 AM

#

Larger nets lose less at fixed nodes and have greater speedup

rocky vigil Oct 25, 2025, 7:38 AM

#

just like how lazy threat calculation should also be worth it

#

i need to pull up that discussion but i think the main bottleneck is duplicating some make/unmake logic

#

like the way it's structured right now is that the NNUE knows minimal about how the position works

#

it pretty much gets fed the differences per position

#

and runs that through

#

so if we wanted to do lazy threats we either couple NNUE tighter with position or duplicate some essential position logic for those calculations

#

both of these are nontrivial

rare jacinth Oct 25, 2025, 7:52 AM

#

is there a speedup from the fact that a lot of threats are bidirectional (say rooks on the same file) so say we can merge the weights for a pawn on a2 and bishop on b3? not sure how much has been explored already, please link me relevant material

twilit oriole Oct 25, 2025, 7:53 AM

#

see the pdf on the first post. rough outline of how things are

rare jacinth Oct 25, 2025, 7:56 AM

#

I see that you handle all non-pawn symmetries, are pawn-bishop and pawn-queen bidirectional encodings too rare to matter?

prime mica Oct 25, 2025, 7:57 AM

#

rare jacinth I see that you handle all non-pawn symmetries, are pawn-bishop and pawn-queen bi...

lol we think alike

rare jacinth Oct 25, 2025, 7:57 AM

#

if we did I would have found all of your speedups before lol

prime mica Oct 25, 2025, 7:57 AM

#

lazy SMP

#

improvement on threat_inputs from specializing at the top for different pairs (added.size(), removed.size()) with total less than 4

twilit oriole Oct 25, 2025, 7:59 AM

#

rare jacinth I see that you handle all non-pawn symmetries, are pawn-bishop and pawn-queen bi...

There are many further improvements, I think those are already handled. But undocumented, have to read code to see them

prime mica Oct 25, 2025, 7:59 AM

#

not sure whether it's independent from what Alice did tho

#

oh I see, it's similar but it's pulling it out of the loop

#

interesting

#UE Threat Inputs for AB

Result of 100 runs