UE Threat Inputs for AB | Stockfish | Page 9

prime mica Nov 4, 2025, 6:35 AM

#

huh how is that possible

rocky vigil Nov 4, 2025, 6:35 AM

#

it probably compresses

#

the

#

i16 part

#

actually

#

💀

prime mica Nov 4, 2025, 6:36 AM

#

lol

#

shapely

#

is this right?

rocky vigil Nov 4, 2025, 6:36 AM

#

yep

#

it's also listed in the commit msg

prime mica Nov 4, 2025, 6:37 AM

#

ok so try these two commits vs. each other

rocky vigil Nov 4, 2025, 6:37 AM

#

yep

#

speedtest

#

1 thread and then n thread

#

1 thread is probably neutral

#

since no big mem pressure

#

but for stuff like concurrency or smp it should help

prime mica Nov 4, 2025, 6:38 AM

#

ooh it seems considerably faster on bench actually

#

lemme run that first then I'll run some speedtest

#

@torn lagoon no this is an LSS

torn lagoon Nov 4, 2025, 6:40 AM

#

Bench can be noisy

prime mica Nov 4, 2025, 6:40 AM

#

we need an LSS emoji

torn lagoon Nov 4, 2025, 6:40 AM

#

VVLSS when

prime mica Nov 4, 2025, 6:40 AM

#

lololol

rocky vigil Nov 4, 2025, 6:41 AM

#

sample size and tc are inversely proportional

prime mica Nov 4, 2025, 6:41 AM

#

does anyone have a test script that uses speedtest

#

mine only uses bench

rocky vigil Nov 4, 2025, 6:41 AM

#

simply run VVSTC to get VVLSS

prime mica Nov 4, 2025, 6:41 AM

#

I guess I can just replace the command huh

frosty imp Nov 4, 2025, 6:41 AM

#

Think you also need to change the regex

prime mica Nov 4, 2025, 6:41 AM

#

==================
base (./stockfish    ) =    1461604  +/- 1364
test (...sh.after.gcc) =    1527947  +/- 1821
diff                   =     +66343  +/- 2014

speedup        = +0.0454
P(speedup > 0) =  1.0000

very promising

rocky vigil Nov 4, 2025, 6:42 AM

#

👀👀

warm thistle Nov 4, 2025, 6:43 AM

#

wtf

#

👀

rocky vigil Nov 4, 2025, 6:43 AM

#

prime mica ```Result of 100 runs ================== base (./stockfish ) = 1461604 +/...

this is with concurrency right

prime mica Nov 4, 2025, 6:43 AM

#

oh no

#

this is bench

#

so single threaded

#

I'ma try speedtest now

rocky vigil Nov 4, 2025, 6:43 AM

#

wait is it just 100 in a row

prime mica Nov 4, 2025, 6:43 AM

#

yep

rocky vigil Nov 4, 2025, 6:43 AM

#

or 100 distributed over N cores

prime mica Nov 4, 2025, 6:44 AM

#

just need to figure out how to modify the trusty pyshbench...

rocky vigil Nov 4, 2025, 6:44 AM

#

i think you would also need to set the speedtest invocation to be much less than 150 seconds

prime mica Nov 4, 2025, 6:44 AM

#

yeah...

rocky vigil Nov 4, 2025, 6:45 AM

#

rocky vigil or 100 distributed over N cores

this would also grant approximately similar speedup

#

since it's just

#

total number of threads running search

#

at any given point

#

i think

#

in terms of mem pressure

#

nevertheless it'll definitely show on fishtest

#

assuming the net doesn't die

#

in the meanwhile lemme set up old stage 1 net...

stray reef Nov 4, 2025, 6:47 AM

#

prime mica ```Result of 100 runs ================== base (./stockfish ) = 1461604 +/...

is this i8threats already working?

prime mica Nov 4, 2025, 6:47 AM

#

yes unless I bungled something

stray reef Nov 4, 2025, 6:47 AM

#

when fishtest 👀

prime mica Nov 4, 2025, 6:47 AM

#

I just make -j profile-builded consecutive commits from mr. sscg13

rocky vigil Nov 4, 2025, 6:48 AM

#

stray reef when fishtest 👀

no full net

stray reef Nov 4, 2025, 6:48 AM

#

ah

rocky vigil Nov 4, 2025, 6:48 AM

#

stage 1

prime mica Nov 4, 2025, 6:48 AM

#

ok running speedtest 16 now, be back with results in ~10 minutes

#

have you tried this stuff locally?

rocky vigil Nov 4, 2025, 6:48 AM

#

sanity check this and full training should be done in 2 days

rocky vigil Nov 4, 2025, 6:49 AM

#

prime mica have you tried this stuff locally?

I'm currently training a nnue for Prolix lmao

native lake Nov 4, 2025, 6:49 AM

#

Should try bt4 data lol

rocky vigil Nov 4, 2025, 6:49 AM

#

so my laptop is anything but reliable rn

prime mica Nov 4, 2025, 6:49 AM

#

rocky vigil I'm currently training a nnue for Prolix lmao

LOL

#

aura

rocky vigil Nov 4, 2025, 6:49 AM

#

in fact the regular bench is 20% slower than with no load

#

and highly inconsistent ofc

#

i will get the stage 1 fixed games test up on fishtest tho

prime mica Nov 4, 2025, 6:51 AM

#

lol

#

movsxbw movsxbw movsxbw movsxbw

rocky vigil Nov 4, 2025, 6:51 AM

#

average memory bandwidth bottleneck

prime mica Nov 4, 2025, 6:51 AM

#

Gordon Moore shaking in his boots

rocky vigil Nov 4, 2025, 6:51 AM

#

do the i8 -> i16 conversions show up anywhere

prime mica Nov 4, 2025, 6:52 AM

#

I think that's what these are

#

unless you're referring to soemthing eles

rocky vigil Nov 4, 2025, 6:52 AM

#

I am referring to cvtepi8_epi16

prime mica Nov 4, 2025, 6:52 AM

#

ye that compiles to the venerable vpmovsxbw

rocky vigil Nov 4, 2025, 6:52 AM

#

o

prime mica Nov 4, 2025, 6:53 AM

#

move sign extend byte to word

#

first pair in, speedtest 16, +3.1%

#

so a bit more modest with more threads actually

rocky vigil Nov 4, 2025, 6:54 AM

#

hmm

prime mica Nov 4, 2025, 6:54 AM

#

but we'll see

naive comet Nov 4, 2025, 6:54 AM

#

wait I have an idea

prime mica Nov 4, 2025, 6:54 AM

#

pray tell

naive comet Nov 4, 2025, 6:55 AM

#

I need to double check first

prime mica Nov 4, 2025, 6:55 AM

#

yup then I'll move my king

#

I'm still hoping there's slightly more efficient ways to load the i8 to i16 than vpmovsxbw spam

#

my first idea didn't work when Yoshie tried it

#

but there might be some variant

#

will muck around later

rocky vigil Nov 4, 2025, 6:57 AM

#

yeah

#

if i8 to i16 could be sped up it would be good

#

though definitely it seems memory -> i8 -> i16 is faster than memory -> i16

prime mica Nov 4, 2025, 6:58 AM

#

ye

#

at least on some devices... we'll see on fishtest

#

make sure to turn off autopurge ^_^

rocky vigil Nov 4, 2025, 6:58 AM

#

curious if anyone has the 256 MB L3 cpus lying around

#

with this one it'll actually fit in L3

prime mica Nov 4, 2025, 6:58 AM

#

yum

rocky vigil Nov 4, 2025, 6:59 AM

#

and that might be very big

rocky vigil Nov 4, 2025, 6:59 AM

#

rocky vigil with this one it'll actually fit in L3

less mem usage than master net even

prime mica Nov 4, 2025, 6:59 AM

#

my computer has 512 MB but sadly it's split up across the CoRE ComPlExeS

#

so I don't think it would have ur intended effect

#

not sure tho

rocky vigil Nov 4, 2025, 6:59 AM

#

well i can dream

prime mica Nov 4, 2025, 6:59 AM

#

lol

#

good

#

  1   21663457   22347071  +683614
  2   21772336   22595945  +823609
  3   21680315   22189962  +509647
  4   22011912   22642714  +630802

Result of   4 runs
==================
base (./stockfish    ) =   21782005  +/- 157128
test (...sh.after.gcc) =   22443923  +/- 208742
diff                   =    +661918  +/- 127303

speedup        = +0.0304
P(speedup > 0) =  1.0000```

#

speedtest 16

#

would you like me to try more/less threads and/or other parameters?

stray reef Nov 4, 2025, 7:01 AM

#

rocky vigil curious if anyone has the 256 MB L3 cpus lying around

@split warren has one, where we also tested it for plenty

rocky vigil Nov 4, 2025, 7:02 AM

#

ah nice

rocky vigil Nov 4, 2025, 7:02 AM

#

prime mica `speedtest 16`

this is 16 thread 4 run x 150 sec right

#

looks good

prime mica Nov 4, 2025, 7:03 AM

#

yessir

naive comet Nov 4, 2025, 7:03 AM

#

prime mica I'm still hoping there's slightly more efficient ways to load the i8 to i16 than...

I think we can interleave the weights on load,

ie aabb -> abab

use srai to extract high bits
use slli+srai to extract low bits

or was this your idea

prime mica Nov 4, 2025, 7:04 AM

#

yeah...

stray reef Nov 4, 2025, 7:04 AM

#

i used mulhi instead of slli+srai when testing it, but yes that was the idea

prime mica Nov 4, 2025, 7:04 AM

#

I'm still confused why it didn't work tbh

stray reef Nov 4, 2025, 7:04 AM

#

maybe mulhi latency is too high, with small L1 there's really not a lot of iterations

naive comet Nov 4, 2025, 7:04 AM

#

would slli+srai be better?

prime mica Nov 4, 2025, 7:04 AM

#

perhaps

#

yaeh could try that

rocky vigil Nov 4, 2025, 7:05 AM

#

well first we should hold up and make sure https://tests.stockfishchess.org/tests/view/6909a35aea4b268f1fac2a61 doesn't die

stray reef Nov 4, 2025, 7:05 AM

#

can easily try that in about 45mins

rocky vigil Nov 4, 2025, 7:05 AM

#

if validation loss is anything to go by it should be fine in terms of eval quality

#

~~the absolute best case would be a double whammy of the QA=255 change making it both faster and better now~~

prime mica Nov 4, 2025, 7:07 AM

#

let us hope

#

u are doing god's work here

rocky vigil Nov 4, 2025, 7:07 AM

#

nah

prime mica Nov 4, 2025, 7:07 AM

#

new evaluation function is huge

rocky vigil Nov 4, 2025, 7:07 AM

#

the only real work I have contributed is impl

#

and the random suggestion to only apply i8 to threats

#

which turned out to be 🔥

prime mica Nov 4, 2025, 7:08 AM

#

that's like saying the people who built the Panama Canal only dug up 82 kilometers of soil, they didn't actually draw the line on a map

rocky vigil Nov 4, 2025, 7:08 AM

#

fair

prime mica Nov 4, 2025, 7:09 AM

#

😊

rocky vigil Nov 4, 2025, 7:09 AM

#

prime mica new evaluation function is huge

yeah it needed to be done, nn-1c000000000 has reigned as 👑 too long

prime mica Nov 4, 2025, 7:09 AM

#

👻nn-1c000000000👻

rocky vigil Nov 4, 2025, 7:10 AM

#

with this i hope ppl start exploring eval improvements again

prime mica Nov 4, 2025, 7:10 AM

#

watershed moment

#

next chess move won't know what hit them

rocky vigil Nov 4, 2025, 7:10 AM

#

ncm actually favorable to threat inputs

#

with the smp

prime mica Nov 4, 2025, 7:10 AM

#

yep

stray reef Nov 4, 2025, 7:11 AM

#

what is the L3 size on ncm? :P

prime mica Nov 4, 2025, 7:11 AM

#

once things are cleaned up I'ma try porting some of my layer combining work to threat inputs

#

I actually think it could work even better

rocky vigil Nov 4, 2025, 7:11 AM

#

"NCM uses Dell R7515 128-thread EPYC 7702 dedicated servers to perform its dev build tests. Each server plays 16 games concurrently with 30+0.3 time controls. Hash is set to 128MB, and Threads is set to 8."

#

256 MB shared

stray reef Nov 4, 2025, 7:11 AM

#

256MB l3

prime mica Nov 4, 2025, 7:11 AM

#

fancy schmancy

rocky vigil Nov 4, 2025, 7:11 AM

#

ok but then why didn't it have any effect

#

(the memory sharing patch)

prime mica Nov 4, 2025, 7:12 AM

#

I wondered that too

rocky vigil Nov 4, 2025, 7:12 AM

#

bc with that the entire net could've fit in that 256 MB

stray reef Nov 4, 2025, 7:12 AM

#

prime mica once things are cleaned up I'ma try porting some of my layer combining work to t...

what are your ideas here?

prime mica Nov 4, 2025, 7:12 AM

#

maybe they have IPC disabled

rocky vigil Nov 4, 2025, 7:12 AM

#

maybe it's split up across

prime mica Nov 4, 2025, 7:12 AM

#

stray reef what are your ideas here?

https://tests.stockfishchess.org/tests/live_elo/690008ee637acd2a11e73441 one idea I've been musing about is combining add/sub with featureTransformer

naive comet Nov 4, 2025, 7:13 AM

#

maybe shuf instead of left shift is better

#

idk free up ports? idk

#

concurrent execution smth smth

prime mica Nov 4, 2025, 7:13 AM

#

even with a noobish implementation like that one ^^ it's a decent bump

rocky vigil Nov 4, 2025, 7:13 AM

#

sss

prime mica Nov 4, 2025, 7:13 AM

#

lol

#

sss

rocky vigil Nov 4, 2025, 7:14 AM

#

random walk go

naive comet Nov 4, 2025, 7:14 AM

#

prime mica https://tests.stockfishchess.org/tests/live_elo/690008ee637acd2a11e73441 one ide...

in Alex we completely elide UE (only keeping finny tables) but do that and it's a speedup

#

https://github.com/PGG106/Alexandria/blob/master/src/nnue.cpp#L34

prime mica Nov 4, 2025, 7:14 AM

#

fancy schmancy

#

what is a finny table

#

oh is that the entryTile stuff

naive comet Nov 4, 2025, 7:15 AM

#

basically a small cache to store old accumulators and positions so we can diff the positions and UE that instead of refreshing from scratch

naive comet Nov 4, 2025, 7:15 AM

#

naive comet <https://github.com/PGG106/Alexandria/blob/master/src/nnue.cpp#L34>

I thought about combining it with UE but I didn't want to get depression coding it, I guess you did XD

prime mica Nov 4, 2025, 7:16 AM

#

lol

#

you anticipated the psychological effects quite precisely

#

debugging it took a couple hours

#

Result of   4 runs
==================
base (./stockfish    ) =   70245225  +/- 1147585
test (...sh.after.gcc) =   72228060  +/- 894349
diff                   =   +1982836  +/- 871094

speedup        = +0.0282
P(speedup > 0) =  1.0000

64 threads

#

so seems to scale nicely enough (super noisy at these thread counts tho)

rocky vigil Nov 4, 2025, 7:17 AM

#

yea pretty consistent speedup

naive comet Nov 4, 2025, 7:17 AM

#

why does this guy have 64 threads

split warren Nov 4, 2025, 7:17 AM

#

stray reef <@457017004373704705> has one, where we also tested it for plenty

I do, do you guys want me to run something specific on it locally?

rocky vigil Nov 4, 2025, 7:17 AM

#

it has occurred to me that I maybe should've tested fixed nodes sanity check

stray reef Nov 4, 2025, 7:17 AM

#

prime mica https://tests.stockfishchess.org/tests/live_elo/690008ee637acd2a11e73441 one ide...

trying to read SF inference on phone challenge (impossible)

will look later

split warren Nov 4, 2025, 7:17 AM

#

Though it'll probably be tomorrow morning as I'm not gonna do that on my phone lol

prime mica Nov 4, 2025, 7:18 AM

#

stray reef trying to read SF inference on phone challenge (impossible) will look later

np the code is shit tho

#

just probing to see whether it's a potentially good idea. Honestly if in the end it's <4 ELO STC I don't think it's worth it bc you completely break encapsulation of the NN layers

naive comet Nov 4, 2025, 7:19 AM

#

anematode you might want to combine this thing with refreshes like we do at alex

#

I think it gains more

#

and is less cancer to implement

#

but idk

prime mica Nov 4, 2025, 7:19 AM

#

yeah I will def try that!

stray reef Nov 4, 2025, 7:19 AM

#

naive comet <https://github.com/PGG106/Alexandria/blob/master/src/nnue.cpp#L34>

damn what??

#

that works?

naive comet Nov 4, 2025, 7:19 AM

#

😎

prime mica Nov 4, 2025, 7:19 AM

#

lol

#

'tis always fun to have convergent ideas

rocky vigil Nov 4, 2025, 7:20 AM

#

bruh this nnue training for Prolix is not ideal rn

#

i can only use like 3 concurrency for fixed nodes

#

sanity check

#

instead of 12

prime mica Nov 4, 2025, 7:20 AM

#

prolix

split warren Nov 4, 2025, 7:21 AM

#

So just to make sure I understood, I gotta clone shawns ti branch and then do speedup between the two branches tomorrow?

prime mica Nov 4, 2025, 7:21 AM

#

hm?

#

I think sscg13's branch acutally

split warren Nov 4, 2025, 7:22 AM

#

Sscg13 is the dev, what's the base?

prime mica Nov 4, 2025, 7:22 AM

#

two consecutive commits in his branch

rocky vigil Nov 4, 2025, 7:22 AM

#

yea

split warren Nov 4, 2025, 7:22 AM

#

I'll post the results here tomorrow

stray reef Nov 4, 2025, 7:22 AM

#

i think just base & dev of this?
https://tests.stockfishchess.org/tests/view/6909a35aea4b268f1fac2a61

rocky vigil Nov 4, 2025, 7:22 AM

#

that would also work

#

just different nets

split warren Nov 4, 2025, 7:22 AM

#

stray reef i think just base & dev of this? https://tests.stockfishchess.org/tests/view/690...

This is perfect

rocky vigil Nov 4, 2025, 7:22 AM

#

it shouldn't matter

#

holy sss in that test tho

split warren Nov 4, 2025, 7:23 AM

#

For speedup wouldnt matter as long as I'm using same net in both branches so we cool

rocky vigil Nov 4, 2025, 7:23 AM

#

rocky vigil bruh this nnue training for Prolix is not ideal rn

nvm it just finished

#

right on time

#

lmao

split warren Nov 4, 2025, 7:23 AM

#

Bruh I have been meaning to, I'll do a Prolix net train for you while you do this, i gotcha

#

I know you've sent me the info, I'll actually get to it

prime mica Nov 4, 2025, 7:24 AM

#

kyoot

split warren Nov 4, 2025, 7:24 AM

#

Ngl, looking at ur net the last time, it's actually a pretty quick job 😉

twilit oriole Nov 4, 2025, 7:24 AM

#

If the benches between the branches is different a speedup test is not valid

naive comet Nov 4, 2025, 7:24 AM

#

^^^^^^

#

unless you do like fixed nodes or smth

split warren Nov 4, 2025, 7:25 AM

#

Hash gate, thread gate n now benchgate?

prime mica Nov 4, 2025, 7:25 AM

#

scandal central

twilit oriole Nov 4, 2025, 7:25 AM

#

naive comet unless you do like fixed nodes or smth

Still it is invalid

naive comet Nov 4, 2025, 7:25 AM

#

split warren Hash gate, thread gate n now benchgate?

idk I was scammed for an 11% speedup but it was just because bench was bigger so it had higher nps in pyshbench

rocky vigil Nov 4, 2025, 7:25 AM

#

so yeah

#

just do two consecutive

#

commits

naive comet Nov 4, 2025, 7:26 AM

#

twilit oriole Still it is invalid

much more accurate tho

rocky vigil Nov 4, 2025, 7:26 AM

#

to avoid that

naive comet Nov 4, 2025, 7:26 AM

#

and still a good approx imo

#

idk

#

at least in my experience

rocky vigil Nov 4, 2025, 7:26 AM

#

i have those up

#

one i16

#

one i8

naive comet Nov 4, 2025, 7:26 AM

#

ok

prime mica Nov 4, 2025, 7:27 AM

#

naive comet idk I was scammed for an 11% speedup but it was just because bench was bigger so...

story of my life

twilit oriole Nov 4, 2025, 7:30 AM

#

naive comet and still a good approx imo

I have seen 2-3% variability between patches with that. I tried

rocky vigil Nov 4, 2025, 7:31 AM

#

split warren Hash gate, thread gate n now benchgate?

anyways just run https://github.com/sscg13/Stockfish/commit/5a6633ad554f22ef1ad953bff2af74d0db3c0b79 vs https://github.com/sscg13/Stockfish/commit/83eb0e1d835e138194237c33cc968c48f42a6a68

rocky vigil Nov 4, 2025, 7:32 AM

#

split warren Bruh I have been meaning to, I'll do a Prolix net train for you while you do thi...

about this the data I uploaded is actually outdated now, lemme see if I can get the newest dataset uploaded

prime mica Nov 4, 2025, 7:35 AM

#

@naive comet lol your neural network code is lowk 10x easier to read and understand than Stockfish's

rocky vigil Nov 4, 2025, 7:36 AM

#

yeah sf nnue code is uh

#

ngl

#

it's not good

prime mica Nov 4, 2025, 7:36 AM

#

I feel like it's just overabstracted...

#

maybe once threat inputs are in an overhaul is in order

rocky vigil Nov 4, 2025, 7:37 AM

#

btw fixed nodes is concerning so lemme look into inference again first

...      Stockfish TI-i8 playing White: 144 - 105 - 251  [0.539] 500
...      Stockfish TI-i8 playing Black: 69 - 156 - 275  [0.413] 500
...      White vs Black: 300 - 174 - 526  [0.563] 1000
Elo difference: -16.7 +/- 14.8, LOS: 1.4 %, DrawRatio: 52.6 %
SPRT: llr 0 (0.0%), lbound -inf, ubound inf
1000 of 1000 games finished.```

prime mica Nov 4, 2025, 7:38 AM

#

O no

#

when the i8 😭

rocky vigil Nov 4, 2025, 7:38 AM

#

no this must be net issue

#

or inference issue

prime mica Nov 4, 2025, 7:38 AM

#

yeah ik

rocky vigil Nov 4, 2025, 7:47 AM

#

@naive comet shouldn't this actually be 255 / 256

#

or am I throwing

naive comet Nov 4, 2025, 7:54 AM

#

this should be 255/256

naive comet Nov 4, 2025, 7:54 AM

#

prime mica <@1082450465301733376> lol your neural network code is lowk 10x easier to read a...

it's also much simpler (although even sf arch in this won't be too complicated)

rocky vigil Nov 4, 2025, 7:58 AM

#

naive comet this should be 255/256

💀

#

could that be worth 15 elo

#

or however much stage 1 is losing at fixed nodes

naive comet Nov 4, 2025, 7:59 AM

#

rocky vigil could that be worth 15 elo

nah

#

for me at least it's like minimal

#

in my old experience

#

you can honestly retrain from this xd

stray reef Nov 4, 2025, 7:59 AM

#

https://furybench.com/test/3598/ srai(slli(...)) vs. cvtepi8_epi16 (VSTC)

naive comet Nov 4, 2025, 8:00 AM

#

I love the naming

rocky vigil Nov 4, 2025, 8:00 AM

#

then where'd the 15 elo go

#

💀

naive comet Nov 4, 2025, 8:00 AM

#

the remaining 5 stages?

rocky vigil Nov 4, 2025, 8:00 AM

#

remaining 4*

#

maybe

#

gotta 🙏 hard for this one

naive comet Nov 4, 2025, 8:00 AM

#

yeah 4 fucking stages bro

rocky vigil Nov 4, 2025, 8:01 AM

#

tbf factorizer stage 1 was also 15 elo above non-factorizer stage 1

#

and then only 3.5 elo

#

in the end

naive comet Nov 4, 2025, 8:01 AM

#

just update the branch and retrain from there

prime mica Nov 4, 2025, 8:09 AM

#

The x86 ISA and its consequences have been a disaster for the human race

#

Ugh hopefully we can figure the regression out 🤞

rocky vigil Nov 4, 2025, 8:15 AM

#

💀

#

i think the smallnet just died

#

with this

#

actually shoot

#

yeah

#

i killed the smallnet

#

that might regain some portion of elo

rocky vigil Nov 4, 2025, 8:50 AM

#

so

#

I have not been able to do i8 on the existing net

#

in a manner that doesn't lose 300 elo at fixed nodes

#

if anyone would like to attempt to fix it's at https://github.com/sscg13/Stockfish/tree/threat-inputs-i8-originalnet

stray reef Nov 4, 2025, 9:03 AM

#

stray reef <https://furybench.com/test/3598/> srai(slli(...)) vs. cvtepi8_epi16 (VSTC)

seems better than maddubs, that one was -5 at this TC, but also does not seem superior, and this is 2+0.02

#

i'll stop it so my tune finishes faster :P

rocky vigil Nov 4, 2025, 10:12 AM

#

lmao

#

salvation

rocky vigil Nov 4, 2025, 10:37 AM

#

@violet badger a minor fix in https://github.com/sscg13/nnue-pytorch/commit/68b56ad3bfa98a6433b3e37fd4b26ba9155fbf2c, doesn't require restarting training
stage 1 looks very impressive after fixes (so far, I am saying this as of 3000 games) so i think we should be good to go for the other 4 stages

lofty cedar Nov 4, 2025, 4:07 PM

#

The current i8 net loads half the vector at once and casts up.

#

But was that optimal?

#

Should we instead load the vector in full then split in half?

violet badger Nov 4, 2025, 4:31 PM

#

rocky vigil <@713871252246495262> a minor fix in <https://github.com/sscg13/nnue-pytorch/com...

you prefer training the later stages with the fix, or we go with the current setup full length, or I train in parallel?

#

am I reading fishtest correctly that the i8 branch gains about 10Elo STC? That would be impressive of course.

twilit oriole Nov 4, 2025, 4:33 PM

#

yoshie had similar results, it compresses a lot at LTC because of the slight fixed nodes loss also

#

https://furybench.com/test/3530/
https://furybench.com/test/3533/

violet badger Nov 4, 2025, 4:53 PM

#

rocky vigil <@713871252246495262> a minor fix in <https://github.com/sscg13/nnue-pytorch/com...

I've started a second training run with that fix integrated.

rocky vigil Nov 4, 2025, 5:29 PM

#

Ah either way works

#

It would’ve been fine to just go through with the original run

#

But either way the net will be done in ~2-3 days

rocky vigil Nov 4, 2025, 5:32 PM

#

violet badger am I reading fishtest correctly that the i8 branch gains about 10Elo STC? That w...

Stage 1 of course is not an entirely accurate comparison, but at least it would seem that there is not a large fixed nodes loss

lofty cedar Nov 4, 2025, 5:51 PM

#

Apruvu sama.

#

I tried loading and then using vector extract instruction to split instead of loading individual vectors.

violet badger Nov 4, 2025, 6:10 PM

#

rocky vigil It would’ve been fine to just go through with the original run

both runs are ongoing.. we'll see both.

lofty cedar Nov 4, 2025, 6:14 PM

#

Oops... fixed it not working on AVX2. Apruvu sama.

split warren Nov 4, 2025, 8:58 PM

#

rocky vigil anyways just run <https://github.com/sscg13/Stockfish/commit/5a6633ad554f22ef1ad...

  Run  1: 73598785 nps
  Run  2: 73103840 nps
  Run  3: 72480310 nps
  Run  4: 72441752 nps
  Run  5: 73022612 nps
  Run  6: 72743447 nps
  Run  7: 73217047 nps
  Run  8: 73797795 nps
  Run  9: 73452183 nps
  Run 10: 73739312 nps
Benchmarking 83eb0e1...
  Run  1: 67691034 nps
  Run  2: 67220747 nps
  Run  3: 67977216 nps
  Run  4: 67682587 nps
  Run  5: 68257834 nps
  Run  6: 67943234 nps
  Run  7: 67116181 nps
  Run  8: 68663434 nps
  Run  9: 67257098 nps
  Run 10: 67305347 nps

Engine                        Average NPS   Failures
------------------------- ---------------   --------
5a6633ad                         73159708          0
83eb0e1                          67711471          0```

rocky vigil Nov 4, 2025, 8:59 PM

#

Hmm +8% pretty nice

#

All that’s left now is to wait for net

green moat Nov 4, 2025, 9:06 PM

#

Is this correct as of now?
"Some speedup, ~~i8 inference~~, ~~i8 nets training~~, ~~verbatim nets~~.....and eventually SPSA the net"

#

What would be next steps before merging TI?

torn lagoon Nov 4, 2025, 9:14 PM

#

green moat What would be next steps before merging TI?

Gain to master I believe

green moat Nov 4, 2025, 9:20 PM

#

torn lagoon Gain to master I believe

gain to master until any Elo can be squeezed, and then SPSA the net, I'm assuming

torn lagoon Nov 4, 2025, 9:29 PM

#

It was agreed to not spsa

violet badger Nov 4, 2025, 9:40 PM

#

rocky vigil Hmm +8% pretty nice

speedtest ratio for the two versions

18137844 / 14678640
1.23566243194192377495
18064633 / 14668557
1.23152079649007056385

#

(AMD Ryzen 9 3950X)

prime mica Nov 4, 2025, 9:42 PM

#

We are off to the races!!!

#

Interesting that my computer saw the least speed up this time…

violet badger Nov 4, 2025, 9:43 PM

#

really, this stuff is getting pretty HW dependent.

prime mica Nov 4, 2025, 9:43 PM

#

😩

violet badger Nov 4, 2025, 9:47 PM

#

in fact so HW dependent it is currently not compiling on ARM 😉

prime mica Nov 4, 2025, 9:47 PM

#

LOL true

violet badger Nov 4, 2025, 9:47 PM

#

nnue/nnue_accumulator.cpp:362:65: error: 'vec_convert_8_16' was not declared in this scope
  362 |                     acc[k] = vec_sub_16(acc[k], vec_convert_8_16(column[k]));
      |                                                 ~~~~~~~~~~~~~~~~^~~~~~~~~~~
nnue/layers/../simd.h:173:43: note: in definition of macro 'vec_sub_16'
  173 |     #define vec_sub_16(a, b) vsubq_s16(a, b)
      |                                           ^

prime mica Nov 4, 2025, 9:48 PM

#

Ill do an investigation about the best approach for ARM

#

I suspect we should have some improvement from vldq4_u8 or whatever it’s called which loads four vectors in one instruction

#

oh also, should we maybe merge shared memory into this and run another fishtest to see if modifies the situation?

amber fern Nov 4, 2025, 10:13 PM

#

prime mica new evaluation function is huge

What's different about it?

#

Other than it being stronger I mean.

prime mica Nov 4, 2025, 10:16 PM

#

I mean it's just exciting to have a different evaluation scheme

#

shawn's thesis is that Elo stagnation is in large part due to unchanging evaluation and I'm inclined to agree

rare jacinth Nov 4, 2025, 10:17 PM

#

@prime mica btw the cached updates I suggested would probably be even more performant for threat inputs

prime mica Nov 4, 2025, 10:18 PM

#

yeeee I will try them soon

amber fern Nov 4, 2025, 10:44 PM

#

Hoping for a new stockfish with thread inputs as master as my early christmas present 🙂

prime mica Nov 4, 2025, 10:44 PM

#

XD

#

ok we'll try to get it in before Dec 25 :)

#

@violet badger got it working on ARM...

#

https://github.com/anematode/Stockfish/tree/arm-port

#

will do some apple silicon speed tests in a bit

#

wow, fantastic on ARM

#

==================
base (./stockfish    ) =    1070090  +/- 12943
test (./stockfish_i8 ) =    1196706  +/- 12269
diff                   =    +126615  +/- 6814

speedup        = +0.1183
P(speedup > 0) =  1.0000

CPU: 10 x arm
Hyperthreading: off

#

(Apple M1)

#

==================
base (./stockfish    ) =    1389464  +/- 16853
test (./stockfish_i8 ) =    1539608  +/- 18562
diff                   =    +150144  +/- 2732

speedup        = +0.1081
P(speedup > 0) =  1.0000

CPU: 12 x arm

(Apple M4)

daring wren Nov 4, 2025, 11:13 PM

#

amber fern Hoping for a new stockfish with thread inputs as master as my early christmas pr...

thread inputs 🥀

prime mica Nov 4, 2025, 11:19 PM

#

hm the arm64 codegen still looks suboptimal on Apple clang

#

I'll see if I can squeeze out a bit more with vld1q_s8_x4

prime mica Nov 5, 2025, 12:24 AM

#

@rocky vigil we no longer prescale weights?

rocky vigil Nov 5, 2025, 12:24 AM

#

Nope

prime mica Nov 5, 2025, 12:24 AM

#

gotcha

#

it's not even helpful anymore right

rocky vigil Nov 5, 2025, 12:25 AM

#

It would force extra x2s elsewhere since i8 is restrictive

prime mica Nov 5, 2025, 12:25 AM

#

or in theory (suppose you could double them for free in add/sub) would it be nice?

#

the reason I'm asking is bc

#

ARM's i8 -> i16 conversion instructions have a shfito perand

rocky vigil Nov 5, 2025, 12:25 AM

#

rocky vigil in a manner that doesn't lose 300 elo at fixed nodes

Also ^^

prime mica Nov 5, 2025, 12:25 AM

#

lol

rocky vigil Nov 5, 2025, 12:26 AM

#

prime mica ARM's i8 -> i16 conversion instructions have a shfito perand

I think just reduce the slli from 7 to 6 to compensate

prime mica Nov 5, 2025, 12:26 AM

#

yeah but does that even help

rocky vigil Nov 5, 2025, 12:26 AM

#

Or smth

prime mica Nov 5, 2025, 12:26 AM

#

for now I just have shift = 0

rocky vigil Nov 5, 2025, 12:26 AM

#

Idk

#

Wait how does mulhi trick work on arm

prime mica Nov 5, 2025, 12:26 AM

#

not sure

#

anyway we can flesh it out later, all that matters is it's already a huge win on ARM too

rocky vigil Nov 5, 2025, 12:27 AM

#

Yeah

rocky vigil Nov 5, 2025, 12:28 AM

#

green moat What would be next steps before merging TI?

Wait for the i8 net(s) to be trained, and after that it will probably be enough elo to pass the sprts against master

#

Code cleanup also needs to happen

prime mica Nov 5, 2025, 12:29 AM

#

what are your ideas for cleanup

rocky vigil Nov 5, 2025, 12:30 AM

#

Like just a generic statement

prime mica Nov 5, 2025, 12:30 AM

#

oh sure

rocky vigil Nov 5, 2025, 12:30 AM

#

I think for one it’s currently hacky how I redefine vec ONE depending on smallnet

#

Like vec(254 + use_threats)

prime mica Nov 5, 2025, 12:31 AM

#

couldn't we just get rid of use_threats

#

oh wait that's for small net only I see

rocky vigil Nov 5, 2025, 12:31 AM

#

Also need to add non-avx2 back

#

There’s a way that Plentychess uses but it was a little too complicated for me to bother copying

prime mica Nov 5, 2025, 12:32 AM

#

lol

#

non-avx2 mfs can upgrade

split warren Nov 5, 2025, 12:35 AM

#

Ive wished this too many times to actually admit... But then there's always someone running a cpu from 2004 still

#

Most often it's these outdated xeon cores v2 or whatever pre avx2 was

prime mica Nov 5, 2025, 12:36 AM

#

sigh

rocky vigil Nov 5, 2025, 12:36 AM

#

rocky vigil There’s a way that Plentychess uses but it was a little too complicated for me t...

In particular it did not fit in one line lol

prime mica Nov 5, 2025, 12:36 AM

#

I mean we'll just write the straightforward translation and then it'll be good enough right

prime mica Nov 5, 2025, 1:29 AM

#

somewhat crazy idea

#

could it potentially profitable to use VNNI instructions with multipliers of ±1 to further improve threat input updates

#

the pain point is that it accumulates to 32 bits

#

I'll probably try it once it's merged

amber fern Nov 5, 2025, 1:56 AM

#

daring wren thread inputs 🥀

threat inputs whoops lol

stray reef Nov 5, 2025, 6:45 AM

#

prime mica for now I just have shift = 0

i do this too fwiw

violet badger Nov 5, 2025, 6:51 AM

#

prime mica https://github.com/anematode/Stockfish/tree/arm-port

nice tests running... maybe @rocky vigil can merge this already in his branch.

rocky vigil Nov 5, 2025, 6:51 AM

#

yeah I can do that

prime mica Nov 5, 2025, 6:52 AM

#

I modified it inline so it won't compile on x86 anymore

rocky vigil Nov 5, 2025, 6:52 AM

#

oh

#

uh

prime mica Nov 5, 2025, 6:52 AM

#

but I can fix that

rocky vigil Nov 5, 2025, 6:52 AM

#

just get rid of that and then pr

#

ig

prime mica Nov 5, 2025, 6:52 AM

#

yep!

violet badger Nov 5, 2025, 6:54 AM

#

also didn't check this compiles on old arm...

#

anyway, progress..

#

==== 4a97c2ba244790c41bff09968d93430966ac5d48 ====
1 Nodes/second : 290009447
2 Nodes/second : 291033770
Average (over 2):  290521608
==== 83eb0e1d835e138194237c33cc968c48f42a6a68 ====
1 Nodes/second : 267842604
2 Nodes/second : 266942817
Average (over 2):  267392710

#

good 8% speedup

prime mica Nov 5, 2025, 7:02 AM

#

😎

#

@rocky vigil PRed

rocky vigil Nov 5, 2025, 7:06 AM

#

cool

#

merged

#

another name joins the eventual pr

prime mica Nov 5, 2025, 7:08 AM

#

big ball of moss

stray reef Nov 5, 2025, 7:08 AM

#

nice. how long until the net is fully trained?

rocky vigil Nov 5, 2025, 7:08 AM

#

stray reef nice. how long until the net is fully trained?

1-2 days

#

stage one being +10 elo at stc as expected

#

maybe +11

#

idk how much that minor additional smallnet fix is

prime mica Nov 5, 2025, 7:09 AM

#

vs. what?

rocky vigil Nov 5, 2025, 7:09 AM

#

vs last run stage 1

prime mica Nov 5, 2025, 7:09 AM

#

ah ok

#

so just measuring the effect of speed ups

violet badger Nov 5, 2025, 7:09 AM

#

39h I would guess.

rocky vigil Nov 5, 2025, 7:10 AM

#

for which run

#

i guess 127 / 128 vs 255 / 256 is minor

violet badger Nov 5, 2025, 7:10 AM

#

(for the first one)

#

2000 epochs remaining.

#

gives us time to think about QAT..

prime mica Nov 5, 2025, 7:11 AM

#

what is QAT

violet badger Nov 5, 2025, 7:11 AM

#

quantization aware training

prime mica Nov 5, 2025, 7:11 AM

#

ohhh

rocky vigil Nov 5, 2025, 7:11 AM

#

so far the only real change is it knows about the i8 limits

stray reef Nov 5, 2025, 7:12 AM

#

in my experiments it only helped when the quantisation was really tight (when all feature weights are i8)

#

right now quantisation isn't really different then before

#

but ofc maybe it's a way to squeeze another elo at the cost of training speed :P

prime mica Nov 5, 2025, 7:13 AM

#

stray reef in my experiments it only helped when the quantisation was really tight (when al...

oh you did try quantizin the main net?

#

what was the fixed-nodes loss

rocky vigil Nov 5, 2025, 7:15 AM

#

linrock claimed the quantization change is worth 1 elo or so

#

so it's likely we might not even see fixed nodes loss

violet badger Nov 5, 2025, 7:16 AM

#

if we're only losing 1Elo we're not quantizing hard enough 😉

prime mica Nov 5, 2025, 7:16 AM

#

lololol

violet badger Nov 5, 2025, 7:16 AM

#

int4 SF when?

prime mica Nov 5, 2025, 7:16 AM

#

ideal

prime mica Nov 5, 2025, 7:17 AM

#

rocky vigil so it's likely we might not even see fixed nodes loss

if that's for real then we should strongly consider that

rocky vigil Nov 5, 2025, 7:17 AM

#

oh i meant like

#

the 127 -> 255 QA change

#

maybe cancels out the slight loss

#

from i8

prime mica Nov 5, 2025, 7:17 AM

#

ohh I see

rocky vigil Nov 5, 2025, 7:17 AM

#

i guess it's a "good antiscaler"

#

big gain at stc, moderate gain at ltc, neutral at vvltc

stray reef Nov 5, 2025, 7:18 AM

#

prime mica oh you did try quantizin the main net?

ok i checked back what i actually tested. QA=63 + QAT was about as strong as QA=127 (both -10 fixed nodes to master). but with QA=127, QAT did not help (same result against master)

this is all full i8 (except master, that was still i16 back then)

prime mica Nov 5, 2025, 7:18 AM

#

interesting ok

rocky vigil Nov 5, 2025, 7:19 AM

#

rocky vigil big gain at stc, moderate gain at ltc, neutral at vvltc

such stuff is technically antiscaling but ppl would be fine adding it

#

i guess it's only a bad antiscaler if it goes negative

violet badger Nov 5, 2025, 7:20 AM

#

but that's where I think QAT could help. See how much it reduces fixed node Elo loss.

rocky vigil Nov 5, 2025, 7:20 AM

#

i thought the fixed node loss was only -2 or smth

violet badger Nov 5, 2025, 7:20 AM

#

so, freelo 😉

rocky vigil Nov 5, 2025, 7:21 AM

#

hopefully the good results continue up to stage 4/5

violet badger Nov 5, 2025, 7:21 AM

#

I think there is also some loss on the other parts of the net.

#

we could probably soon test stage 3.

#

that's pretty close to a converged net.

rocky vigil Nov 5, 2025, 7:22 AM

#

violet badger I think there is also some loss on the other parts of the net.

i thought the later layers remained unchanged

violet badger Nov 5, 2025, 7:22 AM

#

I mean they are also quantized from float to int

rocky vigil Nov 5, 2025, 7:22 AM

#

right yeah

#

it might benefit there slightly

violet badger Nov 5, 2025, 7:23 AM

#

I'm wondering if part of the SPSA gains are just related to cleaning up quantizing..

prime mica Nov 5, 2025, 7:23 AM

#

that would be cruel but hilarious

rocky vigil Nov 5, 2025, 7:24 AM

#

i remember viren saying a while ago that the quantization in later layers has a large effect

#

i think it is worth revisiting

frosty imp Nov 5, 2025, 7:26 AM

#

QAT should be like 10 lines max with pytorch

#

any branch to work on?

rocky vigil Nov 5, 2025, 7:35 AM

#

frosty imp any branch to work on?

https://github.com/sscg13/nnue-pytorch/tree/threat-i8-QA-255

#

though idt it's exclusive to threat inputs

#

(QAT on the later layers, that is)

frosty imp Nov 5, 2025, 7:50 AM

#

https://github.com/xu-shawn/nnue-pytorch/tree/threat-i8-QA-255-QAT

#

can't test rn but I think it should work

#

maybe we can extend the quantization to weights later

violet badger Nov 5, 2025, 8:10 AM

#

can try to run this branch as well. I'm just somewhat surprised that this is the way it is done. I would expect some term added to the loss, that drives weights to be close to quantized values.

frosty imp Nov 5, 2025, 8:20 AM

#

hmm I'm not sure if the quantization is applied to the weights or activations just from the bullet commit alone

#

https://github.com/xu-shawn/nnue-pytorch/commit/ff987e34ddf263e8f30f9300f5e704f6106f050e

#

quantized weight version

prime mica Nov 5, 2025, 8:30 AM

#

💦

violet badger Nov 5, 2025, 8:40 AM

#

so the latter version is the thing to run, I assume?

frosty imp Nov 5, 2025, 8:40 AM

#

probably

violet badger Nov 5, 2025, 8:40 AM

#

let's start with that and see where we get.

#

started

violet badger Nov 5, 2025, 1:58 PM

#

step 3 training finished..

rocky vigil Nov 5, 2025, 1:58 PM

#

Hmm

#

Can give it a go

#

It should already gain at this stage

#

If +10 is real

prime mica Nov 5, 2025, 2:33 PM

#

Yes

#

Plz

violet badger Nov 5, 2025, 2:33 PM

#

test is up, but I'm not sure what is being tested against what 🙂

prime mica Nov 5, 2025, 2:34 PM

#

I think against stage 3 of previous threat weights run

rocky vigil Nov 5, 2025, 2:34 PM

#

nope

prime mica Nov 5, 2025, 2:34 PM

#

O?

rocky vigil Nov 5, 2025, 2:34 PM

#

just stage 3 against stage 5

prime mica Nov 5, 2025, 2:35 PM

#

I see ok

violet badger Nov 5, 2025, 2:35 PM

#

so best threats setup, against current stage 3 i8

rocky vigil Nov 5, 2025, 2:35 PM

#

stages 4/5 are worth max like 3 elo anyways from what we've seen

violet badger Nov 5, 2025, 2:35 PM

#

yeah.

rocky vigil Nov 5, 2025, 2:35 PM

#

so it should already gain at this point

#

hopefully that will be confirmed shortly

prime mica Nov 5, 2025, 2:36 PM

#

Against master or against previous threat inputs

#

I am giddy

#

Time to sip my morning coffee and watch the Elo while reading the news

violet badger Nov 5, 2025, 2:38 PM

#

you'd better sip Elo while watching the news.

prime mica Nov 5, 2025, 2:38 PM

#

lololol does it taste good

violet badger Nov 5, 2025, 2:38 PM

#

only one little sip and you're hooked, i've heard

prime mica Nov 5, 2025, 2:39 PM

#

Well shit I gotta find some then

violet badger Nov 5, 2025, 2:39 PM

#

let me trigger it a little bit

prime mica Nov 5, 2025, 2:40 PM

#

😍

rocky vigil Nov 5, 2025, 2:42 PM

#

~~gg sf 18 is here~~

prime mica Nov 5, 2025, 3:34 PM

#

what are we expecting Elo wise?

violet badger Nov 5, 2025, 4:06 PM

#

having no expectations is the safest, but some speedup O(10Elo) and some quantization error O(1Elo). As long as prefactors are no 1/3 and 3, all good.

prime mica Nov 5, 2025, 7:13 PM

#

Elo: 6.32 ± 3.6

#

comports with stage 4/5 being handful of points right?

#

big error bars tho

stray reef Nov 5, 2025, 7:23 PM

#

test vs master now?

violet badger Nov 5, 2025, 7:42 PM

#

I would wait for the stages 4/5 to finish.. we can than pick the best net

rocky vigil Nov 5, 2025, 10:35 PM

#

A few stage 4/5 runs are finishing in the next few days

rocky vigil Nov 5, 2025, 10:57 PM

#

Btw we can look into removing leb128 for the threat weights

#

It literally cannot perform better

#

And removing it would simplify the current parsing code

prime mica Nov 5, 2025, 11:00 PM

#

It should just be a memcpy at this point right

#

Followed by a permutation ofc

rocky vigil Nov 5, 2025, 11:00 PM

#

Yeah

#

Rn what is done is read the entire thing into a big array

#

And then move it into separate arrays

#

Because as it turns out our readleb128 also includes a length

#

So it actually cannot just be read directly

rocky vigil Nov 5, 2025, 11:04 PM

#

prime mica It should just be a memcpy at this point right

read_little_endian is our function for this

#

Which maybe also guards against the machine being big endian somehow

#

Ah nvm single byte

rocky vigil Nov 5, 2025, 11:08 PM

#

rocky vigil And then move it into separate arrays

This actually produces a noticeable startup time

prime mica Nov 5, 2025, 11:13 PM

#

that's surprising..

#

unless you mean read_leb_128 is slow

#

which would make sense to me

rocky vigil Nov 5, 2025, 11:14 PM

#

It’s like half a second

#

Or smth

#

Idk

#

At the very least if I open the exe and type uci right away it doesn’t process instantly

prime mica Nov 5, 2025, 11:29 PM

#

ai ya

#

that's definitely something to fix before merging

amber fern Nov 6, 2025, 12:13 AM

#

So the threat-inputs-i8 is the new optimisation that to my understanding reduces weight precision slightly to improve the speed of the network? Does that make it a slight antiscalar?

rocky vigil Nov 6, 2025, 12:18 AM

#

+10 stc +3 ltc is our prediction

#

I guess that is an antiscaler

#

Technically

amber fern Nov 6, 2025, 12:23 AM

#

what about vltc?

rocky vigil Nov 6, 2025, 12:32 AM

#

Neutral most likely

#

Well

#

The fixed nodes loss is probably maximum 1-2 elo

#

So the speedup is well worth it

#

Lots of variables here

amber fern Nov 6, 2025, 12:36 AM

#

has this been tried will the smallnet as well? Im guessing its also highly worth it

rocky vigil Nov 6, 2025, 12:43 AM

#

Smallnet isn’t using threats at all

#

Still the old smallnet

frosty imp Nov 6, 2025, 3:08 AM

#

rocky vigil Btw we can look into removing leb128 for the threat weights

Easy in SF. Difficult on the trainer side

naive comet Nov 6, 2025, 5:16 AM

#

I will try to optimise the code maybe

#

is the i8 merged?

rocky vigil Nov 6, 2025, 5:48 AM

#

naive comet is the i8 merged?

not yet, still waiting for nets to train

#

although stage 3 is already much better at stc

naive comet Nov 6, 2025, 5:49 AM

#

ugh

#

so I have to pull your branch instead of shawn

rocky vigil Nov 6, 2025, 6:00 AM

#

for now yeah

naive comet Nov 6, 2025, 6:08 AM

#

what is the latest branch?

#

for i8

prime mica Nov 6, 2025, 6:10 AM

#

https://github.com/sscg13/Stockfish/tree/threat-inputs-i8 methinks

GitHub

GitHub - sscg13/Stockfish at threat-inputs-i8

A free and strong UCI chess engine. Contribute to sscg13/Stockfish development by creating an account on GitHub.

naive comet Nov 6, 2025, 7:55 AM

#

wait hold up

#

why is the nnue code so nice now

violet badger Nov 6, 2025, 8:00 AM

#

you have seen the light?

green moat Nov 6, 2025, 1:48 PM

#

rocky vigil Smallnet isn’t using threats at all

Might be possible to train smallnet with TI as well?
Or do you think it wouldn't gain?

#

Also, does vondele have the recipe for smallnet cooking?
🤔

rocky vigil Nov 6, 2025, 1:51 PM

#

green moat Might be possible to train smallnet with TI as well? Or do you think it wouldn't...

The primary advantage of smallnet is speed, so I think keeping it without threat inputs is most beneficial

green moat Nov 6, 2025, 1:56 PM

#

Probably the Elo gain on TI paradigm for smallnet would be compensated by the speed loss, so in the end smallnet with TI might actually be neutral at best
😐

violet badger Nov 6, 2025, 2:08 PM

#

we did train a smallnet with threats.. but it can't gain IMO.

rocky vigil Nov 6, 2025, 2:13 PM

#

Viren also had an idea to use a single net but either psq inputs only or psq + threats, although experimenting with that can wait for after merge

dark stream Nov 6, 2025, 2:46 PM

#

How long until the net is fully trained?

green moat Nov 6, 2025, 3:25 PM

#

dark stream How long until the net is fully trained?

Those are the current pipelines:
https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/2926829081096545/-/pipelines/2139734382
https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/2926829081096545/-/pipelines/2140902410

prime mica Nov 6, 2025, 3:27 PM

#

rocky vigil Viren also had an idea to use a single net but either psq inputs only or psq + t...

that's an interesting idea actually

#

ok quick question, aren't the threats which involve a piece attacking a king in a way that can't be blocked (e.g. slider directly adjacent, or knight attack) completely redundant?

#

because they are implied by the corresponding main net feature...

#

#

like random example

#

the threat "queen on d8 attacks king on c7" is active if and only if the main net feature "king on c7 and queen on d8" is active

#

if I'm not mistaken we could actually test this post-training... just need to add the threat weights to the right part of the main net weights then zero out the original

rocky vigil Nov 6, 2025, 3:39 PM

#

prime mica the threat "queen on d8 attacks king on c7" is active *if and only if* the main ...

But with threats now the total is not the sum of the parts

#

Knight on e5 and knight on f7 isn’t the sum of their individual weights

violet badger Nov 6, 2025, 3:40 PM

#

green moat Those are the current pipelines: https://gitlab.com/cscs-ci/ci-testing/webhook-c...

there are 3 running ...

stray reef Nov 6, 2025, 3:40 PM

#

prime mica if I'm not mistaken we could actually test this post-training... just need to ad...

ah this works for certain bucket setups yeah

#

but only for the correct stm

prime mica Nov 6, 2025, 3:40 PM

#

stray reef but only for the correct stm

yes sure

rocky vigil Nov 6, 2025, 3:40 PM

#

prime mica ok quick question, aren't the threats which involve a piece attacking a king in ...

In general they are redundant since we never evaluate positions in check, though rn5 was not able to get it to gain

prime mica Nov 6, 2025, 3:41 PM

#

oh right...

#

ok that part negates any benefit of my idea

green moat Nov 6, 2025, 3:41 PM

#

violet badger there are 3 running ...

This one?
https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/2926829081096545/-/pipelines/2138356933
But I can't see anything....
😐

prime mica Nov 6, 2025, 3:41 PM

#

do we ever train on positions in check? if not then those features will probably be driven to ~0 anyway...

violet badger Nov 6, 2025, 3:41 PM

#

no.. skipped

prime mica Nov 6, 2025, 3:42 PM

#

surely it would be good to skip add/sub for them though

#

I'll take another look at rn5's work

rocky vigil Nov 6, 2025, 3:42 PM

#

Also some interesting thing is bullet initializes threats / psq separately according to their individual sparsities

#

Idk if it’s any good

#

But can certainly be tested

prime mica Nov 6, 2025, 3:44 PM

#

interesting

#

so many avenues of exploration 😩

rocky vigil Nov 6, 2025, 3:44 PM

#

In general I wonder how much weight initialization matters

violet badger Nov 6, 2025, 3:45 PM

#

https://github.com/official-stockfish/nnue-pytorch/blob/5d18196172dcb181bb878922092ea45ec94aec29/training_data_loader.cpp#L821

#

that's where the skipping logic is.

violet badger Nov 6, 2025, 3:48 PM

#

green moat This one? https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076...

pipelines correspond to PR here https://github.com/vondele/nettest/pulls

#

right now it is a bit harder to see, as I pushed an additional commit to 2 of 3 PRs, and github doesn't show the pipeline on the previous commit to be active, despite it being active.

violet badger Nov 6, 2025, 9:33 PM

#

👀 https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/2926829081096545/-/jobs/12004945633

GitLab

step_5_05efd3f870d0_test (#12004945633) · Jobs · CSCS-CI / ci-tes...

Mirror of https://github.com/vondele/nettest

rocky vigil Nov 6, 2025, 9:33 PM

#

👀

violet badger Nov 6, 2025, 9:39 PM

#

seems real.

rocky vigil Nov 6, 2025, 9:40 PM

#

arm speedups for i8 done some big magic

#

as an aside, that is the strongest stage 2 I have ever seen

violet badger Nov 6, 2025, 9:42 PM

#

also.

#

but see stage 4 😉

#

anyway, looks very sweet now.

prime mica Nov 6, 2025, 9:44 PM

#

yw

#

what's "reference" in there, the previous threat inputs?

#

or stage 2 or what

violet badger Nov 6, 2025, 9:47 PM

#

master

#

see yaml description 🙂

#

https://github.com/vondele/nettest/pull/122/commits/4159b8aebab8304699ed3710ebc9b301a26d5b79

prime mica Nov 6, 2025, 9:48 PM

#

no fucking way

violet badger Nov 6, 2025, 9:49 PM

#

agree, that's sweet ...

prime mica Nov 6, 2025, 9:49 PM

#

fpoaijpoiajewpofijwofeij

#

I am so excited

violet badger Nov 6, 2025, 9:49 PM

#

not without reason.

prime mica Nov 6, 2025, 9:49 PM

#

impressively calm response

violet badger Nov 6, 2025, 9:50 PM

#

8640 messages later..

prime mica Nov 6, 2025, 9:55 PM

#

how can I download the net?

#

I'm curious how it'll be on my computer given the lesser speedup

violet badger Nov 6, 2025, 9:56 PM

#

https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/2926829081096545/-/jobs/11961488158/artifacts/browse/step_05efd3f870d0/

prime mica Nov 6, 2025, 9:57 PM

#

Thxxxx

#

I think I’ll make another attempt at speedups this weekend on my train trip

violet badger Nov 6, 2025, 9:57 PM

#

(you can get there via the artifacts of the proper training step)

prime mica Nov 6, 2025, 9:57 PM

#

See if we can squeeze out a bit more..

violet badger Nov 6, 2025, 9:57 PM

#

excellent

prime mica Nov 6, 2025, 9:58 PM

#

I still can’t believe that’s against master…

#

Ok I’ll shut up

violet badger Nov 6, 2025, 9:58 PM

#

I agree, though. It is quite spectacular.

#

let's triple check somehow 😉

#

-engine name=reference cmd=/workspace/scratch/packages/stockfish/69a01b88f35db2a5003d42116f573207ca5c275b-profile-build/Stockfish/src/stockfish

#

undeniable...

foggy wind Nov 6, 2025, 9:59 PM

#

Maybe it doesn't scale at all and is therefore useless Kappa

violet badger Nov 6, 2025, 10:00 PM

#

threats known antiscaler.

desert tree Nov 6, 2025, 10:17 PM

#

is it happening👀

rocky vigil Nov 6, 2025, 10:17 PM

#

looks safe enough to use stage 4 so I'll start a few progtests on fishtest

violet badger Nov 6, 2025, 10:18 PM

#

might be sprt against master time?

rocky vigil Nov 6, 2025, 10:18 PM

#

or that

#

the whole shebang?

#

stc / ltc / stc smp / ltc smp?

violet badger Nov 6, 2025, 10:18 PM

#

well, one at a time..

rocky vigil Nov 6, 2025, 10:19 PM

#

ok

prime mica Nov 6, 2025, 10:19 PM

#

don't forget to turn off auto purge ;)

amber fern Nov 6, 2025, 11:11 PM

#

Wait, no way we are about to get a fishtest of threat inputs vs master that is gaining?! 🥺

prime mica Nov 6, 2025, 11:13 PM

#

threads

#

^_^

naive comet Nov 6, 2025, 11:14 PM

#

threat inputs

prime mica Nov 6, 2025, 11:14 PM

#

maybe I'll figure out a way to add thread count as a feature

#

then we'll have true thread inputs ;)

amber fern Nov 6, 2025, 11:20 PM

#

So is it on fishtest yet? 🙂

prime mica Nov 6, 2025, 11:20 PM

#

https://tests.stockfishchess.org/tests/live_elo/690d2514ec1d00d2c195beb5

#

this is so hype

#

NOT A DRILL

amber fern Nov 6, 2025, 11:22 PM

#

YOOOO!!!!

#

+14 guys xD

prime mica Nov 6, 2025, 11:23 PM

#

let's take bets

#

I'm betting +4

amber fern Nov 6, 2025, 11:23 PM

#

where do I check the error bars?

amber fern Nov 6, 2025, 11:23 PM

#

prime mica I'm betting +4

im betting +6 fr

#

okay maybe +5

plain flower Nov 6, 2025, 11:53 PM

#

it'll slide down and settle at +1 </pessimism>

desert tree Nov 6, 2025, 11:54 PM

#

im betting +7 because thats what the SPRT is currently saying

amber fern Nov 6, 2025, 11:57 PM

#

Okay, guess ill middleground my guesses: 5.5 +-1 😂

warm thistle Nov 7, 2025, 12:01 AM

#

i'm guessing 0 +/- 1199.99

split warren Nov 7, 2025, 1:13 AM

#

imma bet 4, not 3, not 5, but 4

amber fern Nov 7, 2025, 1:39 AM

#

Its gonna be 1 guys...

split warren Nov 7, 2025, 1:40 AM

#

vvltc smp banger

split warren Nov 7, 2025, 1:40 AM

#

violet badger 👀 https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/292...

what is different compared to this in this test?

prime mica Nov 7, 2025, 1:45 AM

#

different net... but not sure if that can explain the delta

#

and different arch ofc but I don't think that's entirely it either

lofty cedar Nov 7, 2025, 1:50 AM

#

All these for 1 elo?...

#

There must be more!

naive comet Nov 7, 2025, 2:07 AM

#

#bet fail red -0.69

lofty cedar Nov 7, 2025, 2:34 AM

#

Though I don't think threat input is an anti-scaler.

#

It's quite well-established that bigger neural networks are good scalers, not meaning that it would scale well to have a bloated net size, but that if the bigger net is already good at STC, it would probably be good at LTC and above.

#

However, I think it might have to do with search.

prime mica Nov 7, 2025, 2:36 AM

#

I think it was a joke

lofty cedar Nov 7, 2025, 2:37 AM

#

I think it may be an apparent anti-scaler if the search tuning was so heavily done on the old net that the search now adjusts for all the quirks of the old net.

#

And since the search-wide tuning was done at VVLTC, as you approach longer TC, you're fighting an increasingly uphill battle.

naive comet Nov 7, 2025, 2:43 AM

#

maybe it becomes anti scaler if the speedup was too good

#

in that case we just increase L1

prime mica Nov 7, 2025, 3:44 AM

#

#

(local SPRT, posting for future reference)

amber fern Nov 7, 2025, 3:45 AM

#

prime mica I think it was a joke

yes it was

prime mica Nov 7, 2025, 3:45 AM

#

it looks like a lot more stuff got inlined into evaluate in master than in threat inputs

#

I wonder if forcing inlining would be good or bad

#

partial_insertion_sort is still taking a disgusting amount of time 😩

amber fern Nov 7, 2025, 3:46 AM

#

rn the fishtest isn't going great... but I believe that better nets will come! And tuning for it will help a ton

prime mica Nov 7, 2025, 3:50 AM

#

lol

#

vpmovsxbw my beloved

amber fern Nov 7, 2025, 3:52 AM

#

Guys, which net is next? Like to be tested on fishtest, I assume the current threat-inputs-i8 (update net) isn't the strongest one coming?

prime mica Nov 7, 2025, 3:53 AM

#

not sure

amber fern Nov 7, 2025, 3:53 AM

#

Any good ideas that weren't put into that net?

prime mica Nov 7, 2025, 3:53 AM

#

if it's really close between master and threat inputs that's pretty great bc I'm positive there are more speedups to be found

amber fern Nov 7, 2025, 3:54 AM

#

prime mica if it's really close between master and threat inputs that's pretty great bc I'm...

yeah its -0.13 rn

warm thistle Nov 7, 2025, 3:54 AM

#

amber fern Any good ideas that weren't put into that net?

spsa i guess

prime mica Nov 7, 2025, 3:54 AM

#

amber fern yeah its -0.13 rn

massive error bars ofc haha

amber fern Nov 7, 2025, 3:54 AM

#

warm thistle spsa i guess

https://discord.com/channels/435943710472011776/1431539619446394880

prime mica Nov 7, 2025, 3:55 AM

#

oh you mean the log likelihood ratio

#

yeah we'll see

amber fern Nov 7, 2025, 3:55 AM

#

prime mica massive error bars ofc haha

yeah but only +- like 2 elo at this point right?

amber fern Nov 7, 2025, 3:55 AM

#

prime mica oh you mean the log likelihood ratio

no I meant elo haha

prime mica Nov 7, 2025, 3:55 AM

#

oh ok

prime mica Nov 7, 2025, 3:55 AM

#

amber fern yeah but only +- like 2 elo at this point right?

ehhh the trajectories are very variable lol

#

especially with this where there's probably large inter-computer differences

#

actually I haven't looked at the residuals yet let's see

amber fern Nov 7, 2025, 3:56 AM

#

it feels like a sport event watching the dials update live lol

#

nerdiest sporting event in history that is

prime mica Nov 7, 2025, 3:56 AM

#

lololol

#

yeah just eyeballing the residuals, AVX2 machines are suffering while AVX512 machines are doing swell

#

probably the vpmovsxbw spam 😩

amber fern Nov 7, 2025, 3:57 AM

#

prime mica yeah just eyeballing the residuals, AVX2 machines are suffering while AVX512 mac...

where are the avx2 matches???

#

this is the only link I use: https://tests.stockfishchess.org/tests/live_elo/690d2514ec1d00d2c195beb5

prime mica Nov 7, 2025, 3:58 AM

#

lol

#

https://tests.stockfishchess.org/tests/view/690d2514ec1d00d2c195beb5

#

you can see the per-worker results

amber fern Nov 7, 2025, 3:59 AM

#

prime mica you can see the per-worker results

where?

prime mica Nov 7, 2025, 3:59 AM

#

the link I sent

#

there's a dropdown somewhere

amber fern Nov 7, 2025, 4:01 AM

#

yeah is it here?

prime mica Nov 7, 2025, 4:01 AM

#

yep, you see the big table

#

one thing I've been meaning to add to fishtest is an ability to aggregate by some property

amber fern Nov 7, 2025, 4:01 AM

#

how do I read the avx2 vs 512 differences? The residuals?

prime mica Nov 7, 2025, 4:02 AM

#

yes but if I'm not mistaken the residual doesn't actually tell you whether the worker is significantly lower or higher than the mean

#

so you have to look at the pentanomial

#

anyway dw about it

#

we'll see in 12 hours where we're at

amber fern Nov 7, 2025, 4:02 AM

#

haha, yeah wait till it gets to 50k games ig

rare jacinth Nov 7, 2025, 4:04 AM

#

why hasn't l2 been increased to 31 to compensate for the smaller accumulator?

prime mica Nov 7, 2025, 5:31 AM

#

Results of New vs Base (30+0.3, 8t, 256MB, UHO_4060_v4.epd):
Elo: 9.17 +/- 10.71, nElo: 21.75 +/- 25.38
LOS: 95.35 %, DrawRatio: 64.17 %, PairsRatio: 1.35
Games: 720, Wins: 201, Losses: 182, Draws: 337, Points: 369.5 (51.32 %)
Ptnml(0-2): [0, 55, 231, 74, 0], WL/DD Ratio: 1.22
LLR: 0.25 (8.4%) (-2.94, 2.94) [0.00, 2.00]```
results trickling in from a VVLTC run I'm doing

#

probably equivalent to 80+0.8 8t or so on fishtest

#

not quite as dramatic as the STC on vondele's CI...

rocky vigil Nov 7, 2025, 6:07 AM

#

amber fern Any good ideas that weren't put into that net?

Looking into some QAT

#

We have a couple more training runs so far, might brute force a better net by sheer luck

#

Also like Daniel said could also test increasing L2 size

stray reef Nov 7, 2025, 7:00 AM

#

naive comet maybe it becomes anti scaler if the speedup was too good

in that case elo diff vs master converges to some number > 0 as time -> infinity tho

#

is the SPRT that's running rn a stage 5 net?

rocky vigil Nov 7, 2025, 7:02 AM

#

stray reef is the SPRT that's running rn a stage 5 net?

both a stage 4 and stage 5 one are running rn

#

local testing had them ~~equal but the stage 5 one is performing better on fishtest so far

amber fern Nov 7, 2025, 7:40 AM

#

yeah! so far: stage 4 net = -0.26 elo (30k games) stage 5 net = 2.25 elo (17k games)

rocky vigil Nov 7, 2025, 8:22 AM

#

btw it's time to look into preparing the branch for PR

#

so, what would we like the format of the net to be

#

some minor notes I have

#

change the mirroring of threat inputs to efgh (this can be done by permuting the weights, e.g.)
change the net format to store the i8 weights verbatim
ensure compilation works with all architectures
clean up the code generally
(optionally) do a lil bitcoin mining to rename the net

#

on the net side there are still a couple other things to try

#

check if L1=1280 works again after the i8 speedup

#

or check if L2=31(+1) works with the general L1 reduction

amber fern Nov 7, 2025, 8:29 AM

#

Looking forward to SF18! 🙂

rocky vigil Nov 7, 2025, 8:29 AM

#

heh would need to be like 10 elo gain for that

#

although

#

+X vvltc and another +Y from the (search) spsa that will happen after gets us closer

#

to sf 18

violet badger Nov 7, 2025, 9:15 AM

#

let's not confuse this thread with SF18, it is going to be complex enough without that aspect 😉

rocky vigil Nov 7, 2025, 10:16 AM

#

rare jacinth why hasn't l2 been increased to 31 to compensate for the smaller accumulator?

https://github.com/sscg13/nnue-pytorch/commit/07787fd8131ab171d76534ae29cd41eae0c91c21

#

would require a full retrain

#

though

#

so 3 days or so

amber fern Nov 7, 2025, 10:18 AM

#

rocky vigil so 3 days or so

Do it!

rocky vigil Nov 7, 2025, 10:18 AM

#

need @violet badger to set it up

#

would also be helpful if someone had a profile of latest

#

branch

#

my general estimation is that l2=31 will end up being -5% speed or so

violet badger Nov 7, 2025, 10:20 AM

#

rocky vigil would also be helpful if someone had a profile of latest

#1336647760388034610 message ?

rocky vigil Nov 7, 2025, 10:21 AM

#

difficult there to tell the relative runtime

#

I'll try

violet badger Nov 7, 2025, 10:24 AM

#

I'll setup the training, ultimately, one needs to measure to get a real number

rocky vigil Nov 7, 2025, 10:25 AM

#

yeah

#

it would appear that l2 etc. take up 8% of total runtime

#

so I think -5% speed from doubling l2 seems reasonable

violet badger Nov 7, 2025, 10:27 AM

#

so you change both l1 and l2?

rocky vigil Nov 7, 2025, 10:27 AM

#

no

#

i think the l1 setting there gets overridden by --l1 option

#

anyways

violet badger Nov 7, 2025, 10:27 AM

#

ah, it is just the diff showing it that way.

#

yeah, l1 is set by option

rocky vigil Nov 7, 2025, 10:28 AM

#

i just saw it and thought it should change to make it more accurate

#

for cosmetic purposes

violet badger Nov 7, 2025, 10:30 AM

#

so started

twilit oriole Nov 7, 2025, 10:50 AM

#

im going to predict that fails at TC lol

twilit oriole Nov 7, 2025, 10:54 AM

#

rocky vigil my general estimation is that l2=31 will end up being -5% speed or so

(if this figure is correct)

rocky vigil Nov 7, 2025, 10:55 AM

#

i mean why not give it a try lol

#

we'll have to see

#

the actual slowdown

twilit oriole Nov 7, 2025, 10:56 AM

#

Giving a prediction does not suggest not giving it a try (obviously)

#

The vibe I got was there was unusually high optimism for this

rocky vigil Nov 7, 2025, 10:59 AM

#

nah I suspect 16 is optimal as well

#

in fact I am surprised 16 is better than 8 even

#

but indeed there is the possibility that since input -> l1 takes more portion of total time relative to l1 -> l2, l2 could be increased

naive comet Nov 7, 2025, 12:34 PM

#

can I have some clarity - will threat-inputs-i8 eventually be merged into threat_inputs branch?

#

and if i want to write a patch that applies for both - which one do I base on/test against?

rocky vigil Nov 7, 2025, 12:40 PM

#

naive comet can I have some clarity - will threat-inputs-i8 eventually be merged into threat...

yes we in fact can merge it rn if shawn wants

rocky vigil Nov 7, 2025, 12:42 PM

#

naive comet and if i want to write a patch that applies for both - which one do I base on/te...

i think i8 has been proven to be much better so going forward it only matters to test on that

naive comet Nov 7, 2025, 12:42 PM

#

ok

rocky vigil Nov 7, 2025, 12:43 PM

#

rn anematode stage 5 branch performing slightly better on fishtest but still well within error bars

#

so idt it matters which of the i8 branches you use

lofty cedar Nov 7, 2025, 1:34 PM

#

Maybe someone could make Finny table work with threat input?

#

Though not sure if it would gain.

rocky vigil Nov 7, 2025, 1:35 PM

#

in what sense

#

full refreshes are basically inconsequential

naive comet Nov 7, 2025, 1:35 PM

#

we dont have buckets on threat inputs

#

only HM

lofty cedar Nov 7, 2025, 1:35 PM

#

Oh, I see.

#

I thought it had such a thing.

prime mica Nov 7, 2025, 1:46 PM

#

So what explains the chasm between the run on CI and on fishtest

rocky vigil Nov 7, 2025, 1:46 PM

#

hardware

#

x86 sees less benefit than arm from i8

#

apparently

prime mica Nov 7, 2025, 1:47 PM

#

😩

rocky vigil Nov 7, 2025, 1:47 PM

#

tbf why not start a 10k game ltc 1 thread

#

just to see how the scaling is like

stray reef Nov 7, 2025, 1:48 PM

#

that's reasonable imo

rocky vigil Nov 7, 2025, 1:48 PM

#

after i8 it should be neutral scaling

#

or so

prime mica Nov 7, 2025, 1:48 PM

#

Yeah we should

rocky vigil Nov 7, 2025, 1:48 PM

#

if it's antiscaling we have a slight issue

stray reef Nov 7, 2025, 1:48 PM

#

stray reef in that case elo diff vs master converges to some number > 0 as time -> infinity...

.

#

just requires positive STC in that case

#

which is doable

prime mica Nov 7, 2025, 1:54 PM

#

still sss but not looking too crazy at very long TC:

Results of New vs Base (30+0.3, 8t, 256MB, UHO_4060_v4.epd):
Elo: 3.43 +/- 3.73, nElo: 8.24 +/- 8.96
LOS: 96.43 %, DrawRatio: 65.57 %, PairsRatio: 1.12
Games: 5780, Wins: 1610, Losses: 1553, Draws: 2617, Points: 2918.5 (50.49 %)
Ptnml(0-2): [0, 470, 1895, 523, 2], WL/DD Ratio: 1.33
LLR: 0.69 (23.5%) (-2.94, 2.94) [0.00, 2.00]

#

(this is master vs. the stage 5 net)

#

we'll see on fishtest ofc

#

also I'm still confused, I thought we benchmarked some very nice speed gains on x86

rocky vigil Nov 7, 2025, 1:57 PM

#

speedup

#

natural that speedup matters less at higher time control

prime mica Nov 7, 2025, 1:57 PM

#

but shouldn't that help quite a bit on the current SPRTs?

rocky vigil Nov 7, 2025, 1:57 PM

#

prime mica but shouldn't that help quite a bit on the current SPRTs?

indeed we went from -9 to 0

#

at stc

prime mica Nov 7, 2025, 1:57 PM

#

ohhh

foggy wind Nov 7, 2025, 1:57 PM

#

https://tests.stockfishchess.org/tests/view/690d2514ec1d00d2c195beb5


GROUPED BY ARCH

64bit AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                | Elo:    -2.45 ±    2.62 | LOS:   3.3% | LLR: -1.77 | [89, 2356, 4773, 2172, 114]
64bit AVX512ICL VNNI AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT | Elo:     2.85 ±    2.92 | LOS:  97.2% | LLR:  1.11 | [60, 1760, 3752, 1861, 71]
64bit AVX2 SSE41 SSSE3 SSE2 POPCNT                            | Elo:    -4.12 ±    3.68 | LOS:   1.4% | LLR: -1.40 | [60, 1200, 2437, 1097, 54]
64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                       | Elo:    -2.55 ±    3.97 | LOS:  10.4% | LLR: -0.80 | [51, 1061, 2141, 970, 65]
64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                  | Elo:     4.68 ±    5.74 | LOS:  94.5% | LLR:  0.51 | [20, 445, 985, 498, 20]
64bit POPCNT NEON_DOTPROD                                     | Elo:    45.23 ±   23.07 | LOS: 100.0% | LLR:  0.33 | [0, 15, 55, 40, 2]

rocky vigil Nov 7, 2025, 1:58 PM

#

lmao

#

arm be like

prime mica Nov 7, 2025, 1:58 PM

#

NEON_DOTPROD enjoyer

#

ok yeah that's dispositive

#

arghh

rocky vigil Nov 7, 2025, 1:58 PM

#

trust in scaling Kappa

#

smp is like +5 elo

prime mica Nov 7, 2025, 2:00 PM

#

ok well pending scaling tests... does that mean that if we get like a 3% speedup across the board on x86 specific to threat inputs (not saying this is easy!) then we should be ok?

rocky vigil Nov 7, 2025, 2:01 PM

#

1% is sufficient to make it pass stc sprt

#

anyways

#

it's like 1% -> 2 elo at stc

prime mica Nov 7, 2025, 2:02 PM

#

right

#

tantalizing...

violet badger Nov 7, 2025, 2:03 PM

#

Nice summary.. it basically is a question of having 'the right' HW on fishtest right now. VNNI seems to like this as well.

foggy wind Nov 7, 2025, 2:04 PM

#

With fleed it would pass in a second xD

violet badger Nov 7, 2025, 2:04 PM

#

ik...

prime mica Nov 7, 2025, 2:04 PM

#

LOL

#

> deploy ARM cores
> release SF 18

foggy wind Nov 7, 2025, 2:05 PM

#

We simply have to market this as mobile first. People only use their cell phones for everything anyway. Kappa

rocky vigil Nov 7, 2025, 2:05 PM

#

total neon penta over the 2 tests

stray reef Nov 7, 2025, 2:06 PM

#

what LLR for [0, 2] bounds?

rocky vigil Nov 7, 2025, 2:06 PM

#

0.5

stray reef Nov 7, 2025, 2:06 PM

#

kekw

rocky vigil Nov 7, 2025, 2:06 PM

#

still need 5000 more neon games

violet badger Nov 7, 2025, 2:10 PM

#

it is still somewhat surprising that neon does so well. Is there some particular instruction that works very well?

prime mica Nov 7, 2025, 2:10 PM

#

I think the i8 -> i16 conversions are much cheaper

#

maybe not the only cause but you can do one load of 128-bits and then unpack it in two instructions into two 128-bit registers fit for accumulation

#

whereas on x86 we're doing one load + one conversion per accumulation

#

Yoshie tried a technique I suggested to avoid that but it performed worse

#




#ifdef USE_NEON
                for (IndexType k = 0; k < Tiling::NumRegs; k += 2) {
                    acc[k] = vec_sub_16(acc[k], vmovl_s8(vget_low_s8(column[k / 2])));
                    acc[k + 1] = vec_sub_16(acc[k + 1], vmovl_high_s8(column[k / 2]));
                }
#else

#

(vget_low_s8 is a no-op in assembly, just for casting porpoises)

#

I don't think that explains the full gap tho

stray reef Nov 7, 2025, 2:13 PM

#

yeah there's no way this explains so much elo

#

especially since it's probably still mostly memory bound on x86

prime mica Nov 7, 2025, 2:14 PM

#

honestly I"m not so sure ab that anymore...

#

I'll do some profiling later on my friend's older Intel box

violet badger Nov 7, 2025, 2:15 PM

#

I think that big gap can almost only be explained by the used data now somehow fitting in some cache?

#

or by magic a better access pattern?

prime mica Nov 7, 2025, 2:15 PM

#

hm yeah, maybe the VNNI trend is because of newer machines being more well-endowed

violet badger Nov 7, 2025, 2:15 PM

#

possibly.

lofty cedar Nov 7, 2025, 2:16 PM

#

I tried fusing load before. Didn't work.

prime mica Nov 7, 2025, 2:16 PM

#

😩

prime mica Nov 7, 2025, 2:16 PM

#

lofty cedar I tried fusing load before. Didn't work.

ye I saw

stray reef Nov 7, 2025, 3:21 PM

#

@violet badger someone really wanted this STC to pass huh? xD

violet badger Nov 7, 2025, 3:22 PM

#

oh, it passed 😮

stray reef Nov 7, 2025, 3:22 PM

#

i see your arm machines there :P

#

congrats!

formal smelt Nov 7, 2025, 3:22 PM

#

#

fighting against the technologov cores lol

violet badger Nov 7, 2025, 3:23 PM

#

basically just documenting how HW dependent this is ....

#

let's wait a bit for the second test to pass, and after that submit LTC (for which I will remove again the arm machines).

rocky vigil Nov 7, 2025, 3:24 PM

#

cool

rocky vigil Nov 7, 2025, 4:01 PM

#

violet badger let's wait a bit for the second test to pass, and after that submit LTC (for whi...

it does seem the arm machines simply print llr on the tests

violet badger Nov 7, 2025, 4:11 PM

#

well given the Elo numbers we had in the pipeline, it is clear they can easily do that

#

I would say go ahead and submit one LTC.

rocky vigil Nov 7, 2025, 4:13 PM

#

ok

#

i suppose we just choose at random

#

since stage 4 and stage 5 are well within error bars at fishtest also

#

since I'm online rn i guess it'll just be stage 4 then

violet badger Nov 7, 2025, 4:15 PM

#

either is fine.

#

eventually we sprt things against each other.

#

in principle 4 is nicer, since it would establish a shorter training baseline.

rocky vigil Nov 7, 2025, 4:17 PM

#

ok

#

submitted

violet badger Nov 7, 2025, 4:27 PM

#

I've also updated the reference in the pipelines to be your branch at the f3f net.

#

and switched it to sprt

#

so we will more easily see what is better in future tests.

rocky vigil Nov 7, 2025, 4:29 PM

#

ah cool

#

actually the minor fix is finishing later today

#

so that'll be cool start

violet badger Nov 7, 2025, 4:31 PM

#

in principle I'd have to stop and restart the pipeline for it to pick up that commit that changes the reference.

#

(right now I think the yaml it is testing is not yet with a suitable sf sha).

lapis parrot Nov 7, 2025, 6:58 PM

#

anematode machine is like solo killing this test kek

#

smth like -27 -39 +1 -29 pairs

foggy wind Nov 7, 2025, 7:03 PM

#

anematode-128cores-7b133829 | Elo: -6.29 ± 4.66 | LOS: 0.4% | LLR: -1.19 | [4, 608, 1351, 516, 5]

prime mica Nov 7, 2025, 7:08 PM

#

Yaaaa the i8 speedup was so small on my computer

#

The break even point is probably even higher TC

prime mica Nov 7, 2025, 8:29 PM

#

Silly suggestion, are there any things in search which are known to depend strongly on evaluation accuracy

#

or is the tuning far too diffuse

lapis parrot Nov 7, 2025, 8:30 PM

#

well, any static eval based heuristic more or less

#

but it's like hmm

#

+90 elo from eval with major slowdown ~= +15-20 elo from search patches

#

at least this is what it was when the very first NNUE was introduced

#

also in general tuning should handle it nicely anyway

prime mica Nov 7, 2025, 8:31 PM

#

right

#

like I'm wondering whether it makes sense given that we're within shooting distance of master to try some basic search tuning and see if we can exceed it

#

and that effort can be done in parallel with trying to speed up x86

lapis parrot Nov 7, 2025, 8:32 PM

#

eh, not really imho

#

search tuning can and probably should be done on top of the net imho

prime mica Nov 7, 2025, 8:33 PM

#

which net

lapis parrot Nov 7, 2025, 8:33 PM

#

which passes

daring wren Nov 7, 2025, 8:33 PM

#

prime mica like I'm wondering whether it makes sense given that we're within shooting dista...

well there's no way to prove that those spsa results actually only work with the threat inputs net

prime mica Nov 7, 2025, 8:33 PM

#

SPSA?

daring wren Nov 7, 2025, 8:33 PM

#

maybe you are just tuning the search to be better regardless of what net is veing used

#

in that case, you're just hacking the net in and using the tune as an excuse to get a passing SPRT

prime mica Nov 7, 2025, 8:34 PM

#

sure... but that can be validated by using the same parameters with the master net, no?

#

or running two SPSAs although that's expensive

lapis parrot Nov 7, 2025, 8:35 PM

#

I would prefer to not touch search with new arch

#

I think maintainers will also be like this )

prime mica Nov 7, 2025, 8:35 PM

#

kk

violet badger Nov 7, 2025, 8:35 PM

#

yeah would be nice to break even with just net, and my expectation is that would lead to search tweaks afterwards.

#

ARMed riot against x86 right now

prime mica Nov 7, 2025, 8:36 PM

#

lol

#

https://hackaday.com/2024/03/21/why-x86-needs-to-die/ me rn

Hackaday

Julian Scheffers

Why X86 Needs To Die

As I’m sure many of you know, x86 architecture has been around for quite some time. It has its roots in Intel’s early 8086 processor, the first in the family. Indeed, even the original …

lapis parrot Nov 7, 2025, 8:37 PM

#

hmph about search I would guess that since it has threat inputs

#

we can be more aggressive with capture pruning and qsearch

#

maybe

#

ofc it's all pretty vague

prime mica Nov 7, 2025, 8:37 PM

#

sure

lapis parrot Nov 7, 2025, 8:37 PM

#

maybe finally history adjustment will work for captures?

#

or correction history adjustments ?

violet badger Nov 7, 2025, 8:42 PM

#

prime mica Nov 7, 2025, 8:42 PM

#

#ifdef USE_NEON
constexpr bool threat_inputs = true;
#else
constexpr bool threat_inputs = false;
#endif```

lapis parrot Nov 7, 2025, 8:42 PM

#

well this one can actually be a rluke

prime mica Nov 7, 2025, 8:52 PM

#

I'm wondering whether speedups matter more at LTC than generally believed...

#

@foggy wind would u mind running your Elo bucketing script on the PT https://tests.stockfishchess.org/tests/view/68ee711328e6d77fcff9fd63 to see the difference between VNNI and non-VNNI architectures

lapis parrot Nov 7, 2025, 8:52 PM

#

they are like 70% of what they are at STC I think?

prime mica Nov 7, 2025, 8:53 PM

#

oh what

#

I thought it was like 30%

#

hm ok

lapis parrot Nov 7, 2025, 8:53 PM

#

elo wise? no

#

speedups are usually "mocked" for not scaling

foggy wind Nov 7, 2025, 8:53 PM

#

prime mica <@398510765910523904> would u mind running your Elo bucketing script on the PT h...

GROUPED BY ARCH

64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT   | Elo: 30.50 ± 1.84 | LOS: 100.0% | LLR: 29.71 | [2, 1204, 7486, 3301, 4]
64bit AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT | Elo: 25.27 ± 2.04 | LOS: 100.0% | LLR: 20.40 | [3, 1015, 6107, 2396, 4]
64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT        | Elo: 24.71 ± 2.93 | LOS: 100.0% | LLR:  9.85 | [0, 515, 2989, 1178, 1]
64bit AVX2 SSE41 SSSE3 SSE2 POPCNT             | Elo: 19.43 ± 3.43 | LOS: 100.0% | LLR:  5.70 | [0, 388, 2176, 755, 2]
64bit SSE41 SSSE3 SSE2 POPCNT                  | Elo: 21.28 ± 9.48 | LOS: 100.0% | LLR:  0.80 | [1, 57, 300, 115, 1]

lapis parrot Nov 7, 2025, 8:53 PM

#

because average patch is like ~ the same elo at LTC as it is at STC

#

and some even hyperscle

#

but it doesn't mean they are like 30% lol

prime mica Nov 7, 2025, 8:54 PM

#

O wow, +5 delta between VNNI and normal AVX512

lapis parrot Nov 7, 2025, 8:54 PM

#

#

210/142

prime mica Nov 7, 2025, 8:54 PM

#

gotcha

lapis parrot Nov 7, 2025, 8:55 PM

#

is like slightly less that 1,5

prime mica Nov 7, 2025, 8:55 PM

#

➗

lapis parrot Nov 7, 2025, 8:55 PM

#

so speedups should be slightly above 66% stc -> ltc

#

just that usual stuff for sf releases is having this elo being 1:1

#

so logical patches in general scale better than speedups

prime mica Nov 7, 2025, 8:56 PM

#

foggy wind ``` GROUPED BY ARCH 64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT | Elo: 30.50...

how about for the previous PT, https://tests.stockfishchess.org/tests/view/68d98c39fa806e2e8393b7a1

lapis parrot Nov 7, 2025, 8:56 PM

#

even our last PT is 31 - 27

#

which is far above 1,5:1

violet badger Nov 7, 2025, 8:56 PM

#

a bit old data, I think that with the current book it is even more similar.

lapis parrot Nov 7, 2025, 8:56 PM

#

maybe

#

you can measure

#

🙂

prime mica Nov 7, 2025, 8:57 PM

#

📏

lapis parrot Nov 7, 2025, 8:57 PM

#

also yeah at some point compression hits you anyway

#

even with UHO books

#UE Threat Inputs for AB