foggy wind Nov 7, 2025, 8:57 PM

#

GROUPED BY ARCH

64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT        | Elo: 20.85 ± 1.99 | LOS: 100.0% | LLR: 17.97 | [2, 1135, 6434, 2321, 2]
64bit AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT | Elo: 21.13 ± 2.25 | LOS: 100.0% | LLR: 14.40 | [0, 944, 5114, 1908, 2]
64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT   | Elo: 27.43 ± 2.50 | LOS: 100.0% | LLR: 14.88 | [1, 717, 4149, 1753, 5]
64bit AVX2 SSE41 SSSE3 SSE2 POPCNT             | Elo: 21.21 ± 2.85 | LOS: 100.0% | LLR:  8.95 | [1, 598, 3197, 1204, 3]
64bit SSE41 SSSE3 SSE2 POPCNT                  | Elo: 22.17 ± 8.81 | LOS: 100.0% | LLR:  0.99 | [0, 57, 332, 120, 1]

lapis parrot Nov 7, 2025, 8:57 PM

#

you can have better game pair ratio but worse elo and worse nElo

prime mica Nov 7, 2025, 8:58 PM

#

foggy wind ``` GROUPED BY ARCH 64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT | Elo: 20.85...

huh where did the avx512 + vnni combination go

#

oh wait I'm reading it wrong

#

hm ok so the delta was always there... but then the distribution of workers changed a lot

prime mica Nov 7, 2025, 9:43 PM

#

ok, found an avx2-specific speedup... ~1% locally but we'll see on fishtest

#

the problem is that NumRegistersSIMD is 16

#

before, the loads would get consistently folded into the memory operand of vpaddw and vpsubw

#

but now they need a temp register for vpmovsxbw

#

and thus at least one acc register needs to be spilled...

frosty imp Nov 7, 2025, 9:49 PM

#

wait it passed?

prime mica Nov 7, 2025, 9:53 PM

#

would any AVX2 users be able to bench https://github.com/anematode/Stockfish/tree/threat-inputs-i8-st5 vs. the previous commit

GitHub

GitHub - anematode/Stockfish at threat-inputs-i8-st5

A free and strong UCI chess engine. Contribute to anematode/Stockfish development by creating an account on GitHub.

summer swan Nov 7, 2025, 9:56 PM

#

frosty imp wait it passed?

STC had a lot of arm workers, and it seems to be doing better on arm.

#

Why not SMP BTW? Even if not willing to pass with VLTC. Shared memory is not as good because positions in a single engine are more similar.

#

BTW I get warning on clang (not new to this commit)

 1104 |         Bitboard threatened = ray & qAttacks & occupied;
      |                  ^
position.cpp:1057:14: note: previous declaration is here
 1057 |     Bitboard threatened;
      |   ```

foggy wind Nov 7, 2025, 10:00 PM

#

summer swan BTW I get warning on clang (not new to this commit) ```position.cpp:1104:18: war...

same with gcc

foggy wind Nov 7, 2025, 10:09 PM

#

prime mica would any AVX2 users be able to bench https://github.com/anematode/Stockfish/tre...

ARCH=x86-64-avx2

Result of 200 runs
==================
base (...sh_avx2.base) =    1739819  +/- 3149
test (...nputs-i8-st5) =    1778412  +/- 3041
diff                   =     +38593  +/- 2346

speedup        = +0.0222
P(speedup > 0) =  1.0000

prime mica Nov 7, 2025, 10:10 PM

#

ok, not terrible...

#

this is on Zen 5 though right? so AVX512 would be the default build

foggy wind Nov 7, 2025, 10:11 PM

#

Yes, but this way the AVX2 path is used.

prime mica Nov 7, 2025, 10:11 PM

#

sure

foggy wind Nov 7, 2025, 10:13 PM

#

I would say 2.2% is more than just not terrible.

summer swan Nov 7, 2025, 10:13 PM

#

prime mica would any AVX2 users be able to bench https://github.com/anematode/Stockfish/tre...

14600k speedtest (avx2 arch, also not default)
a071eec7: 19320421
f9886de7: 19729029
+.0211

prime mica Nov 7, 2025, 10:13 PM

#

ok I'ma hunt for more speedups until I get +4% or so and then put up on fishtest filtered to AVX2

lofty cedar Nov 7, 2025, 10:20 PM

#

So, it seems a massive slowdown on AVX2?

lofty cedar Nov 7, 2025, 10:36 PM

#

Oh... your patch was like 3.7 on AVX2 for my machine.

prime mica Nov 7, 2025, 10:36 PM

#

I made a mistake lol

#

wrong bench

#

will fix and we'll see

lofty cedar Nov 7, 2025, 10:37 PM

#

Woopsies!

prime mica Nov 7, 2025, 10:41 PM

#

ok pushed a fix u can retest

foggy wind Nov 7, 2025, 10:50 PM

#

prime mica ok pushed a fix u can retest

Result of 200 runs
==================
base (...sh_avx2.base) =    1738472  +/- 3302
test (...nputs-i8-st5) =    1765637  +/- 2975
diff                   =     +27165  +/- 1085

speedup        = +0.0156
P(speedup > 0) =  1.0000

prime mica Nov 7, 2025, 10:51 PM

#

hm ok

#

can u verify that the binary gives 1902842 as teh bench?

foggy wind Nov 7, 2025, 10:52 PM

#

Nodes searched : 1902842

prime mica Nov 7, 2025, 10:52 PM

#

ok that's good

#

testing on my friend's Zen 2 EPYC now

#

compiler btw?

foggy wind Nov 7, 2025, 10:53 PM

#

gcc 15.2.1

prime mica Nov 7, 2025, 10:53 PM

#

gotcha

#

also could u git log

#

just wanna make sure it's the right commit as I pushed something else a few minutes ago

foggy wind Nov 7, 2025, 10:54 PM

#

its the fix commit

prime mica Nov 7, 2025, 10:54 PM

#

ooh ok

#

pull again and re-test?

#

I tried cramming DirtyThreat into 32 bits and I think it might help

foggy wind Nov 7, 2025, 10:54 PM

#

kk

prime mica Nov 7, 2025, 10:54 PM

#

ur the best

#

test (./stockfish    ) =     837952  +/- 1405
diff                   =      +7987  +/- 1581

speedup        = +0.0096
P(speedup > 0) =  1.0000

CPU: 32 x AMD EPYC 7502P 32-Core Processor```

#

underwhelming

#

I'ma try the idea of making indices an LUT, forgot who said that

foggy wind Nov 7, 2025, 11:01 PM

#

Result of 200 runs
==================
base (...sh_avx2.base) =    1746542  +/- 3695
test (...nputs-i8-st5) =    1710156  +/- 2848
diff                   =     -36387  +/- 1309

speedup        = -0.0208
P(speedup > 0) =  0.0000

prime mica Nov 7, 2025, 11:02 PM

#

h u h

#

why??

#

it works fine on my end..

#

u sure u did profile-build right

foggy wind Nov 7, 2025, 11:04 PM

#

yeah definitely

prime mica Nov 7, 2025, 11:04 PM

#

😭

foggy wind Nov 7, 2025, 11:04 PM

#

❯ ./stockfish_avx2.threat-inputs-i8-st5 compiler
Stockfish dev-20251107-cd5e513d by the Stockfish developers (see AUTHORS file)

Compiled by                : g++ (GNUC) 15.2.1 on Linux
Compilation architecture   : x86-64-avx2
Compilation settings       : 64bit AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 15.2.1 20250813

prime mica Nov 7, 2025, 11:05 PM

#

that makes no sense

#

could you check that the bnech is the same?

foggy wind Nov 7, 2025, 11:06 PM

#

it is

prime mica Nov 7, 2025, 11:06 PM

#

mmk

#

could you try comparing the commit "fix" vs the commit "cram"

#

if "cram" is rly bad then it should be -0.035

foggy wind Nov 7, 2025, 11:08 PM

#

hmm something is weird. I compiled again and got a different md5sum

prime mica Nov 7, 2025, 11:08 PM

#

that happens to be sometimes if I don't explicitly make clean

#

but not sure

foggy wind Nov 7, 2025, 11:09 PM

#

I'm pretty sure I did. But well

prime mica Nov 7, 2025, 11:09 PM

#

lol

#

cosmic ray /s

foggy wind Nov 7, 2025, 11:16 PM

#

so something was wrong with the binary, but still

Result of 200 runs
==================
base (...sh_avx2.base) =    1746615  +/- 4120
test (...nputs-i8-st5) =    1756439  +/- 3361
diff                   =      +9824  +/- 1416

speedup        = +0.0056
P(speedup > 0) =  1.0000

prime mica Nov 7, 2025, 11:18 PM

#

gross

#

that's between "use stage 5 net" and "cram" ?

foggy wind Nov 7, 2025, 11:19 PM

#

yes

prime mica Nov 7, 2025, 11:19 PM

#

sigh

#

ok I'll have to look at the disassembly on GCC 15 later

#

it probably did something rly dumb

lofty cedar Nov 7, 2025, 11:24 PM

#

I got around 1% speedup.

#

But well, not that many rounds.

rocky vigil Nov 7, 2025, 11:28 PM

#

i wonder, if stockpile the claimed minor speedups and then combine, is this sound methodology

prime mica Nov 7, 2025, 11:30 PM

#

LUT seems to be working... but my computer has plenty of cache

#

I'ma see if I can compact it a little then I'll push

prime mica Nov 7, 2025, 11:31 PM

#

rocky vigil i wonder, if stockpile the claimed minor speedups and then combine, is this soun...

I think it's ok... long term we can verify that each of them is real or not

rocky vigil Nov 7, 2025, 11:31 PM

#

prime mica LUT seems to be working... but my computer has plenty of cache

ah nice

prime mica Nov 7, 2025, 11:32 PM

#

Result of 100 runs
==================
base (....ti.avx2.gcc) =    1298550  +/- 1517
test (./stockfish    ) =    1329132  +/- 2045
diff                   =     +30582  +/- 1763

speedup        = +0.0236
P(speedup > 0) =  1.0000

CPU: 128 x AMD EPYC 9755 128-Core Emb Processor

bench matches...

#

lol ok I crammed it to 17 bits but split across two arrays which is dumb

#

probably makes the most sense to have it either be 24 bits or 17 bits but contiguous

#

ok yeah that destroyed perf

#

hm now that I think about it, the LUT might actually be accessed in a fairly ok way

#

depending on how we index it

#

anyway, I pushed LUT if any of y'all wanna test

#

we might have to put the LUT in shared memory lmao

#

big boi

#

testing on Zen 2 now

rocky vigil Nov 7, 2025, 11:49 PM

#

prime mica we might have to put the LUT in shared memory lmao

speaking of this aren't the pext LUTs also somewhat large

prime mica Nov 7, 2025, 11:49 PM

#

yeah...

#

definitely worth a shot

rocky vigil Nov 7, 2025, 11:50 PM

#

so it looks like we currently 2 elo away in ltc 1thread

#

i kinda guessed this

#

after i8 "antiscaling"

prime mica Nov 7, 2025, 11:51 PM

#

that's kinda rough

twilit oriole Nov 7, 2025, 11:51 PM

#

It isn't anti scaling. It didn't have the arm benefit

prime mica Nov 7, 2025, 11:51 PM

#

does i8 antiscaling mean the whole enterprise is antiscaling?

amber fern Nov 7, 2025, 11:51 PM

#

Yo guys, anyone else notice that the Threat input fishtest is going pretty good right now? For the STC stage 4 and 5 nets 😄

prime mica Nov 7, 2025, 11:52 PM

#

yes you can read the thread above for details haha

#

long story short, it's a (ar)mirage

twilit oriole Nov 7, 2025, 11:52 PM

#

twilit oriole It isn't anti scaling. It didn't have the arm benefit

Plz interpret the results correctly otherwise this gets annoying fast

prime mica Nov 7, 2025, 11:52 PM

#

how are you sure

twilit oriole Nov 7, 2025, 11:52 PM

#

Well at least look into it instead of just declaring it anti scaling

prime mica Nov 7, 2025, 11:52 PM

#

(not saying you're wrong, just trying to understand)

#

I ran some VVLTC games and it was at 3 +/- 2 ELO while I think at STC it's -5 ELO on my machine

#

so there's some hope there... but fishtest seems to have the opposite trend

#

base (...kfish.ti.gcc) =     830469  +/- 1142
test (./stockfish    ) =     846138  +/- 1085
diff                   =     +15670  +/- 1420

speedup        = +0.0189
P(speedup > 0) =  1.0000

CPU: 32 x AMD EPYC 7502P 32-Core Processor

amber fern Nov 7, 2025, 11:53 PM

#

prime mica so there's some hope there... but fishtest seems to have the opposite trend

where is the LCT fishtest link?

prime mica Nov 7, 2025, 11:53 PM

#

ok I'ma put this up on fishtest, AVX2 only

#

https://tests.stockfishchess.org/tests/live_elo/690e1bbfec1d00d2c195c24e

rocky vigil Nov 7, 2025, 11:54 PM

#

yeah 8 threads tends to be +5 elo over 1 thread anyways

twilit oriole Nov 7, 2025, 11:54 PM

#

amber fern where is the LCT fishtest link?

God everyone is going to just read these results wrong now

rocky vigil Nov 7, 2025, 11:54 PM

#

we forcibly passed the stc

prime mica Nov 7, 2025, 11:54 PM

#

relax lol

amber fern Nov 7, 2025, 11:54 PM

#

prime mica https://tests.stockfishchess.org/tests/live_elo/690e1bbfec1d00d2c195c24e

is that stage 5 net or stage 4?

rocky vigil Nov 7, 2025, 11:54 PM

#

with vondele machines

prime mica Nov 7, 2025, 11:54 PM

#

we are just experimenting

rocky vigil Nov 7, 2025, 11:54 PM

#

stc was -1 elo before that

amber fern Nov 7, 2025, 11:54 PM

#

oh its stage 4

prime mica Nov 7, 2025, 11:54 PM

#

stage 4 == stage 5 for all intents and purposes

amber fern Nov 7, 2025, 11:55 PM

#

prime mica stage 4 == stage 5 for all intents and purposes

we'll just have to test stage 5 and se

twilit oriole Nov 7, 2025, 11:55 PM

#

prime mica we are just experimenting

I think you should add some things in the info so it is less misleading tbh

prime mica Nov 7, 2025, 11:55 PM

#

who are we misleading tho

twilit oriole Nov 7, 2025, 11:55 PM

#

It is far too easy to just read the Elo off the sprt and draw invalid conclusions

rocky vigil Nov 7, 2025, 11:55 PM

#

prime mica stage 4 == stage 5 for all intents and purposes

which suggests actually maybe we drop it?

#

or

prime mica Nov 7, 2025, 11:55 PM

#

everyone working on threat inputs knows that the SPRT has many asterisks attached to it

rocky vigil Nov 7, 2025, 11:55 PM

#

i have consistently noticed stage 5 is not much better than stage 4

prime mica Nov 7, 2025, 11:55 PM

#

ye

amber fern Nov 7, 2025, 11:56 PM

#

rocky vigil which suggests actually maybe we drop it?

there is a good chance tuning the search for the new net will fix the scaling problem, no?

twilit oriole Nov 7, 2025, 11:56 PM

#

prime mica who are we misleading tho

.

rocky vigil Nov 7, 2025, 11:56 PM

#

amber fern there is a good chance tuning the search for the new net will fix the scaling pr...

per maintainer decision this is not to be done until after we win w/o changing search

twilit oriole Nov 7, 2025, 11:56 PM

#

So now we declared a scaling problem?

prime mica Nov 7, 2025, 11:56 PM

#

no?

#

like I said my local tests even indicate it could scale well

rocky vigil Nov 7, 2025, 11:57 PM

#

might be time to try 1280 later

#

tho

twilit oriole Nov 7, 2025, 11:57 PM

#

amber fern there is a good chance tuning the search for the new net will fix the scaling pr...

.

prime mica Nov 7, 2025, 11:57 PM

#

but SSS, I only ran 5000 VVLTC games

#

why do u have a bee in your bonnet about this lol

twilit oriole Nov 7, 2025, 11:57 PM

#

I am referring to this message as a response to "who are we misleading". This is not a difficult chain of reasoning to follow

rocky vigil Nov 7, 2025, 11:57 PM

#

amber fern there is a good chance tuning the search for the new net will fix the scaling pr...

the "scaling" btw is neutral within error bars

twilit oriole Nov 7, 2025, 11:57 PM

#

Yes

rocky vigil Nov 7, 2025, 11:58 PM

#

i wouldn't read too much into it

twilit oriole Nov 7, 2025, 11:58 PM

#

Finally a logical response

rocky vigil Nov 7, 2025, 11:58 PM

#

all the sprt shows is that it's not some +5 superscaler at least for 1 thread

amber fern Nov 7, 2025, 11:58 PM

#

rocky vigil the "scaling" btw is neutral within error bars

but theoretically it could be better than neutral right?

rocky vigil Nov 7, 2025, 11:58 PM

#

yes?

#

the combined error bars are like 4 elo

amber fern Nov 7, 2025, 11:59 PM

#

I still think we should only be testing the stage 5 net, since its .5 elo higher on fishtest stc

rocky vigil Nov 7, 2025, 11:59 PM

#

performance is also really machine dependent

#

and not just the fact that arm machines have it at +15 elo

rocky vigil Nov 8, 2025, 12:00 AM

#

amber fern I still think we should only be testing the stage 5 net, since its .5 elo higher...

i would also caution into reading too much into this

amber fern Nov 8, 2025, 12:00 AM

#

rocky vigil and not just the fact that arm machines have it at +15 elo

jees, really!

rocky vigil Nov 8, 2025, 12:01 AM

#

rocky vigil we forcibly passed the stc

.

amber fern Nov 8, 2025, 12:01 AM

#

too bad I don't have an arm machine

#

wait, does it also improve on phone hardware? Since that's arm?

rocky vigil Nov 8, 2025, 12:03 AM

#

presumably

prime mica Nov 8, 2025, 12:04 AM

#

Result of 100 runs
==================
base (...kfish.ti.gcc) =    1517185  +/- 2488
test (./stockfish    ) =    1565808  +/- 2403
diff                   =     +48623  +/- 2682

speedup        = +0.0320
P(speedup > 0) =  1.0000

some improvements on Zen 5 as well...

#

ok I'ma put this up on fishtest

rocky vigil Nov 8, 2025, 12:04 AM

#

there was a little meming about releasing sf18 since for arm devices it's at that level

split warren Nov 8, 2025, 12:04 AM

#

Hmm is there a way to contribute cores to a specific test on fishtest?

twilit oriole Nov 8, 2025, 12:04 AM

#

Nope

#

Been a requested feature for years

rocky vigil Nov 8, 2025, 12:04 AM

#

yeah no vondele just flooded fishtest with machines until some joined the threat input tests

twilit oriole Nov 8, 2025, 12:05 AM

#

There is also the 8 Elo of net spsa Elo that can be added. Which is always forgotten

rocky vigil Nov 8, 2025, 12:05 AM

#

not forgotten

#

we've been well over pre-spsa 5 stage

#

since i8

twilit oriole Nov 8, 2025, 12:05 AM

#

Well not you lol. But I get messages why isn't threat inputs working well all the time

#

In DMs

rocky vigil Nov 8, 2025, 12:05 AM

#

💀

split warren Nov 8, 2025, 12:06 AM

#

I look at this as just the beginning

rocky vigil Nov 8, 2025, 12:06 AM

#

twilit oriole In DMs

do ppl just not comment here on public thread

prime mica Nov 8, 2025, 12:06 AM

#

twilit oriole In DMs

who is DMing you lol

rocky vigil Nov 8, 2025, 12:06 AM

#

or is this just the dev thread

prime mica Nov 8, 2025, 12:06 AM

#

celebrity 😩

twilit oriole Nov 8, 2025, 12:06 AM

#

Idk why they don't message here. I guess they just see the first post is me and then DM

amber fern Nov 8, 2025, 12:06 AM

#

rocky vigil or is this just the dev thread

I don't count as a dev and I'm here lol

split warren Nov 8, 2025, 12:07 AM

#

You need to be aware this exists and discussions are happening here, else discord ui is very good at hiding it from the rest of the world

rocky vigil Nov 8, 2025, 12:07 AM

#

yeah

#

afterwards it would be best to just make it mainstream

#

nnue-dev

#

etc.

twilit oriole Nov 8, 2025, 12:07 AM

#

Yeah they never messaged here so maybe they just don't see it. And only know it is from me from general engine dev

#

I mean I should make a copypasta for the response. It's like the same every time

prime mica Nov 8, 2025, 12:08 AM

#

lol

#

Threat inputs copypasta incoming

split warren Nov 8, 2025, 12:09 AM

#

You should just reply with 'and how's the alternative doing?'

twilit oriole Nov 8, 2025, 12:09 AM

#

Anyways yes it is true I am somewhat annoyed at getting messaged multiple times why isn't threat inputs "working" yet. When it is working perfectly fine when considering all factors

amber fern Nov 8, 2025, 12:09 AM

#

There, I put the link to here in the nnue channel

rocky vigil Nov 8, 2025, 12:10 AM

#

the most prudent thing for now is to wait for https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/2926829081096545/-/jobs/12017736932 to finish in 2 hours

#

this offers 2 more chances at getting a slightly better net

#

anyways I've modified the fishtest description

twilit oriole Nov 8, 2025, 12:13 AM

#

I'm too easy to bait tbh, character flaw. Someone sends me a sprt and asks 'does this means threat inputs isn't promising' and I'll always respond kek

rocky vigil Nov 8, 2025, 12:13 AM

#

just in case ppl are lurking there

prime mica Nov 8, 2025, 12:14 AM

#

twilit oriole I'm too easy to bait tbh, character flaw. Someone sends me a sprt and asks 'does...

🎣

#

ok let's see...

#

I wish SF was better than shashchess 😩

#

maybe we should merge in some of his changes

rocky vigil Nov 8, 2025, 12:15 AM

#

prime mica maybe we should merge in some of his changes

nah

rocky vigil Nov 8, 2025, 12:15 AM

#

prime mica I wish SF was better than shashchess 😩

it literally is

#

by like 50 elo

#

at ltc

#

or smth

twilit oriole Nov 8, 2025, 12:15 AM

#

Baited lmao

prime mica Nov 8, 2025, 12:15 AM

#

lol

rocky vigil Nov 8, 2025, 12:15 AM

#

oh shoot

prime mica Nov 8, 2025, 12:15 AM

#

ez

rocky vigil Nov 8, 2025, 12:15 AM

#

🤡

amber fern Nov 8, 2025, 12:18 AM

#

rocky vigil or smth

more like 100, this is urgent guys, maybe shashchess changes will giga scale with threat inputs omg

rocky vigil Nov 8, 2025, 12:19 AM

#

💀

rocky vigil Nov 8, 2025, 1:12 AM

#

this got buried

#

what do others think

prime mica Nov 8, 2025, 1:17 AM

#

clean up the code generally
is the biggest one to me lmao

rocky vigil Nov 8, 2025, 1:17 AM

#

that's the one where other ppl are gonna have to suggest stuff

prime mica Nov 8, 2025, 1:17 AM

#

I'm not a C++ guru tbh so i don't know how to make things clean

#

but I'll think about it

rocky vigil Nov 8, 2025, 1:19 AM

#

https://github.com/anematode/Stockfish/blob/threat-inputs-i8-st5/src/nnue/features/full_threats.h#L41

#

did nobody ever bother to delete these 25 lines

#

this is actually been unused since like

prime mica Nov 8, 2025, 1:20 AM

#

what's that thing called in genetics

rocky vigil Nov 8, 2025, 1:20 AM

#

shawn's first dual-accumulator

prime mica Nov 8, 2025, 1:21 AM

#

https://en.wikipedia.org/wiki/Genetic_hitchhiking

Genetic hitchhiking

Genetic hitchhiking, also called genetic draft or the hitchhiking effect, is when an allele changes frequency not because it itself is under natural selection, but because it is near another gene that is undergoing a selective sweep and that is on the same DNA chain. When one gene goes through a selective sweep, any other nearby polymorphisms th...

frosty imp Nov 8, 2025, 1:21 AM

#

prime mica what's that thing called in genetics

vestigial structures? Kappa

prime mica Nov 8, 2025, 1:21 AM

#

that too

#

I thought it was copied from the other features

frosty imp Nov 8, 2025, 1:22 AM

#

true

rocky vigil Nov 8, 2025, 1:24 AM

#

prime mica I thought it was copied from the other features

it's a relic of when full_threats used to also include the halfkav2hm

prime mica Nov 8, 2025, 1:25 AM

#

why is there this random massive memcpy in do_move

#

are we passing something by value

rocky vigil Nov 8, 2025, 1:25 AM

#

prime mica ok I'ma put this up on fishtest

wow that 40 core worker actually dumped negative llr on that test

prime mica Nov 8, 2025, 1:26 AM

#

omg DirtyBoardData is getting copied around for some reason

#

return value optimization isn't happening

rocky vigil Nov 8, 2025, 1:26 AM

#

👀

prime mica Nov 8, 2025, 1:27 AM

#

disgusting

#

ok I guess I'll make it an out parameter

amber fern Nov 8, 2025, 1:27 AM

#

The stage 4 Threat inputs ltc test is steadliy improving somehow btw, like its -1.3now, it was -2.3 before

#

is it cos of the arm testing hardware?

prime mica Nov 8, 2025, 1:28 AM

#

amber fern The stage 4 Threat inputs ltc test is steadliy improving somehow btw, like its -...

idk what ur looking at

#

oh are you looking at https://tests.stockfishchess.org/tests/live_elo/690e8be1ec1d00d2c195c37f ?

#

I think I bungled something here but I'm not sure, breaking it up into each change to find out what the problem is

amber fern Nov 8, 2025, 1:28 AM

#

prime mica idk what ur looking at

https://tests.stockfishchess.org/tests/live_elo/690e1bbfec1d00d2c195c24e

prime mica Nov 8, 2025, 1:28 AM

#

oh you mean Elo

#

yeah who knows

#

when a test is hardware dependent is bounces around everywhere

amber fern Nov 8, 2025, 1:29 AM

#

yeah

rocky vigil Nov 8, 2025, 1:29 AM

#

just gotta sneak in some fleet for next progtest Kappa

prime mica Nov 8, 2025, 1:29 AM

#

ah

amber fern Nov 8, 2025, 1:29 AM

#

kinda annoying, should really have different actitecture separation

amber fern Nov 8, 2025, 1:30 AM

#

prime mica ah

whats so ah?

rocky vigil Nov 8, 2025, 1:31 AM

#

prime mica ah

is this one line using like 4% of total runtime?

prime mica Nov 8, 2025, 1:31 AM

#

no lol

rocky vigil Nov 8, 2025, 1:31 AM

#

if I interpret the vmovdqu correctly

prime mica Nov 8, 2025, 1:31 AM

#

but it's memcpying it around a lot

#

we'll see

rocky vigil Nov 8, 2025, 1:31 AM

#

like there are 22 vmovdqus

#

and their average is like 0.3%

prime mica Nov 8, 2025, 1:31 AM

#

oh

#

it's a fraction of the method's executoin time, not the whole program

rocky vigil Nov 8, 2025, 1:32 AM

#

oh

prime mica Nov 8, 2025, 1:32 AM

#

and do_move is like 6%

#

but we'll see

rocky vigil Nov 8, 2025, 1:32 AM

#

i see

prime mica Nov 8, 2025, 1:42 AM

#

hm we might want to add a template parameter to do_move which is whether or not to do dirty piece calculations

rocky vigil Nov 8, 2025, 1:45 AM

#

wouldn't it be true

#

most of the time tho

prime mica Nov 8, 2025, 1:47 AM

#

most of the time... but still adds a branch everywhere

naive comet Nov 8, 2025, 1:51 AM

#

prime mica hm we might want to add a template parameter to `do_move` which is whether or no...

I tried it b4 I think

#

but worth a second try

prime mica Nov 8, 2025, 2:06 AM

#

ugh this is pissing me off

#

too many things to template

#

lol it gets memcpyed 3 times I think

#

i'm too lazy to make things nullable so here's what we're gonan do

#

Result of 100 runs
==================
base (...vx2.69a01b88) =    1261593  +/- 2401
test (./stockfish    ) =    1274991  +/- 2447
diff                   =     +13398  +/- 2445

speedup        = +0.0106
P(speedup > 0) =  1.0000

CPU: 128 x AMD EPYC 9755 128-Core Emb Processor

#

ok let's hope this works on fishtest

#

that's just removing one copy lol

#

ok time to try to remove the other one

jolly tangle Nov 8, 2025, 2:29 AM

#

can someone point me to the latest current threats input branch? I'm going to spend the weekend understanding it

prime mica Nov 8, 2025, 2:30 AM

#

sscg's threat-inputs-i8 is what I'm working off of

#

and was the basis for the recent SPRTs

#

lol

#

I really like this line

#

just make sure it's really 0

rocky vigil Nov 8, 2025, 2:38 AM

#

they are two different things

prime mica Nov 8, 2025, 2:39 AM

#

oh wait

#

I can't READ

#

wait no wonnder it's not working

#

thx babe

#

OK

#

time to put on fishtest

#

ok three speedups to try out on fishtest... let's see how they work

rocky vigil Nov 8, 2025, 2:42 AM

#

in other news the stage 2 nets are somehow now extremely strong relative to training time

prime mica Nov 8, 2025, 2:42 AM

#

oh that's exciting

naive comet Nov 8, 2025, 2:42 AM

#

prime mica ``` Result of 100 runs ================== base (...vx2.69a01b88) = 1261593 +...

!!!!!!!!!!

prime mica Nov 8, 2025, 2:43 AM

#

rocky vigil in other news the stage 2 nets are somehow now extremely strong relative to trai...

what's different...

#

just the 254/255 thing?

#

or is there something else

rocky vigil Nov 8, 2025, 2:44 AM

#

prime mica just the 254/255 thing?

the new run is this yeah

#

255/256 instead of 127/128

jolly tangle Nov 8, 2025, 2:49 AM

#

prime mica sscg's threat-inputs-i8 is what I'm working off of

is that in a fork somewhere?

prime mica Nov 8, 2025, 2:49 AM

#

yes

#

mr. sscg13's fork

#

https://github.com/sscg13/Stockfish/tree/threat-inputs-i8

GitHub

GitHub - sscg13/Stockfish at threat-inputs-i8

A free and strong UCI chess engine. Contribute to sscg13/Stockfish development by creating an account on GitHub.

#

ok going out but will look for more speedups when I get back

rocky vigil Nov 8, 2025, 2:52 AM

#

nice nice

twilit oriole Nov 8, 2025, 2:52 AM

#

It is 3am

prime mica Nov 8, 2025, 2:52 AM

#

where

twilit oriole Nov 8, 2025, 2:52 AM

#

uk

prime mica Nov 8, 2025, 2:52 AM

#

you are correct

twilit oriole Nov 8, 2025, 2:53 AM

#

going out = Asia?

#

Everywhere else is night lol

rocky vigil Nov 8, 2025, 2:53 AM

#

could also be western americas

prime mica Nov 8, 2025, 2:56 AM

#

Go to bed dude

frosty imp Nov 8, 2025, 2:56 AM

#

rocky vigil could also be western americas

It’s not party time yet Kappa

prime mica Nov 8, 2025, 2:56 AM

#

Lol

#

Just dinner with a friend

rocky vigil Nov 8, 2025, 2:56 AM

#

jolly tangle is that in a fork somewhere?

for indexing, there is also yoshie's bullet configs, that might be more readable

prime mica Nov 8, 2025, 2:57 AM

#

Are there any ways to make LUT more compact

rocky vigil Nov 8, 2025, 2:57 AM

#

tbh like [12][64][12][64] is good enough i think

jolly tangle Nov 8, 2025, 2:58 AM

#

does bullet support threat inputs natively, or you need to manually do it

rocky vigil Nov 8, 2025, 2:58 AM

#

jolly tangle does bullet support threat inputs natively, or you need to manually do it

no, read yoshie's configs

rocky vigil Nov 8, 2025, 3:00 AM

#

rocky vigil tbh like [12][64][12][64] is good enough i think

you can reduce that last [64] with a popcount and another lookup

#

actually

#

that kinda defeats the purpose

#

yeah idt there's a fast way to decrease the current 2^20 size

rocky vigil Nov 8, 2025, 3:18 AM

#

twilit oriole Anyways yes it is true I am somewhat annoyed at getting messaged multiple times ...

jolly tangle Nov 8, 2025, 3:40 AM

#

so if I'm understanding correctly, the threat inputs are tuples of (from square, to square, piece on from square, piece threatened on to square) counting both 'threats' of capturing my own pieces (i.e defended pieces) as well as threats to capture opponents pieces. And one optimization is you de-duplicate symmetric threats. So e.g if you have bishop on A1 threatening bishop on B2, then obviously that implies the bishop on B2 threatens the bishop on A1 so you fold those inputs together into one. The idea is e.g we don't need 'pawn threatens bishop' as a input because the 'bishop threatens pawn' already contains that information based on the input squares. You also skip symmetric NvN or BvB etc attacks with a from < to comparison.

So essentially the network knows where each piece is relative to the king, and also knows all the pieces that are defended/defending/attacked/attacking it.

rocky vigil Nov 8, 2025, 3:41 AM

#

yep

#

if you don't do eval in check you can also skip X -> K, but this so far hasn't gained at fishtest

#

the threat inputs also have horizontal mirroring, but no additional buckets bc that would blow up the size

amber fern Nov 8, 2025, 4:51 AM

#

rocky vigil in other news the stage 2 nets are somehow now extremely strong relative to trai...

keep us updated on stage 3! 🙂

violet badger Nov 8, 2025, 5:58 AM

#

rocky vigil

looking good.

stray reef Nov 8, 2025, 7:06 AM

#

jolly tangle is that in a fork somewhere?

u can also check out plenty main if you want simpler inference

jolly tangle Nov 8, 2025, 7:07 AM

#

yeah I have been using plenty more than SF to understand 😂

prime mica Nov 8, 2025, 7:35 AM

#

Dodo and mallard looking ok so far… both apply to avx2

#

Time to see what else there is

#

maybe optimizing make_index

#

updated profile (sscg's threat-inputs-i8)

rocky vigil Nov 8, 2025, 7:50 AM

#

prime mica updated profile (sscg's threat-inputs-i8)

ok so i guess increasing L2 to 31 will cause 3% slowdown is my estimate

#

we'll see in a few days

prime mica Nov 8, 2025, 7:51 AM

#

gotcha, that puts more work on which function exactly?

#

the vpdpbusd spam?

rocky vigil Nov 8, 2025, 7:51 AM

#

on eval::nnue::network<...>::evaluate()

#

I think

frosty imp Nov 8, 2025, 7:51 AM

#

how good is i8 now

prime mica Nov 8, 2025, 7:51 AM

#

rocky vigil on eval::nnue::network<...>::evaluate()

huh

#

evaluate?

#

oh I meant in the source code not the profile

rocky vigil Nov 8, 2025, 7:52 AM

#

6.97%/1.55%

#

for 1024 / 128

#

oh

#

uh

#

on https://github.com/sscg13/Stockfish/blob/threat-inputs-i8/src/nnue/layers/affine_transform_sparse_input.h#L254

#

it'll put more pressure

prime mica Nov 8, 2025, 7:53 AM

#

ah yup

#

that loop is the bane of my existence

#

I finally realized why GCC bungles it in the old form and it's so dumb

#

it's because biases is right before weights

#

so because it loads from biases first, it thinks it's a good idea to replace weights[x] with biases[x + 1]

#

and then disaster ensues

rocky vigil Nov 8, 2025, 7:53 AM

#

what

prime mica Nov 8, 2025, 7:54 AM

#

in the assembly I mean

rocky vigil Nov 8, 2025, 7:54 AM

#

oh

#

idk what u are referring to

#

but

#

gl

#

hf

#

well basically after you find the nnz's

#

instead of adding a bunch of 16

#

vectors

#

u add a bunch of 32 vectors

prime mica Nov 8, 2025, 7:54 AM

#

https://github.com/official-stockfish/Stockfish/pull/6331

GitHub

Shave some instructions off a hot loop in affine transform by anema...

Passed STC:
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 44672 W: 11841 L: 11527 D: 21304
Ptnml(0-2): 165, 4625, 12415, 4993, 138
Non-functional speedup, so per vondele, no LTC test nece...

rocky vigil Nov 8, 2025, 7:54 AM

#

idk how much slower that would be

prime mica Nov 8, 2025, 7:54 AM

#

rocky vigil u add a bunch of 32 vectors

hm ok

#

I actually think that'll be less than a 2x slowdown

#

because

rocky vigil Nov 8, 2025, 7:55 AM

#

the entire propagation takes like 7%

prime mica Nov 8, 2025, 7:55 AM

#

the loop is actually currently somewhat bottlenecked by index calculations rather than actual vector instruction spam

rocky vigil Nov 8, 2025, 7:55 AM

#

i assume this propagate takes most of that time

prime mica Nov 8, 2025, 7:55 AM

#

at least on VNNI, I think on AVX512 and AVX2 it's the vector instructions

rocky vigil Nov 8, 2025, 7:55 AM

#

so idk what the relative ratio of nnz / avx

prime mica Nov 8, 2025, 7:55 AM

#

nnz is very little time IME

#

although probably more time on AVX2

#

I have a complicated theory to speed it up on AVX2 that I'll get to at some point... but doesn't benefit threat inputs any more than master

rocky vigil Nov 8, 2025, 7:56 AM

#

but yeah in 2 days or so we'll find out

prime mica Nov 8, 2025, 7:56 AM

#

👍

rocky vigil Nov 8, 2025, 7:57 AM

#

you're welcome to try it, just change l2Big in the source code from 15 to 31

prime mica Nov 8, 2025, 7:57 AM

#

which net should I use?

rocky vigil Nov 8, 2025, 7:57 AM

#

and download stage 1 from https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/2926829081096545/-/jobs/12017677484

#

for speed testing purposes

prime mica Nov 8, 2025, 7:57 AM

#

perfect ok

frosty imp Nov 8, 2025, 8:01 AM

#

What’s the current progress?

#

Did QAT work? Has i8 beaten the baseline

rocky vigil Nov 8, 2025, 8:02 AM

#

frosty imp Did QAT work? Has i8 beaten the baseline

ye by like 10 elo lmao

#

at stc

#

i8 is effectively the main branch rn

frosty imp Nov 8, 2025, 8:03 AM

#

And QAT?

rocky vigil Nov 8, 2025, 8:03 AM

#

frosty imp And QAT?

still running

#

give it 12 hours

prime mica Nov 8, 2025, 8:06 AM

#

lol the branch predictor HATES this section

#

all the conditions are like 20 to 60% probability and unpredictable

#

idk how to reduce the # of branches tho, let's see

lapis parrot Nov 8, 2025, 8:07 AM

#

static exchange evaluation

prime mica Nov 8, 2025, 8:07 AM

#

u are a funny one

#

wait why is append_changed_indices only used in double_inc_update

#

oh FeatureSet is a template parameter blob_facepalm

prime mica Nov 8, 2025, 8:48 AM

#

@foggy wind (and others willing): could u do a bench on https://tests.stockfishchess.org/tests/view/690f03b0ec1d00d2c195c3da

#

==================
base (...vx2.69a01b88) =    1284931  +/- 1876
test (./stockfish    ) =    1295177  +/- 1889
diff                   =     +10246  +/- 2242

speedup        = +0.0080
P(speedup > 0) =  1.0000```
for me but who knows if it's real

#

ok time to revisit LUT...

#

sscg I like your idea of making the table 64x smaller by not including from in the index

#

attacks_bb with two arguments looks pretty fast

#

another really demented idea is to re-index the features in a deliberately inefficient way... using the fact that the unused indices won't use any cache space

#

but making the indexing very fast

#

however it'd use a lot of RAM for zeros :/

rocky vigil Nov 8, 2025, 9:19 AM

#

prime mica sscg I like your idea of making the table 64x smaller by not including `from` in...

it's not 64x smaller

#

wait

#

no it can be 64x smaller

#

yeah

#

for the price of one attacks_bb lookup and some arithmetic

prime mica Nov 8, 2025, 9:21 AM

#

yep

rocky vigil Nov 8, 2025, 9:22 AM

#

forgot the exact order of indexing myself briefly lol

prime mica Nov 8, 2025, 9:22 AM

#

index luvr 9000

#

anyway signing off for tn we'll see if any of the speedups make it

rocky vigil Nov 8, 2025, 9:23 AM

#

ye

#

fair

torn lagoon Nov 8, 2025, 9:27 AM

#

prime mica <@398510765910523904> (and others willing): could u do a bench on https://tests....

==================
base (...at-inputs-i8) =    2027832  +/- 3402
test (../snowy-plover) =    2010917  +/- 3329
diff                   =     -16915  +/- 1227

speedup        = -0.0083
P(speedup > 0) =  0.0000

CPU: 6 x AMD Ryzen 5 9600X 6-Core Processor
Hyperthreading: on ```

```Result of 200 runs
==================
base (...ad-inputs-i8) =     807325  +/- 3379
test (../snowy-plover) =     814006  +/- 3549
diff                   =      +6681  +/- 570

speedup        = +0.0083
P(speedup > 0) =  1.0000

CPU: 4 x Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
Hyperthreading: on```

prime mica Nov 8, 2025, 9:28 AM

#

wut

#

compielr?

#

that is so weird this one shouldn't be arch dependent

torn lagoon Nov 8, 2025, 9:33 AM

#

prime mica compielr?

GCC 15.2.0 on AMD, GCC 15.2.1 on Intel

prime mica Nov 8, 2025, 9:33 AM

#

😭

#

can I just send u two binaries and u try

#

that way I can know the assembly

torn lagoon Nov 8, 2025, 9:34 AM

#

Yeah

prime mica Nov 8, 2025, 9:34 AM

#

although I only have Linux

torn lagoon Nov 8, 2025, 9:34 AM

#

prime mica although I only have Linux

Both boxes are linux

prime mica Nov 8, 2025, 9:34 AM

#

dunno how to cross compiler to windows

#

Oh! perfect ok

#

I'll send tmrw

#

gn

torn lagoon Nov 8, 2025, 9:37 AM

#

gn

foggy wind Nov 8, 2025, 9:54 AM

#

prime mica <@398510765910523904> (and others willing): could u do a bench on https://tests....

Result of 200 runs
==================
base (...sh_avx2.base) =    1730842  +/- 3170
test (...snowy-plover) =    1750735  +/- 3277
diff                   =     +19893  +/- 1960

speedup        = +0.0115
P(speedup > 0) =  1.0000

torn lagoon Nov 8, 2025, 10:07 AM

#

Oh my benchmarks were with the default arch, so avx512icl

torn lagoon Nov 8, 2025, 10:07 AM

#

torn lagoon ```Result of 200 runs ================== base (...at-inputs-i8) = 2027832 +/...

This may be the reason for those results

naive comet Nov 8, 2025, 10:40 AM

#

what if instead of slli+srai, we do shuf+srai? cuz idk free ports or smth smth @stray reef @prime mica

stray reef Nov 8, 2025, 11:08 AM

#

naive comet what if instead of slli+srai, we do shuf+srai? cuz idk free ports or smth smth <...

i got something to work with _mm512_shuffle_epi8, but it doesn't seem to be a speedup locally

naive comet Nov 8, 2025, 11:09 AM

#

mmm

plain flower Nov 8, 2025, 2:07 PM

#

p5 is usually quite constrained so i tend to avoid shuffles where possible as a rule of thumb

rocky vigil Nov 8, 2025, 2:08 PM

#

rocky vigil

speaking of this should we set this as reference

#

for the QAT run testing

violet badger Nov 8, 2025, 4:21 PM

#

probably, do you have this already merged in a branch?

violet badger Nov 8, 2025, 7:23 PM

#

I've updated reference for the QAT run, as this has been integrated in sscg13 latest branch

#

on the arm nodes, threat-inputs-i8 is now just 2% slower than master...

==== master ====
1 Nodes/second : 298577711
2 Nodes/second : 297081100
Average (over 2):  297829405
==== threat-inputs-i8 ====
1 Nodes/second : 291537416
2 Nodes/second : 292647054
Average (over 2):  292092235

violet badger Nov 8, 2025, 10:40 PM

#

so QAT training run doesn't seem to yield anything stronger than what we have so far.

prime mica Nov 8, 2025, 11:33 PM

#

:(

#

@foggy wind can u run your bucketing script on https://tests.stockfishchess.org/tests/view/690e9695ec1d00d2c195c38f ? I bungled the arch filtering so I can’t see the effect on AVX2 lmao

#

The patch does nothing elsewhere

foggy wind Nov 8, 2025, 11:36 PM

#

prime mica <@398510765910523904> can u run your bucketing script on https://tests.stockfish...

GROUPED BY ARCH

64bit AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                | Elo: -0.58 ± 1.67 | LOS:  24.7% | LLR: -1.46 | [131, 4300, 11127, 4261, 117]
64bit AVX512ICL VNNI AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT | Elo: -0.51 ± 2.20 | LOS:  32.5% | LLR: -0.79 | [64, 2592, 6337, 2592, 47]
64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                       | Elo:  5.24 ± 2.42 | LOS: 100.0% | LLR:  3.06 | [45, 1985, 5248, 2248, 58]
64bit AVX2 SSE41 SSSE3 SSE2 POPCNT                            | Elo:  0.38 ± 2.58 | LOS:  61.5% | LLR: -0.06 | [57, 1902, 4641, 1949, 43]
64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                  | Elo: -1.18 ± 4.58 | LOS:  30.7% | LLR: -0.30 | [24, 626, 1515, 615, 20]

prime mica Nov 8, 2025, 11:40 PM

#

Ok that’s good

#

I’ll restart it with the proper filtering ig

#

Thx legend

foggy wind Nov 8, 2025, 11:41 PM

#

Do you think that's necessary? The bmi2 and avx2 values are relevant, right? Together, they look good.

prime mica Nov 8, 2025, 11:43 PM

#

ya but it'd be nice to get an official estimate of the magnitude

#

idk

green moat Nov 8, 2025, 11:51 PM

#

@violet badger
The threats.yaml recipe
https://github.com/vondele/nettest/blob/main/threats.yaml
still contains duplicated binpacks that were removed from large.yaml recipe.
Do you think it might be worth to test the removal of those binpacks in threats.yaml recipe as well?

prime mica Nov 8, 2025, 11:55 PM

#

violet badger on the arm nodes, threat-inputs-i8 is now just 2% slower than master... ``` ====...

this is nuts

#

how is this even possible...

foggy wind Nov 9, 2025, 12:03 AM

#

prime mica ya but it'd be nice to get an official estimate of the magnitude

So, if I'm not mistaken, the AVX2 + BMI2 results have an Elo rating of 2.94 +/- 1.77 and an LLR of 2.98. So this is already a passed test in these architectures.

#

You could also get machines on fishtest to run with a simpler architecture than the default. I did it this way in the past: https://github.com/official-stockfish/Stockfish/compare/master...Torom:Stockfish:NoAVX512

prime mica Nov 9, 2025, 12:43 AM

#

ye

prime mica Nov 9, 2025, 12:44 AM

#

foggy wind You could also get machines on fishtest to run with a simpler architecture than ...

the issue with this is that we optimize for the default build

#

there are plenty of changes which are amazing on my computer's avx2 but fail on my friend's machine etc.

prime mica Nov 9, 2025, 1:33 AM

#

ugh why did snowy-egret fail

#

it must be the dumb optional dirty threats stuff

rocky vigil Nov 9, 2025, 1:42 AM

#

prime mica ugh why did snowy-egret fail

🐟 🧪 move_brilliant 📉 move_dubious

prime mica Nov 9, 2025, 1:42 AM

#

maybe the optimizer realizes the computations are unused and can throw them away on the original

prime mica Nov 9, 2025, 1:42 AM

#

rocky vigil 🐟 🧪 <:move_brilliant:1082020805061386400> 📉 <:move_dubious:108202078927821211...

story of my life bro

green moat Nov 9, 2025, 1:44 AM

#

prime mica story of my life bro

seems good

prime mica Nov 9, 2025, 1:44 AM

#

kyoot

prime mica Nov 9, 2025, 2:00 AM

#

hmmm maybe instead of a big LUT we can just micro optimize the existing calculations

#

in particular not using LUTs and instead using constants...

rocky vigil Nov 9, 2025, 2:02 AM

#

prime mica in particular not using LUTs and instead using constants...

how would you propose doing this

prime mica Nov 9, 2025, 2:03 AM

#

0xcafebabe30913928 >> (attkr & 7)```

#

type beat

#

especially because the constants could probably be hoisted given that make_index is usually used in a loop

#

we can generate the constants via constexpr so it shouldn't be unreadable

#

but idk I'll try the medium-sized LUT first

#

and see how it dose on fishtest

rocky vigil Nov 9, 2025, 2:06 AM

#

yeah

#

[16][64][16] or smth

#

this isn't even that bad

#

much smaller than pext lookups

prime mica Nov 9, 2025, 2:08 AM

#

^_^

naive comet Nov 9, 2025, 2:53 AM

#

prime mica ``` 0xcafebabe30913928 >> (attkr & 7)```

I recall doing smth like this before

#

https://tests.stockfishchess.org/tests/view/690190cb637acd2a11e737dc

#

combining this into 1 mask was slower for me tho locally

rocky vigil Nov 9, 2025, 2:55 AM

#

i feel like lookup tables aren't actually that slow as long as they're small and fit in cache

prime mica Nov 9, 2025, 2:56 AM

#

agree for the most part

#

main issue is if a branch depends on the result

#

because they can have quite high latency

rocky vigil Nov 9, 2025, 3:00 AM

#

i mean for the threat indexing i would guess the CPU attempts to do all of them simultaneously

#

while it waits for the lookups

#

citation needed

#

i am nowhere near a cpu expert

prime mica Nov 9, 2025, 3:01 AM

#

sure

#

it might end up being profitable summing two LUTs one indexed with attkr/to/attkd and one attkr/from/to

#

the latter can be byte-sized

#

will try both

#

actually yeah this seems rly succulent

#

the latter will have quite good cache locality while the first one will be relatively smol

prime mica Nov 9, 2025, 4:46 AM

#

why tf is PIECE_TYPE_NB 8 when there are 6 piece types (or 7 if you include no piece)

rocky vigil Nov 9, 2025, 4:47 AM

#

that would be because

#

0 and 7 mod 8

#

are no-piece

prime mica Nov 9, 2025, 4:50 AM

#

psyduck

#

advanced

rocky vigil Nov 9, 2025, 4:56 AM

#

wait

#

i forgor

prime mica Nov 9, 2025, 4:56 AM

#

ok this is the most demented codegen I've ever seen

rocky vigil Nov 9, 2025, 4:56 AM

#

maybe it's PAWN = 2

#

idk

prime mica Nov 9, 2025, 4:56 AM

#

GCC uses a mask register as a temp register

#

rocky vigil Nov 9, 2025, 4:56 AM

#

i just know that of 0-7 two are no piece

prime mica Nov 9, 2025, 4:57 AM

#

yeah I think 0 and 7 are hte no piece

rocky vigil Nov 9, 2025, 4:57 AM

#

idk what that means

#

(using mask register as temp register)

#

lmao

prime mica Nov 9, 2025, 4:57 AM

#

mask registers are this obscure type of register introduced in AVX512

#

and it's very weird to be using them for integers...

#

sigh

#

can anyone bench https://tests.stockfishchess.org/tests/live_elo/69102050ec1d00d2c195c4f1 on their computer

warm thistle Nov 9, 2025, 5:12 AM

#

Result of  20 runs
==================
base (./sf-old       ) =    1380434  +/- 10321
test (./stockfish    ) =    1406364  +/- 10545
diff                   =     +25930  +/- 2466

speedup        = +0.0188
P(speedup > 0) =  1.0000

CPU: 8 x AMD Ryzen 7 7700X 8-Core Processor
Hyperthreading: on
``` on my cpu

frosty imp Nov 9, 2025, 5:12 AM

#

prime mica ok this is the most demented codegen I've ever seen

what is this new optimization meta

prime mica Nov 9, 2025, 5:14 AM

#

warm thistle ``` Result of 20 runs ================== base (./sf-old ) = 1380434 +...

O shit

#

ok let's hope this works on fishtest 🤞

warm thistle Nov 9, 2025, 5:14 AM

#

👀

prime mica Nov 9, 2025, 5:14 AM

#

can't profile locally atm bc my friend is running some weird DFT simulation lol

prime mica Nov 9, 2025, 5:14 AM

#

frosty imp what is this new optimization meta

GNU

prime mica Nov 9, 2025, 5:16 AM

#

warm thistle 👀

could u double check that the benches are the same

#

just for my peace of mind lol

#

should be 24..

warm thistle Nov 9, 2025, 5:17 AM

#

matches up

prime mica Nov 9, 2025, 5:17 AM

#

😊 thank u sir

#

https://tests.stockfishchess.org/tests/view/69102686ec1d00d2c195c4f7 @warm thistle when u have time if u could check this one that'd be swell

warm thistle Nov 9, 2025, 5:29 AM

#

on it

#

(oh i'm compiling without pgo btw idk if that will affect anything)

prime mica Nov 9, 2025, 5:29 AM

#

that's ok

#

as long as both are w/o PGO

rocky vigil Nov 9, 2025, 5:30 AM

#

oh yeah later on u can update the net

#

forgot to mention

#

it's +3 elo on vondele local test

warm thistle Nov 9, 2025, 5:30 AM

#

Result of  20 runs
==================
base (./sf-old       ) =    1392913  +/- 11095
test (./stockfish    ) =    1421577  +/- 10865
diff                   =     +28663  +/- 3084

speedup        = +0.0206
P(speedup > 0) =  1.0000

CPU: 8 x AMD Ryzen 7 7700X 8-Core Processor
Hyperthreading: on

prime mica Nov 9, 2025, 5:31 AM

#

ok, so maybe slightly better

prime mica Nov 9, 2025, 5:31 AM

#

rocky vigil it's +3 elo on vondele local test

ooh cool, versus which?

warm thistle Nov 9, 2025, 5:31 AM

#

warm thistle ``` Result of 20 runs ================== base (./sf-old ) = 1392913 +...

crazy speedup 🤤

prime mica Nov 9, 2025, 5:31 AM

#

I'm trying 😭

#

threat inputs my beloved

rocky vigil Nov 9, 2025, 5:31 AM

#

rocky vigil

@prime mica see meme

prime mica Nov 9, 2025, 5:31 AM

#

lollll

#

reference = master?

warm thistle Nov 9, 2025, 5:32 AM

#

[-1, 1] wtf lol

prime mica Nov 9, 2025, 5:32 AM

#

is this on ARM tho

rocky vigil Nov 9, 2025, 5:32 AM

#

prime mica `reference` = `master`?

reference = f3fa...

prime mica Nov 9, 2025, 5:32 AM

#

ah the stage 4 net we've been testing with?

rocky vigil Nov 9, 2025, 5:32 AM

#

yeah

prime mica Nov 9, 2025, 5:32 AM

#

ok that's exciting

rocky vigil Nov 9, 2025, 5:32 AM

#

new one is indeed very good

#

cj speaking misinfo 🗣️

#

that 255/256 instead of 127/128 is 3 elo

prime mica Nov 9, 2025, 5:33 AM

#

lololol

rocky vigil Nov 9, 2025, 5:33 AM

#

not negligible

prime mica Nov 9, 2025, 5:33 AM

#

maybe time for a new SPRT soon then?

rocky vigil Nov 9, 2025, 5:34 AM

#

ye after speedup

prime mica Nov 9, 2025, 5:34 AM

#

👍

lapis parrot Nov 9, 2025, 6:52 AM

#

@prime mica you should name at least one of your patches titmouse

#

funneh name will pass 100%

prime mica Nov 9, 2025, 6:52 AM

#

lol

#

bushtit

#

blue footed booby

#

etc.

#

African wild ass

naive comet Nov 9, 2025, 7:00 AM

#

prime mica can anyone bench https://tests.stockfishchess.org/tests/live_elo/69102050ec1d00d...

isnt this 2 changes in 1

#

I did this concept before previously and it ended up -0.5%...

prime mica Nov 9, 2025, 7:01 AM

#

can I see what you did exactly

#

screenshot maybe

prime mica Nov 9, 2025, 7:02 AM

#

naive comet combining this into 1 mask was slower for me tho locally

if it's this... then that's not the same idea

#

bc here we're plopping it into a 64 bit integer

naive comet Nov 9, 2025, 7:02 AM

#

"combining this into 1 mask was slower for me tho locally"

prime mica Nov 9, 2025, 7:03 AM

#

oh I see orry

naive comet Nov 9, 2025, 7:03 AM

#

ill see if i can find it cuz i didnt commit cuz it was slower

#

uhh

#

let me look in my recycle bin

prime mica Nov 9, 2025, 7:03 AM

#

also in https://tests.stockfishchess.org/tests/live_elo/69102686ec1d00d2c195c4f7 I tried putting the shift amount in the LUT

#

which I think should work quite nicely tbh

#

WAIT

#

we don't even need that do we

#

it can just be a bit in the LUT

#

😭

#

I missed the forest for the trees

#

that is funny

naive comet Nov 9, 2025, 7:06 AM

#

prime mica also in https://tests.stockfishchess.org/tests/live_elo/69102686ec1d00d2c195c4f7...

I will support this concept ^^

prime mica Nov 9, 2025, 7:32 AM

#

ok ostrich time

#

I think this should be nearly optimal index calculation at leats with this LUT setup

#

checking whether the feature doesn't exist is four instructions on ARM and x86 🤓

rocky vigil Nov 9, 2025, 7:35 AM

#

prime mica checking whether the feature doesn't exist is four instructions on ARM and x86 �...

you don't need to do "return dimensions" specifically, as long as you guarantee it's >= dimensions

prime mica Nov 9, 2025, 7:35 AM

#

yeah ik

#

my hope with the sf_assume(index != Dimensions) is that (some) compilers will be able jump thread the continue and/or elide one compare

#

GCC seems to do so, haven't tested clang

rocky vigil Nov 9, 2025, 7:37 AM

#

ah I see

prime mica Nov 9, 2025, 7:49 AM

#

ok so current state of affairs: we should be able to do plover + mallard + dodo + ostrich

#

dodo is AVX2 only but the rest are general

#

once all of them are resolved maybe we can SPRT against master with the new net

rocky vigil Nov 9, 2025, 7:50 AM

#

prime mica ok so current state of affairs: we should be able to do plover + mallard + dodo ...

oh

#

interesting

violet badger Nov 9, 2025, 7:50 AM

#

sounds good.

#

but why are we excluding Scolopax minor from the list?

torn lagoon Nov 9, 2025, 7:55 AM

#

torn lagoon ```Result of 200 runs ================== base (...at-inputs-i8) = 2027832 +/...

@prime mica do you still need speedtests for this patch?

prime mica Nov 9, 2025, 7:55 AM

#

ostrich > cassowary > woodcock

#

they're just iterations on the same idea

#

but I'm leaving them up in case I made a mistake...

prime mica Nov 9, 2025, 7:55 AM

#

torn lagoon <@418667403396775936> do you still need speedtests for this patch?

no need

#

but lemme try combining the four patches and we can see whether the speedups are ~additive

#

if u could benchmark that that'd be greatly appreciated

#

my computer is currently suffering my friend's DFT simulation

torn lagoon Nov 9, 2025, 7:57 AM

#

prime mica if u could benchmark that that'd be greatly appreciated

So ostrich?

prime mica Nov 9, 2025, 7:57 AM

#

oh yeah if you could that one by itself that'd be great

warm thistle Nov 9, 2025, 7:57 AM

#

prime mica bushtit

👀

torn lagoon Nov 9, 2025, 7:57 AM

#

On it

prime mica Nov 9, 2025, 7:58 AM

#

Kevin got 2.88% speedup

#

which is roughly expected

#

but it'll probs be less on fishtest

torn lagoon Nov 9, 2025, 8:18 AM

#

==================
base (...at-inputs-i8) =    2023444  +/- 1829
test (../ostrich     ) =    2050542  +/- 1935
diff                   =     +27097  +/- 1232

speedup        = +0.0134
P(speedup > 0) =  1.0000

CPU: 6 x AMD Ryzen 5 9600X 6-Core Processor
Hyperthreading: on ```

Result of 200 runs

base (...at-inputs-i8) = 813188 +/- 3192
test (../ostrich ) = 828548 +/- 3321
diff = +15360 +/- 588

speedup = +0.0189
P(speedup > 0) = 1.0000

CPU: 4 x Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
Hyperthreading: on ```

prime mica Nov 9, 2025, 8:18 AM

#

ok cool

#

so not amazing but not zero

#

could u also try https://github.com/anematode/Stockfish/tree/moa

#

that has the things combined

rocky vigil Nov 9, 2025, 8:19 AM

#

lmao

#

everything everywhere all at once

prime mica Nov 9, 2025, 8:19 AM

#

yes

#

moas are an enormous extinct bird

warm thistle Nov 9, 2025, 8:20 AM

#

Result of  20 runs
==================
base (./sf-old       ) =    1347420  +/- 8957
test (./stockfish    ) =    1384998  +/- 8926
diff                   =     +37578  +/- 2477

speedup        = +0.0279
P(speedup > 0) =  1.0000

CPU: 8 x AMD Ryzen 7 7700X 8-Core Processor
Hyperthreading: on

#

on mine

prime mica Nov 9, 2025, 8:21 AM

#

huh

#

that doesn't add up

#

should be more like +4% lol

#

could you try git revert de7fbe6b and re-compile/re-bench

#

possible that one of them doesn't like the other

warm thistle Nov 9, 2025, 8:24 AM

#

yes i'll try

prime mica Nov 9, 2025, 8:24 AM

#

thx legend

warm thistle Nov 9, 2025, 8:25 AM

#

helping in any way i can because i am not good at optim 💀

prime mica Nov 9, 2025, 8:25 AM

#

lol

#

I was not either until the pandemic when I was so bored I wrote https://github.com/anematode/high-perf-bogosort/tree/main

GitHub

GitHub - anematode/high-perf-bogosort: High-performance bogosort

High-performance bogosort. Contribute to anematode/high-perf-bogosort development by creating an account on GitHub.

#

then I got pissed off at how inefficient 95% of modern software is

warm thistle Nov 9, 2025, 8:25 AM

#

Result of  20 runs
==================
base (./sf-old       ) =    1308364  +/- 14174
test (./stockfish    ) =    1338017  +/- 14966
diff                   =     +29652  +/- 2742

speedup        = +0.0227
P(speedup > 0) =  1.0000

CPU: 8 x AMD Ryzen 7 7700X 8-Core Processor
Hyperthreading: on

prime mica Nov 9, 2025, 8:26 AM

#

hm ok

warm thistle Nov 9, 2025, 8:26 AM

#

the only optim i know is rewriting things in asm ☠️

prime mica Nov 9, 2025, 8:26 AM

#

maybe the +2.88% measurement was a fluke then

warm thistle Nov 9, 2025, 8:26 AM

#

could be natural variation i guess lol

prime mica Nov 9, 2025, 8:26 AM

#

oh well we'll see on fishtest etc.

violet badger Nov 9, 2025, 8:26 AM

#

also running speedtest here..

#

4c902c7c443b6b71d06895097ecd8d85aa3a2e99 vs caee28c4e8fe2ea52d191ee56e27b1cb9cebabf3

prime mica Nov 9, 2025, 8:27 AM

#

lol what are those

violet badger Nov 9, 2025, 8:27 AM

#

moa vs no-bird

prime mica Nov 9, 2025, 8:27 AM

#

loll kk

violet badger Nov 9, 2025, 8:28 AM

#

somehow it helps often afterwards to understand what was really what. One reason to use shas in nettest to pin versions, not branches.

prime mica Nov 9, 2025, 8:28 AM

#

agree

#

these days I have a bins folder formatted as stockfish.<arch>.<compiler>.<first nibbles of hash>

violet badger Nov 9, 2025, 8:30 AM

#

makes sense... assuming you always use 'profile-build' 😉

prime mica Nov 9, 2025, 8:30 AM

#

lololol

#

that has bitten me before

violet badger Nov 9, 2025, 8:30 AM

#

yeah, easy enough.

#

virtually no gain on the arm nodes, but no damage done either:

==== 4c902c7c443b6b71d06895097ecd8d85aa3a2e99 ====
1 Nodes/second : 292248079
2 Nodes/second : 290685623
Average (over 2):  291466851
==== caee28c4e8fe2ea52d191ee56e27b1cb9cebabf3 ====
1 Nodes/second : 292968214
2 Nodes/second : 292490843
Average (over 2):  292729528

prime mica Nov 9, 2025, 8:44 AM

#

that's a little surprising honestly

#

I'll investigate later, I'm guessing at least one of the changes is counterproductive on ARM

violet badger Nov 9, 2025, 8:46 AM

#

let me just run a few more of the shas and we'll see in a bit.

prime mica Nov 9, 2025, 8:46 AM

#

cool thxx

#

testing on Apple silicon as well

violet badger Nov 9, 2025, 8:49 AM

#

Maybe somebody else can run the following script:

$ cat speedtesting.sh 
set -e
max_rep=2
for branch in 4c902c7c443b6b71d06895097ecd8d85aa3a2e99 caee28c4e8fe2ea52d191ee56e27b1cb9cebabf3 de7fbe6ba5c98b585aadb0a81ea7c3cb12fc4d54 a227ff9916e98a33c198a1749f23f2783934be8a 064d09f40f0ef0e9daf19609faaf79aa471ec362 a6721c73eea0b11eaf9b574ed509265211be5738 d8eb3207dbf7f264035ee52ddfa1db9c39471841 638e1786bde4e1f3eb17be5a32da3650666318d9
do
 echo "==== $branch ===="
 if [ ! -f stockfish.$branch ]; then
   git checkout $branch > compile.out.$branch 2>&1
   make -j profile-build >> compile.out.$branch 2>&1
   mv stockfish stockfish.$branch
 fi
 for iter in $(seq 1 ${max_rep})
 do
   if [ ! -f speedtest.out.$branch.$iter ]; then
     ./stockfish.$branch speedtest > speedtest.out.$branch.$iter 2>&1
   fi
   echo $iter $(grep "Nodes/second" speedtest.out.$branch.$iter)
 done
 echo "Average (over $max_rep): " $(grep "Nodes/second" speedtest.out.$branch.* | awk '{s=s+$NF; c++}END{printf("%16d\n",int(s/c))}')
done

prime mica Nov 9, 2025, 8:50 AM

#

not sure if all the commits are functional btw

#

but thx that's very useful

#

you are a bash god

violet badger Nov 9, 2025, 8:51 AM

#

nah, the syntax highlighting is done by discord

prime mica Nov 9, 2025, 8:51 AM

#

looking ok on Apple silicon

Result of 100 runs
==================
base (./stockfish.ti ) =    1566081  +/- 3859
test (./stockfish    ) =    1587327  +/- 3990
diff                   =     +21247  +/- 2395

speedup        = +0.0136
P(speedup > 0) =  1.0000

#

not what I was hoping for tho

violet badger Nov 9, 2025, 8:52 AM

#

well, I think the main interest right now is anyway x86

prime mica Nov 9, 2025, 8:52 AM

#

yeah true

torn lagoon Nov 9, 2025, 8:55 AM

#

==================
base (...at-inputs-i8) =    2017676  +/- 2737
test (../moa         ) =    2064445  +/- 2825
diff                   =     +46769  +/- 1582

speedup        = +0.0232
P(speedup > 0) =  1.0000

CPU: 6 x AMD Ryzen 5 9600X 6-Core Processor
Hyperthreading: on ```

prime mica Nov 9, 2025, 8:55 AM

#

ok cool

#

underwhelming but decent

#

==================
base (...kfish.ti.gcc) =     830128  +/- 901
test (./stockfish    ) =     844316  +/- 1042
diff                   =     +14188  +/- 1326

speedup        = +0.0171
P(speedup > 0) =  1.0000

on my friend's Zen 2 server

#

all that matters is beating master tho lol

#

i think that's looking plausible? 🤞

rocky vigil Nov 9, 2025, 8:59 AM

#

2% is good, it's like 4 elo

#

lol

#

at stc

prime mica Nov 9, 2025, 8:59 AM

#

kk

#

should we wait for my patches to finish or just SPRT early

rocky vigil Nov 9, 2025, 9:00 AM

#

i mean i feel like if the individual ones pass

#

just merge them

#

no need to run an extra sprt

prime mica Nov 9, 2025, 9:01 AM

#

oh sure I meant threat inputs against master

rocky vigil Nov 9, 2025, 9:05 AM

#

ah

#

lemme look

#

yeah just wing it, add the net change and try an stc sprt vs master

#

alone it should be +1 elo, with speedups +4 or smth

prime mica Nov 9, 2025, 9:06 AM

#

kk do u wanna do it or should I

rocky vigil Nov 9, 2025, 9:06 AM

#

I can do it

prime mica Nov 9, 2025, 9:06 AM

#

ok great

rocky vigil Nov 9, 2025, 9:06 AM

#

more throughput

prime mica Nov 9, 2025, 9:06 AM

#

yes true

rocky vigil Nov 9, 2025, 9:07 AM

#

so pull moa right

prime mica Nov 9, 2025, 9:08 AM

#

ye

rocky vigil Nov 9, 2025, 9:08 AM

#

ok

prime mica Nov 9, 2025, 9:08 AM

#

you haven't made any changes to threat-inputs-i8 recently right

rocky vigil Nov 9, 2025, 9:08 AM

#

no, except for net change

prime mica Nov 9, 2025, 9:09 AM

#

kk perfect

rocky vigil Nov 9, 2025, 9:10 AM

#

oh wow with the ultimate SSS on my laptop bench is up 10%

prime mica Nov 9, 2025, 9:10 AM

#

lol

#

SSS lover

#

probably bc different net

#

so different node count

rocky vigil Nov 9, 2025, 9:11 AM

#

no like baseline

#

vs the stuff I just pulled

prime mica Nov 9, 2025, 9:11 AM

#

ah gotcha

rocky vigil Nov 9, 2025, 9:11 AM

#

Nodes searched  : 2324801
Nodes/second    : 1043447```(baseline)
```Total time (ms) : 1974
Nodes searched  : 2324801
Nodes/second    : 1177710```(new) lmao

#

sss

prime mica Nov 9, 2025, 9:11 AM

#

sss

#

I love this emoji so fucking much omg

#

I use it all the time elsewhere with no context

#

one person thought it was a reference to the Schutzstaffel for some reason?

#

need to get their mind out of the gutter

rocky vigil Nov 9, 2025, 9:14 AM

#

oh wow that was fast

#

https://tests.stockfishchess.org/tests/view/69105b3dec1d00d2c195c569

prime mica Nov 9, 2025, 9:14 AM

#

wait what about the new net

#

oh you changed it too ok

#

misleading name 😅

rocky vigil Nov 9, 2025, 9:15 AM

#

Oh yeah

#

Btw I’m curious what does reducing register count do

#

(In dodo)

violet badger Nov 9, 2025, 9:22 AM

#

OK, for what it is worth:

==== 4c902c7c443b6b71d06895097ecd8d85aa3a2e99 ====
1 Nodes/second : 292248079
2 Nodes/second : 290685623
Average (over 2):  291466851
==== caee28c4e8fe2ea52d191ee56e27b1cb9cebabf3 ====
1 Nodes/second : 292968214
2 Nodes/second : 292490843
Average (over 2):  292729528
==== de7fbe6ba5c98b585aadb0a81ea7c3cb12fc4d54 ====
1 Nodes/second : 292234932
2 Nodes/second : 293382923
Average (over 2):  292808927
==== a227ff9916e98a33c198a1749f23f2783934be8a ====
1 Nodes/second : 292874238
2 Nodes/second : 287634242
Average (over 2):  290254240
==== 064d09f40f0ef0e9daf19609faaf79aa471ec362 ====
1 Nodes/second : 287458645
2 Nodes/second : 289818521
Average (over 2):  288638583
==== a6721c73eea0b11eaf9b574ed509265211be5738 ====
1 Nodes/second : 292221507
2 Nodes/second : 293205229
Average (over 2):  292713368
==== d8eb3207dbf7f264035ee52ddfa1db9c39471841 ====
1 Nodes/second : 289783271
2 Nodes/second : 289119242
Average (over 2):  289451256
==== 638e1786bde4e1f3eb17be5a32da3650666318d9 ====
1 Nodes/second : 290801279
2 Nodes/second : 289798590
Average (over 2):  290299934

prime mica Nov 9, 2025, 9:23 AM

#

rocky vigil Btw I’m curious what does reducing register count do

so this part of the code looks like

#

start:
vpaddw ymm0, ymm0, [rax]
vpaddw ymm1, ymm1, [rax + 32]
vpaddw ymm2, ymm2, [rax + 64]
...
vpaddw ymm15, ymm15, [rax + ...]
;; increment base pointer, loop...
jle start

on master, right?

rocky vigil Nov 9, 2025, 9:24 AM

#

Yeah

prime mica Nov 9, 2025, 9:24 AM

#

because the compilers are very smort and they're able to keep every accumulator in a register, and use CISC to enjoy

#

but on threat inputs, it looks like

#

start:
vpmovsxbw ymm15, [rax]
vpaddw ymm0, ymm0, ymm15
vpmovsxbw ymm15, [rax + 16]
vpaddw ymm1, ymm1, ymm15
...
;; uh oh...
;; increment base pointer, loop...
jle start

#

you need one temp register for the i8 -> i16 conversion

#

so the compiler has to spill at least one of the accumulator registers

rocky vigil Nov 9, 2025, 9:25 AM

#

Right that

prime mica Nov 9, 2025, 9:25 AM

#

and then it is sad

#

because it has to store/load across every iteration

#

in theory we could us 16 for the main net and 12 for threats but I was lazy

rocky vigil Nov 9, 2025, 9:26 AM

#

Does this not apply to avx512

prime mica Nov 9, 2025, 9:26 AM

#

no bc we have registers set to 16 there as well

#

and avx512 has an extra set of 16 to play with

rocky vigil Nov 9, 2025, 9:26 AM

#

Ah hah

violet badger Nov 9, 2025, 9:27 AM

#

prime mica in theory we could us 16 for the main net and 12 for threats but I was lazy

so smallnet might be slowed down by this patch?

prime mica Nov 9, 2025, 9:27 AM

#

NEON is fine as well

prime mica Nov 9, 2025, 9:27 AM

#

violet badger so smallnet might be slowed down by this patch?

yeah, main net too... but in my benchmarks the loss from 16 -> 12 was unmeasurable

#

definitely worth making it conditional on the feature set though

violet badger Nov 9, 2025, 9:27 AM

#

well main net will be threats, but would be speedup of threats branch if the effect on smallnet is measurable.

prime mica Nov 9, 2025, 9:28 AM

#

prime mica ``` start: vpaddw ymm0, ymm0, [rax] vpaddw ymm1, ymm1, [rax + 32] vpaddw ymm2, y...

there's also a chance that some old compilers aren't clever enough to do this w/o spilling, and we've been spilling this whole time. But I find that unlikely

violet badger Nov 9, 2025, 9:28 AM

#

something is really cooking on fishtest..

prime mica Nov 9, 2025, 9:28 AM

#

violet badger well main net will be threats, but would be speedup of threats branch if the eff...

main net has accumulator refreshes for both the ksq components and the threats right?

#

idk in general, having written more x86 vector loops than I'd like to admit, I'm very scared to use all the registers unless you're writing in assembly

#

bc you are really relying on the compiler to be smort

#

ok I'm betting +2 against master

#

😊

prime mica Nov 9, 2025, 10:00 AM

#

violet badger OK, for what it is worth: ``` ==== 4c902c7c443b6b71d06895097ecd8d85aa3a2e99 ====...

so based on this... the branchless insertion into added/removed was pretty bad?

#

maybe try git revert a227ff and see if that helps?

#

I could definitely see that one being arch (and compiler) dependent

#

also are these nodes 70 cores, 140 cores, 280 cores or what

violet badger Nov 9, 2025, 10:01 AM

#

4 x 72

prime mica Nov 9, 2025, 10:01 AM

#

gotcha

violet badger Nov 9, 2025, 10:01 AM

#

will try later...

prime mica Nov 9, 2025, 10:01 AM

#

cool!

#

ostrich is really ~~flying~~ running away with it

dark stream Nov 9, 2025, 10:29 AM

#

https://tests.stockfishchess.org/tests/live_elo/69105b3dec1d00d2c195c569
Test was doing so well early on. SSS, I know, but still... 😔

prime mica Nov 9, 2025, 10:30 AM

#

yep, there's hope...

#

oh I see

#

yeah probably arch dependence tbh

#

one of the first workers was a NEON :P

dark stream Nov 9, 2025, 10:35 AM

#

prime mica yeah probably arch dependence tbh

Is there any hope of bringing the performance on other archs to similar to ARM?

prime mica Nov 9, 2025, 10:35 AM

#

ehhh

#

definitely needs more research

#

I don't think it'll ever be closed fully

dark stream Nov 9, 2025, 10:46 AM

#

Btw, if this is the case, then will there be a specific binary be preferred to be sent to competitions like TCEC or CCC? I think that happens even now due to other reasons?

rocky vigil Nov 9, 2025, 10:46 AM

#

There is no need to

prime mica Nov 9, 2025, 10:46 AM

#

good question, these competitions have standardized hardware so it won't be the case

dark stream Nov 9, 2025, 10:55 AM

#

Wth? I glanced away for like 10 mins, and the LLR shot up.

naive comet Nov 9, 2025, 10:56 AM

#

prime mica my hope with the `sf_assume(index != Dimensions)` is that (some) compilers will ...

you should assume < Dimensions instead cuz one of the checks uses that iirc

amber fern Nov 9, 2025, 12:20 PM

#

dark stream https://tests.stockfishchess.org/tests/live_elo/69105b3dec1d00d2c195c569 Test wa...

25000 games in and +2 elo now, looking good! 😄

#

Will LTC be tested soon? 🙂

desert tree Nov 9, 2025, 12:22 PM

#

presumably after the STC test, if it passes

violet badger Nov 9, 2025, 12:27 PM

#

@prime mica the more accurate measurements now:

==== caee28c4e8fe2ea52d191ee56e27b1cb9cebabf3 ====
Average (over 10):  292442968
==== de7fbe6ba5c98b585aadb0a81ea7c3cb12fc4d54 ====
Average (over 10):  292813467
==== a227ff9916e98a33c198a1749f23f2783934be8a ====
Average (over 10):  290405859
==== 064d09f40f0ef0e9daf19609faaf79aa471ec362 ====
Average (over 10):  289685089
==== a6721c73eea0b11eaf9b574ed509265211be5738 ====
Average (over 10):  292753228
==== d8eb3207dbf7f264035ee52ddfa1db9c39471841 ====
Average (over 10):  289360047
==== 638e1786bde4e1f3eb17be5a32da3650666318d9 ====
Average (over 10):  290681425
==== 4c902c7c443b6b71d06895097ecd8d85aa3a2e99 ====
Average (over 10):  291312522

prime mica Nov 9, 2025, 12:27 PM

#

danke

#

very interesting

#

so branchless added/removed was bad

#

064d09f.. being bad is no surprise, that's cassowary

violet badger Nov 9, 2025, 12:29 PM

#

I think that's like the most clear one?

prime mica Nov 9, 2025, 12:29 PM

#

yes

violet badger Nov 9, 2025, 12:29 PM

#

should I try with a revert of just that one?

prime mica Nov 9, 2025, 12:30 PM

#

oh that one was superseded

#

by the following few commits

violet badger Nov 9, 2025, 12:30 PM

#

I see.

#

so any particular one to revert, or just fine as is?

prime mica Nov 9, 2025, 12:30 PM

#

yeah no smoking gun unfortunately...

#

I think the eneral pattern though is that using LUTs is not as good on ARM

#

or at least the machine u are testing on

violet badger Nov 9, 2025, 12:31 PM

#

yeah, might be that right now it is in the memory sweet spot

prime mica Nov 9, 2025, 12:31 PM

#

the neoverse cores seem to have a very good amount of cache according to Wikipedia...

violet badger Nov 9, 2025, 12:32 PM

#

  Package L#0
    NUMANode L#0 (P#0 117GB)
    L3 L#0 (114MB)
      L2 L#0 (1024KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 (P#0)

prime mica Nov 9, 2025, 12:32 PM

#

delicious

#

ok lemme try reverting just the LUT and you can retest?

violet badger Nov 9, 2025, 12:32 PM

#

sure.. maybe different branch to avoid confusion 😉

prime mica Nov 9, 2025, 12:32 PM

#

yep!

violet badger Nov 9, 2025, 12:33 PM

#

ptarmigan

prime mica Nov 9, 2025, 12:33 PM

#

lol

#

sure why not

violet badger Nov 9, 2025, 12:33 PM

#

good names are essential for passing sprt, I believe.

prime mica Nov 9, 2025, 12:36 PM

#

https://github.com/anematode/Stockfish/tree/ptarmigan-no-lut

GitHub

GitHub - anematode/Stockfish at ptarmigan-no-lut

A free and strong UCI chess engine. Contribute to anematode/Stockfish development by creating an account on GitHub.

#

you can try that one vs. the previous commit (which is the best lookup table–based make_index function I've cooked thus far)

#

i.e.
LUT: 6018b8cd092665c7cb7c0bb943a7f2fc48de72d9
no LUT: 9fb3700ef1c7e8578b676ffddf9c644eb9cf0b4d

violet badger Nov 9, 2025, 12:38 PM

#

ok, started both shas.

prime mica Nov 9, 2025, 12:39 PM

#

danke

#

out of curiosity have u ever gotten to see the nodes in person

#

or are they locked away in some massive basement

#

exciting

dark stream Nov 9, 2025, 1:08 PM

#

amber fern 25000 games in and +2 elo now, looking good! 😄

It was holding like +13 for a little bit. SSS, but OK. Not looking so hot now, though I guess it's atleast on par with Master.

prime mica Nov 9, 2025, 1:15 PM

#

patience my friend

#

it'll probably settle down around 1.5 methinks

#

still a huge win. I have a couple more speedups in the pipeline + we will have training tweaks I'm guessing

#

also it should scale well with TC

dark stream Nov 9, 2025, 1:20 PM

#

Yeah, yeah, I know. I'm just kind of impatient for a new net because honestly, search improvements have kind of tapered off as of recently, and maybe this will change that.

violet badger Nov 9, 2025, 1:39 PM

#

prime mica i.e. LUT: 6018b8cd092665c7cb7c0bb943a7f2fc48de72d9 no LUT: 9fb3700ef1c7e8578b676...

==== 6018b8cd092665c7cb7c0bb943a7f2fc48de72d9 ====
Average (over 10):  290043742
==== 9fb3700ef1c7e8578b676ffddf9c644eb9cf0b4d ====
Average (over 10):  281965813

rocky vigil Nov 9, 2025, 2:21 PM

#

it would appear LUT is still effective

rocky vigil Nov 9, 2025, 2:25 PM

#

dark stream Yeah, yeah, I know. I'm just kind of impatient for a new net because honestly, s...

i'm pretty sure this is more likely to restart net improvements rather than search improvements heh

violet badger Nov 9, 2025, 2:26 PM

#

hopefully both 🙂

violet badger Nov 9, 2025, 2:28 PM

#

foggy wind ``` GROUPED BY ARCH 64bit AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT ...

can we have another run of the script on https://tests.stockfishchess.org/tests/view/69105b3dec1d00d2c195c569 ? Maybe the script would be a nice feature on fishtest itself. Also category of windows/linux for certain patches.

foggy wind Nov 9, 2025, 2:29 PM

#

violet badger can we have another run of the script on https://tests.stockfishchess.org/tests/...

GROUPED BY ARCH

64bit AVX512ICL VNNI AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT | Elo:     0.28 ±    3.11 | LOS:  57.1% | LLR: -0.10 | [58, 1664, 3319, 1663, 64]
64bit AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                | Elo:    -0.41 ±    3.43 | LOS:  40.7% | LLR: -0.32 | [54, 1325, 2762, 1306, 57]
64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                       | Elo:    -2.18 ±    4.37 | LOS:  16.4% | LLR: -0.57 | [27, 836, 1659, 796, 26]
64bit AVX2 SSE41 SSSE3 SSE2 POPCNT                            | Elo:     4.10 ±    4.82 | LOS:  95.2% | LLR:  0.62 | [22, 599, 1364, 668, 19]
64bit POPCNT NEON_DOTPROD                                     | Elo:    16.20 ±    8.23 | LOS: 100.0% | LLR:  0.92 | [5, 163, 467, 235, 10]
64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                  | Elo:    -5.97 ±   10.42 | LOS:  13.1% | LLR: -0.24 | [11, 159, 322, 137, 11]

rocky vigil Nov 9, 2025, 2:32 PM

#

nice to see we're still winning on arm machines

violet badger Nov 9, 2025, 2:32 PM

#

these are now mostly macs

#

the rest is still a bit unclear, all LLRs still close to 0

rocky vigil Nov 9, 2025, 2:34 PM

#

I am pretty sure the rest total to negative LLR

#

(slightly)

#

the total LLR is relatively close to the sum of the group LLRs

foggy wind Nov 9, 2025, 2:37 PM

#

foggy wind ``` GROUPED BY ARCH 64bit AVX512ICL VNNI AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPC...

Does pext play any role here, or should AVX2 and BMI2 perform equally?

rocky vigil Nov 9, 2025, 2:38 PM

#

I would've thought BMI2 would be good because it makes threat tracking slightly faster but I guess not

#

btw the first != assume is redundant right

violet badger Nov 9, 2025, 2:39 PM

#

should be redundant.

rocky vigil Nov 9, 2025, 2:45 PM

#

sad that the elo gains from net and speedups seem not additive

dark stream Nov 9, 2025, 2:50 PM

#

Maybe selectively choose speedups once (if) they pass? Idk, how practical that is.

rocky vigil Nov 9, 2025, 2:51 PM

#

spikiness is really something

#

standard tests don't exactly have W-L casually spiking up/down by 100 in the span of only 5k games

dark stream Nov 9, 2025, 2:52 PM

#

rocky vigil standard tests don't exactly have W-L casually spiking up/down by 100 in the spa...

Hmm, noticed that too.

violet badger Nov 9, 2025, 2:53 PM

#

that's just depending on which machines join the test I would say

dark stream Nov 9, 2025, 2:55 PM

#

violet badger that's just depending on which machines join the test I would say

I find that uniquely frustrating... But it is what it is.

rocky vigil Nov 9, 2025, 2:57 PM

#

violet badger that's just depending on which machines join the test I would say

i wonder what the individual machine results look like...

foggy wind Nov 9, 2025, 2:58 PM

#

rocky vigil i wonder what the individual machine results look like...

📎 message.txt

#

But usually it doesn't make much sense to go into such detail because the error bars simply become too large.

rocky vigil Nov 9, 2025, 2:59 PM

#

yep

violet badger Nov 9, 2025, 3:03 PM

#

rocky vigil i wonder what the individual machine results look like...

you can click on the link?

#

but right there you have it.

rocky vigil Nov 9, 2025, 3:04 PM

#

fair

#

I suspect individual machine results on "normal" tests might also look like this

#

due to the variance

violet badger Nov 9, 2025, 3:05 PM

#

yes, individual machines hard to converge, might be possible in local tests though.

#

locally, for example, the branch is still much slower for me:

Result of 100 runs
==================
base (./stockfish.master       ) =    1107456  +/- 3570
test (./stockfish.patch        ) =     984159  +/- 3285
diff                             =    -123297  +/- 4113

speedup        = -0.1113
P(speedup > 0) =  0.0000

CPU: 16 x AMD Ryzen 9 3950X 16-Core Processor
Hyperthreading: on

rocky vigil Nov 9, 2025, 3:07 PM

#

ah yeah an x86 machine

violet badger Nov 9, 2025, 3:07 PM

#

but I suspect that slowdown is more than average, idk.

rocky vigil Nov 9, 2025, 3:07 PM

#

10% slowdown is actually really good

#

it used to be -30%

#

and then -20%

violet badger Nov 9, 2025, 3:08 PM

#

i see.

rocky vigil Nov 9, 2025, 3:08 PM

#

the persistent speedup work has really paid off with patience

violet badger Nov 9, 2025, 3:08 PM

#

well amazing process I think.

#

I will see if I can get some progress with net training, but not sure there is low hanging fruit. We'll see.

rocky vigil Nov 9, 2025, 3:10 PM

#

right the ideal process maybe is different than master

#

i also suspect low hanging fruit is mostly gone though

#

i wonder if the original NNUE development project resulted in a similar feeling

#

of slowly reaching towards hce, then exceeding it, then exceeding it by a lot

green moat Nov 9, 2025, 3:15 PM

#

violet badger I will see if I can get some progress with net training, but not sure there is l...

vondele, don't forget to remove and test (if needed) the duplicated binpack lines in threats.yaml recipe
🙂

violet badger Nov 9, 2025, 3:16 PM

#

of course that would need testing.. so let's see. Maybe as part of future training.

rocky vigil Nov 9, 2025, 3:22 PM

#

am hopeful that we get passing sprts soon enough, the real barrier seems to be LTC single thread

#

I suspect that cleanup work is quite nontrivial

#

but it still seems preemptive to start doing it now

violet badger Nov 9, 2025, 3:36 PM

#

LTC single thread was -0.5Elo in the previous test... this ought to be positive now?

rocky vigil Nov 9, 2025, 3:44 PM

#

Surely

#

1 elo maybe

foggy wind Nov 9, 2025, 3:44 PM

#

violet badger locally, for example, the branch is still much slower for me: ``` Result of 100 ...

single thread

Result of 200 runs
==================
base (...es/stockfish) =    2295252  +/- 4815
test (...stockfish.ti) =    2048251  +/- 5352
diff                   =    -247001  +/- 2504

speedup        = -0.1076
P(speedup > 0) =  0.0000

CPU: 16 x AMD Ryzen 9 9950X3D 16-Core Processor
Hyperthreading: on

32 threads speedtest

sf_base = 42662044 +/- 111193 (95%)
sf_test = 41059945 +/- 108936 (95%)
diff    = -1602099 +/- 112245 (95%)
speedup = -3.75533% +/- 0.263% (95%)

In multithreading, the difference becomes significantly smaller.

rocky vigil Nov 9, 2025, 3:44 PM

#

Yep

#

SMP favors threat inputs

foggy wind Nov 9, 2025, 3:45 PM

#

I thought that this only meant that larger nets are better with longer TC. But it's great that speed is also a factor.

rocky vigil Nov 9, 2025, 3:47 PM

#

The suspected explanation is that smp searches similar positions and has better access patterns to the threat features

lapis parrot Nov 9, 2025, 4:34 PM

#

amicic carrying BlessRNG

prime mica Nov 9, 2025, 4:41 PM

#

https://tests.stockfishchess.org/tests/live_elo/69107881ec1d00d2c195c5c2
sss but emu doing ok... @foggy wind maybe you can benchmark?

#

pretty much every instruction saved off make_index is measurable locally lmao. It's kinda cooked

prime mica Nov 9, 2025, 4:42 PM

#

foggy wind single thread ``` Result of 200 runs ================== base (...es/stockfish) =...

this is interesting but yeah makes sense...

#

so in a way threat inputs (hopefully) scales well in two ways ^_^

prime mica Nov 9, 2025, 4:45 PM

#

rocky vigil btw the first != assume is redundant right

I found out that a fair # of compilers aren't inlining this which is not ideal

#

causes like ten extra instructions, converting Square int8_t to int for example (because the ABI says that if you pass an int8_t, the upper 56 bits are undefined)

#

so I tried putting up a test to force inlining it but it doesn't compile for some reason

#

will try again in a bit

rocky vigil Nov 9, 2025, 4:46 PM

#

lapis parrot amicic carrying BlessRNG

arm cores literally llr printers with their +15 elo

#

I'm pretty sure the actual llr is negative on x86

violet badger Nov 9, 2025, 4:47 PM

#

locally, my x86 is at -10Elo

prime mica Nov 9, 2025, 4:47 PM

#

ugh yeah

#

LOL what

#

why is it soooo bad

violet badger Nov 9, 2025, 4:47 PM

#

Results of master vs patch (10+0.1, 1t, 16MB, UHO_Lichess_4852_v1.epd):
Elo: 10.98 +/- 5.65, nElo: 20.59 +/- 10.58
LOS: 99.99 %, DrawRatio: 48.67 %, PairsRatio: 1.27
Games: 4146, Wins: 1168, Losses: 1037, Draws: 1941, Points: 2138.5 (51.58 %)
Ptnml(0-2): [18, 451, 1009, 572, 23], WL/DD Ratio: 1.20

rocky vigil Nov 9, 2025, 4:47 PM

#

hmm

prime mica Nov 9, 2025, 4:47 PM

#

oh in a good way

violet badger Nov 9, 2025, 4:47 PM

#

no

prime mica Nov 9, 2025, 4:47 PM

#

wait what

violet badger Nov 9, 2025, 4:47 PM

#

well, in favor of master

prime mica Nov 9, 2025, 4:47 PM

#

oh I see

#

grotesque

rocky vigil Nov 9, 2025, 4:48 PM

#

fishtest x86 seems to be -2 or so

violet badger Nov 9, 2025, 4:48 PM

#

I think I have a particularly slow guy 😉

prime mica Nov 9, 2025, 4:48 PM

#

hmmm so how do we interpret this, net and speedups not being additive? speedups themselves not being additive?

#

maybe worth sanity checking the new net by itself without the extra changes?

lapis parrot Nov 9, 2025, 4:49 PM

#

raspberry pi moment

violet badger Nov 9, 2025, 4:50 PM

#

don't think the net by itself is to doubt. Played quite a few games.

prime mica Nov 9, 2025, 4:50 PM

#

👍

#

ok well I'll keep hunting

violet badger Nov 9, 2025, 4:51 PM

#

yeah, best approach ...

#

if emu adds up, that's another fair bit.

rocky vigil Nov 9, 2025, 4:52 PM

#

prime mica hmmm so how do we interpret this, net and speedups not being additive? speedups ...

generally speedups and other improvements are being only applied at like 1/2 rate against master

#

or at least that's what it feels like

violet badger Nov 9, 2025, 4:52 PM

#

I think actually, there is an argument for that.

#

at least Naphthalin has suggested before that self-play tends to overestimate Elo differences. That would be the case in the speedup vs reference, while threats vs master is no longer (or less) self-pay.

prime mica Nov 9, 2025, 5:05 PM

#

that would make sense actually

#

what's this junk in the header of update_accumulator_incremental

#

oh I guess this is reading the feature index from the added/removed hm

#

ok let's try specializing update_accumulator_incremental for the most common (added.size(), removed.size())

#

at least finally snowy egret isn't insta-failing on fishtest

#

ok another reasonable interpretation is that the speedups aren't actually that big

#

most of them have drifted down quite a bit

#

on fishtest

lapis parrot Nov 9, 2025, 5:37 PM

#

this is a general rule of thumb

#

you can't add up fishtest sprt elo to estimate a pt

#

probably it's the same with speedups

prime mica Nov 9, 2025, 5:37 PM

#

right they're all overestimates

lapis parrot Nov 9, 2025, 5:38 PM

#

this also applies to net training

rocky vigil Nov 9, 2025, 5:38 PM

#

yeah the net was vondele local fitbit

lapis parrot Nov 9, 2025, 5:38 PM

#

since nets are tested constantly almost every +8 +/-5 ends up being like at best 2

rocky vigil Nov 9, 2025, 5:38 PM

#

3 +- 2

#

who knows how much of it translated to fishtest

lapis parrot Nov 9, 2025, 5:39 PM

#

because you only submit ones that are showing good results, but if you test for like hundreds of them you are getting flukes

rocky vigil Nov 9, 2025, 5:39 PM

#

eh it was only like 5-10 that we tested

lapis parrot Nov 9, 2025, 5:39 PM

#

well, there is NCM

#

which tests every sf commit

#

and one commit that was like +11 +/- 5 elo

#

was a comment change

#

which is pretty obviously non-functional kek

prime mica Nov 9, 2025, 5:39 PM

#

lolololol

#

comment optimization meta

lapis parrot Nov 9, 2025, 5:40 PM

#

I remember a HCE patch

#

passed fishtest

#

with double SPRT

#

only for me to notice that it can't even theoretically do anything at not FRC

prime mica Nov 9, 2025, 5:40 PM

#

lmaoo

lapis parrot Nov 9, 2025, 5:40 PM

#

and fishtest didn't run FRC book back then

#

so this can happen, even back in sf 10 times average elo / passer was like 0,5 elo

#

so definitely most of what you see is a big overshoot / lucky run, this is pretty normal and not smth you can really fix in general

#

one problem is that back then we didn't really know anything about scaling so we were like "+6 elo STC, 1,5 LTC, fine"

#

nowadays it's always a suspect of being a bad scaler since literally half of the search scales in a weird way duh

lapis parrot Nov 9, 2025, 6:01 PM

#

STC passed

prime mica Nov 9, 2025, 6:04 PM

#

@foggy wind maybe u could check the x86 distribution?

rocky vigil Nov 9, 2025, 6:04 PM

#

lapis parrot STC passed

wait when did it

#

ok so it went from 0 llr to pass in the span of 20000 games

#

👍

#

machine luck???

prime mica Nov 9, 2025, 6:05 PM

#

tfw

warm thistle Nov 9, 2025, 6:06 PM

#

👀

lapis parrot Nov 9, 2025, 6:07 PM

#

rocky vigil machine luck???

idk

#

seems like amicic did 2 runs and both were done by 0 llr

#

can be just luck

#

I recall some of my SPRT sitting at 2,5 LLR LTC for like 60k games

#

so from 30 to 90r

#

and at 120k it failed with -2,95

#

this stuff just happens sometimes

prime mica Nov 9, 2025, 6:13 PM

#

@rocky vigil if you want you can merge emu as I'm pretty confident it's good

#

but up to you

rocky vigil Nov 9, 2025, 6:14 PM

#

prime mica <@693549181838819338> if you want you can merge `emu` as I'm pretty confident it...

o

lapis parrot Nov 9, 2025, 6:15 PM

#

prime mica <@693549181838819338> if you want you can merge `emu` as I'm pretty confident it...

why would you hurt feelings of @jolly tangle so much though

foggy wind Nov 9, 2025, 6:37 PM

#

prime mica <@398510765910523904> maybe u could check the x86 distribution?

https://tests.stockfishchess.org/tests/view/69105b3dec1d00d2c195c569

GROUPED BY ARCH

64bit AVX512ICL VNNI AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT | Elo:     1.66 ±    2.59 | LOS:  89.6% | LLR:  0.69 | [76, 2361, 4780, 2416, 95]
64bit AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                | Elo:     0.35 ±    2.64 | LOS:  60.3% | LLR: -0.10 | [98, 2227, 4697, 2234, 104]
64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                       | Elo:     0.93 ±    3.24 | LOS:  71.2% | LLR:  0.15 | [52, 1499, 3067, 1512, 62]
64bit AVX2 SSE41 SSSE3 SSE2 POPCNT                            | Elo:     3.01 ±    3.94 | LOS:  93.3% | LLR:  0.65 | [28, 918, 2024, 985, 29]
64bit POPCNT NEON_DOTPROD                                     | Elo:    20.68 ±    6.52 | LOS: 100.0% | LLR:  1.89 | [7, 276, 744, 425, 20]
64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                  | Elo:    -4.27 ±    8.34 | LOS:  15.8% | LLR: -0.28 | [15, 241, 485, 223, 12]

prime mica Nov 9, 2025, 6:39 PM

#

hey we're positive on all of them but one

#

sss but still

#

can you try just grouping x86 into one pot and NEON into another

#UE Threat Inputs for AB

Result of 200 runs