#UE Threat Inputs for AB

1 messages Β· Page 11 of 1

prime mica
#

I don't get it lol

green moat
foggy wind
prime mica
#

lololol

#

fighting anti-emu bigotry one PR at a time

#

my friend has a pet emu

#

haven't yet met her tho

#

(the emu)

foggy wind
#
GROUPED BY x86

x86 | Elo:     1.09 Β±    1.46 | LOS:  92.8% | LLR:  1.07 | [269, 7246, 15053, 7370, 302]
ARM | Elo:    20.68 Β±    6.52 | LOS: 100.0% | LLR:  1.89 | [7, 276, 744, 425, 20]
prime mica
#

hm ok

rocky vigil
#

lmao

rocky vigil
#

well there you have the contributions

lapis parrot
#

haven't you seen the memes?

prime mica
#

ugh

violet badger
#

without vondele fleet LLR printers. .... I object. I print Elo, not LLR

lapis parrot
#

etc

prime mica
#

#NotAllEmus

lapis parrot
#

there are a lot of this stuff, you can google yourself

rocky vigil
prime mica
#

superior architecture

#

^_^

#

CISCcels seething

rocky vigil
#

well everyone keep saying x86 is dead

#

so we gotta look towards the future

prime mica
#

true

#

deprecate x86-64-*

violet badger
#

we'll support RISC-V only for the future

lapis parrot
#

even at my work we have a buld that supports arm

#

but there are severe downsides though

prime mica
violet badger
#

EPI for the win

prime mica
lapis parrot
#

well, what is this function called

#

to calculate 1/x for x being a float

#

fast but not precise

prime mica
#

vrcpss

lapis parrot
#

nah

prime mica
#

lol

lapis parrot
#

well in general this function doesn't exist in library of arm cpus we use

#

but exists in dsp

violet badger
#

rsqrtss

lapis parrot
#

well you should understand that we use controllers etc

prime mica
#

what do you work on :o

#

that's very cool

lapis parrot
#

relay protection

#

recipf

#

ofc

#

at least in what we use you can't really use this in arm because library doesn't exist, note that this is a big production cycles so you can't simply switch to newer stuff out of the blue

prime mica
#

for sure

lapis parrot
#

so in general I tend to exclude divisions unless absolutely necessary

prime mica
#

ideal

foggy wind
#

I would say non functional with avx512icl and gcc 15.2.1

Result of 200 runs
==================
base (...fish.ostrich) =    2055743  +/- 4626
test (...tockfish.emu) =    2053770  +/- 4626
diff                   =      -1973  +/- 2362

speedup        = -0.0010
P(speedup > 0) =  0.0510
prime mica
#

yikes

#

we'll see fishtest then

#

might be arch dependent

#

no bench change = non-functional
no bench change, slowdown = dysfunctional

foggy wind
#

There is a new warning for snowy-egret-2

prime mica
#

screenshot?

#

probably just some unused variable or smth

foggy wind
#
position.cpp: In member function 'Stockfish::Position& Stockfish::Position::set(const std::string&, bool, Stockfish::StateInfo*)':
position.cpp:204:16: warning: 'void* memset(void*, int, size_t)' clearing an object of type 'class Stockfish::Position' with no trivial copy-assignment; use value-initialization instead [-Wclass-memaccess]
  204 |     std::memset(this, 0, sizeof(Position));
      |     ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from position.cpp:19:
position.h:80:7: note: 'class Stockfish::Position' declared here
   80 | class Position {
      |       ^~~~~~~~
prime mica
#

eh that's fine

#

it's because I added a dummy DirtyThreats to Position

#

we can silence it by casting to char*

lapis parrot
prime mica
#

:)

#

technically not UB

#

because wherever I use DirtyThreats I use placement new before it

lapis parrot
#

well google "sunday silence"

prime mica
#

Sunday Silence (March 25, 1986 – August 19, 2002) was an American-bred Thoroughbred racehorse and sire. In 1989, he won the Kentucky Derby and the Preakness Stakes but failed to complete the Triple Crown when he was defeated in the Belmont Stakes. Nevertheless, he won the Breeders' Cup Classic and was voted American Champion Three-Year-Old Col...

lapis parrot
#

indeed

#

would be a goated reference to the goat

prime mica
#

I still don't get it

#

does "char" or "cast" have a meaning in horse racing

lapis parrot
#

nah, just that it will be silencing at the sunday

#

nothing more than this

foggy wind
prime mica
#

whew

#

hopefully it'll finally pass fishtest

#

ur the best

foggy wind
# prime mica might be arch dependent

sss But it matches my gcc 15 result.

GROUPED BY COMPILER VERSION

g++ 13     | Elo:     2.32 Β±    3.55 | LOS:  90.0% | LLR:  0.55 | [23, 899, 2438, 949, 27]
g++ 15     | Elo:    -0.39 Β±    5.10 | LOS:  44.0% | LLR: -0.13 | [16, 500, 1190, 509, 9]
g++ 14     | Elo:     4.04 Β±    5.30 | LOS:  93.3% | LLR:  0.48 | [11, 442, 1116, 478, 17]
g++ 11     | Elo:     1.24 Β±    7.10 | LOS:  63.4% | LLR:  0.06 | [5, 243, 622, 239, 11]
clang++ 20 | Elo:     1.49 Β±    8.25 | LOS:  63.8% | LLR:  0.06 | [1, 185, 440, 186, 4]
g++ 12     | Elo:   -10.32 Β±   13.22 | LOS:   6.3% | LLR: -0.23 | [3, 77, 177, 62, 1]
clang++ 22 | Elo:   -43.66 Β±   43.48 | LOS:   2.3% | LLR: -0.08 | [0, 13, 14, 5, 0]
prime mica
#

interesting what is this

violet badger
#

blackmail material

prime mica
#

I think this is too SSS

prime mica
#

clang developers shitting their pants rn

prime mica
#

gotcha

#

idk if it's only -0.1% on Zen 5 and decent on other architectures then I think it's an easy choice

#

but we'll see, might fail

#

I figured out a cool prefetch trick that seems to work ok...

foggy wind
#

Even if it is neutral on gcc 15 and works well with older versions, everything is fine.

prime mica
#

do the psqt accumulation first and in those loops, prefetch the first chunk of the weights accumulation

#

finnicky tho

#

I think because you have the X3D (?) it'll be neutral-to-negative

#

because so much cache

#

but maybe better on fishtest

violet badger
#

meanwhile LTC works wonderfully?

#

this is a super bizarre patch..

lapis parrot
prime mica
#

if it goes well, thoughts on a VLTC test?

violet badger
#

rather SMP

prime mica
#

if this scales nicely then it should pass very quickly anyway

#

oh sure

violet badger
#

which I happen to run locally right now πŸ˜‰

prime mica
#

lol

#

Grace Hopper or "fitbit"

violet badger
#

x86

prime mica
#

cool beans

#

if this scales indefinitely TC-wise that would be so legendary

violet badger
#

well, that never happens, but looking good SMP at 10+0.1

prime mica
#

Torch shaking in its boots 😩

lapis parrot
#

well in general it "does"

#

somewhat

prime mica
#

is it bc of elo compression

lapis parrot
#

at least 120+1.2 SPSA did scale way past 120+1.2

#

elo compression on uho books exists

violet badger
#

'indefinitely' is poorly defined πŸ˜‰

lapis parrot
#

but it's not big

prime mica
violet badger
#

chess is O(1)

prime mica
#

true

lapis parrot
#

yeah at infinity it will play perfect chess anyway

prime mica
#

rare professor who cares about the big-O constant

prime mica
lapis parrot
#

just disable TT

violet badger
#

we already documented one during SF development

lapis parrot
prime mica
#

that's awesome

violet badger
#

let me find this..

violet badger
#

nice you had the tab still open πŸ˜‰

green moat
#

A thumbs up by vondele.
My life is now complete
😭

#

πŸ˜‰

foggy wind
prime mica
#

yuck

#

all that cache...

violet badger
#

meanwhile:

   # PLAYER    :  RATING  ERROR   POINTS  PLAYED   (%)
   1 patch     :     8.2    3.2  11799.0   23138    51
   2 master    :     0.0   ----  11339.0   23138    49
prime mica
#

yoooo

#

time controls?

violet badger
#

10+0.1t16

#

on x86

prime mica
#

very promising

violet badger
#

let me see what I get singlethreaded on the same hardware.

#

same hardware, same TC, single threaded

   # PLAYER    :  RATING  ERROR   POINTS  PLAYED   (%)
   1 master    :     0.0   ----  16984.0   32768    52
   2 patch     :   -13.0    2.8  15784.0   32768    48
prime mica
#

yikes lmao

#

idk if we're ever gonna get a 7% speedup

foggy wind
#

Looks like excellent scaling πŸ˜„

violet badger
#

well, with that kind of scaling this might not be needed.

prime mica
#

do we just ignore STC

violet badger
#

not 'just ignore'

prime mica
#

and hope that search patches will bring it up

violet badger
#

we focus on LTC and SMP LTC (i.e. PT).

#

but STC is a great tool to get there...

prime mica
#

excellent

violet badger
#

especially for speedups πŸ˜‰

prime mica
#

lol

#

one trick pony

violet badger
#

but that kind of difference between single threaded and multithreaded is kind of insane.

prime mica
#

indeed...

#

I'ma do a similar test locally to see how it looks on Zen 5

#

do you have a script you used?

violet badger
#

not really, but can share the fastchess commandline.

prime mica
#

yeah sure

#

that's what I meant

violet badger
#
threads=1
taskset --cpu-list $tasksetlow-$tasksethigh \
./fastchess -tournament roundrobin -concurrency $(($size/$threads)) -rounds 16 -games 2 -repeat -srand $RANDOM \
            -openings file=./UHO_Lichess_4852_v1.epd format=epd order=random\
            -engine name=master cmd=./stockfish.master.x86 tc=10+0.1\
            -engine name=patch cmd=./stockfish.patch.x86 tc=10+0.1\
            -config outname=config-foo\
            -pgnout file=games-foo.pgn\
            -each proto=uci option.Threads=$threads option.Hash=$((16*threads)) >& out-foo
prime mica
#

gotcha

#

did you use SMT or no

violet badger
#

yes.

prime mica
#

cool beans

#

why is it not printing anything

#

or is that expecetd

violet badger
#

look for a file named out-foo πŸ˜‰

prime mica
#

I am blind thank you

#

huzzah it's wroking

prime mica
#

if you think it's worth the data, I'd try running the STC tournament with no SMT...

#

I'm suspecting that the i8->i16 conversion spam doesn't play well with SMT

#

(not that that's a solvable problem)

prime mica
#
Results of master vs patch (10+0.1, 1t, 16MB, UHO_Lichess_4852_v1.epd):
Elo: -3.04 +/- 4.16, nElo: -5.90 +/- 8.08
LOS: 7.63 %, DrawRatio: 51.06 %, PairsRatio: 0.95
Games: 7094, Wins: 1827, Losses: 1889, Draws: 3378, Points: 3516.0 (49.56 %)
Ptnml(0-2): [32, 859, 1811, 829, 16], WL/DD Ratio: 1.14

ST penalty not quite so bad over here so far

violet badger
#

so that's quite good.

#

With more threads (10+0.1t256) still good..

   # PLAYER    :  RATING  ERROR  POINTS  PLAYED   (%)
   1 patch     :    13.8   11.0  1063.5    2048    52
   2 master    :     0.0   ----   984.5    2048    48
prime mica
#

numerous

violet badger
#

make -j ARCH=x86-64-sse41-popcnt profile-build errors out

prime mica
#

ugh

#

we never ported i8 to sse

#

just avx2+ and neon

#

should be easy enough

violet badger
#

ok, part of the cleanup effort..

prime mica
#

I'll do it rn, why not

violet badger
#

sure

prime mica
#

so much threat inputs progress in the past few weeks msheart_eyes

#

There are decades where nothing happens; and there are weeks where decades happen.

violet badger
#

So, the multithreaded cousin of this one looks like:

Results of master vs patch (10+0.1, 8t, 64MB, UHO_Lichess_4852_v1.epd):
Elo: -17.93 +/- 13.31, nElo: -36.33 +/- 26.92
LOS: 0.41 %, DrawRatio: 51.25 %, PairsRatio: 0.66
Games: 640, Wins: 140, Losses: 173, Draws: 327, Points: 303.5 (47.42 %)
Ptnml(0-2): [1, 93, 164, 62, 0], WL/DD Ratio: 0.91
prime mica
#

yikes what

#

I thought 16t was really good?

violet badger
#

mind the order (master vs patch)

#

so roughly 25Elo difference on the same machine between 1t and 8t at STC

prime mica
#

I get my signs right 50% of the time

#

as in, threat inputs is good?

violet badger
#

yeah.

prime mica
#

powerful

violet badger
#

this is super bizarre.

#

but well.

#

good.

rocky vigil
prime mica
#

lol

#

Lenin is displeased

#

hmph how to polyfill _mm_cvtepi8_epi16 for < SSE 4.1

#

on SSSE3 you could pshufb + srai

prime mica
#

advanced

#

@stray reef is it ok if I copy this and would you like credit

rocky vigil
#

so there is some SSSE3 thing I think

#

idk what to do for generic tho

prime mica
#

the fallback looks SSE2 compaible

rocky vigil
#

oh interesting

prime mica
#

I'll write three implementations, one for SSE4.1, one for SSSE3, and one for SSE2

#

then we should be good to go

torn lagoon
#

Sf doesn't support non-sse2?

prime mica
#

well we'll have a generic C fallback tha's slow as molasses

#

not sure whether that's done yet

rocky vigil
#

doesn't it get implicitly casted

prime mica
#

lol if so then that's great

#

ok! all three versions have the right bench

stray reef
prime mica
#

nah it's beautifu

#

thank u

foggy wind
#

Wrong bench for general-64: Nodes searched : 3117291

prime mica
#

🀦

#

ok lemme fix

foggy wind
#

Does ARM already work without NEON? And 32-bit ARM?

prime mica
#

non-NEON ARM will probably use the fallback

#

ngl I don't see why it's wrong...

#

huh, it's correct locally...

#

make -j build ARCH=general-64 right?

foggy wind
#

yea

prime mica
#

maybe (after ur done benching) you can try my SSE port branch...

foggy wind
#

I did a profile-build, but it shouldn't matter

prime mica
#

but I don't see how that would change it tbh

#

ye

foggy wind
prime mica
#

ughh

#

oh ok I can reproduce it now

#

I think I just forgot to build lmao

stray reef
prime mica
#

oh lol I see

#

ok honestly I have no clue why general-64 is bugged

rocky vigil
#

That still uses vector

prime mica
#

what

rocky vigil
#

Only 32 bit uses generic fallback

#

Or smth

#

Idk

prime mica
#

idt so

#

when I make changes to the generic fallback it changes the bench

#
        for (const auto index : removed)
        {
            const IndexType offset = Dimensions * index;

            for (IndexType j = 0; j < Dimensions; ++j)
                toAcc[j] = fromAcc[j] - featureTransformer.threatWeights[offset + j];

            for (std::size_t k = 0; k < PSQTBuckets; ++k)
                toPsqtAcc[k] =
                  fromPsqtAcc[k] - featureTransformer.threatPsqtWeights[index * PSQTBuckets + k];
        }

        for (const auto index : added)
        {
            const IndexType offset = Dimensions * index;

            for (IndexType j = 0; j < Dimensions; ++j)
                toAcc[j] += featureTransformer.threatWeights[offset + j];

            for (std::size_t k = 0; k < PSQTBuckets; ++k)
                toPsqtAcc[k] += featureTransformer.threatPsqtWeights[index * PSQTBuckets + k];
        }
#

I have to be missing something really obvious

rocky vigil
prime mica
#

OH

#

-=

#

I am a dumbas

#

thx

foggy wind
prime mica
#

ok not bad

#

better than ostrich which is what matters

#

rare force_inline W

#

thx as always <3

#

OK

#

sse2 inefficiency fixed, general-64 works again

#

so we should be good to go

prime mica
foggy wind
prime mica
#

O nvm huh

#

it rly rips through the indexing on your computer lol

#

ok well we'll wait for fishtest then

warm thistle
#
Result of  20 runs
==================
base (./sf-old       ) =    1385104  +/- 7425
test (./stockfish    ) =    1420016  +/- 9039
diff                   =     +34912  +/- 3671

speedup        = +0.0252
P(speedup > 0) =  1.0000

CPU: 8 x AMD Ryzen 7 7700X 8-Core Processor
Hyperthreading: on
prime mica
#

hmph ok

#

so similar to ostrich as well

violet badger
#

no surprise, but good:

Verify node counts: 
               g++-9 :    2324801
              g++-10 :    2324801
              g++-11 :    2324801
              g++-12 :    2324801
              g++-13 :    2324801
          clang++-11 :    2324801
          clang++-12 :    2324801
          clang++-13 :    2324801
          clang++-14 :    2324801
          clang++-15 :    2324801
          clang++-16 :    2324801
          clang++-17 :    2324801
          clang++-18 :    2324801
          clang++-19 :    2324801
          clang++-20 :    2324801
#

I should probably add a loop over our architectures..

#

time to call it a day. I suggest to start both SMP runs on fishtest once the LTC passes.

prime mica
#

gn!

frosty imp
#

are we preparing for the PR now?

lapis parrot
#

it's only at 2.6 LLR though

twilit oriole
#

It passed

warm thistle
#

πŸŽ‰

shell breach
#

πŸŽ‰πŸ₯³πŸŽ‰πŸ₯³πŸŽ‰πŸ₯³πŸ₯³πŸŽŠπŸŽŠπŸŽŠπŸ»πŸ»πŸ»

frosty imp
#

I'm assuming the branch is threat-i8-QA-255?

#

shouldn't the smallnet also be updated with the QA=255 quantization

#

@violet badger would it be possible to look into merging nnue-pytorch#370? I have some refactors planned that should make the feature system easier to work with

rocky vigil
rocky vigil
rocky vigil
frosty imp
#

same can be said for all of those though

rocky vigil
#

sure, how long will smallnet training take?

#

i think the threat-i8-QA-255 branch can also be used to train a smallnet

frosty imp
#

we can just requantize?

rocky vigil
#

just use HalfKAv2_hm^ feature set

frosty imp
rocky vigil
#

oh right

#

nvm

rocky vigil
frosty imp
#

ig just requantizing from nnue

rocky vigil
#

replace x with x * 255 / 127

#

actually no

#

just replace x with x * 2

prime mica
#

let's gooooo

#

so proud of everyone

dark stream
#

So, if this passes, then will it be merged? Or will you all try for more first?

prime mica
#

a lot of cleanup work to do first...

#

@frosty imp you've already done a lot of cleaning up right

frosty imp
#

somewhat

#

the later additions were not cleaned up at all

prime mica
#

gotcha

frosty imp
#

I think some of the nicer inference cleanups need trainer side coordination

prime mica
#

ah

frosty imp
#

but threat index calculation & co should be fine

prime mica
#

anything I can help with?

frosty imp
#

ofc

#

just clean up anything you see

prime mica
#

lol ok

#

and then PR to your fork?

frosty imp
#

I have i8 merged but not your speedup

prime mica
#

kk

frosty imp
#

the refactor branch is wip. need trainer side changes

prime mica
#

kk

frosty imp
#

with clang-format

prime mica
#

huzzah

#

will you apply the diff to the most recent SPRTs yourself or should I do that and PR it

frosty imp
#

PR plz

#

oh cool I see the PR

prime mica
#

I'll clean them up a bit though

frosty imp
#

oops I broke the compile by removing the friend struct Position thing

prime mica
#

friendship breakups suck

#

lmk when you've fixed

#

oh u did ok

frosty imp
#

fix already pushed

#

ye

prime mica
#

huh I get a segfault with sanitize=undefined,address

#

oh well we'll figure it out later

#

seems to be a misaligned struct

#

it's segfaulting on a memcpy that expanded to vmovdqa instructions

frosty imp
#

hmm

prime mica
frosty imp
#

merged

prime mica
#

danke

naive comet
#

Shawn have you clang formatted

prime mica
#

yes

frosty imp
#

yeah

prime mica
#

honestly the code isn't that bad

#

the only serious pain point imo is nnue_accumulator.cpp

#

which I gather u've been working on

violet badger
frosty imp
#

I would probably do that to be safe

#

although it's a simple reorganization. probably nothing will go wrong

violet badger
#

yeah, so will be a bit later.

rocky vigil
#

u've done smth wrong

prime mica
#

:P

violet badger
#

just work..

#

meanwhile, some results for 60+0.6t256.

   # PLAYER    :  RATING  ERROR  POINTS  PLAYED   (%)
   1 patch     :     6.3   10.7  1042.0    2048    51
   2 master    :     0.0   ----  1006.0    2048    49

A bit sss, but looks good.

rocky vigil
#

looks decent yeah

#

no horrible regression at tcec conditions

violet badger
#

well, likely quite reasonable progress

rocky vigil
#

@frosty imp how did you requantize the smallnet?

frosty imp
rocky vigil
#

try just multiplying every weight by 2 in the nnue

frosty imp
#

hmm isn't that 254 quant tho

naive comet
#

doesn't matter practically speaking

frosty imp
#

i'll try that later

frosty imp
#

wrong hash on this one oof

prime mica
#

is it an SSE

#

yeah it is

#

so that's why, the test doesn't have my SSE patch

frosty imp
#

oh I mean hash size

prime mica
#

ohhhh

#

what's it supposed to be?

#

128?

frosty imp
#

512

prime mica
#

O

#

big boi

rocky vigil
#

πŸ’€

#

eh

#

who cares

stray reef
#

a new era of nnue just started, great job everyone peepoHappy

prime mica
#

all thx to u

#

(and many others)

naive comet
#

LFG bois LFG

#

I reckon many speedups to come too

stray reef
#

so many good things coming from this at once. master net will be reproducable again, there will be new nets again after a long time, no spsa needed rn, probably some smart speedups incoming, SF 18 is coming πŸš€

prime mica
#

and obviously others will find fruit

violet badger
#

I assume that is a step towards getting threats into the main brach, right?

frosty imp
#

should allow refactoring feature transformers in the next PR, which will make getting threats in main easy

lapis parrot
#

to set up VLTC+

#

you need to click ***

dark stream
#

If the running test ends the way it is looking like, then threat inputs does indeed scale very well.

prime mica
#

life is good

dark stream
foggy wind
#

The LTC looked much more x86 friendly.

GROUPED BY ARCH

64bit AVX512ICL VNNI AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT | Elo:     3.24 Β±    2.91 | LOS:  98.6% | LLR:  1.11 | [9, 1429, 3509, 1528, 20]
64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                       | Elo:     2.63 Β±    4.49 | LOS:  87.4% | LLR:  0.35 | [8, 590, 1469, 637, 5]
64bit AVX2 SSE41 SSSE3 SSE2 POPCNT                            | Elo:     2.47 Β±    6.17 | LOS:  78.4% | LLR:  0.17 | [2, 304, 774, 320, 4]
64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                  | Elo:    10.76 Β±    6.57 | LOS:  99.9% | LLR:  0.86 | [0, 242, 669, 314, 2]
64bit AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                | Elo:    -1.66 Β±    8.16 | LOS:  34.5% | LLR: -0.13 | [1, 200, 444, 190, 2]
64bit VNNI AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT           | Elo:    -2.00 Β±    8.39 | LOS:  32.0% | LLR: -0.15 | [2, 183, 417, 178, 0]
64bit POPCNT NEON_DOTPROD                                     | Elo:    23.63 Β±   10.79 | LOS: 100.0% | LLR:  0.71 | [1, 85, 248, 151, 1]
GROUPED BY x86

x86 | Elo:     3.11 Β±    2.01 | LOS:  99.9% | LLR:  2.19 | [22, 2948, 7282, 3167, 33]
ARM | Elo:    23.63 Β±   10.79 | LOS: 100.0% | LLR:  0.71 | [1, 85, 248, 151, 1]
green moat
#

eventually

naive comet
torn lagoon
vestal gale
#

sprt?

green moat
green moat
#

Are there any preliminary results on L2=31 TI nets?

torn lagoon
#

So this point will never come

finite wind
#

What is the current elo vs master?

lapis parrot
#

2 stc 3,5 ltc

#

at least SPRT elo wise

finite wind
#

Over 10k posts in this thread, long battle πŸ‘

lapis parrot
#

so make a PR? test will pass soon I guess

twilit oriole
#

what branch is it again

#

like is the final branch going to be the shawn one or the one sscg13 has on the test

#

For the PR message need to decide how much detail to go in. Like do I discuss the alternate schemes that failed before this one (or things that were tried and failed in general) or stick to just explaining the final product

lapis parrot
#

I think you should write failed alternates in a separate doc

#

and link it in PR

#

otherwise it's too much text

twilit oriole
#

yep

#

if someone wants to make that doc feel free. and then i can just add the section talking about the other failed input schemes

#

it would be good to summarise the findings of this 10k messages i think

rocky vigil
rocky vigil
regal steeple
rocky vigil
#

I personally don’t care, I think it would make most sense to have Shawn make the pr since his branch is being actively updated with cleanup work

#

There are also corresponding PRs Shawn and I need to make to nnue-PyTorch

twilit oriole
regal steeple
#

But you did not do any stuff to get it to work in sf (which is where this pr is made)?

formal smelt
#

i agree one of shawn/sscg should open the pr
is the pr not gonna have like 10 quadrillion coauthors anyway?

rocky vigil
#

Yeah

formal smelt
#

coolio dont forget me :p else i'll be briefly sad

prime mica
#

ofc

#

shall we create a Google Doc

rocky vigil
#

Don’t worry viren claims to have a big list

formal smelt
#

its just his name in a very large font size

rocky vigil
#

Might be time to reveal :P

formal smelt
#

viren me lofty yoshie sscg shawn and then all the SF speedup gang?

#

in chronological order even

rocky vigil
#

~~ in chronological order I think I come before yoshie~~

#

Disservin, vondele, linrock also need to be credited

formal smelt
#

this will be the holiest PR in existence

rocky vigil
#

Tbf I think viren should just reveal the list

formal smelt
#

what are the current elo numbers at STC/LTC/SMP?

rocky vigil
twilit oriole
#

Yeah I will I'm on phone rn I don't have it on me lol

rocky vigil
twilit oriole
rocky vigil
#

I’m happy now that when I say I’m a sf dev it doesn’t mean I just made a one line simp

rocky vigil
rocky vigil
stray gyro
#

Increased TP of the current VLTC test to 100%...

rocky vigil
#

I don’t trust my personal acc with important stuff like this

#

That’ll get referenced many times in the future

twilit oriole
#

It will not be referenced directly. It will be downloaded and attached through GitHub lol

#

It's only for the collab stage

rocky vigil
#

Oh cool

stray gyro
#

^

rocky vigil
#

Yeah just make one then

prime mica
#

kk

#

DM me your emails? (Or put them here)

rocky vigil
#

Mine is the same one that appears on my github

prime mica
#

🐻

twilit oriole
#

Just share it publicly

rocky vigil
#

That also works

#

I don’t think anyone will grief

twilit oriole
#

Well it has version history anyways

prime mica
#

Finally snowy egret has a chance ugh

#

The memcpys were pissing me off

#

Maybe we should add a proper move assignment operator to ValueList

#

which won’t fix the problem but at least it won’t copy the whole thing

#

I think finding an upper bound for the threats list size no longer matters tho with egret

#

Speedups don't seem terribly important for the PR description right

rocky vigil
#

idk might as well highlight

#

the effort

prime mica
#

maybe we briefly describe the most important ones?

rocky vigil
#

like if we're gonna make a big doc

#

we have plenty of space for everything

prime mica
#

I'm planning to write an in-depth blog post about it (bc some of the techniques are interesting imo) so we can also link that

rocky vigil
#

ah nice

prime mica
rocky vigil
#

wow writing this stuff is harder than I though

prime mica
#

just stream of consciousness it!

#

and then we can reorganize

#

works ok locally

#

trolled by a loongarch worker lmaoo

#

I should do a loong vsx port some time

foggy wind
prime mica
#

ugh

rocky vigil
#

oh yeah meanwhile

prime mica
#

I wonder if it gets inlined, are you using clang?

rocky vigil
#

the VVLTC with 1/2 hash for both sides

#

officially passed

prime mica
#

πŸ₯³

rocky vigil
#

so yep

#

need cleanups

#

and preparing of the PR

foggy wind
#

Congratulations to everyone who put in a lot of hard work πŸ™‚

prime mica
#

thank u for all the hlep

rocky vigil
#

there is still a bit more to come

#

in terms of cleanup work

foggy wind
#

there is also still this warning:

position.cpp: In member function 'void Stockfish::Position::update_piece_threats(Stockfish::Piece, Stockfish::Square, Stockfish::DirtyThreats*)':
position.cpp:1104:18: warning: declaration of 'threatened' shadows a previous local [-Wshadow]
 1104 |         Bitboard threatened = ray & qAttacks & occupied;
      |                  ^~~~~~~~~~
position.cpp:1057:14: note: shadowed declaration is here
 1057 |     Bitboard threatened;
      |              ^~~~~~~~~~
prime mica
#

yeah that'll be fixed in cleanup

#

not a code error, just slightly sloppy code

rocky vigil
#

actually it probably wouldn't hurt to take a look at itnow

green moat
rocky vigil
#

Btw on net format, let’s try to print i8 verbatim and then leb128 for i16

#

And update the trainer accordingly

prime mica
#

is there a reason we use leb128 instead of something a bit simpler and more compact

#

the vast marjotiy of weights are in [-127,127] so

#

the format can just be 0x80 + (2 bytes) for i16s that don't fit, and the literal value otherwise (which will be sign extended)

#

should be like 10% smaller

rocky vigil
#

I mean we could unironically just write the weights verbatim

prime mica
#

based

rocky vigil
#

The sad part is because of packus preprocessing it doesn’t simplify memory sharing

rocky vigil
prime mica
#

lol

green moat
rocky vigil
#

cool

#

more merges

violet badger
rocky vigil
#

i think there are still many cleanups to make

violet badger
#

yes sure

rocky vigil
#
  • we haven't prepared the PR message
frosty imp
#

is the doc supposed to be the PR message

rocky vigil
#

i think we could also include like the most important parts

rocky vigil
lapis parrot
#

idk just make PR message "new arch" with SPRT results

#

the end

rocky vigil
#

append it to the pr?

prime mica
#

we can also just merge it later into master

frosty imp
prime mica
#

personally I'd just link to an external doc or Wiki entry?

#

for a more extensive explanation

rocky vigil
#

I think in the actual message we just put the SPRT results, a brief description of threat inputs, and the contributors

#

we are also waiting on Viren's list of contributors over time

rocky vigil
#

so let's wait for that

#

i mean there's no rush

rocky vigil
green moat
#

guys, the PR description is not important. It can be modified/updated afterwards

frosty imp
rocky vigil
#

ok cool

#

i would also prefer that like we be able to make the prs to sf / nnue-pytorch at the same time

#

since they're companion prs

frosty imp
#

what if we create an issue first with the list of tasks

#

like this

rocky vigil
#

yeah that seems good

violet badger
rocky vigil
#

ok cool

#

so just the random shas then

violet badger
#

yes

rocky vigil
#

as for the net format itself?

#

i think LEB128 on i8 weights is a waste

violet badger
#

I would keep it, or we have to redo training?

frosty imp
#

I would say threat weights and psqt weights be stored separately

violet badger
#

which we can of course

frosty imp
#

yeah

rocky vigil
#

wouldn't need to redo training, is just a change in serializing net

#

i think

#

the weights themselves would remain the same

#

just formatted differently

violet badger
#

well, I would at need fix the script ... new trainer sha would need rerun.

#

more after dinner πŸ˜‰

green moat
#

Is there any work/experiment to do on smallnet?

rocky vigil
green moat
#

also, sscg13, don't forget snowy-egret 2
πŸ™‚

prime mica
#

meh

rocky vigil
#

i think anematode can just tack it onto shawn's branch

prime mica
#

We can do it later imo

#

Or that

#

it’s a small change

rocky vigil
#

it's not functional

#

I think we should hold off on all functional changes (i.e. requantize smallnet) until after merge

prime mica
#

great

green moat
#

Also, tomorrow morning we will have Stage 5 net with L2=31.
Stage 4 net already available, should we test it?
πŸ€”

frosty imp
#

I think it's fine to merge if tested

violet badger
#

we should now work towards a mergable PR. There should probably be still a final test to verify the cleanup didn't introduce a regression or so. However, larger change, I would keep for afterwards and use the normal process to improve stuff.

prime mica
#

Agree

#

The flurry of improvements will last a while probably

#

So might as well get it in

twilit oriole
#

i think if possible try to avoid links in the text itself, i think take them outside to the top of the page. otherwise it is a bit much

#

like just list out monty, yukari, plentychess prs, nnue trainer prs etc. before going into the text

rocky vigil
#

ok

#

how to get a link to the original pdf in the first message

twilit oriole
#

hm i think we can actually put that in the PR message itself instead. and attached through github. It seems to be the easiest way to understand the general concept quickly

#

A graphical version of an earlier scheme (with less refinement) that illustrates the core concepts can be found at: <Link the initial monty inputs v2 pdf> I copy it here for later

rocky vigil
#

tbh I don't think the doc really needs to go into detail about speedups seeing as it's more to serve as an introduction into threat inputs

prime mica
#

agree

#

it's tangential

twilit oriole
#

hmm well if someone does write it it's still better even if not strictly necessary. otherwise those concepts will be lost to time

#

it's not like you can git blame the speedups itself

prime mica
#

I'm planning to write a blog post

#

would be happy to include the others' work as well

rocky vigil
#

and we can always link it later

prime mica
#

great

rocky vigil
#

what was the range of x86 speed loss again compared to master?

twilit oriole
#

For the PR message itself I think the PR links of the 3 engines (Monty, Yukari, Plentychess) being included is also probably important. Maybe that PR link section can just move there itself

#

I think its just going to be a bunch of links and short summary. Only way to have it condense

rocky vigil
#

yep

lapis parrot
#

I would recommend to not overdo a PR

rocky vigil
lapis parrot
#

since new arch and stuff is probably good to speedup development in other areas

#

so as soon as you make it the better it is imho

#

even if it's not complete, relevant info can be put on github after it

rocky vigil
#

i think we can get everything done within 1-2 days if we speed it

#

which should be fairly fast

lapis parrot
#

yeah just saying, my last project at work is more or less finished 2 weeks ago and I'm still making docs to close it

#

(:

rocky vigil
#

ah

#

I think the "full" doc looks solid now

#

so we can work on preparing PR message

#

and after that shawn and I need to lock in on the other things

twilit oriole
#

Well the PR message can be done in 1 hour, the branch being ready is the main thing lol

rocky vigil
#

@frosty imp what is remaining before threat inputs can be PR'd to nnue-pytorch

#

and I'll try to set up a tracking issue for the main sf PR

rocky vigil
#

so idk if this is actually the best approach now

rocky vigil
rocky vigil
#

most other stuff is optional and can be done after the initial PR

#

but this warning should definitely go

violet badger
#

you can also make a PR and use the first PR comment to keep a list of items?

rocky vigil
#

true, would need to wait for shawn to do that

violet badger
#

I think you could create that?

#

not that it matters too much.

rocky vigil
#

so pull shawn's branch and then create a pr with it

#

sure

violet badger
#

yes, creating a PR would have the advantage of CI running.

#

let's see what it uncovers πŸ˜‰

prime mica
#

oh dear

violet badger
#

🧟

prime mica
#

like uncovering a rock w/ a bajillion roaches and worms underneath it

#

(maybe)

#

or hopefully it's a nicely mowed lawn

rocky vigil
#

alright give me a few minutes to set it up

lapis parrot
#

roaches are 2 supply

#

so maxing out on them is not good

rocky vigil
#

speaking of vvltc results

#

I do wonder which pair master got double killed in...

lapis parrot
#

with ctrl+f ,1

#

and downloading relevant pgn

#

but in general it's pretty meaningless

rocky vigil
#

they can be downloaded by machine?

lapis parrot
#

this pairs happen in dev vs sf 17 from both sides

rocky vigil
#

interesting

#

ah sure

lapis parrot
#

you can open the test

prime mica
#

do the positions tend to be very sharp or something

#

or does one side just make a blunder early on

#

(or both)

lapis parrot
#

click on any number below idx

#

and download all pgns played by the worker

rocky vigil
#

yeah PT has them

lapis parrot
#

usually neither

rocky vigil
#

actually in higher frequency

lapis parrot
#

just some time trouble

#

where one side shows 0,00 and other shows +2

#

or some tactical miss

#

where losing side lacked ike 1-2 plies of search to see it

violet badger
#

first zombie identified πŸ˜‰

rocky vigil
#

a lot of checks failing

#

besides that

#

what else should I add to task list

prime mica
#

merging in a couple speedups

#

and then snowy egret ofc

#

both are pretty straightforward and non functional

violet badger
#

task list, PR to nnue-repo

twilit oriole
#

Merge them after or redo the overall sprts I think

#

Otherwise the listed Elos will be wrong I guess

violet badger
#

Task: verify correct .yaml is mentioned/linked for the training recipe

rocky vigil
#

is that a task for this pr or for nnue-pytorch pr though

#

which I need to get confirmation from shawn

#

that everything is ready

violet badger
#

just somewhere we can keep track πŸ˜‰

green moat
#

vondele, given how PRs work, wouldn't it be better to merge threats_input only when it will be difficult even for anematode to find some speedup?
My fear is that, if threats_input is merged soon, some next speedups might get lost in merge waves or could interfere with other gains....
Just saying...
πŸ™‚

prime mica
#

nahhh

rocky vigil
#

it is better to merge this and have it become master

prime mica
#

^

rocky vigil
#

well all other pending ones will need to get redone

#

that's unavoidable

#

but the sooner we do it the less time is wasted

green moat
rocky vigil
#

it does look like we got quite the bugs in the walls

#

that need to be cleansed

prime mica
#

mfw

rocky vigil
#

we also gotta add all the other contributors still unlisted here, once viren gets his list

lofty cedar
#

Relatively little for a patch that changes 30+ files I'd say.

rocky vigil
#

oh bruh nobody removed this ahhhh

#

somebody remove it

prime mica
violet badger
rocky vigil
#

well that gives a new entry to task list

#

"remove unused code"

#

and "fix write NNUE"

rocky vigil
#

though we could also hack it

#

it still irks me that read_leb128(a), read_leb128(b) is not equivalent to read_leb128(a+b)

lofty cedar
#

But it seems no UB right?

prime mica
#

that is funny

#

there's definitely something... bc it crashes for me locally with sanitizers on (but without a message)

rocky vigil
#

since nobody touched them

lofty cedar
#

Oh... yeah... but other than that...

#

Though I thought Stockfish was going to be slower to adopt threat input than this. It's pretty fast. Only Monty, Yukari, and Plentychess adopted it faster?

#

Impressive considering the baseline net is much better in Stockfish.

rocky vigil
#

so, the hack fix for this is to declare a combined array, write the threat weights and normal weights into the combined array, and then write_leb_128 the combined array

prime mica
#

great

rocky vigil
#

but like

#

do we wanna do hack fix

#

or wait longer and attempt to do a better fix

twilit oriole
#

I am out of date on this, is the issue the i8 weights are too hard to compress or smth?

rocky vigil
#

no

#

the issue is that everything is still leb128

#

and it's screwing stuff

twilit oriole
#

What is needed instead

rocky vigil
#

basically I don't like the current format bc we have to declare these huge combined arrays

#

in roder to read and write

twilit oriole
#

The hack fix doesn't sound too bad to me tbh

rocky vigil
#

yeah

#

ideally what I would prefer is:
(i8 threat weights) (leb128 psq weights) (rest of network)

#

instead of (leb128 combined weights) (rest of network)

twilit oriole
#

Why verbatim?

rocky vigil
#

ok confusing

#

just the bytes directly

green moat
#

Also, some poor guy (shawn_xu?) will have to update new SF NNUEv10 architecture scheme in nnue-pytorch....
πŸ™‚

prime mica
#

basically just memcpy

rocky vigil
#

when it shouldn't be necessary at all

#

yknow what I'll do hack fix for now

#

see if it fixes anything

violet badger
#

Right now, hack fix, and work on a new format for another round... yeah

#

if anybody asks about the architectures that SF supports most solidly, refer to this picture please

rocky vigil
#

btw @twilit oriole I've attached the PR links directly in the PR msg

#

so I think we can get rid of them in the other doc now

#

smallnet printing has been hacked correctly

#

still working on threatnet

#

alright

#

net printing fixed...

#

ok it seems like the next issue is the declaration shadows local variable

#

which turns into an error on some CI

violet badger
#

-Werror

rocky vigil
#

i think that can be easily fixed

#

let me try to do it as well...

#

what is this test checking?

#

I'll attempt to fix this as well if I understand what it wants

prime mica
#

it tells you if you're including a header that's not needed for it to compile

#

or if you're not including a header that, didn't your compiler transitively include it by another header, would make the program fail to compile

rocky vigil
#

well if u wannar ead the output

#

and lemme know what it wants

violet badger
#

last one to fix, don't worry

rocky vigil
#

ok

#

I'll push the declaration shadows local variable fix now

violet badger
rocky vigil
#

this cannot cause any performance regression right

prime mica
#

nah

rocky vigil
#

i'll chalk up the abnormally slow bench to my laptop being weird then

prime mica
#

yes

#

:P

violet badger
#

laptop regression confirmed by remote diagnosis.

#

The matetrack error is more interesting.

rocky vigil
#

yeah and matetrack

prime mica
#

lol wtf

#

invalid explicitly-specified argument for template parameter

violet badger
#

maybe Apple clang version 15.0.0 (clang-1500.3.9.4) issue?

#

or feature?

rocky vigil
#

at least nothing else erroring out so far

violet badger
#

the error was there before, so nothing random

prime mica
#

my apple clang is 17 unfortunately

#

maybe because it's declared inline for no reason?

rocky vigil
#

yeah a lot of compilers having issues with that

#

interesting

prime mica
#

ugh

#

_mm_cvtsi64x_si128 maybe

#

actually that's even worse

#

hm

#

we could do _mm_set_epi64x(0,x)

#

miserable

rocky vigil
#

idk compiler diffs are weird

prime mica
#

is there a way to check whether an identifier exists in the preprocessor

#

no right

rocky vigil
#

yeah prob not

#

tough

#

i mean I wouldn't know tho

#

somehow master has none of these compilation issues

#

sigh

prime mica
#

lmao

#

average non portability

rocky vigil
#

also i think someone who worked on the incremental threat can revisit this

prime mica
#

@rocky vigil ok I think try replacing _mm_cvtsi64_si128(x) with _mm_loadu_si64(&x)

#

OH WAIT

#

it's because it's building on 32-bit

#

ughhhhhh

#

ok yeah then _mm_loadu_si64 should work

frosty imp
#

@rocky vigil PR sent

warm thistle
#

i may have a speedup ```
Result of 20 runs

base (./sf-old ) = 1374133 +/- 10832
test (./stockfish ) = 1383722 +/- 11199
diff = +9589 +/- 2380

speedup = +0.0070
P(speedup > 0) = 1.0000

CPU: 8 x AMD Ryzen 7 7700X 8-Core Processor
Hyperthreading: on


it's also a simp so we'll see
prime mica
#

exciting

candid ivy
#

(if that hasn’t been said yet)

frosty imp
#

huh @warm thistle it was added here for optimization

warm thistle
#

oh hm

#

idk removing it seems to do better on my machine..

#

ig we see what fishtest says

prime mica
#

seems reasonable to get rid of it if it doesn't help

#

maybe it should be an assert instead of an assume

#

but I don't see how it could be faster

frosty imp
#

@rocky vigil made a few more PRs. will be back for more once those get merged

lofty cedar
#

Great job everyone! This was a long journey, but now, after the cleanup finished, we could finally merge to master!

#

And then we can start search tune and so on to gain some more.

#

You're awesome!

prime mica
#

Says u

#

The sama

rocky vigil
#

aight they're all merged

frosty imp
#

can resolve this now

rocky vigil
#

yep

#

bruh IWYU came up with new compilation errors

frosty imp
#

pr made

jolly tangle
rocky vigil
#

matetrack issue seems to be related to tb

#

i have no idea how that happened

#

we ostensibly didn't touch any part of tb probing

#

btw @frosty imp how is progress on nnue-pytorch going

#

should I make a PR now for that

#

and you see what needs to be changed

frosty imp
#

I'm working on a feature set refactor so best to wait after that

#

since there's prolly going to be heavy merge conflicts

rocky vigil
#

yeah

#

the merge conflicts already there

#

cool

rocky vigil
#

manually

#

later

#

yeah i assume we are not gonna change the net format or anything

violet badger
#

For the matetrack, I trying to find the reproducer, haven't extracted it yet, but seen this error message, which is probably the reason:
stockfish: syzygy/tbprobe.cpp:1148: void Stockfish::{anonymous}::set(T&, uint8_t*) [with T = TBTable<Stockfish::<unnamed>::WDL>; uint8_t = unsigned char]: Assertion `e.hasPawns == bool(*data & HasPawns)' failed.

#

unless that rings an immediate bell, I'll try to extract the testcase

#

syzygy/tbprobe.cpp:1073: uint8_t* Stockfish::{anonymous}::set_sizes(PairsData*, uint8_t*): Assertion `d->base64[i] * 2 >= d->base64[i + 1]' failed.

#

something is fishy πŸ™‚

stray reef
#

maybe salting the fish helps

rocky vigil
#

and this doesn't trigger with master?

#

very very strange

stray reef
#

i went to bed when sscg was fixing still, i woke up and sscg is still going kekgasm

rocky vigil
#

and woke up and started fixing more

stray reef
#

guess i just need more sleep than most people here

split warren
rocky vigil
#

btw does anyone else have cleanups they would like to propose to the pr

#

if so just pr it to my branch

stray reef
#

planned to read the diff in a lecture later (3-4h from now)

rocky vigil
#

fair

stray reef
#

one minor thing that comes to mind tho: we no longer require safe_destination() in bitboard.h, it can be moved back to make the diff simpler

rocky vigil
#

ah

#

btw on the x86-32-sse41-popcnt comp failure

#

LLM is suggesting to use _mm_cvtsi32_si128 instead

#

idk how trustworthy that is

#

i get the general issue of attempting to manipulate 64 bit stuff on 32 bit comp

#

but do we have a way to distinguish between 32 bit sse41 and 64 bit sse41

stray reef
#

could FullThreats::append_active_indices be simplified for pawns using one of the newly introduced attacks_bb() functions in bitboard.h?

#

just throwing some ideas here, not sure how much cleanup should be done now vs. afterwards

rocky vigil
#

it is faster for refreshing

#

though refreshing takes negligible amount of total time

violet badger
#
$ cat test3.inp 
setoption name syzygyPath value ../../syzygy/3-4-5/
position fen 8/8/8/8/6b1/1N1P4/5K1p/7k b - - 0 1
go nodes 100000
$ cat test3.inp - |  ../Stockfish/src/stockfish
Stockfish dev-20251110-b5a26a84 by the Stockfish developers (see AUTHORS file)
info string Found 145 WDL and 145 DTZ tablebase files (up to 5-man).
info string Available processors: 0-31
info string Using 1 thread
info string NNUE evaluation using nn-49c1193b131c.nnue (125MiB, (102384, 1024, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.
info depth 1 seldepth 3 multipv 1 score cp -40 nodes 11 nps 11000 hashfull 0 tbhits 0 time 1 pv g4f3
info depth 2 seldepth 3 multipv 1 score cp -33 nodes 26 nps 26000 hashfull 0 tbhits 0 time 1 pv g4f3
info depth 3 seldepth 4 multipv 1 score cp -27 nodes 138 nps 138000 hashfull 0 tbhits 0 time 1 pv g4e2
info depth 4 seldepth 5 multipv 1 score cp -97 nodes 811 nps 811000 hashfull 0 tbhits 0 time 1 pv g4h5 b3d2
info depth 5 seldepth 6 multipv 1 score cp -93 nodes 983 nps 491500 hashfull 0 tbhits 0 time 2 pv g4e2 d3d4 e2f3 b3d2 f3g2
info depth 6 seldepth 7 multipv 1 score cp -87 nodes 1024 nps 512000 hashfull 0 tbhits 0 time 2 pv g4e2 d3d4 e2f3 b3d2 f3g2
info depth 7 seldepth 9 multipv 1 score cp -96 nodes 1268 nps 634000 hashfull 0 tbhits 0 time 2 pv g4e2 d3d4 e2f3 b3d2 f3g2 d2c4 g2f3
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_default_append
#

no idea what is going on there..

rocky vigil
#

huh it looks like it's in the TB code

rocky vigil
violet badger
#

educate me ....

rocky vigil
#

i.e. repeat "go nodes x/ucinewgame" with increasing values of x

#

until crash

violet badger
#

ah, I see.

#

I can probably just print the fen of the probing..

rocky vigil
#

yeah to get an fen

#

essentially

#

how updated is probing code?

violet badger
#

I think it is more some corruption that just happens to trigger that.

rocky vigil
#

in any case it's strange

violet badger
#

if you happen to have TB around, can you test if you can reproduce?

rocky vigil
#

i only have shatranj TB lol

violet badger
#

ok dw

rocky vigil
#

what endgame?

#

I can download the specific 5 man

violet badger
#
setoption name syzygyPath value ../../syzygy/3-4-5/
position fen 8/8/8/8/6b1/1N1P4/5K1p/7k b - - 0 1
go nodes 100000```
#

but let me see if I get the fen

rocky vigil
#

yeah this is 6 piece (root pos)

violet badger
#

If I print out the fens it probs I get

...
Probe: 8/8/8/8/3P4/5KN1/8/6kr w - - 0 7
Probe: 8/8/8/8/3P4/5K2/8/6kN b - - 0 7
Probe: 8/8/8/8/3P4/5KN1/8/6kb w - - 0 7
==1805787== Thread 2:
==1805787== Invalid read of size 1
#

with that last fen triggering the error

rocky vigil
#

ah

#

bishop underpromotion is strange

#

so this is KNPkb

violet badger
#

but if I search that last fen nothing happens.

#

hmm that could be.

rocky vigil
#

what if you try a precursor position like 8/8/8/8/3P4/5KN1/7p/6k1 b - - 0 1

violet badger
#

no problem

rocky vigil
#

huh

#

doesn't seem to be a probing problem

violet badger
#

no something is strange..

rocky vigil
#

underpromotion, it being a check, captures available in the position are all edge cases of TB idk

violet badger
#

well, we've never had TB issues.

rocky vigil
#

but why would it only crash when root pos is far away

rocky vigil
violet badger
#

If I compile with sanitize=undefined I get:

Probe: 8/8/8/8/3P4/5KN1/8/6kb w - - 0 7
syzygy/tbprobe.cpp:1081:22: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int'
syzygy/tbprobe.cpp:1042:31: runtime error: shift exponent 151 is too large for 64-bit type 'long long unsigned int'
syzygy/tbprobe.cpp:1043:31: runtime error: shift exponent 209 is too large for 64-bit type 'long long unsigned int'
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_default_append
rocky vigil
#

that is very strange

#

and I assume it doesn't occur when the position is probed directly?

violet badger
#

it doesn't trigger on master... let me check on the branch

#

no the positions searches fine as rootpos

rocky vigil
#

and no warnings on shift exponent

#

ok

violet badger
#

right

rocky vigil
#

i guess the issue must be in the internal data being passed somehow

violet badger
#

I think so, but have to stop debugging now.. later today I can look into it again.

frosty imp
#

Issue in make move maybe?

violet badger
#

it is something rare, I'm currently playing games with syzygy enabled, and it is not triggering after a few 100 games.

#

but does trigger on that testcase.

violet badger
#

OK, finally, have a setup where this triggers reliably while playing games (basically book of random 6men positions).

#
     34  0-1 {Black mates}
     12  0-1 {White disconnects}
      9  1-0 {Black disconnects}
     25  1-0 {White mates}
      3  1/2-1/2 {Draw by fifty moves rule}
      7  1/2-1/2 {Draw by insufficient mating material}
#

and it is specific to the branch, not happening for master.

violet badger
prime mica
prime mica
#

Is there a field of Position or StateInfo that only tbprobe reads

#

I’m surprised address sanitizer isn’t catching anything

proper oxide
#

is there a reason there's double_inc_update for threats?

#

it seems to me like there's no optimization there?

#

this is a slight speedup for me

--- src/nnue/nnue_accumulator.cpp
+++ src/nnue/nnue_accumulator.cpp
@@ -212,17 +212,6 @@ void AccumulatorStack::forward_update_incremental(
             DirtyPiece& dp1 = psq_accumulators[next].diff;
             DirtyPiece& dp2 = psq_accumulators[next + 1].diff;
 
-            if (std::is_same_v<FeatureSet, ThreatFeatureSet> && dp2.remove_sq != SQ_NONE
-                && ((threat_accumulators[next].diff.threateningSqs & square_bb(dp2.remove_sq))
-                    || (threat_accumulators[next].diff.threatenedSqs & square_bb(dp2.remove_sq))))
-            {
-                double_inc_update<Perspective>(featureTransformer, ksq, threat_accumulators[next],
-                                               threat_accumulators[next + 1],
-                                               threat_accumulators[next - 1], dp2);
-                next++;
-                continue;
-            }
-
             if (std::is_same_v<FeatureSet, PSQFeatureSet> && dp1.to != SQ_NONE
                 && dp1.to == dp2.remove_sq)
             {
prime mica
#

lol

#

advanced

prime mica
#

so I think data is getting misaligned somehow in the TB read logic

#

huh but the only usage of a possibly-bad pos in mapped is constructing the file name...

#

is there a consistency check utility function for Position anywhere?

prime mica
#

just check whether __i386__ or __x86_64__ is defined

#

I don't think the LLM's suggestion makes much sense

amber fern
#

Is the threat inputs branch merged with the official SF branch yet? If not, when with that happen πŸ™‚

prime mica
#

patience

amber fern
#

What this guy said

#

Also, you said the same thing to that guy πŸ˜‚

prime mica
#

lol

amber fern
# prime mica lol

Yes I just read through 400+ messages on this thread, the entire history of the last ~4 days, y'all have a lot to say

prime mica
#

it's a complex change!

green moat
amber fern
#

Fish test time!

rocky vigil
#

we'll do it after the merge