#UE Threat Inputs for AB

1 messages · Page 5 of 1

rocky vigil
#

around half the speed of sf master net?

#

cannot read the numbers

stray reef
#

with my current arch (including the factoriser) training speeds are basically identical to my current master net

rocky vigil
#

at least one of the numbers was ~equal to sf master, one was 1/2 sf master, and one was 1/4

stray reef
#

yeah. no idea either how to interpret the 80-90it/s either, but from what i've seen in the past it seems pretty good

violet badger
#

same speed as master net (l1=3072) training.

#

so fairly straightforward to train.

#

certainly if we train just 1 or 2 stages.

#

actually even a bit faster to train.

stray reef
#
Elo   | -3.77 +- 2.98 (95%)
SPRT  | 40.0+0.40s Threads=1 Hash=64MB
LLR   | -2.25 (-2.25, 2.89) [0.00, 2.50]
Games | N: 11898 W: 2877 L: 3006 D: 6015
Penta | [6, 1394, 3281, 1259, 9]

https://furybench.com/test/3100/
LTC vs main. slowly getting there. i'm hopeful that a factoriser is all that's needed now

rocky vigil
#

mm

#

+2 LTC from previous but it doesn't say much

#

considering error bars

stray reef
#

First factoriser not looking good (loss also sucked)

--------------------------------------------------
Results of ThreatsFactorised vs Threats (20000 nodes, 1t, 16MB, UHO_4060_v2.epd):
Elo: -13.87 +/- 9.37, nElo: -21.63 +/- 14.58
LOS: 0.18 %, DrawRatio: 43.30 %, PairsRatio: 0.81
Games: 2180, Wins: 602, Losses: 689, Draws: 889, Points: 1046.5 (48.00 %)
Ptnml(0-2): [58, 284, 472, 239, 37], WL/DD Ratio: 1.58
--------------------------------------------------
Results of ThreatsFactorised vs Threats (5+0.05, 1t, 16MB, UHO_4060_v2.epd):
Elo: -13.70 +/- 11.09, nElo: -26.13 +/- 21.12
LOS: 0.76 %, DrawRatio: 48.46 %, PairsRatio: 0.75
Games: 1040, Wins: 245, Losses: 286, Draws: 509, Points: 499.5 (48.03 %)
Ptnml(0-2): [5, 148, 252, 113, 2], WL/DD Ratio: 1.03
--------------------------------------------------

I mean i've never experimented with factoriser schemes, there is the possibility that smth is still bugged ofc even though i double checked the things i could think of, but it also doesn't look bad enough for it to be bugged

#

i'll try coding up 768x4 next, when i have time

rocky vigil
#

yeah that looks cooked rip

stray reef
#

i tried to describe the information encoded in various threat schemes, in the hope of getting some collective opinion on what a factoriser might need most.

large threat inputs: [src][src_pc][src_pc_col][dest][dest_pc][dest_pc_rel_col]
small threat inputs: [src][src_pc][src_pc_col][dest][dest_pc_rel_col] -> leave out attacked piece type
what i tried:        [src][src_pc][src_pc_col][dest][dest_pc_worth_more_than_src_pc][dest_pc_rel_col]
768x4:               [dest][dest_pc][dest_pc_col][dest_attacked][dest_defended]
alternative idea 1:  [src_pc][src_pc_col][dest][dest_pc][dest_pc_rel_col] -> leave out source square
alternative idea 2:  [src][src_pc][src_pc_col][dest_pc][dest_pc_rel_col] -> leave out destination square

i'm actually thinking alternative idea 1 might be best, the source square should not be super important for the factoriser. but i wanna hear some opinions

rocky vigil
#

i think leaving out src square seems reasonable yea

twilit oriole
#

Why not use small threat inputs as the factoriser?

#

Because it is known to not be terrible even as standalone

rocky vigil
#

is that not what yoshie just tried

#

idk

twilit oriole
#

Oh I see lol

#

It is basically

stray reef
#

trying without encoding the source square now. for pawns, since source/destination are so closely tied, i encode source file+threat direction, but not rank. 6824 features

#

eta 17-22h from now, depends on when i'm home

stray reef
#

Still not great

--------------------------------------------------
Results of ThreatsFactorised vs Threats (20000 nodes, 1t, 16MB, UHO_4060_v2.epd):
Elo: -7.04 +/- 6.13, nElo: -11.26 +/- 9.79
LOS: 1.21 %, DrawRatio: 45.16 %, PairsRatio: 0.86
Games: 4836, Wins: 1329, Losses: 1427, Draws: 2080, Points: 2369.0 (48.99 %)
Ptnml(0-2): [95, 617, 1092, 519, 95], WL/DD Ratio: 1.31
--------------------------------------------------

I don't know, maybe it's not a thing that can be factorised well, at least in the ways i've tried so far? I.e. the weights of each "bucket" are too different, what i'm doing seems to be doing more harm than good

twilit oriole
#

Yeah

rocky vigil
#

well then

#

the debugging starts again

rocky vigil
#

ucinewgame position startpos eval (x2) gives two wildly different results

#

this is so bad

#

surprise surprise doing a no-ue inference hack on sf nnue vector code by treating the biases as accumulator caches breaks

#

because the biases themselves get updated

#

ok well this looks more like chess

#

idk how good this chess is

twilit oriole
#

This is the early checkpoint right

rocky vigil
#

yes

#
...      Frolic (stable) playing White: 0 - 46 - 4  [0.040] 50
...      Frolic (stable) playing Black: 0 - 48 - 2  [0.020] 50
...      White vs Black: 48 - 46 - 6  [0.510] 100
Elo difference: -603.9 +/- 183.4, LOS: 0.0 %, DrawRatio: 6.0 %
SPRT: llr 0 (0.0%), lbound -inf, ubound inf
100 of 100 games finished.

well seeing it can still destroy Frolic (~3080 CCRL blitz) at stc without ue

#

i think there shouldn't be any major issues with training/inference at this stage

#
...      Stockfish TI-experimental playing White: 6 - 29 - 15  [0.270] 50
...      Stockfish TI-experimental playing Black: 6 - 34 - 10  [0.220] 50
...      White vs Black: 40 - 35 - 25  [0.525] 100
Elo difference: -195.5 +/- 65.7, LOS: 0.0 %, DrawRatio: 25.0 %
SPRT: llr 0 (0.0%), lbound -inf, ubound inf
100 of 100 games finished.``` 10k node
#

idk how much the rest of training is worth

#

what should be the plan

#

start a full training run and compare fixed nodes?

#

200 superbatches was it?

#

i honestly have no idea how undertrained that is

#

besides "very"

rocky vigil
twilit oriole
#

well i think you should get rid of king buckets for the baseline lol

#

then we can compare to plenty results easier

#

Plenty has a L1 512 TI vs L1 1536 regular, SF would be L1 1024 TI vs L1 3072. So fixed nodes should be very similar

stray reef
#

hm -190 sounds almost like something's broken honestly, or the end LR is still extremely high

for my training setup there is no way any stage can be so bad, assuming a reasonable LR schedule

#

plenty L1 is 1792 btw

#

#nnue-dev message
given this, the elo diff seems fine

violet badger
#

Anyway, worthwhile training something stronger.

rocky vigil
#

Yeah still cannot guarantee everything is perfectly fine

#

But at least this is a lower bound

violet badger
#

right, but it is likely not outrageously wrong, which is good enough to put some more resources on this.

#

do you have some correlation plot, e.g. TI vs master net evals in a scatter plot?

rocky vigil
#

Ah I can make that later if you tell me how

violet badger
#

just take a random source of fens (e.g. a binpack), and evaluate once 1000 fens with your net and once with master net, and plot x,y..

rocky vigil
#

ok

stray reef
#

Btw @twilit oriole do you have any data on how much data and how many SBs/epochs a threat input net of a certain L1 size needs?

#

i'm wondering if mine is massively undertrained (not only wrt data, but also SBs)

twilit oriole
#

hm not really. we are using 12k SBs and 160B positions for an L1 8192

#

and that seems slightly undertrained but not by much

#

though mcts might have higher data requirements

stray reef
#

how many SBs would you do for L1 512, given enough data (whatever that may be)

twilit oriole
#

difficult to say because you can nearly always squeeze a few more elo out

#

probably something like 1k minimum, 2k to be sure

stray reef
#

hm ok

rocky vigil
#
...      Stockfish TI-experimental playing White: 19 - 141 - 90  [0.256] 250
...      Stockfish TI-experimental playing Black: 18 - 165 - 67  [0.206] 250
...      White vs Black: 184 - 159 - 157  [0.525] 500
Elo difference: -208.9 +/- 27.1, LOS: 0.0 %, DrawRatio: 31.4 %
SPRT: llr 0 (0.0%), lbound -inf, ubound inf
500 of 500 games finished.``` 20k nodes but idt there's really much more substantial things to learn atm
violet badger
#

I think the important check is to see if the inference is consistent with the trainer...

#

(though the script might need verifying it still works)

rocky vigil
#

actually my question is why isn't there a command that just returns the unnormalized eval

rocky vigil
#

btw @twilit oriole sparsity on the threat net L1 -> L2 seems trashed

#

have you measured this before

twilit oriole
#

wdym trashed

#

i found threat nets compress much better for us which would lead me to believe the opposite

rocky vigil
#

combined zeros here seems much lower

#

than in the halfka

#

(master arch at this checkpoint has like 78 which is double the amount)

#

oh right L1 issue

rocky vigil
#

@violet badger I'm measuring a large fixed nodes loss between the first checkpoint of the full run (nn-42b0b08a207a.nnue) and the net trained from the short run (nn-cc78fa7e0258.nnue) despite a lower validation loss (0.00405 vs 0.00425), is there a meaningful difference between the two in the first stage besides training time?

...      Stockfish TI-experimental playing White: 24 - 45 - 31  [0.395] 100
...      Stockfish TI-experimental playing Black: 20 - 48 - 32  [0.360] 100
...      White vs Black: 72 - 65 - 63  [0.517] 200
Elo difference: -86.9 +/- 40.7, LOS: 0.0 %, DrawRatio: 31.5 %
SPRT: llr 0 (0.0%), lbound -inf, ubound inf
200 of 200 games finished.```
violet badger
#

full first stage (nn-42b....) should be better than the previous (nn-cc78...), there is no difference except increasing from 200 to 800 epochs the training. In this sense nn-42b can now also be compared to similarly trained nets of the master arch (which are roughly -50Elo compared to master fully trained).

#

do I understand your measurement as showing it is worse?

rocky vigil
#

yes

#

I'll look into both impls again

violet badger
#

yeah, I think this need some checking from the implementation point of view.

#

note on the loss during training, we adjust lambda (mix between eval and game outcome) during training (if lambda start and lambda end is not the same), so the loss doesn't mean the same at the same epoch if the max_epoch is different.

rocky vigil
#

yeah i mean the 0.00405 vs 0.00425 comparison is from final epoch from both

#

but yeah something is strange

violet badger
#

possible the final epoch can indeed be compared.

#

still strange.

#

even if painful, I think the thing to do right now is to ensure trainer and SF have the same inference result.

rocky vigil
#

do all of the nnue-pytorch functions really need a gpu to run

#

i'll try to enlist a friend's help if that is the case

violet badger
#

most likely, at least I don't think non-gpu runs are still supported. It would add a new dimension to testing..

rocky vigil
#

i should have guessed something was sketchy

#

in stockfish piece enum color is msb, in nnue-pytorch it is lsb...

rocky vigil
#

ok well it turns out changing the threat indexing does not affect bench at all

#

um

#

something is highly wrong in my inference then

rocky vigil
#
void init_threat_offsets() {
    int pieceoffset = 0;
    for (int c = WHITE; c <= BLACK; c++) {
        for (int pt = PAWN; pt <= KING; pt++) {
            Piece piece = make_piece(Color(c), PieceType(pt));
            threatoffsets[piece][65] = pieceoffset;
            int squareoffset = 0;
            for (int from = SQ_A1; from <= SQ_H8; from++) {
                threatoffsets[piece][from] = squareoffset;
                if (pt != PAWN) {
                    Bitboard attacks = attacks_bb(PieceType(pt), Square(from), 0ULL);
                    squareoffset += popcount(attacks);
                }
                else if (from >= SQ_A2 && from <= SQ_H7) {
                    Bitboard attacks = (piece < 8) ? pawn_attacks_bb<WHITE>(square_bb(Square(from)))
                                                   : pawn_attacks_bb<BLACK>(square_bb(Square(from)));
                    squareoffset += popcount(attacks);
                }
            }
            threatoffsets[piece][64] = squareoffset;
            pieceoffset += numvalidtargets[piece]*squareoffset;
        }
    }
}```
no matter how I swap the order of the top for loops (either way), I get the same bench
#

idk what is going wrong...

rocky vigil
#

I legitimately do not know how changing the threat indexing does not affect bench at all

rocky vigil
#

the battle begins again

#

nvm this is it packing two ints and interpreting it as a u64

rocky vigil
#

more inclined to believe the issue is in the trainer now

#

so this is basically just a L1=1024 halfka net

#

no wonder it's -200 to master

#

doesn't explain how the 800 sb one is worse than the 200 sb one at fixed nodes

#

oh well

rocky vigil
# rocky vigil

@violet badger it looks like something is wrong right now so there isn't much point in continuing the run

#

I'll have to take a look into trainer again

violet badger
#

okay, just let me know if there are fixes to the trainer to test out and we can restart.

rocky vigil
#

yeah it's hard to work with nnue-pytorch w/o a gpu but hopefully my friend can help in the next few days

rocky vigil
#

@violet badger is it safe to rebase against master

stray reef
rocky vigil
#

oh nice

stray reef
#

@twilit oriole threat inputs don't allow duplicate encoding of the same interaction, e.g. two queens attacking each other. did you ever measure the elo of this?

#

i realised there's quite a few unused features due to this (only 73360 are used)

rocky vigil
#

i think it's possible to change the encoding itself

#

to reduce some of stuff like that

#

but it's annoying because you then have to treat it separately

#

not like indexing is the bottleneck anyways

#

it takes like < 1% of runtime

stray reef
twilit oriole
#

the elo was not measured no. you need to be careful about unused features, sometimes it is an illusion due to rare underpromos that would for example allow u to have two own bishops of same square complex etc

rocky vigil
#

huh that's strange

#

actually are you sure the lookup table is the right play here

stray reef
#

it was faster than the usual calculation

rocky vigil
#

hmmm

stray reef
#

though i'm 100% sure there must be a different encoding to make this faster

#

and also to figure out if a feature is unused or not

rocky vigil
#

back when diss ran profile the actual indexing portion was only 1% or so of runtime and generating the threats was like 20% over both sides

#

idk maybe stuff changes

stray reef
#

well maybe i did smth really stupid but i didn't really get very far with profiling

rocky vigil
#

do you actually have a profile of latest version

stray reef
#

the time taken in this loop is roughly 1/3 unpacking DirtyThreat and calculating relative squares, 1/3 table lookup, 1/3 adding into the arrays

rocky vigil
#

I can't do it bc windows sucks

rocky vigil
#

ok cool

naive comet
#

yoshie have you tried to split into 2 DirtyThreat lists, one with add and one with subtract, to remove branching in the loop? i think it could be a minor speedup

stray reef
#

i tried it in combination with smth else, can try it standalone as well

rocky vigil
stray reef
#

this loop basically takes more time than all of addsub

#

it's crazy

#

i think going back and forth between indexing the table and the dirty threat lists is awful for the cache, especially if there's like 10 threat updates to process

#

though i've not managed to found a way to improve it yet

rocky vigil
#

also random idea maybe don't use max capacity 128 indexlists

#

for add/remove

#

like 32 should do just fine

stray reef
#

tried that, was not a speedup

rocky vigil
#

oh well

#

yeah I tried once not to like create entirely new lists every time but that screwed with multithreading

stray reef
#

@naive comet maybe you have some idea on how to improve the cache situation? to not jump back and forth between dirtyThreats and the lookup table?

rocky vigil
#

how big is dirtythreats

stray reef
#
struct DirtyThreat {
  Piece piece;
  Piece attackedPiece;
  Square square;
  Square attackedSquare;
  Color pieceColor;
  Color attackedColor;
  bool add;
};

struct Accumulator {
  alignas(ALIGNMENT) int16_t threatState[2][L1_SIZE];
  alignas(ALIGNMENT) int16_t pieceState[2][L1_SIZE];

  DirtyPiece dirtyPieces[4];
  int numDirtyPieces;
  DirtyThreat dirtyThreats[256];
  int numDirtyThreats;

  KingBucketInfo kingBucketInfo[2];
  Board* board;
};

lmao the 256 can definitely be made smaller

but it's not like that's an issue here, we're staying in the same accumulator

twilit oriole
#

something else to try is measure threat activity per index over a long search. i think ultra rare threats could be combined

stray reef
#

oh god

twilit oriole
#

like the threats that only activate in underpromo situations etc

#

i expect the distribution has an extreme skew in general

rocky vigil
#

yeah i mean it looks small

#

idk about cache but i wouldn't see how it's a big issue

#

if anything the lookup table looks much larger of an issue

#

but if you measured that it gains over using less

stray reef
#

the lookup table is ofc way bigger than theoretically necessary

rocky vigil
#

then idk either

stray reef
#

but doing the calculations to reduce size (e.g. compressing the [64][64]) are more expensive apparently

naive comet
stray reef
#

yes, that would work

rocky vigil
#

if you're willing to do a bunch of mailbox lookups you only need the two squares

stray reef
#

i don't have colored pieces so it'd have to be bitboard lookups but yeah

rocky vigil
#

oh interesting

formal smelt
#

i wouldn't expect it to make a notable difference in the resulting net, though the training would be slightly different

stray reef
#

I tried a bunch more stuff to optimise the index calculation. Even tried unpacking the network like this

struct NetworkData {
  alignas(ALIGNMENT) int16_t inputWeightsPawn[ThreatInputs::LookupSizes::PAWN * L1_SIZE];
  alignas(ALIGNMENT) int16_t inputWeightsKnight[ThreatInputs::LookupSizes::KNIGHT * L1_SIZE];
  alignas(ALIGNMENT) int16_t inputWeightsBishop[ThreatInputs::LookupSizes::BISHOP * L1_SIZE];
  alignas(ALIGNMENT) int16_t inputWeightsRook[ThreatInputs::LookupSizes::ROOK * L1_SIZE];
  alignas(ALIGNMENT) int16_t inputWeightsQueen[ThreatInputs::LookupSizes::QUEEN * L1_SIZE];
  alignas(ALIGNMENT) int16_t inputWeightsKing[ThreatInputs::LookupSizes::KING* L1_SIZE];
  alignas(ALIGNMENT) int16_t inputWeightsPsq[768 * KING_BUCKETS * L1_SIZE];
  alignas(ALIGNMENT) int16_t inputBiases[L1_SIZE];
  alignas(ALIGNMENT) int8_t  l1Weights[OUTPUT_BUCKETS][L1_SIZE * L2_SIZE];
  alignas(ALIGNMENT) float   l1Biases[OUTPUT_BUCKETS][L2_SIZE];
  alignas(ALIGNMENT) float   l2Weights[OUTPUT_BUCKETS][2 * L2_SIZE * L3_SIZE];
  alignas(ALIGNMENT) float   l2Biases[OUTPUT_BUCKETS][L3_SIZE];
  alignas(ALIGNMENT) float   l3Weights[OUTPUT_BUCKETS][L3_SIZE + 2 * L2_SIZE];
  alignas(ALIGNMENT) float   l3Biases[OUTPUT_BUCKETS];
};

where the threat feature weights for each attacking piece are encoded as [64][64][6][2][2]. was equally fast. ofc there would be way too much unused space but i was hoping to at least achieve faster calculation, cache pressure was roughly similar still

#

i think i'll give up on speedups for now, and just generate some more data

twilit oriole
#

Hm. Something else to try is have the L1 for piece square inputs be larger than that of the threat inputs

rocky vigil
#

How are you inferencing that then

twilit oriole
#

?

#

In the usual way. Just stop early in the L1 for the threat inputs

rocky vigil
#

ah I see

#

asymmetric like that requires more extensive trainer modifications and stuff

twilit oriole
#

In bullet should be easy

stray reef
rocky vigil
#

yeah the issue rn is probably more data

#

than intrinsic scaling of the arch

violet badger
#

have 160B positions on offer for the price of $0.0

twilit oriole
rocky vigil
#

what was the difference in 123rrr4 btw

#

this is cool bc it should hopefully mean ltc is neutral now

#

so very close

stray reef
#

Yeah wanted to post about this. 0123rrr4 is the last stage with 600M more positions (5ksn adversarial) compared to 0123rrr. Gained 2 elo at STC + LTC

The game plan is generate 600M more positions while I'm on holiday, and then train a 640 L1

rocky vigil
#

nice it's looking very promising

#

hopefully you are rewarded for all of the effort soon enough

rocky vigil
#

@violet badger we discovered an error in the threat offsets initializer not being run. That should be resolved now, so the threat features should actually train

#

let's try a short test run first, and I'll verify the fix works

prime mica
#

super exciting

violet badger
violet badger
#

@rocky vigil do you happen to have a repo + sha of an SF that can use your net already? If I have it, I should be able to add this to the training pipeline already. Not urgent.

#

ok, think I found it threat-inputs-rebase last commit.

prime mica
#

how long is it expected to take?

violet badger
#

short test only, 1h for 'a bit of a net'

#

Full training schedule would e about 4days

prime mica
#

gotcha

#

do you happen to know what hardware is being used

violet badger
#

let me think..

prime mica
#

(wondering how doable it is to experiment at home)

violet badger
#

Needs experimenting, depends on your GPU.

#

that is on a H100 equivalent.

prime mica
#

fancy schmancy

violet badger
#

But I don't think this is ways faster than some fancy home GPU.

stray reef
#

It is merged

violet badger
#

while threat inputs in SF won its first games against master...
[129, 722, 283, 29, 0]

#

Elo: -150.87 +/- 7.78, nElo: -309.70 +/- 14.12

#

So, time to further increase epochs.

rocky vigil
#

Lemme do that quickly

#

And update against master as well

#

But it’s already looking better

violet badger
#

oh, that's going to make a difference, but sure.

rocky vigil
#

Gimme a bit to sanity check run through lldb etc.

violet badger
#

sure...

#

the net won't run away

rocky vigil
#

Would have done this yesterday if I knew it would’ve been a very fast response

violet badger
#

it is exciting, so got bumped in priority 😉

rocky vigil
#

Wait actually the current inference already seems to use the right indexing

#

Ah it was always the bullet indexing

#

Still lemme sanity check

rocky vigil
violet badger
#

so that sound promising..

#

it is still a very early net as well, I wouldn't expect a master net to be better than -100 Elo at this point.

rocky vigil
#

this is much better and looks proper

rocky vigil
# violet badger it is still a very early net as well, I wouldn't expect a master net to be bette...

quick sanity 20k nodes (on 8moves, balanced book)

...      Stockfish TI-experimental playing White: 27 - 25 - 48  [0.510] 100
...      Stockfish TI-experimental playing Black: 17 - 36 - 47  [0.405] 100
...      White vs Black: 63 - 42 - 95  [0.552] 200
Elo difference: -29.6 +/- 35.0, LOS: 4.9 %, DrawRatio: 47.5 %
SPRT: llr 0 (0.0%), lbound -inf, ubound inf
200 of 200 games finished.```
so unless some recent search development is extremely good at low node counts, the net is already quite good
violet badger
#

no magic search recently.

#

that's really rather strong already.

#

should definitely gain > 30 Elo from training.

#

There should be an updated net in like 12h or so, that should be equivalent to master -50Elo.

rocky vigil
#

the estimate of -20% speed with optimization still seems accurate

#

but am hopeful 50 elo more from the full training can be gotten

#

especially now that the threats seem to actually work properly

violet badger
#

pretty certain that 50Elo is still quite easy with training..

#

unless these nets train much faster

rocky vigil
#

i wouldn't expect it, due to parameter count

#

monty used a similar training schedule of 3000 * (100M pos) I think at L1=3072 or so, and plentychess was 1200 * (100M pos) at L1=512

naive comet
#

@stray reef do you have lofty's resource regarding incremental threat tracking?

violet badger
#

so, the quick practical conclusion is that the inference code is fine for testing right now. No need for me to change things urgently.

rocky vigil
#

yeah

#

switching to nnue-pytorch is good on my end

#

as I can just use the real inference code and it just "works"

violet badger
#

not sure I fully followed that remark, but yes, SF inference code is working, though will need the speedup work that we kind of know how to start.

#

If nnue-pytorch is working fine, we'll have a next net in about 12h

#

And could have a fully trained net in 3-4d

rocky vigil
#

needed to write the entire inference from scratch last attempt with bullet

violet badger
#

I understand now... one day would still be nice to have a bullet compatible setup, but that's a different story.

stray reef
#

though it is using bitlists instead of bitboards

#

Stefan Pohl is going to do some tests with the new net as well, against the latest release (net being the only diff to latest release). Will be interesting to have those results as well

rocky vigil
#

ah

#

fair enough

naive comet
#

how are the (expanded) threat inputs indexed btw

#

I know your current input set is just that but squished right?

stray reef
#

the current indexing setup stems from an old montytrain branch

naive comet
naive comet
#

thank you

sharp sail
rocky vigil
#

huh

#

i actually did hear smth like this from leela

sharp sail
#

The way I explained it for myself was because with each step you update more parameters than for a small NN

rocky vigil
#

like about the large nets being faster initially but much slower to squeeze out maximum performance from

sharp sail
#

So it has more potential to learn in a single step

#

But without systematic analysis I'll be careful to make a definite claim, it could also just be that that hyperparameters were optimized for large NNs

frosty imp
rocky vigil
#

ok

#

sample 0 looks correct by manual inspection

#

i mean the fact that it's so close fixed nodes means it hopefully works

sharp sail
#

how close?

#

one of the reasons why NNs are so hard to debug is because even when they're buggy, they often perform pretty well

rocky vigil
twilit oriole
#

What L1 size

#

@rocky vigil

rocky vigil
#

1024

#

well that was 100 sb

twilit oriole
#

Interesting. And that's like -20% speed?

rocky vigil
#

should be according to plenty data

twilit oriole
#

The plenty measurement didn't have pairwise?

#

I would have thought it's less than 20% slowdown

rocky vigil
#

ahhh

#

mm hmm

twilit oriole
#

I think anything above 30 fixed nodes should be passing easily at SF VVLTC for around 15% slowdown

rocky vigil
#

so we still need ~60 more ish

violet badger
#

should be quite straightforward to measure nps?

#

no need to speculate what it is right now?

twilit oriole
#

It isn't. Very position dependent

rocky vigil
#

right now the inference is not intended to optimize nps

violet badger
#

speedtest works?

#

you just get a number

rocky vigil
#

it is intended to optimize for correctness

violet badger
#

sure

twilit oriole
#

Not really. The elo dependence of speed is dependent on position

rocky vigil
#

non-ue on my laptop is like 1/3 - 1/2 the speed of master

violet badger
#

right so that's a number

rocky vigil
#

but i think my laptop is not representative

#

do stuff wrong and it'll send processes between the P / E cores etc.

#

average intel laptop experience

violet badger
#

let me measure

twilit oriole
rocky vigil
#

the target is +30 (fixed nodes), and the 100 SB one was -30 +- 30

violet badger
#

at STC the difference is 150 Elo

twilit oriole
#

Lol

rocky vigil
#

and yeah -150 elo or so seems reasonable

#

for being 2x slower rn

twilit oriole
#

Well I don't think we "need" 60 Elo kek. We need a measurement with lower error bars lol

rocky vigil
#

this is true

rocky vigil
#

threat-inputs-rebase

twilit oriole
#

Where's the net

twilit oriole
#

And I just set evalfile?

rocky vigil
#

my branch should name this net as default

#

so it should compile with it yea

violet badger
#

592817 nps vs 1084640nps , so 54%

#

(quick test via bench)

twilit oriole
#

How many SBs are there in total

rocky vigil
#

the full run should have 800 * num stages

twilit oriole
#

Oh early days then. Might as well wait till at least first stage concludes

rocky vigil
#

i think at some point we should have the first stage checkpoint

#

yeah

#

800 SB

twilit oriole
#

I assume there is some numbers on how close it is after first stage?

#

In a regular master run

violet badger
#

normal net would be -50Elo

#

First stage should be ready in a couple of hours.

rocky vigil
#

theoretically we surpassed -50 elo with 1/8 of the first stage so hopefully the good stuff continues 🙏

violet badger
#

I think it looks promising indeed.

#

56% of speed is a lot of Elo STC.

#

(consistent with your fixed nodes number and my STC number)

#

I think people should start looking at a faster inference now, full trained net will be there before the end of the week.

rocky vigil
#

yeah I'll start working with yoshie and let's see how we should approach the incremental threat tracking

violet badger
#

Pretty sure we'll get some more people to look at this as it makes progress.

prime mica
#

exciting

#

once there's something testable I'll take a look into improving NPS

violet badger
#

there is

rocky vigil
#

ah yeah your upstream optimizations have also made it here (:

prime mica
#

lol

rocky vigil
#

what we need to do next for improving NPS is like set up the foundation basically

#

our UE framework, etc.

rocky vigil
#

and after we do that it's minor optimizations go go

violet badger
#

I agree..

prime mica
#

for sure

violet badger
#

though some pondering can go in parallel 😉

stray reef
#

bestmove sleep ponder speedup_ideas

prime mica
#

I have a strange plot atm to fuse FC0 with add/sub

violet badger
#

bestmove do dishes

prime mica
#

also to fuse consecutive add/subs together

#

will try to keep it well-abstracted tho

rocky vigil
#

ooh interesting

#

working fusing would be quite good

#

since average threat update has multiple add/sub

prime mica
#

ye

#

my hope™ is that if add/sub is really memory bandwidth limited, then we should be able to do useful work (like the dot products) at the same time

#

but there are complications ofc

rocky vigil
stray reef
#

can probably do a lot by fusing threat updates if done right

rocky vigil
#

can't have good ue without dual accumulator, as otherwise every king move is suddenly gonna be 4x as expensive

#

so i guess that might be the priority

stray reef
#

i've been trying to come up with something similar to finny tables, that fuses threat updates on a per move basis (for frequent moves), but no good idea yet

stray reef
rocky vigil
#

yeah i expected as much

prime mica
#

what is a "dual accumulator"

rocky vigil
#

should track the contribution from threat features and psq features separately

prime mica
#

ohh

#

smort

rocky vigil
#

bc like, the refresh patterns are different

#

psq needs a full refresh every king move

#

but threats only need a full refresh when the king crosses d/e (due to horizontal mirroring)

prime mica
#

interesting

frosty imp
#

what's an up-to-date threat inputs branch/net

frosty imp
#

hmm the net is not on fishtest?

rocky vigil
#

stage 1

#

in ~ a few hours

#

hopefully that'll be equal fixed nodes to master

#

at least

frosty imp
#

is there a net I can use to just get it running

rocky vigil
#

this is also the one named in my branch

frosty imp
#

ah cool

#

uploaded it to fishtest btw

rocky vigil
#

oh lol

frosty imp
#

do threat inputs apply to psqt?

rocky vigil
#

yes

#

I think they do

#

why not

prime mica
#

noob question, why are there both piece square table and positional factors

frosty imp
#

I see

prime mica
#

like why not just the latter

frosty imp
#

(theoretically)

#

but practically it gains to use the difference between psqt and positional as information

lofty cedar
#

It seems that capturing simpler features first makes it much easier for the rest of the net to focus on the nonlinear ones.

prime mica
#

what is this difference intuitively

#

how sharp the position is?

frosty imp
#

some kind of complexity measurement

#

yeah

prime mica
#

interseting

frosty imp
lofty cedar
#

Well, for some reason... thr psqt and the positional factors are 125/128 and 131/128... but actually, they were trained on both being 1.

#

And somehow it gained.

frosty imp
#

also are we planning to use psqt biases?

lofty cedar
#

I think the 125, 131 are tuned...

rocky vigil
rocky vigil
#
...      Stockfish TI-experimental playing White: 33 - 12 - 55  [0.605] 100
...      Stockfish TI-experimental playing Black: 19 - 30 - 51  [0.445] 100
...      White vs Black: 63 - 31 - 106  [0.580] 200
Elo difference: 17.4 +/- 33.1, LOS: 84.9 %, DrawRatio: 53.0 %
SPRT: llr 0 (0.0%), lbound -inf, ubound inf
200 of 200 games finished.```
#

now pushed a bench to my branch

#

corresponding stc:

Elo: -116.83 +/- 6.88, nElo: -237.30 +/- 12.98
LOS: 0.00 %, DrawRatio: 30.60 %, PairsRatio: 0.08
Games: 2752, Wins: 300, Losses: 1192, Draws: 1260, Points: 930.0 (33.79 %)
Ptnml(0-2): [81, 802, 421, 72, 0], WL/DD Ratio: 1.18
LLR: -2.95 (-100.0%) (-2.94, 2.94) [-101.00, -99.00]```
rocky vigil
# rocky vigil ```Score of Stockfish TI-experimental vs Stockfish 10/07/25: 52 - 42 - 106 [0.52...

update: Score of Stockfish TI-experimental vs Stockfish 10/07/25: 255 - 251 - 494 [0.502] ... Stockfish TI-experimental playing White: 148 - 99 - 253 [0.549] 500 ... Stockfish TI-experimental playing Black: 107 - 152 - 241 [0.455] 500 ... White vs Black: 300 - 206 - 494 [0.547] 1000 Elo difference: 1.4 +/- 15.3, LOS: 57.1 %, DrawRatio: 49.4 % SPRT: llr 0 (0.0%), lbound -inf, ubound inf 1000 of 1000 games finished.

stray reef
#

(you may be talking about the integrated psq of the sf arch, not threats, in that case nvm)

rocky vigil
#

oh btw yoshie

#

we should probably also concurrently start working on setting up the ue

#

actually lemme start by figuring how how to do dual accumulator

stray reef
#

alright, i can start with incremental threat tracking today

rocky vigil
#

like add on to my branch?

#

that would be welcome yeah

#

how much do you estimate you'd have to overhaul sf stuff

rocky vigil
#

ostensibly you need to add stuff to the position structure etc.

stray reef
rocky vigil
#

well good luck with it

#

i sleep soon

#

i don't know if the rest of the training stages are happening but that would also be interesting to see

rocky vigil
#

But this is a lot of code duplication

frosty imp
#

i'm doing dual acc right now

#

it's done except for whatever bug in psqt

rocky vigil
#

Ohhhh

#

Very cool

frosty imp
#

cc @rocky vigil

#

updated branch with dual acc. Just FYI I refactored some stuff with the input features so it's probably best to write incremental threats on top of this

rocky vigil
#

Ok cool

#

There is also a new net (see above) just to note

#

Bench looks right

#

For the older net

violet badger
#

nice, that worked well, so roughly 30 Elo progress and parity at fixed nodes.

#

With some luck adding the other training stages adds another 30+ Elo. So I'll start those soon

frosty imp
#

I’m wondering if some pairwise multiplication-ish architecture is possible with threat inputs

#

Since dual accumulators is already a thing

rocky vigil
#

Lemme look more carefully

rocky vigil
#

Ah you updated it to exclude the psq parts

#

Fair enough

#

Does it achieve any speedup

#

Now that the halfkav2hm part is being ue’d normally

rocky vigil
#

Very strange

rocky vigil
# frosty imp updated branch with dual acc. Just FYI I refactored some stuff with the input fe...
Stockfish dev-20251012-536051bf by the Stockfish developers (see AUTHORS file)
info string Using 1 thread
Warmup position 3/3
Position 258/258
===========================
Version                    : Stockfish dev-20251012-536051bf
Compiled by                : g++ (GNUC) 15.1.0 on MinGW64
Compilation architecture   : x86-64-bmi2
Compilation settings       : 64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 15.1.0
Large pages                : no
User invocation            : speedtest 1
Filled invocation          : speedtest 1 128 150
Available processors       : 0-15
Thread count               : 1
Thread binding             : none
TT size [MiB]              : 128
Hash max, avg [per mille]  :
    single search          : 43, 25
    single game            : 732, 453
Total nodes searched       : 122156946
Total search time [s]      : 153.585
Nodes/second               : 795370```
```./stockfish speedtest 1
Stockfish dev-20251012-3a5c355e by the Stockfish developers (see AUTHORS file)
info string Using 1 thread
Warmup position 3/3
Position 258/258
===========================
Version                    : Stockfish dev-20251012-3a5c355e
Compiled by                : g++ (GNUC) 15.1.0 on MinGW64
Compilation architecture   : x86-64-bmi2
Compilation settings       : 64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 15.1.0
Large pages                : no
User invocation            : speedtest 1
Filled invocation          : speedtest 1 128 150
Available processors       : 0-15
Thread count               : 1
Thread binding             : none
TT size [MiB]              : 128
Hash max, avg [per mille]  :
    single search          : 47, 29
    single game            : 798, 525
Total nodes searched       : 137345907
Total search time [s]      : 153.564
Nodes/second               : 894388```
rocky vigil
#

nice initial speedup

#

so now that takes care of the psq part

#

so we can focus on incremental threats

violet badger
# rocky vigil Unexpected EOF

restarted, some network issue can cause that (somewhere between the gitlab runner reading the output and the actual calculation).

rocky vigil
#

huh

#

is the progress lost or no

violet badger
#

no worries.

#

relatively transparent.

#

so, looks like we already made progress with the inference code... nice!

rocky vigil
stray reef
#

Some other stuff got in the way, should get somewhere tomorrow

rocky vigil
#

ah fair

rocky vigil
#

stage 2, (nn-a878500a97a8.nnue), 8moves_v3.epd, 20k nodes

...      Stockfish TI-experimental playing White: 151 - 72 - 277  [0.579] 500
...      Stockfish TI-experimental playing Black: 104 - 116 - 280  [0.488] 500
...      White vs Black: 267 - 176 - 557  [0.545] 1000
Elo difference: 23.3 +/- 14.3, LOS: 99.9 %, DrawRatio: 55.7 %
SPRT: llr 0 (0.0%), lbound -inf, ubound inf
1000 of 1000 games finished.```
#

🙏

prisma hatchBOT
frosty imp
#

threat net solves this while master can't 👀

#

significant static eval diff for 8/p6b/r1p1p3/P1p1P3/2P2P2/1P6/3Bk3/2K5 w - - 15 10

stray reef
#

alright i got something written up, getting to the debugging part now

#

(just incremental threat tracking, no UE yet)

stray reef
#

gonna start working on UE now (though i might get stuck in SF inference hell there, we'll see)

rocky vigil
#

ah

#

wait i think the net changed in the meanwhile

#

to stage 2 net

stray reef
#

yeah i rebased

rocky vigil
#

ah yeah i guess the bench

#

but whatever

stray reef
#

ah forgot to update the bench in the PR, but the commit has the right bench

rocky vigil
#

oh lol

#

very cool!

#

getting much farther than the previous attempt half a year ago

stray reef
#

it's definitely not a good way to just do what the branch currently has:

std::vector<AccumulatorState> accumulators;
std::vector<AccumulatorState> threat_accumulators;

since AccumulatorState has a dirty piece, but now also needs a dirty threats list, which we don't want to duplicate

#

nnue_accumulator.h/.cpp looks awful to work with lmao

rocky vigil
#

yeah we probably want to distinguish the two

#

but it's extra effort

desert tree
#

just not quickly

rocky vigil
rocky vigil
#

so maybe if that is done everything will still work nicely

stray reef
#

I think I won't produce anything reasonable here. Adding more abstraction is going to make this code even worse, making what's there fit is ugly, I'd want to simplify it if anything

rocky vigil
#

that would also be nice if it can be done in a good way

#

how would you want to simplify?

#

we should probably also get @frosty imp's opinion since he probably understands this code the best

stray reef
candid ivy
#

If i understand correctly the "problem" is that the AccumulatorState

    Accumulator<TransformedFeatureDimensionsBig>   accumulatorBig;
    Accumulator<TransformedFeatureDimensionsSmall> accumulatorSmall;
    DirtyPiece     

has this but it actually only needs one accumulator? and no dirty pieces?

stray reef
#

are we removing smallnet support?

rocky vigil
#

we could

#

i think shawn wanted to keep it in case it was still useful

#

but right now bool use_smallnet is just false

stray reef
#

imo this is a maintainer decision

#

it makes no sense to remove it now if we need to re-implement it in 2 weeks

rocky vigil
candid ivy
#

i'd be fine with removing it if the threat inputs itself is strong enough to compensate the loss obviously

stray reef
#

We should be able to remove it then

violet badger
#

I think if we remove it it will reappear... threat net doesn't solve what smallnet provides (i.e. speed at decided positions)

rocky vigil
#

it's probably worth testing later

#

if it turns out to be a big gain many small things can be masked underneath it

rocky vigil
#

@frosty imp would you mind setting a low throughput stc vs master as well

#

my guess is around the range of -50 to -40 elo

frosty imp
rocky vigil
#

would it really be better to template it

#

instead of just having two separate ones

rocky vigil
frosty imp
#

eh sounds like a lot of code duplication

#

that would add templates with accumulatorStack operations anyway

frosty imp
prime mica
#

honestly what would be ideal to me is some sort of simple DSL to describe the network layout

#

and a Python (or whatever) script to generate nice C++ code

#

that way you don't have to futz around with template metaprogramming

#

it'd also make performance improvements easier by allowing layers to be fused together

rocky vigil
#

nice

#

i think stage 4 should be the big gain (according to master results) but we'll see

violet badger
#

I think that's a bit unpredictable, I've seen it jump or not at that point or earlier.

rocky vigil
#

anyways this is probably beyond what I can run reasonably fast locally fixed nodes so I'll just put a reduced throughput stc up on fishtest vs stage 2

violet badger
#

at the end of the full training run there will be accurate results on the testing, so patience will also get us there.

#

i.e. will be clear which net to pick

rocky vigil
#

fair

rocky vigil
#

oh nice

violet badger
#

shawn impatient 😉

#

now, I'm much more curious to see the inference speedup patch being tested like that... seems like this was another good improvement though.

green moat
#

@violet badger
Did you check if removing those duplicated lines actually improves "master" nets?
#nnue-dev message

violet badger
frosty imp
violet badger
#

all steps needed to get to full speed inference 😉

#

but I meant the net test you did (with good improvement)

#

if I'm not mistaken that suggests another 10+ Elo from stage 2 to stage 3?

twilit oriole
#

You will want to try doubling length of each stage after this run completes. Convergence time goes up a lot because some threats are very rare

#

Also u can ditch small net, try later disabling threats for decided positions (will need a new training run as well obviously)

#

It should do a similar thing with benefit of that regular part of the accumulator always being up to date if it switches back to regular eval

frosty imp
#

incremental threats done

#

debug time nohope

frosty imp
#

@stray reef have you debugged the incremental threats calculation? my bench isn't matching and I'm not sure where the problem is

rocky vigil
#

Do you have a branch

#

I’ll also try looking through it

rocky vigil
frosty imp
#

suspecting something is wrong when capturing a piece

rocky vigil
#

this is not true actually

#

needs to be refreshed when mirroring changes

frosty imp
#

oh crap yeah

rocky vigil
#

i.e. smth like

let index = make_index(...)
if (index < Dimensions) { append(index) }
#

never really figured out a better way to handle deduplication

#

afaik plentychess does same thing

rocky vigil
rocky vigil
#

in short some threats imply the existence of the corresponding ones in the opposite direction

#

i.e. rook attacking queen implies queen attacking rook

#

so in that case we filter so that only one of the two is active

frosty imp
#

ah I see

rocky vigil
#

yeah besides this most of the failure points would come from the incremental threat calculation

#

but yoshie claims he tested this thoroughly against from scratch

#

so I'm hoping it just works after these fixes

frosty imp
#

let's go

#

bench matches

rocky vigil
#

wait !!!!

#

let

#

's go?

#

how fast lol

frosty imp
#

around 20%

rocky vigil
#

ah

#

wait vs previous?

frosty imp
#

yeah

rocky vigil
#

hmm

frosty imp
#

vs threat tracking but no incr update

rocky vigil
#

ok ok i see

#

so that moves -52 to -15 or so?

#

this gonna be a close one at stc

#

but should scale

frosty imp
#

let's see

#

merged

rocky vigil
#

nice

rocky vigil
#

oh nice most of the compilation warnings also died

#

the small things xd

frosty imp
#

oh huh no difference on speedtest

rocky vigil
#
Stockfish dev-20251014-895f63de by the Stockfish developers (see AUTHORS file)
info string Using 1 thread
Warmup position 3/3
Position 258/258
===========================
Version                    : Stockfish dev-20251014-895f63de
Compiled by                : g++ (GNUC) 15.1.0 on MinGW64
Compilation architecture   : x86-64-avxvnni
Compilation settings       : 64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 15.1.0
Large pages                : no
User invocation            : speedtest 1
Filled invocation          : speedtest 1 128 150
Available processors       : 0-15
Thread count               : 1
Thread binding             : none
TT size [MiB]              : 128
Hash max, avg [per mille]  :
    single search          : 56, 31
    single game            : 821, 583
Total nodes searched       : 141977379
Total search time [s]      : 153.54
Nodes/second               : 924693```
```./stockfish speedtest 1
Stockfish dev-20251014-75edbee0 by the Stockfish developers (see AUTHORS file)
info string Using 1 thread
Warmup position 3/3
Position 258/258
===========================
Version                    : Stockfish dev-20251014-75edbee0
Compiled by                : g++ (GNUC) 15.1.0 on MinGW64
Compilation architecture   : x86-64-avxvnni
Compilation settings       : 64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 15.1.0
Large pages                : no
User invocation            : speedtest 1
Filled invocation          : speedtest 1 128 150
Available processors       : 0-15
Thread count               : 1
Thread binding             : none
TT size [MiB]              : 128
Hash max, avg [per mille]  :
    single search          : 78, 40
    single game            : 914, 712
Total nodes searched       : 190559772
Total search time [s]      : 153.52
Nodes/second               : 1241270```
#

-25% maybe?

frosty imp
#

rip

rocky vigil
#

@stray reef can you get similar numbers for plentychess L1=1024?

frosty imp
rocky vigil
#

this isn't too bad

#

15% overhead

twilit oriole
#

Well. It's -10 at STC ofc it's not too bad kek

#

Finish training + SPSA is already enough to just pass at higher TCs ig

frosty imp
#

really hoping we won't need net SPSA

#

search SPSA maybe though

twilit oriole
#

Well even if you don't "need" it is just a large gain eventually

#

8 Elo is a lot

rocky vigil
#

finish training already might be enough

#

assuming LTC scales by +5 or so

frosty imp
#

still SSS Kappa

rocky vigil
#

actually that one probably benefits us as well

#

since it's related to overhead of finny tables

rocky vigil
frosty imp
#

oof

rocky vigil
#

also this is so confusing

#

-52 elo

#

speedup 10 elo

#

net 6 elo

#

= -14 elo?

#

it does not add

#

btw does not fusing make it faster

#

yoshie also said that fusing the threat updates never worked well

#

oh fishtest

#

ig we'll see

frosty imp
#

speedtest looks the same, so I put it on fishtest

rocky vigil
frosty imp
#

ig just errors bars

rocky vigil
#

hmm

#

really thought the speedup would be more though

#

like 20 elo

#

or even 30

rocky vigil
#

the next run should also have this once I figure out how to do it

#

@stray reef seems plentychess with 640 is typically 30-40% faster than current branch (based on manual inspection of nps in two LTC games), is this reasonable numbers?

violet badger
#

also removal of smallnet, let's say 10 Elo .. or even more in this case.

stray reef
violet badger
rocky vigil
#

everything was indeed good

#

it was not in the threat calculation

#

there are some tests up on fishtest rn

#

around -16 stc

stray reef
rocky vigil
#

is where we are at

stray reef
rocky vigil
#

prayer for scaling

#

i think we wait until stage 5 for that

#

since i strongly suspect lack of factorizer + threat inputs itself means it benefits more from more stages

violet badger
#

b7f553ee8b28a4abace6c1056dceb1d69169873a

frosty imp
#

yeah

rocky vigil
lofty cedar
#

To think that some obscure monty led to more than a thousand nontrivial LOCs, the rewrite of Stockfish training infrastructure, etc... that could finally be gaining.

naive comet
#

"some obscure monty"?

lofty cedar
#

~50 stars on github vs 14000.

naive comet
#

it's not obscure in the chess engine sphere

#

idk otherwise you could call practically any other engine obscure

lofty cedar
#

Ethereal is about 400. Even stormphrax is like 100.
Koivisto is 150.

And given that github stars are already skewed toward programmers who are familiar with the chess engines, the popularity of monty compared to Stockfish in the wider chess world is probably even lesser.

#

But let's look at a more objective metric: TCEC. Monty isn't even in TCEC.

naive comet
#

Monty should've been in tcec if not for some small issues

#

anyways that's besides the point

#

@frosty imp I might try some speed stuff later

#

do I speedtest or start a test on fishtest?

rocky vigil
#

speedtest probably fine

#

am very curious how uh

#

ue only managed to gain like 5% speed

#

or smth

frosty imp
naive comet
lofty cedar
#

Which is wild.

stray reef
#

i think you underestimate monty in many ways, including strength

lofty cedar
#

Isn't Monty like 700 elo behind?

rocky vigil
#

i guess lazy eval probably screws around with the threats

violet badger
rocky vigil
#

sad

violet badger
#

This is ongoing work.... let's not forget that something similar was tried years ago by sopel, and at that time it didn't gain either.

#

things have changed, not the least the amount of data available, improved trainer, etc etc... so worthwhile trying again.

#

as usual a lot of work has to come together to replace sota stuff..

rocky vigil
#

2 more hours until stage 4 or so i presume?

violet badger
#

something like that.

stray reef
#

and under tcec conditions a lot better than whatever ccrl or so would show

lofty cedar
rocky vigil
#

yeah tcec conditions a lot better

#

since gigantic net reduces contention

#

let's also not forget that PlentyChess is also #1 at ccrl 40/15 rn

violet badger
rocky vigil
#

i suppose like

#

200+2 5thread

#

is similar to CCRL Blitz 8CPU

lofty cedar
#

Though I often use the back-of-the-envelope calculation that if fixed node elo gains more than two third of the elo loss from slowdown, it should gain at LTC.

naive comet
frosty imp
#

guys stop distracting cj from coming up with bangers Kappa

lofty cedar
#

Oh... welp... I guess yeah...

Though Stockfish is often a bit more conservative in adopting ideas than in other engines because it often has to be done well to gain.

stray reef
#

I understand, I did not do it well in plenty Kappa

lofty cedar
#

Oh, not that... I meant that in Stockfish, since the baseline is higher, it's much harder to gain with new ideas.

stray reef
#

just joking ofc

frosty imp
#

@naive comet here's the profile if you haven't seen #1336647760388034610 message

stray reef
#

sf is a much bigger entity

frosty imp
#

maybe there's opportunities in incremental threat tracking? idk

#

the refresh scheme might also be improvable

rocky vigil
#

well threat specific stuff

#

would be in tracking i think

#

or like the actual accumulator updates

#

like we should see

#

if backwards updates are still worth it for threats

#

considering that refreshing from scratch is not as heavy as expected

frosty imp
rocky vigil
#

huh

#

strange

#

why are full refreshes so op

#

then

#

or like

#

how is it possible to come so close

#

with literally most basic strat

frosty imp
#

how huge is the diff usually

#

compared to a full refresh

rocky vigil
#

on average is what, 8 or so?

#

compared to a full refresh probably being like at least 20

frosty imp
#

well the percentage reduction from full refresh to incremental isn't as good as halfkav2

#

ig maybe that's where the problem is

#

could be interesting to try alternative update schemes based on that

naive comet
rocky vigil
lofty cedar
#

Maybe it takes a lot of work to compute what needs to get updated?

#

You can try byteboard technology if it helps.

stray reef
#

Yeah that's something worth investigating... lots of simd stuff possible

naive comet
#

how do I clone Shawn's branch and only that branch?

#

nvm I got it thanks to my friend chatgpt

rocky vigil
#

yeah smth like

#

set shawnxu as a remote

#

and then pull from it

desert tree
rocky vigil
# violet badger "fixed"

is there a potential reason why this might trigger at a much higher rate with the threat inputs?

#

since it has happened again

violet badger
#

no independent of what runs in CI, really just somehow timeout or dropped connection somewhere, needs some more robust polling mechanism in the CI infrastructure. Not our concern right now, just restart and wait a bit.

rocky vigil
#

ah i see

violet badger
#

for restart, I also updated the SF used in the final testing, so we'll get info on all steps with the current best inference in 24h or so.

#

step 5 should be running now.

#

I'm not expecting these training steps to gain miracolously, but we'll see.

rocky vigil
#

hmm

#

should still hopefully be decent gains at least

violet badger
#

fingers crossed, we'll see.

stray reef
#

i think it is possible that a fully trained net is a bit sparser, so maybe the "real" number for plenty would be a big higher. but SF speeds are looking nice

naive comet
#

1014074 vs 907784 but idk my hardware is noisy

#

I used speedtest btw

#

what should I do? pr to your branch?

#

also ideally I'd need someone with stable hardware to test this maybe

violet badger
#

since when would 10% be small 😉

naive comet
#

I can rerun without and see I guess

violet badger
#

like at 1200 words per minute 😉

#

Anyway, I think PR to the branch of shawn, and he can integrate.. ?

#

Can be tested on fishtest, but I think this is not essential for speedups right now.

naive comet
naive comet
#

I bring good news: I might have found a further speedup

#

will wait and see

naive comet
#

ehh its within a %ish

#

but from an empirical pov it saves 1 instr

frosty imp
#

I see the new pipeline now

#
Version                    : Stockfish dev-20251014-b7f553ee
Compiled by                : g++ (GNUC) 14.2.0 on Linux
Compilation architecture   : x86-64-avxvnni
Compilation settings       : 64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 14.2.0
Large pages                : yes
User invocation            : speedtest 1
Filled invocation          : speedtest 1 128 150
Available processors       : 0-3
Thread count               : 1
Thread binding             : none
TT size [MiB]              : 128
Hash max, avg [per mille]  : 
    single search          : 41, 22
    single game            : 668, 439
Total nodes searched       : 97642350
Total search time [s]      : 153.564
Nodes/second               : 635841
Version                    : Stockfish dev-20251015-40e85beb
Compiled by                : g++ (GNUC) 14.2.0 on Linux
Compilation architecture   : x86-64-avxvnni
Compilation settings       : 64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 14.2.0
Large pages                : yes
User invocation            : speedtest 1
Filled invocation          : speedtest 1 128 150
Available processors       : 0-3
Thread count               : 1
Thread binding             : none
TT size [MiB]              : 128
Hash max, avg [per mille]  : 
    single search          : 42, 23
    single game            : 685, 459
Total nodes searched       : 100965370
Total search time [s]      : 153.551
Nodes/second               : 657536

local speedtest on cj speedup

naive comet
#

OK, so closer to 3% than 10%

#

oh I understand all the noise now

#

it was using all the threads for speedtest

#

I dropped speedtest down to single thread

#

should be more accurate now

rocky vigil
#
Stockfish dev-20251014-895f63de by the Stockfish developers (see AUTHORS file)
info string Using 1 thread
Warmup position 3/3
Position 258/258
===========================
Version                    : Stockfish dev-20251014-895f63de
Compiled by                : g++ (GNUC) 15.1.0 on MinGW64
Compilation architecture   : x86-64-avxvnni
Compilation settings       : 64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 15.1.0
Large pages                : no
User invocation            : speedtest 1
Filled invocation          : speedtest 1 128 150
Available processors       : 0-15
Thread count               : 1
Thread binding             : none
TT size [MiB]              : 128
Hash max, avg [per mille]  :
    single search          : 58, 31
    single game            : 832, 590
Total nodes searched       : 143169711
Total search time [s]      : 153.545
Nodes/second               : 932428```
```./stockfish speedtest 1
Stockfish dev-20251015-40e85beb by the Stockfish developers (see AUTHORS file)
info string Using 1 thread
Warmup position 3/3
Position 258/258
===========================
Version                    : Stockfish dev-20251015-40e85beb
Compiled by                : g++ (GNUC) 15.1.0 on MinGW64
Compilation architecture   : x86-64-avxvnni
Compilation settings       : 64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 15.1.0
Large pages                : no
User invocation            : speedtest 1
Filled invocation          : speedtest 1 128 150
Available processors       : 0-15
Thread count               : 1
Thread binding             : none
TT size [MiB]              : 128
Hash max, avg [per mille]  :
    single search          : 52, 31
    single game            : 820, 582
Total nodes searched       : 142798016
Total search time [s]      : 153.549
Nodes/second               : 929983``` yeah neutral on my laptop
#

but my laptop might be noisy

#

I hope this ran on the P cores the entire time

violet badger
rocky vigil
#

yeah 900k+ suggests it used P core at least a majority of the time

twilit oriole
#

just do fishtest test?

violet badger
#

so, what's the speed-ups we could reasonably still expect?

#

(relative to shawn's nn-598188c9a702.nnue branch, which has most of it already).

frosty imp
#

maybe we need @prime mica's expertise 😛

violet badger
#

he'll make master faster faster than branch 😉

#

anyway, doing a quick test of your 598188c9a702 branch against master..

#

seems like we need another 30Elo or so..

rocky vigil
candid ivy
#

Maybe we can do a ralph wiggum approach with this

rocky vigil
#

How much does smallnet speed up master?

violet badger
#

I think that could be 5-10 Elo, but I don't know the exact number.

frosty imp
#

is it possible to start a smallnet run now?

violet badger
#

you mean a threat smallnet?

frosty imp
#

yeah

violet badger
#

sure.

#

like 128 ?

frosty imp
#

lgtm

#

threat vs master 10k games

violet badger
#

have it

#
--------------------------------------------------
Results of master vs patch (10+0.1, 1t, 16MB, UHO_Lichess_4852_v1.epd):
Elo: 28.66 +/- 3.69, nElo: 53.68 +/- 6.88
LOS: 100.00 %, DrawRatio: 47.71 %, PairsRatio: 1.86
Games: 9806, Wins: 3046, Losses: 2239, Draws: 4521, Points: 5306.5 (54.11 %)
Ptnml(0-2): [39, 859, 2339, 1588, 78], WL/DD Ratio: 1.26
--------------------------------------------------
rocky vigil
#

Ouch

#

Worse than fishtest

violet badger
#

we don't have it on fishtest right?

rocky vigil
frosty imp
#

dunno

rocky vigil
#

From the sprt

frosty imp
#

nn-bf4519f857f4.nnue net

violet badger
#

I see.

#

might depend quite a bit on HW.

#

(i.e. different memory architecture and so on).

#

Though arguably, ranges still overlap more or less.

rocky vigil
#

Fair

rocky vigil
#

So it would be better to have it not be threats

violet badger
#

we might be mixing conversations, but yeah, if possible regular small net would be faster, unless there is something sharable between the two.

rocky vigil
#

I think we could get the existing small net to work first and give it a try

violet badger
#

I think that's probably better

rocky vigil
#

Maybe some template bool use_threats or whatever

violet badger
#

I will do a check at larger TC and more threads, just to have a reference.

rocky vigil
#

That would be good

#

Yeah

rocky vigil
#

Will start working on smallnet in ~2 hours

violet badger
#

The standard or the threats one? I did start a threats net optimization at 128 as well, just one stage, so probably ready in like 8h or so.

rocky vigil
#

Standard

twilit oriole
#

yeah dont use threats for small net

rocky vigil
#

It shouldn’t be too hard

#

At most more template bool use_threats nonsense

frosty imp
#

eh you need more than that

rocky vigil
#

Huh

frosty imp
#

because threat accumulators are a class field

rocky vigil
#

How I was gonna hack it in was just keep threat accumulators for smallnet but never touch them

frosty imp
#

I guess maybe for a temporary hack

rocky vigil
#

Yeah

frosty imp
#

then you can check with constexpr bool UseThreats = Dimensions == TransformedFeatureDimensionsSmall

rocky vigil
#

Ohhh indeed

frosty imp
#

don't even need templates

rocky vigil
#

This works

#

Btw if I call eval

#

Will it use smallnet when applicable

#

Or always big net?

frosty imp
#

bignet always it seems

rocky vigil
#

bruh

#

If I give it like KQQk

#

Will that default to smallnet then

#

In a real search

#

Like 8/8/8/3k4/8/8/6K1/6QQ b - - 0 1 for instance

frosty imp
#

well just disable bignet in evaluate.cpp

rocky vigil
#

Oh ok

#

Replace use smallnet with true

#

Sure

frosty imp
#

also remove the re-eval

rocky vigil
#

mm

#

Well won’t be back for another hour and a bit

#

But yeah

rocky vigil
violet badger
#

60+0.6, 288t, 16000MB, UHO_Lichess_4852_v1.epd

rocky vigil
#

Crazy

violet badger
#

funny, 11 drawn game pairs in a row for now.

rocky vigil
#

Can that even get a meaningful sample size

#

I was expecting LTC smp at most

violet badger
#

you were the one talking about TCEC style dev 😉

rocky vigil
#

hehe

violet badger
#

but I must say that if it doesn't gain at LTC I would have quite strong reservations...

twilit oriole
#

Can you do an updated fixed nodes with smaller error bars

rocky vigil
twilit oriole
#

Yes but vondele already has it set up lol

rocky vigil
#

Oh

rocky vigil
#

LTC if possible

violet badger
#

when you set back the concurrency but forget to set back the hash ...

rocky vigil
#

@frosty imp which branch is preferable for me to test smallnet against

frosty imp
#

just the threat_inputs branch

rocky vigil
#

ok

#

so including the cj commit?

frosty imp
#

yep

rocky vigil
#

and stage 4 net or still stage 3

frosty imp
#

uh let's keep stage 3 net

rocky vigil
#

ok

#

prayer for stage 5 😔

#

stage 4 disappointing

violet badger
#
--------------------------------------------------
Results of master vs patch (20000 nodes, 1t, 16MB, UHO_Lichess_4852_v1.epd):
Elo: -29.60 +/- 2.28, nElo: -44.34 +/- 3.40
LOS: 0.00 %, DrawRatio: 41.82 %, PairsRatio: 0.64
Games: 40000, Wins: 10775, Losses: 14175, Draws: 15050, Points: 18300.0 (45.75 %)
Ptnml(0-2): [1551, 5531, 8363, 3877, 678], WL/DD Ratio: 1.96
--------------------------------------------------
#

so net is not bad per se, but speed matters

rocky vigil
twilit oriole
#

Cool

rocky vigil
#

actually @frosty imp how is the threat input branch able to read the smallnet without dying

violet badger
#

doesn't read it?

#

at least there are warnings related to that..

#

(compile time warnings that is)

#

meanwhile, master and patch battling it out at scale, and deciding to break the UHO_Lichess_4852_v1.epd while doing so.

rocky vigil
#

i kind of want a pgn of the games

#

if this is the 288 thread ltc

violet badger
#

not saved I'm afraid

rocky vigil
#

aww

#

shame

violet badger
#

nah.

violet badger
#

Ptnml(0-2): [0, 1, 18, 1, 0]

#

on the UHO book.

rocky vigil
#

ok

violet badger
#

that's pretty insane IMO.

rocky vigil
#

wait one each

#

insane

violet badger
#

but 90% drawn game pairs

#

like by construction the book is aiming for 50% of those.

rocky vigil
violet badger
#

I'd assume the format of the data structures in memory is changed?

frosty imp
#

I commented out the smallnet read + verification

rocky vigil
#

so.

#

ok

#

checks out

#

shouldn't be too hard

#

bignet works fine

#

it isn't too bad

#

just frankenstein master and threat input code together

violet badger
#

git checkout -b frankenstein ?

rocky vigil
#
  * frame #0: 0x00007fff7fd5b212 msvcrt.dll`memcpy + 146
    frame #1: 0x00007ff74b8d5216 stockfish.exe`Stockfish::Eval::NNUE::AccumulatorCaches::Cache<128u>::Entry::clear(this=0x0000015a4ed26ac0, biases=0x0000000000000000)
    frame #2: 0x00007ff74b8d52cf stockfish.exe`void Stockfish::Eval::NNUE::AccumulatorCaches::Cache<128u>::clear<Stockfish::Eval::NNUE::Network<Stockfish::Eval::NNUE::NetworkArchitecture<128u, 15, 32>, Stockfish::Eval::NNUE::FeatureTransformer<128u>>>(this=0x0000015a4ed26ac0, network=0x0000015a40626108)
    frame #3: 0x00007ff74b8d536a stockfish.exe`void Stockfish::Eval::NNUE::AccumulatorCaches::clear<Stockfish::Eval::NNUE::Networks>(this=0x0000015a4ece2ac0, networks=0x0000015a40626090)
    frame #4: 0x00007ff74b8d53a0 stockfish.exe`Stockfish::Eval::NNUE::AccumulatorCaches::AccumulatorCaches<Stockfish::Eval::NNUE::Networks>(this=0x0000015a4ece2ac0, networks=0x0000015a40626090)```we love to see it
#

how are the smallnet biases null pointer

frosty imp
#

branch?

rocky vigil
#

oh lemme push

frosty imp
#

hmm try uncomment networks->small.verify

#

might be hash verification issues

rocky vigil
#

oh

rocky vigil
#

vscode no autosave

rocky vigil
#

gg?

#

bruh this smallnet is 1.8M vs 2.3M in master

frosty imp
#

threat tracking maybe?

rocky vigil
#

real weakness SHOWEN

frosty imp
#

try benchmark

rocky vigil
#

right

#

forgot it still did that

#

oh threat tracking pretty fast ngl

frosty imp
rocky vigil
#
Stockfish dev-20251015-4c91a5c9 by the Stockfish developers (see AUTHORS file)
info string Using 1 thread
Warmup position 3/3
Position 258/258
===========================
Version                    : Stockfish dev-20251015-4c91a5c9
Compiled by                : g++ (GNUC) 15.1.0 on MinGW64
Compilation architecture   : x86-64-avxvnni
Compilation settings       : 64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 15.1.0
Large pages                : no
User invocation            : speedtest 1
Filled invocation          : speedtest 1 128 150
Available processors       : 0-15
Thread count               : 1
Thread binding             : none
TT size [MiB]              : 128
Hash max, avg [per mille]  :
    single search          : 56, 32
    single game            : 852, 602
Total nodes searched       : 150141109
Total search time [s]      : 153.543
Nodes/second               : 977844```
#

(with smallnet)

rocky vigil
#

we'll see how much elo this is

prime mica
# frosty imp

update_piece_threats and append_changed_indices can definitely be optimized to a tiny fraction of the runtime

#

unless I'm misunderstanding what threats are

rocky vigil
#

aprs

#

shawn

#

approve my test

violet badger
#

just what was needed

rocky vigil
#

what the sprt gods grant they taketh away

rocky vigil
naive comet
rocky vigil
#

Ouch

#

No big improvement from stage 5 either

rocky vigil
#

@twilit oriole expecting that at fishtest conditions rn at STC without any major breakthroughs we can get it to -15 +- 5 or so