#UE Threat Inputs for AB

12630 messages · Page 13 of 13 (latest)

rocky vigil
#

i also gtg for an hour

#

so someone else's turn to stare

#

and see if they find the errors

lofty cedar
#

Do we spec LTC threat42 too?

#

Like... if threat 37 was an anti-scaler, there is no reason not to believe the current net can't be an anti-scaler comparedto threat42 or something else.

#

But yeah... Spec LTC-ing everything is expensive.

rocky vigil
#

Realistically the only way we’ll be able to debug this is by comparing all of bullet layers to sf layers

#

Idk how feasible this is to do in bullet

#

@formal smelt ?

stray reef
#

what kind of transformations does SF do on startup?

rocky vigil
#

Not ones that change the overall eval

#

You can read the x32 code, that doesn’t have any transformations

stray reef
#

yeah ofc i'm talking about transposing, packus stuff, etc

formal smelt
rocky vigil
#

Ok

#

So when I get back

#

I’ll also attempt to do it on sf side

formal smelt
#

Return the relevant node and you can get it after calling forward

#

If it isn’t optimised out

lofty cedar
#

Now that we can train new nets... what do you think?

formal smelt
#

Random cross-post lol

#

why not put in #nnue-dev

lofty cedar
#

IDK... at this point the thread and #nnue-dev are now used interchangeably.

formal smelt
#

Well as always "what if we did <extremely vague thing>" isn't a great suggestion

rocky vigil
#

Btw @stray reef can you update branch on GitHub

#

So that I can look and see if I can statically find additional issues

stray reef
#

done

rocky vigil
#

Well

#

The architecture now looks correct

#

sigh

rocky vigil
#
L3 pre-activation: [757 -128007 889 1153 131982 66180 -63742 196852 65014 66823 132104 393984 135271 -63887 198538 -62482 131595 198409 65519 4239 132477 -62995 67830 66179 -61969 67456 66664 -260342 -63263 -63886 2152 67434 ]
L3 post-activation: [11 0 13 18 127 127 0 127 127 127 127 127 127 0 127 0 127 127 127 66 127 0 127 127 0 127 127 0 0 0 33 127 ]``` (startpos)
#

@stray reef I am assuming this is not how it's supposed to go?

stray reef
#

nope

rocky vigil
#

can you get the L2, L3 from bullet

#

(pre-activation)

#

i am concerned about why everything in L2 is 127

#

or 0

#

for the sqrrelu

stray reef
#

not rn unfortunately

rocky vigil
#
L3 pre-activation: [-840 -1436 2324 -3142 -1453 -5814 718 -2749 -6213 -1084 -5075 -15 -1638 -713 -2499 -6018 416 -3924 -2577 647 -1328 20 2479 -3501 -5318 -1800 -661 1223 -2003 -4210 -1722 -2615 ]
L3 post-activation: [0 0 36 0 0 0 11 0 0 0 0 0 0 0 0 0 6 0 0 10 0 0 38 0 0 0 0 19 0 0 0 0 ]``` for comparison here's old master network
stray reef
#

yeah that's more like it

rocky vigil
#
L2 CReLU(x^2): [127 127 127 127 127 127 127 127 127 0 127 127 127 127 127 ]
L2 CReLU(x): [0 127 0 127 127 127 127 127 0 8 0 0 0 127 127 ]
L3: [757 -128007 889 1153 131982 66180 -63742 196852 65014 66823 132104 393984 135271 -63887 198538 -62482 131595 198409 65519 4239 132477 -62995 67830 66179 -61969 67456 66664 -260342 -63263 -63886 2152 67434 ]
L3 CReLU(x): [11 0 13 18 127 127 0 127 127 127 127 127 127 0 127 0 127 127 127 66 127 0 127 127 0 127 127 0 0 0 33 127 ]```
#

ah yes

#

of course

#

L2 was always supposed to be this massive

#

sigh sigh sigh

#
L2 CReLU(x^2): [2 0 2 24 37 4 3 1 0 45 5 39 5 0 18 ]
L2 CReLU(x): [0 10 19 0 69 0 0 11 0 75 0 70 0 0 0 ]
L3: [-840 -1436 2324 -3142 -1453 -5814 718 -2749 -6213 -1084 -5075 -15 -1638 -713 -2499 -6018 416 -3924 -2577 647 -1328 20 2479 -3501 -5318 -1800 -661 1223 -2003 -4210 -1722 -2615 ]
L3 CReLU(x): [0 0 36 0 0 0 11 0 0 0 0 0 0 0 0 0 6 0 0 10 0 0 38 0 0 0 0 19 0 0 0 0 ]``` (1c0000000000.nnue)
#

I think the L2 biases are off

#

@stray reef forgot since we are using old master arch the 255s here should be 127

stray reef
#

for everything?

rocky vigil
#

yeah

#

we'll need to change it back to get to new arch

#

but for now it's 127

stray reef
#

i see, i'll send you a new net when i'm back

rocky vigil
#

ok

stray gyro
#

What's the last test result of full threat small net?

rocky vigil
#

bad

#

like -3 at least

stray gyro
#

Is there data of speed difference?

rocky vigil
#

unsure

#

i don't think it's that good though

stray gyro
#

-3 Elo sounds like 2% slowdown

#

hmm

rocky vigil
#

considering the main purpose of smallnet is speed

#

and not necessarily evaluation accuracy

twilit oriole
#

I think now is the time to try a training to disable threats and use just psq for anything >400cp. I dont think threats have much value for that, its mostly just speed loss

rocky vigil
twilit oriole
#

its a simple change i think. can be done on top of the latest vondele branch

rocky vigil
#

i know how this would be defined though

#

in the part that assigns active features, just compute simple eval first

prime mica
#

that would require different weights for the following layer, right?

twilit oriole
#

no need to modify the inference on training side. just pretend threats dont exist for above threshold

rocky vigil
#

i did not notice any other errors

#

so if stuff still goes wrong then i really will need to manually compare the hidden layers

stray reef
#

maybe half an hour?

rocky vigil
#

ok cool

#

🙏

stray reef
#

@rocky vigil

rocky vigil
#

ok

stray reef
twilit oriole
#

the same way?

stray reef
#

in the trainer you have the datapoint evaluation

twilit oriole
#

the threshold is computed based on simple eval

stray reef
#

oh ok

rocky vigil
#
L2 CReLU(x^2): [0 127 0 0 127 0 0 0 127 127 127 0 127 127 127 ]
L2 CReLU(x): [0 127 0 0 127 1 1 0 127 127 127 0 127 0 0 ]
L3: [-66045 128 131962 65793 382 131326 -196226 131328 381 -196607 -65660 327809 65410 -254 65537 131328 66046 65791 -64899 130943 632 380 131581 636 257 -65282 -65282 1145 65664 66680 130818 327046 ]
L3 CReLU(x): [0 2 127 127 5 127 0 127 5 0 0 127 127 0 127 127 127 127 0 127 9 5 127 9 4 0 0 17 127 127 127 127 ]


NNUE evaluation        -25.60 (white side)
Final evaluation       -16.01 (white side) [with scaled NNUE, ...]```
#

sigh

#

how are we getting l2 values that are 6 digits

rocky vigil
#

the skip connection from L1 to output is -131056

#

which is responsible for the negative eval

twilit oriole
#

i wonder what happens if you remove it. a last resort thing to attempt lol

stray reef
#

there's probably some major issue still, like wrong weight layout in ft or l1

rocky vigil
# twilit oriole i wonder what happens if you remove it. a last resort thing to attempt lol

??????

L2 CReLU(x^2): [0 127 0 0 127 0 0 0 127 127 127 0 127 127 127 ]
L2 CReLU(x): [0 127 0 0 127 1 1 0 127 127 127 0 127 0 0 ]
L3: [-66045 128 131962 65793 382 131326 -196226 131328 381 -196607 -65660 327809 65410 -254 65537 131328 66046 65791 -64899 130943 632 380 131581 636 257 -65282 -65282 1145 65664 66680 130818 327046 ]
L3 CReLU(x): [0 2 127 127 5 127 0 127 5 0 0 127 127 0 127 127 127 127 0 127 9 5 127 9 4 0 0 17 127 127 127 127 ]


              [+0, -17]
NNUE evaluation        -0.04 (white side)```
#

finally something that doesn't look total trash

stray reef
#

all the values in the arrays are still trash

#

it's just luck that it's close to 0

rocky vigil
#

probably

twilit oriole
#

well i expect that. because it would have to be removed in trainer also i assume. but the core thing is if it is some kind of instability or a mapping issue

#

what happens when you inspect the weights itself

rocky vigil
#

first of all I suspect the quantization is wrong

#

I think the l1 -> l2 values are way too large

stray reef
#

what's the weight clipping in nnue-pytorch?

stray reef
#

i mean the float clipping during training

#

in bullet the default is [-1.98, 1.98]

rocky vigil
#

there is some clipping

#

lemme check

rocky vigil
#

for l2 and l3

stray reef
#

and nothing for ft/l1?

rocky vigil
#

it's quantized to 127

#

no weight clipping

#

afaik

#

I suspect smth is wrong with the psqt

#

and the skip connection

stray reef
#

can you check the pairwise output?

rocky vigil
#

when I am ignoring those two the evals are actually reasonable

#

for eg

#

startpos

#

fen r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9

stray reef
#

sure but the l2/l3 data above is also not normal

#

i guess those two being wrong just have a much bigger influence on the output

rocky vigil
#

yeah

#

i suppose

rocky vigil
stray reef
#

you can just print the first X i guess

#

just so we can check sparsity and if everything's clamped there too

rocky vigil
#

I can print first 16 cool

#
L2: [53 131077 9 42 65588 82 66 43 196643 131100 131119 0 65542 -65525 -65526 (-131056)]
L2 CReLU(x^2): [0 127 0 0 127 0 0 0 127 127 127 0 127 127 127 ]
L2 CReLU(x): [0 127 0 0 127 1 1 0 127 127 127 0 127 0 0 ]
L3: [-66045 128 131962 65793 382 131326 -196226 131328 381 -196607 -65660 327809 65410 -254 65537 131328 66046 65791 -64899 130943 632 380 131581 636 257 -65282 -65282 1145 65664 66680 130818 327046 ]
L3 CReLU(x): [0 2 127 127 5 127 0 127 5 0 0 127 127 0 127 127 127 127 0 127 9 5 127 9 4 0 0 17 127 127 127 127 ]
[normal, skip] = [-279 -154790]```
#

(startpos)

stray reef
#

sss but looks fine at least

rocky vigil
#

I can actually go to 128

#

why not

#

L1 (first 128): [0 0 0 0 0 0 5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]

#

omega sparsity

#

lemme try amster

#

now

stray reef
#

probably l1 has a problem then

rocky vigil
#
L2: [-1128 647 1222 -3620 4443 -1457 -1256 725 -705 4858 -1640 4542 -1743 -604 -3117 (-42)]
L2 CReLU(x^2): [2 0 2 24 37 4 3 1 0 45 5 39 5 0 18 ]
L2 CReLU(x): [0 10 19 0 69 0 0 11 0 75 0 70 0 0 0 ]
L3: [-840 -1436 2324 -3142 -1453 -5814 718 -2749 -6213 -1084 -5075 -15 -1638 -713 -2499 -6018 416 -3924 -2577 647 -1328 20 2479 -3501 -5318 -1800 -661 1223 -2003 -4210 -1722 -2615 ]
L3 CReLU(x): [0 0 36 0 0 0 11 0 0 0 0 0 0 0 0 0 6 0 0 10 0 0 38 0 0 0 0 19 0 0 0 0 ]
[normal, skip] = [371 -49]


[psqt, positional] = [+0, +20]
NNUE evaluation        +0.05 (white side)
Final evaluation       +0.07 (white side) [with scaled NNUE, ...]```
#

here's 1c0...

#

i feel like the L2 biases

#

are screwed

#

somehow

#

or smth about the l2 scale

twilit oriole
#

hm. the scale for the bias and regular weights are handled differently right

rocky vigil
stray reef
#

done

rocky vigil
#

x.quantise(Q) is just round(Q*x) right

#

in bullet

stray reef
#

.round().quantise(), yeah

rocky vigil
twilit oriole
#

what if you zero everything except the psqt. surely that works right

rocky vigil
#

psqt alone would still give like +30 on this position

twilit oriole
#

yeah cos it is trained with the rest of the net right?

rocky vigil
#
NNUE evaluation        +56.21 (white side)```
#

everything is cooked

twilit oriole
#

yeah but i mean if you literally only train the psqt and inference it

rocky vigil
#

ok we can try that

stray reef
#

sure

rocky vigil
#

so basically comment out lines 153-166 and return pst_out instead of out

#

i think

rocky vigil
#

i genuinely dunno

#

at this point

stray reef
#

bullet does not like this

thread 'main' (546672) panicked at /home/patrick/.cargo/git/checkouts/bullet-8a69ed9a26c6f599/e37db79/crates/acyclib/src/graph/builder.rs:132:30:
called `Result::unwrap()` on an `Err` value: ## Error Occurred ##
Message("MultipleRoots")
rocky vigil
#

oh i think it needs to be mut

#

if you return pst_out

#

maybe?

#

idk

#

actually this is strange

#

you can also try just doing the entire inference

#

and only returning pst_out

stray reef
#

ofc i tried both

#

gonna try multiplying the other two with 0 now so everything is "used" at least

rocky vigil
#

oh

#

also I think eval_scale is 600

#

not 400

#

that's purely cosmetic though

stray reef
#

out = out.linear_comb(0.0, pst_out, 0.5) + skip_neuron.linear_comb(0.0, pst_out, 0.5);
this works :P

rocky vigil
#

heh

#

tricked the compiler

#

but yeah

rocky vigil
#

and eval_scale 600

#

bc if the float weights are x

#

on sf side you jsut have (600 * 16 * sum x) / 16

#

and you get 600 * sum x

stray reef
rocky vigil
#

can you also get r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9

#

startpos pst has always been 0 :P

rocky vigil
#

kk

#
eval
info string NNUE evaluation using nn-4e6276be8161.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.

WHITE added: 20800 20545 20674 20931 20677 21062 20424 20425 20426 20427 20429 20430 20431 20836 20651 20528 20529 20530 20531 20788 20533 20534 20535 20920 20794 21051 20925 21118
removed:
BLACK added: 20920 20665 20794 21051 20797 21118 20528 20529 20530 20531 20533 20534 20535 20892 20563 20424 20425 20426 20427 20684 20429 20430 20431 20800 20674 20931 20805 21062
removed:
L1 (first 128): [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
L2: [1 -1 0 -65535 -65536 65536 65535 65536 1 65536 0 -1 -65536 0 -65536 (65535)]
L2 CReLU(x^2): [0 0 0 127 127 127 127 127 0 127 0 0 127 0 127 ]
L2 CReLU(x): [0 0 0 0 0 127 127 127 0 127 0 0 0 0 0 ]
L3: [65663 -65536 -65409 254 65282 -889 -253 -889 -64900 -66171 -131325 -65536 65028 127 65409 65408 -127 131325 65409 -508 -127 65154 65662 -509 -65663 -65790 -65663 65409 65408 64900 -635 255 ]
L3 CReLU(x): [127 0 0 3 127 0 0 0 0 0 0 0 127 1 127 127 0 127 127 0 0 127 127 0 0 0 0 127 127 127 0 3 ]
[normal, skip] = [65277 77403]


[psqt, positional] = [-4095, +8917]
NNUE evaluation        +12.75 (white side)```
#

no seriously

#

what the

#

the feature indices are correct right?

#

so purely parsing error or smth

stray reef
#

probably a parsing/layout error now yeah

rocky vigil
#

well there are strict checks on the layouts

#

in particular the l0b l0w and pst must be int he correct order

#

so something is wrong in the pst section itself

#

is it possible for you to get the 8 bucket weights of specific indices or smth

#

grasping at straws here

stray reef
#

from the bullet checkpoint yes

#

not sure now the .nnue file works

rocky vigil
#

yeah sure

#

from bullet checkpoint

stray reef
#

my best guess would be to write a small script that tests PST inference for the bullet checkpoint

#

but i don't see how it could be wrong there

rocky vigil
#

yeah I don't either

#

ngl

rocky vigil
stray reef
#

wait pst is output bucketed right, how does this work in SF inference, it's also UE'd right? so all buckets are technically always computed, even if not needed

rocky vigil
#

yes

stray reef
#

ok i thought for a second we forgot to transpose the weights

rocky vigil
#

they're stored as

#

[f0b0 f0b1 ... f1b0 f1b1 ... f22527b0 ... f22527b7]

stray reef
#

ok which weights do you want to see?

#

for which feature index

rocky vigil
#

uh all eight for 20931

#

how about

stray reef
#

all zeros

rocky vigil
#

bruh

#

what

#

ok

rocky vigil
stray reef
#

hm no smth is wrong with my code

rocky vigil
#

[65537 1 65536 65535 65535 65536 65536 1 ]

#

this does

#

not seem right

#

but yeah we'll see

stray reef
#

i get the same numbers...

rocky vigil
#

welp

#

leb128 looking fine

#

as I suspected

#

so I have no idea why it's different

#

maybe lemme get some position with only few pieces

stray reef
#

i think since the input type is factorised, the pst weights have a factoriser too

#

even though afaik Factorised<> automatically merges that, i'm gonna try it non-factorised rq

rocky vigil
#

oh

#

ok

#

yeah and maybe try 8/6k1/8/8/3P4/8/1K6/8 w - - 0 1

#

there's only 3 pieces

#

what could go wrong :clueless:

prime mica
#

When the Pawn

stray reef
#

ah the checkpoint is about 4.5MB smaller now, which matches perfectly what would happen if the factorised weights were previously included

#

152 -87 -266 -201 75 290 397 299
reasonable values!

rocky vigil
#

o

rocky vigil
#

my uni wifi wondering why I've downloaded 8 nnue files of 66 MB today :P

rocky vigil
#

let's go???

stray reef
#

startpos eval?

rocky vigil
#

0 :P

#

it's always been 0 for psqt

#

no matter what nnue

stray reef
#

oh right

stray reef
#

r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9 should be about -6

rocky vigil
#

or the pawn endgame

prime mica
rocky vigil
# stray reef `r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9` should be about -6
eval
info string NNUE evaluation using nn-a64da979b54f.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.

WHITE added: 20800 20545 20674 20931 20677 21062 20424 20425 20426 20427 20429 20430 20431 20836 20651 20528 20529 20530 20531 20788 20533 20534 20535 20920 20794 21051 20925 21118
removed:
BLACK added: 20920 20665 20794 21051 20797 21118 20528 20529 20530 20531 20533 20534 20535 20892 20563 20424 20425 20426 20427 20684 20429 20430 20431 20800 20674 20931 20805 21062
removed:
L1 (first 128): [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
L2: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (0)]
L2 CReLU(x^2): [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
L2 CReLU(x): [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
L3: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
L3 CReLU(x): [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[normal, skip] = [0 0]


[psqt, positional] = [-9, +0]```
#

let's go??

stray reef
#

YES

prime mica
#

life is good

rocky vigil
#

minor quant error but to be expected

#

do u have any more test pos

stray reef
#

nah this is fine

#

gonna add the rest of the net again

rocky vigil
#

ok so factorizer is suspicious

#

cool

stray reef
#

once something works we can work backwards to add what's missing

rocky vigil
#

bc my debug info also has all 3 of them

stray reef
#

i can try, but let's first see how bad things are

rocky vigil
#

ok

#

wait why was the input still factorized

#

did you swap it to non-factorized

stray reef
#

i completely removed the factoriser now

rocky vigil
#

yay

#

yeah simple things first

#

and gradually work up

stray reef
rocky vigil
#

random berlin position is actually a decent test position lmao

#

ok let's see

#

in a min or two

rocky vigil
# stray reef <https://1drv.ms/u/c/74d39b59afff2586/IQBsQgdxgVMgRbekt661j2IUAbWtg-9jF7zF67SAXa...
eval
info string NNUE evaluation using nn-2cc242fcab84.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.

WHITE added: 20800 20545 20674 20931 20677 21062 20424 20425 20426 20427 20429 20430 20431 20836 20651 20528 20529 20530 20531 20788 20533 20534 20535 20920 20794 21051 20925 21118
removed:
BLACK added: 20920 20665 20794 21051 20797 21118 20528 20529 20530 20531 20533 20534 20535 20892 20563 20424 20425 20426 20427 20684 20429 20430 20431 20800 20674 20931 20805 21062
removed:
L1 (first 128): [0 7 4 0 0 27 4 1 0 0 0 0 19 5 2 1 23 0 18 1 0 5 0 20 23 0 0 0 3 14 28 1 0 5 0 0 0 0 14 0 25 19 9 15 0 0 0 0 40 0 0 0 8 2 0 18 3 9 0 0 13 20 0 0 2 0 21 1 0 21 0 0 0 0 0 11 0 1 0 11 14 0 0 0 0 0 0 34 0 32 22 0 0 0 3 2 9 0 13 14 0 0 1 0 0 0 24 28 2 21 8 0 0 2 0 13 0 6 0 17 12 3 13 6 15 11 0 0 ]
L2: [-5890 4039 8313 4356 3438 5305 10575 6519 3957 1457 -2233 -2333 5295 7469 -4199 (4289)]
L2 CReLU(x^2): [66 31 127 36 22 53 127 81 29 4 9 10 53 106 33 ]
L2 CReLU(x): [0 63 127 68 53 82 127 101 61 22 0 0 82 116 0 ]
L3: [5564 4988 4871 -3625 -1894 -2410 -739 2902 3027 2338 -883 3304 -3766 -2235 -1100 2512 2216 -7469 467 2828 -833 -816 8457 -901 2398 2163 2244 11 -328 3479 4975 2129 ]
L3 CReLU(x): [86 77 76 0 0 0 0 45 47 36 0 51 0 0 0 39 34 0 7 44 0 0 127 0 37 33 35 0 0 54 77 33 ]
[normal, skip] = [-4262 5065]


[psqt, positional] = [-2, +50]
NNUE evaluation        +0.13 (white side)```
maybe maybe
(raw eval is 48)
stray reef
#

that looks quite reasonable now

rocky vigil
#

psqt + positional

#

idk how big quant error was supposed to be

stray reef
#

48 is still alright given true eval is 16

rocky vigil
#

do you have startpos eval as well

stray reef
#

45

rocky vigil
#
eval
info string NNUE evaluation using nn-2cc242fcab84.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.

WHITE added: 22208 21953 22082 22339 22468 22085 21958 22215 21832 21833 21834 21835 21836 21837 21838 21839 21936 21937 21938 21939 21940 21941 21942 21943 22328 22073 22202 22459 22524 22205 22078 22335
removed:
BLACK added: 22328 22073 22202 22459 22524 22205 22078 22335 21936 21937 21938 21939 21940 21941 21942 21943 21832 21833 21834 21835 21836 21837 21838 21839 22208 21953 22082 22339 22468 22085 21958 22215
removed:
L1 (first 128): [0 7 0 0 0 30 0 6 2 0 0 2 16 7 0 5 6 0 1 2 0 1 5 49 0 0 4 5 0 19 0 0 4 0 9 9 0 1 5 3 54 9 3 5 0 0 4 0 54 3 0 12 2 1 0 5 6 8 0 0 24 10 1 0 9 33 0 3 0 6 0 0 0 8 0 0 0 0 0 6 9 4 0 0 0 1 0 24 0 18 21 0 1 0 0 0 6 0 3 8 5 0 0 1 4 0 22 30 0 30 37 0 0 0 0 31 0 0 1 7 27 2 22 0 5 0 0 0 ]
L2: [5706 4058 1665 3009 -7774 7520 7305 -8251 -14491 -8684 -5017 6628 7127 -4286 6504 (-850)]
L2 CReLU(x^2): [62 31 5 17 115 107 101 127 127 127 48 83 96 35 80 ]
L2 CReLU(x): [89 63 26 47 0 117 114 0 0 0 0 103 111 0 101 ]
L3: [-1294 6294 6987 5821 4174 -3564 3387 -3172 7346 1499 -3768 -1403 -4705 1062 4005 6366 3391 1700 -6845 3495 4704 -4917 1691 -4078 2004 -1570 1976 3884 473 -1350 -3770 -1540 ]
L3 CReLU(x): [0 98 109 90 65 0 52 0 114 23 0 0 0 16 62 99 52 26 0 54 73 0 26 0 31 0 30 60 7 0 0 0 ]
[normal, skip] = [1419 -1003]


[psqt, positional] = [+0, +26]
NNUE evaluation        +0.07 (white side)```
#

26

#

raw

#

reasonable

stray reef
#

sick

rocky vigil
#

maybe I comment out debug info and see if the pv for startpos makes any sense?

#

lemme try that

#

not good chess

#

but still chess

stray reef
#

hell yeah!

#

i'll sleep now, and then we can try tomorrow or so to re-integrate the other features

rocky vigil
#

o

#

it beat a 2400 ish CCRL blitz engine

#

strangest game ever

daring wren
sage stream
#

Wait where is the original

daring wren
rocky vigil
rocky vigil
violet badger
#

I'm happy to try to run for a little longer. Two quick questions, how to provide multiple binpacks as input, and how to setup multiGPU training.

rocky vigil
rocky vigil
violet badger
#

I can quickly run 100SB right now, and we see where we stand? Would be faster multiGPU, but starting now is probably even faster 😉

rocky vigil
#

ok

#

i guess single gpu single binpack

#

should be relatively fast

violet badger
#

yeah

stray reef
#

I'll simultaneously try to fix the factoriser, until we see that it still produces reasonable results

rocky vigil
#

cool

#

maybe take this chance to see later if the l2 factoriser is useful at all

stray reef
violet badger
#

you're faster than me installing rust ...

#

(ok, trying to figure out how to do it correctly in the container environment that I'm using, but well, no excuses)

rocky vigil
#

or do I need to multiply these by 1.5

stray reef
stray reef
#

changed it now for future evals

rocky vigil
#
eval
info string NNUE evaluation using nn-81c082405712.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.

L1 (first 128): [0 35 9 7 0 0 28 0 0 7 0 2 28 0 2 36 0 12 3 31 31 28 1 56 11 0 0 3 0 5 0 40 16 55 14 19 38 0 46 19 14 0 0 28 0 52 0 39 49 0 4 0 0 21 26 30 0 0 9 0 0 0 22 48 0 1 37 0 19 0 1 23 21 0 8 0 8 11 0 24 10 33 0 44 0 2 43 0 19 35 0 17 0 34 62 0 48 6 0 0 17 24 8 0 19 27 0 8 0 7 41 17 0 1 66 22 17 4 3 0 34 7 23 0 45 0 26 0 ]
L2: [-782 2913 5915 1645 -14149 24231 2753 5418 3241 -4086 3977 8455 -8888 4432 -2446 (-2630)]
L2 CReLU(x^2): [1 16 66 5 127 127 14 55 20 31 30 127 127 37 11 ]
L2 CReLU(x): [0 45 92 25 0 127 43 84 50 0 62 127 0 69 0 ]
L3: [3290 3317 4467 -5288 -1449 -531 2395 -2337 -1085 -4165 -4409 -4177 707 3713 2600 2935 5039 2861 3123 2263 -2477 3885 -8094 -4863 3442 -3560 4140 -3969 -574 2406 -1194 -2577 ]
L3 CReLU(x): [51 51 69 0 0 0 37 0 0 0 0 0 11 58 40 45 78 44 48 35 0 60 0 0 53 0 64 0 0 37 0 0 ]
[normal, skip] = [4210 -3106]


[psqt, positional] = [+0, +69]
NNUE evaluation        +0.18 (white side)```
#

raw startpos is 69

#
position fen r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9
eval
info string NNUE evaluation using nn-81c082405712.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.

L1 (first 128): [0 34 7 5 0 0 21 0 0 8 0 6 12 0 0 33 0 5 3 50 14 13 0 32 6 0 0 0 0 0 0 28 2 41 15 23 17 0 38 15 20 0 0 30 0 62 0 32 34 0 0 0 0 20 27 29 1 0 12 3 0 0 23 23 0 0 42 0 19 0 2 16 0 6 13 0 4 8 0 19 5 36 0 25 0 0 23 0 13 26 0 14 0 36 55 0 45 8 0 0 19 33 3 0 10 8 0 9 0 3 46 24 1 0 67 19 22 4 0 0 25 0 18 0 35 0 16 0 ]
L2: [-3180 8434 -8349 -4061 8049 114 -1044 1497 -3472 -5565 459 4286 1974 -3644 2177 (-779)]
L2 CReLU(x^2): [19 127 127 31 123 0 2 4 22 59 0 35 7 25 9 ]
L2 CReLU(x): [0 127 0 0 125 1 0 23 0 0 7 66 30 0 34 ]
L3: [-4048 1024 -3380 1544 3979 -909 3072 452 5896 602 1618 -1936 2329 3109 -1840 -1814 -149 4150 -2820 1299 3614 2346 1136 4138 -419 -2696 2506 1320 -1162 -2180 5403 1449 ]
L3 CReLU(x): [0 16 0 24 62 0 48 7 92 9 25 0 36 48 0 0 0 64 0 20 56 36 17 64 0 0 39 20 0 0 84 22 ]
[normal, skip] = [2422 -920]


[psqt, positional] = [-11, +93]
NNUE evaluation        +0.22 (white side)``` raw is 82
#

looks good

stray reef
#

amazing

violet badger
#

if looks good, please push, and I'll start from that.

rocky vigil
#

"remove factoriser" "add factoriser again" lol

stray reef
#

shall we try the l1 factoriser as well?

rocky vigil
#

perhaps

#

if rust installation is taking a while

#

might as wlel

violet badger
#
# test bullet
git clone https://github.com/Yoshie2000/sf-bullet-train.git
cd sf-bullet-train
git checkout fix-inputs
# edit src/main.rs file_path
cargo run --release .
#

that's the procedure right?

#

(like manual edit of main.rs needed)

rocky vigil
#

where are the datasets being loaded

violet badger
rocky vigil
#

ah

#

i suppose that needs to be changed

#

other than that i think this is good

violet badger
#

sure that's the comment in the procedure above.

stray reef
#

also need to adjust SB count in line 185

violet badger
#

okay

rocky vigil
#

btw yoshie how is speed

#

of training

stray reef
#

hard to say right now, i'm training another net already :P but i remember roughly 800k pos/s from yesterday

stray reef
# violet badger okay

(some multiple of 60 would make sense, since that suits the LR schedule, but you can change the step there too of course)

violet badger
#

will make it 120

stray reef
#

There is a speedup still when using a factoriser, though I won't implement it now as it does not work with threat inputs (at least I haven't found a way yet)

violet badger
#

annoyingly the install is still not correct... somehow being installed as root, and starting the container as non-root. So, I reinstall when entering the container right now. SHould figure that out eventually.

#

That's one SB

Params: 72156296
Training Preamble
Net Name               : test
Batch Size             : 16384
Batches / Superbatch   : 1024
Positions / Superbatch : 16777216
Start Superbatch       : 1
End Superbatch         : 1
Eval Scale             : 600
Save Rate              : 150
WDL Scheduler          : constant 0
LR Scheduler           : start 0.001 gamma 0.3 drop every 60 superbatches
Threads                : 4
Output Path            : checkpoints
Beginning Training
superbatch 1 | time 9.6s | running loss 0.013042 | 1738602 pos/sec | total time 11.3s
Estimated time remaining in training: 0h 0m 0s
Saved [test-1]
Total Training Time: 0h 0m 13s
Eval: 44.568cp
Eval: -31.222cp
#

looks OK?

stray reef
#

seems fine yes

#

#engines-dev message #engines-dev message
some info from jw on multigpu

violet badger
#

ok, let me try that.

rocky vigil
violet badger
#

yeah, though skipping and such is quit different, but certainly looks good.

rocky vigil
#

why has

#

a superbatch been reduced

#

btw

#

to 1024 batches

#

and not the standard 6104

stray reef
#

ah good point, we should change that

rocky vigil
#

getting -91 and -28

#
eval
L1 (first 128): [22 0 0 0 2 0 0 0 0 0 0 6 0 0 0 0 5 0 0 0 2 0 2 1 1 3 1 0 0 0 18 0 4 0 14 0 13 0 0 9 0 5 0 0 0 6 0 3 0 0 2 0 0 2 0 0 2 8 0 0 5 0 0 0 0 0 0 7 7 13 3 8 17 3 0 0 0 0 3 1 7 0 0 0 0 0 11 0 0 3 0 6 0 7 0 20 0 15 0 0 0 0 8 0 4 0 4 9 2 1 0 0 4 0 11 1 0 0 0 0 0 0 1 0 0 0 0 0 ]
L2: [7211 -1613 4893 -1900 -12473 7297 9595 1191 14570 3115 -3398 6660 13664 5785 -8917 (-1121)]
L2 CReLU(x^2): [99 4 45 6 127 101 127 2 127 18 22 84 127 63 127 ]
L2 CReLU(x): [112 0 76 0 0 114 127 18 127 48 0 104 127 90 0 ]
L3: [3586 1415 5605 -6626 -5444 -371 3739 -3533 1561 4141 3812 -4530 -5525 -3268 4107 -4484 -3925 -5060 1606 -5402 -4427 -6424 -5318 7747 -10300 2550 -6464 1889 -7744 -1817 -1362 -1545 ]
L3 CReLU(x): [56 22 87 0 0 0 58 0 24 64 59 0 0 0 64 0 0 0 25 0 0 0 0 121 0 39 0 29 0 0 0 0 ]
[normal, skip] = [-132 -1324]
[psqt, positional] = [+0, -91]
NNUE evaluation        -0.24 (white side)

ucinewgame
position fen r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9
eval
L1 (first 128): [12 0 0 0 1 0 0 1 0 0 0 6 0 1 0 0 11 0 10 0 6 0 11 0 0 6 15 0 0 11 0 0 7 0 4 0 16 0 0 15 0 2 0 0 0 8 0 0 0 1 3 0 0 4 0 8 8 0 0 5 3 0 0 0 0 3 1 8 2 10 2 4 12 4 0 13 0 0 3 0 5 0 0 0 0 0 0 0 0 0 0 14 0 4 0 2 0 0 0 0 0 0 3 0 0 3 1 7 0 1 2 3 9 0 6 2 0 0 0 0 0 0 5 0 0 0 2 0 ]
L2: [17849 -3002 -2770 3869 6234 -11711 -21902 15147 6015 -617 -3107 15343 2215 -3799 -9219 (226)]
L2 CReLU(x^2): [127 17 14 28 74 127 127 127 69 0 18 127 9 27 127 ]
L2 CReLU(x): [127 0 0 60 97 0 0 127 93 0 0 127 34 0 0 ]
L3: [-2937 1608 -1563 857 3589 3871 -6501 1365 -4169 2560 -5220 2119 -8509 -5111 2558 4084 -2520 -6030 2079 -7494 1958 617 2366 -633 -4361 3530 -171 -1299 3694 -4122 2150 -2276 ]
L3 CReLU(x): [0 25 0 13 56 60 0 21 0 40 0 33 0 0 39 63 0 0 32 0 30 9 36 0 0 55 0 0 57 0 33 0 ]
[normal, skip] = [-722 266]
[psqt, positional] = [+0, -28]
NNUE evaluation        -0.07 (white side)```
rocky vigil
stray reef
#

don't think it's going to work, i think i need to transpose before merging, which i'll try next

rocky vigil
rocky vigil
#

-33 and +232

stray reef
#

ah ofc the default .transpose() does not work with this type of factorisation

rocky vigil
#

maybe need to define some custom stuff

#

ye

rocky vigil
#

14th sfnnv9 net attempt lel

stray reef
#

wait no this is bullshit. transposing worked without the factoriser

rocky vigil
#

startpos +360 yeah

violet badger
#

I have the output from the 120SB:

Estimated time remaining in training: 0h 0m 8s
superbatch 120 | time 8.6s | running loss 0.001450 | 1941046 pos/sec | total time 1046.2s
Estimated time remaining in training: 0h 0m 0s
Failed to write quantised network weights:
Failed quantisation from f32 to i8!
Saved [test-120]
Total Training Time: 0h 17m 28s
Eval: 73.723cp
Eval: 90.723cp
rocky vigil
#

huh

violet badger
#

(at re-add working ft factoriser)

rocky vigil
#

failed quantisation from f32 to i8

stray reef
#

Failed to write quantised network weights:
Failed quantisation from f32 to i8!
what!

rocky vigil
#

something exceed weight limit?

stray reef
#

looks like it

rocky vigil
#

must be in this one

#

1.98 doesn't work for this

#

needs to be 1.68

stray reef
#

pushed a fix for that

#

are we sure nnue-pytorch has this l1 factoriser?

#

@violet badger if you integrate the latest commit, you should be able to retry the quantisation simply by doing

-trainer.run(&schedule, &settings, &data_loader);
+// trainer.run(&schedule, &settings, &data_loader);
+trainer.load_from_checkpoint("checkpoints/test-1");
+trainer.save_quantised("checkpoints/test-1/quantised.bin").unwrap();
rocky vigil
#

suspicious

#

i would like to test if removing it helps

#

eventually

rocky vigil
stray reef
#

ah true. forget what i said

violet badger
stray reef
#

yeah sorry, there was no way that'd work :P

#

gotta retrain unfortunately

violet badger
#

no problem.

rocky vigil
#

and it's already heavily quantized

rocky vigil
stray reef
#

i don't think so, transposing puts it from [l1][ob][l2] to [ob][l1][l2], and the factoriser should have layout [l1][l2], so the standard fact.repeat(bucketcount) -> elementwise add should work. but it doesn't

rocky vigil
#

oh

#

right

stray reef
rocky vigil
#

+5, +122

stray reef
#

i don't know then. let's skip it for now i guess

#

if anything, it'll be low single-digit elo anyway

rocky vigil
#

idk if it's even good

#

like l1 -> l2 is way smaller

#

than inputs -> l1

#

do we have a second run

#

with the fixed clipping

stray reef
#

but since quantisation did not fail, everything fits into i8

rocky vigil
#

oh

violet badger
#

My local run (but multiGPU) ended with

Saved [test-120]
Total Training Time: 0h 14m 59s

thread 'main' (27889) panicked at /users/vjoost/.cargo/git/checkouts/bullet-8a69ed9a26c6f599/e37db79/crates/bullet_lib/src/value.rs:245:18:
Invalid output size!
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

probably better to not mix both things though.

formal smelt
#

It’s due to to calling .eval

#

Not related to training

violet badger
#

in the multiGPU context?

formal smelt
#

Yeah

violet badger
#

okay.

#

but I guess that means that at least checkpoint saved correctly, so the failed to quantise error went away.

rocky vigil
#

I guess just load the checkpoint and run eval using single gpu

#

(I cannot run this with the debug data right now, but you can check the normalized eval by loading the network into past master)

violet badger
#

I'll run afterwards with 1 GPU... first some testing multiGPU.

rocky vigil
#

How fast is it?

#

Are these 16384 * 1024 superbatches or 16384 * 6104

violet badger
#

I'm using 1024 right now

#

speed #1439214470529421384 message

rocky vigil
#

Ah ok

violet badger
#

so, started 120 SB of 6104 * 16384... should take about 2h

#

I used 1GPU, so that allows for speed comparison, ultimately not too different (about 50s bullet vs 67s pytorch per SB/epoch), assuming we're doing roughly the same thing now.

rocky vigil
#

Interesting

twilit oriole
#

That is probably because the default batch size is too low for threat inputs + high end GPU

violet badger
#

see #1439214470529421384 message

#

but yeah, I'll run an experiment now on nnue-pytorch to see what a changed batch size does to training.

formal smelt
#

Is this on old inputs?

violet badger
#

good question, the bullet is on old inputs, the pytorch number I quoted is probably on threats, though it wasn't too different.

formal smelt
#

But different HL size also then right

violet badger
#

yeah, old arch on pytorch is 72s instead of 67s

#

and I think this current bullet training is setup to match the old arch..

formal smelt
#

Yeah

#

Almost 50% seems pretty good
And also if both had the better factoriser code the gap would widen I think

violet badger
#

absolutely ....

#

(I know how much effort one might put in just 5% for e.g. megatron/LLM training).

formal smelt
#

I still think if someone was feeling cute they should just write a fused ft/l1 kernel for nnue-pytorch given the arch seems pretty much fixed

#

Would make a really big difference

violet badger
#

I hope someone will pick up... @frosty imp was refactoring recently... so maybe

rocky vigil
#

Did we verify that the produced networks are reasonably strong

violet badger
#

I should have this 120SB trained network in a bit, that should give an idea.

#

I guess that could be within say 100-200Elo of master?

rocky vigil
#

Yes

#

Should be around there

violet badger
#

ikr..

violet badger
#

so threat net with 64k batch trains significantly faster (about 35s per SB/epoch)

rocky vigil
#

with nnue-pytorch?

violet badger
#

yeah

rocky vigil
#

ah nice

#

let's see how it affects elo

violet badger
#

well. still needs to see this is Elo impact free. right

#

it also increases for whatever reason memory usage on CPU side.

#

probably each of the workers having a buffer that is proportional.

#

OK, longer train ended.

rocky vigil
#

ah nice

violet badger
#

Total Training Time: 1h 34m 23s
Eval: 86.946cp
Eval: 79.768cp

rocky vigil
#

I can test

violet badger
#

what data do you need

rocky vigil
#

can you just load it into an old sf

#

and check the evals

#

in those two positions

#

(after running disservin converter script)

violet badger
#

from 'quantised.bin' ?

rocky vigil
#

yea

violet badger
#
$ python convert_quantised_to_pytorch.py checkpoints/test-120/quantised.bin test.nnue
Read checkpoints/test-120/quantised.bin successfully.
Organized data into 8 buckets.
Writing to test.nnue...
Ending position for bucket 0: 70487760
Bucket 0 size: 1152 bytes
Ending position for bucket 1: 70538168
Bucket 1 size: 1152 bytes
Ending position for bucket 2: 70588576
Bucket 2 size: 1152 bytes
Ending position for bucket 3: 70638984
Bucket 3 size: 1152 bytes
Ending position for bucket 4: 70689392
Bucket 4 size: 1152 bytes
Ending position for bucket 5: 70739800
Bucket 5 size: 1152 bytes
Ending position for bucket 6: 70790208
Bucket 6 size: 1152 bytes
Ending position for bucket 7: 70840616
Bucket 7 size: 1152 bytes
Integer value at position 69389475: 33686908
Conversion complete: checkpoints/test-120/quantised.bin -> test.nnue
#

now, let me build an SF in that container.

#
info depth 30 seldepth 45 multipv 1 score cp 20 nodes 16143880 nps 712691 hashfull 1000 tbhits 0 time 22652 pv e2e4 c7c5 c2c3 d7d5 e4d5 d8d5 d2d4 g8f6 g1f3 b8c6 d4c5 d5c5 b1a3 e7e5 a3b5 c5e7 d1a4 e7d8 f3e5 f8c5 e5c6 b7c6 b5d4 e8g8 f1e2 f8e8 c1e3 f6g4
#
NNUE evaluation        +0.23 (white side)
Final evaluation       +0.31 (white side) [with scaled NNUE, ...]

and

NNUE evaluation        +0.17 (white side)
Final evaluation       +0.22 (white side) [with scaled NNUE, ...]
#

main net is pretty similar

NNUE evaluation        +0.05 (white side)
Final evaluation       +0.07 (white side) [with scaled NNUE, ...]

and

NNUE evaluation        +0.24 (white side)
Final evaluation       +0.31 (white side) [with scaled NNUE, ...]
rocky vigil
#

pv and normalized evals look decent

#

normalization constant being around 3.5 or so

violet badger
#

let me see if I can start a short match.

violet badger
#

looks pretty good..

#
--------------------------------------------------
Results of master vs test (10+0.1, 1t, 16MB, UHO_Lichess_4852_v1.epd):
Elo: 149.91 +/- 16.47, nElo: 338.50 +/- 32.92
LOS: 100.00 %, DrawRatio: 25.70 %, PairsRatio: 78.50
Games: 428, Wins: 205, Losses: 31, Draws: 192, Points: 301.0 (70.33 %)
Ptnml(0-2): [0, 2, 55, 138, 19], WL/DD Ratio: 1.12
LLR: 1.10 (37.5%) (-2.94, 2.94) [0.00, 2.00]
--------------------------------------------------
#

I think that works.

#

nice, another good result from this thread 🙂

rocky vigil
#

ok so looks like we got basic arch working

#

finally

#

which unlocks testing more things with bullet

stray reef
#

nice

candid ivy
#

is that a threat input network or normal network test?

stray reef
#

pre-threat input arch

#

but it should now be pretty straight forward to get threat inputs working as well

violet badger
#

to continue training, can I just 'load_from_checkpoint' and increase end_superbatch in schedule?

twilit oriole
#

Assuming no LR schedule?

#

Otherwise you need to change start super batch also

stray reef
violet badger
#

okay

#

so that would be 121 (i.e. previous end + 1)

twilit oriole
#

Also note it will start from the beginning of the dataset again. So this is not ideal

#

I always restart training for this reason. From the beginning

violet badger
#

ok, yeah, this is still very early experiment.

#

is there a way to provide multiple binpack and have it interleave them on the fly?

stray reef
#

not without a custom dataloader I think

#

though it shouldn't be too hard, one could mix and match existing code, e.g. interleaving exists for viri binpacks in bullet-utils

#

what's more important, that or threat inputs?

violet badger
#

I think threat inputs is more fun 🙂

#

(also more relevant on the longer run)

candid ivy
rocky vigil
#

and then just tack on the other stuff onto it

#

which is basically how I got it in nnue-pytorch as well

violet badger
# violet badger ``` -------------------------------------------------- Results of master vs test...

240SB:

--------------------------------------------------
Results of master vs test240 (10+0.1, 1t, 16MB, UHO_Lichess_4852_v1.epd):
Elo: 112.26 +/- 9.57, nElo: 235.96 +/- 18.77
LOS: 100.00 %, DrawRatio: 31.91 %, PairsRatio: 13.45
Games: 1316, Wins: 553, Losses: 142, Draws: 621, Points: 863.5 (65.62 %)
Ptnml(0-2): [2, 29, 210, 390, 27], WL/DD Ratio: 1.08
LLR: 2.95 (100.1%) (-2.94, 2.94) [0.00, 2.00]
--------------------------------------------------
#

so, definitely working.

violet badger
#

a network, just experiment with batch size.. don't worry.

stray reef
#

@formal smelt I need some help here, I'm working on integrating threat inputs with the SF PST right now. My idea was to have the input type have the layout
factoriser,halfkav2,threats
so I can modify PST inference like so

-let stm_pst = pst.matmul(stm).select(buckets)
+let pst_slice_end = ThreatInputsBucketsMirrored::FACTORISER_SIZE + ThreatInputsBucketsMirrored::HALFKA_V2_SIZE;
+let stm_pst = pst.matmul(stm.slice_rows(0, pst_slice_end)).select(buckets)

calling slice_rows() like this leads to an error that I'm not sure how to fix: Message("Op(IncorrectDataLayout)")
Any ideas?

#

It looks like this operation may not be allowed on sparse nodes. in which case this will be difficult, or training will be slow

formal smelt
rocky vigil
#

at least if only applied to one stage

#

could be different if done for all stages

violet badger
#

yeah, looks very good to my eyes (strength is equivalent/better). Will now start a full training to verify. That's a bit more tricky. Now that means that making sure the DDP in pytorch is working would become very useful. It would imply a 5stage net trained in a day.

rocky vigil
dark stream
#

This entire effort was a godsend for the SF net training pipeline.

frosty imp
#

Bullet is using dp right?

daring wren
violet badger
#

data parallelism, I assume

#

in that case yes

#

see also #1439214470529421384 message

lofty cedar
#

Is it time for a VVLTC search tune for threat input?

green moat
violet badger
#

restarted.

rocky vigil
#

Wasn’t able to fetch the data? That’s a new one

violet badger
#

yeah, could happen, some filesystem hickup, I've seen it before.

#

not related to the patch.

stray reef
#

it's a bit hard to figure out due to the huge squash, who do i need to credit to for coming up with the way SF currently creates threat indices? (Using the PiecePairData and other lookup tables)
I'm cleaning up some code right now, but an early & dirty version seemed to be a small speedup at VSTC
https://furybench.com/test/3709/

#

I'm thinking probably a mix of @rocky vigil and @prime mica, but potentially cj as well, I'm not sure

rocky vigil
#

oh interesting

#

you should credit me, anematode, cj i think

#

me when my legacy [piece][64] and [piece][65] code is making it into plentychess as well 💀

stray reef
#

i'll get rid of those :P

rocky vigil
#

uh

#

maybe shawn also

#

idk

prime mica
#

if it passes…

stray reef
#

yeah but easier to ask here than to dig through 73 commits

rocky vigil
#

i guess it really depends on how far you want the contribution range to reach

#

if it's just "everyone who has touched threat indexing at some point" then that's like 6 ppl

#

ofc anematode is the one who has put in the most effort into optimizing the lookup table design

stray reef
#

i think it's mainly about the three lookup tables, but idm crediting the whole team kekw

rocky vigil
#

so that would be me, cj, shawn, aliceroselia, and rn5 as well

prime mica
#

Nah I don’t need to be credited

#

It was sscg’s long-held idea, I just implement it

rocky vigil
#

(that's worth a lot)

#

(also tbh I only had this idea bc yoshie had lookup table from the beginning kekw)

stray reef
#

I guess I'll just mention the entire team in the PR

prime mica
#

In the end, it all boils down to Yoshie, who also invented the transistor and integrated circuit

stray reef
#

I would like to thank the floor without which I would not be standing here today

rocky vigil
#

yep, also don't forget to thank your parents

rocky vigil
rocky vigil
stray reef
#

A cool thing is, this is not only a slight speedup, but also removes 100 lines of code, and no longer needs 2MB for the lookup table kekw

rocky vigil
#

i thought it added 100 lines of code xD

#

but yeah it indeed removes 2MB of lookup table

#

ah I didn't see the other change

rocky vigil
#

but short answer is that it's not 2.6

#

but rather 2.442

#

or so

#

there are also some other knobs nnue-pytorch has

#

but i guess we should first get threat inputs to work

stray reef
stray reef
#

Since I never tested it properly, and king buckets did not gain in monty, I wanted to get some numbers on how much they are worth.

No king buckets vs. 12 king buckets, fixed nodes

Elo   | -25.58 +- 5.36 (95%)
Conf  | N=20000 Threads=1 Hash=16MB
Games | N: 6462 W: 1661 L: 2136 D: 2665
Penta | [164, 953, 1410, 602, 102]

https://furybench.com/test/3740/

STC (Could be more optimised)

Elo   | -14.79 +- 5.76 (95%)
SPRT  | 8.0+0.08s Threads=1 Hash=16MB
LLR   | -2.26 (-2.25, 2.89) [0.00, 2.50]
Games | N: 3596 W: 838 L: 991 D: 1767
Penta | [18, 482, 939, 353, 6]

https://furybench.com/test/3739/

I guess not a huge surprise, king buckets are still worth it

green moat
#

@violet badger
net nn-a46c62f97ff9.nnue created

lofty cedar
#

Why is more correction history and more node searched suddenly good now?

#

Isn't it supposedly that better net = more aggressive pruning?

#

And less need for correction.

lapis parrot
#

we do more futility pruning though

#

in both moves loop and in general

lofty cedar
#

Yes... but the bench is still going up for some reasons.

lapis parrot
#

more extensions

#

also jump of +/- 20% in bench is meaningless

lofty cedar
#

Oh... extensions are good reasons.

#

But yeah... 20% bench jump is quite meaningless. It's not just this though. With the old net, we sometimes went sub 2M.

lapis parrot
#

and before VVLTC tweaks we went below 1 milion

#

because of IIR

#

so?

lofty cedar
#

Thought the trend was down and down and down.

#

Like... the graph suggested that the branching factor went down over time since like Stockfish 8.

lapis parrot
#

branching factor is measured at much bigger depth than bench

#

bench actually jumps all over the place

#

I recall some PT having almost 8kk bench vs 2,5kk at some point

#

since I personally did a lot of work to make IIR less aggressive to improve scaling

#

this work resulted in bench increasing 3x more or less

#

so it's a normal thing - especially for VVLTC oriented patches

naive comet
#

we should permanently move out of this thread shouldn't we

primal wraith
#

does the threat inputs net handle pinned pieces

primal wraith