#UE Threat Inputs for AB
12630 messages · Page 13 of 13 (latest)
Do we spec LTC threat42 too?
Like... if threat 37 was an anti-scaler, there is no reason not to believe the current net can't be an anti-scaler comparedto threat42 or something else.
But yeah... Spec LTC-ing everything is expensive.
Realistically the only way we’ll be able to debug this is by comparing all of bullet layers to sf layers
Idk how feasible this is to do in bullet
@formal smelt ?
what kind of transformations does SF do on startup?
Not ones that change the overall eval
You can read the x32 code, that doesn’t have any transformations
yeah ofc i'm talking about transposing, packus stuff, etc
It should be quite easy
Return the relevant node and you can get it after calling forward
If it isn’t optimised out
IDK... at this point the thread and #nnue-dev are now used interchangeably.
Well as always "what if we did <extremely vague thing>" isn't a great suggestion
Btw @stray reef can you update branch on GitHub
So that I can look and see if I can statically find additional issues
done
L3 pre-activation: [757 -128007 889 1153 131982 66180 -63742 196852 65014 66823 132104 393984 135271 -63887 198538 -62482 131595 198409 65519 4239 132477 -62995 67830 66179 -61969 67456 66664 -260342 -63263 -63886 2152 67434 ]
L3 post-activation: [11 0 13 18 127 127 0 127 127 127 127 127 127 0 127 0 127 127 127 66 127 0 127 127 0 127 127 0 0 0 33 127 ]``` (startpos)
@stray reef I am assuming this is not how it's supposed to go?
nope
can you get the L2, L3 from bullet
(pre-activation)
i am concerned about why everything in L2 is 127
or 0
for the sqrrelu
not rn unfortunately
L3 pre-activation: [-840 -1436 2324 -3142 -1453 -5814 718 -2749 -6213 -1084 -5075 -15 -1638 -713 -2499 -6018 416 -3924 -2577 647 -1328 20 2479 -3501 -5318 -1800 -661 1223 -2003 -4210 -1722 -2615 ]
L3 post-activation: [0 0 36 0 0 0 11 0 0 0 0 0 0 0 0 0 6 0 0 10 0 0 38 0 0 0 0 19 0 0 0 0 ]``` for comparison here's old master network
yeah that's more like it
L2 CReLU(x^2): [127 127 127 127 127 127 127 127 127 0 127 127 127 127 127 ]
L2 CReLU(x): [0 127 0 127 127 127 127 127 0 8 0 0 0 127 127 ]
L3: [757 -128007 889 1153 131982 66180 -63742 196852 65014 66823 132104 393984 135271 -63887 198538 -62482 131595 198409 65519 4239 132477 -62995 67830 66179 -61969 67456 66664 -260342 -63263 -63886 2152 67434 ]
L3 CReLU(x): [11 0 13 18 127 127 0 127 127 127 127 127 127 0 127 0 127 127 127 66 127 0 127 127 0 127 127 0 0 0 33 127 ]```
ah yes
of course
L2 was always supposed to be this massive
sigh sigh sigh
L2 CReLU(x^2): [2 0 2 24 37 4 3 1 0 45 5 39 5 0 18 ]
L2 CReLU(x): [0 10 19 0 69 0 0 11 0 75 0 70 0 0 0 ]
L3: [-840 -1436 2324 -3142 -1453 -5814 718 -2749 -6213 -1084 -5075 -15 -1638 -713 -2499 -6018 416 -3924 -2577 647 -1328 20 2479 -3501 -5318 -1800 -661 1223 -2003 -4210 -1722 -2615 ]
L3 CReLU(x): [0 0 36 0 0 0 11 0 0 0 0 0 0 0 0 0 6 0 0 10 0 0 38 0 0 0 0 19 0 0 0 0 ]``` (1c0000000000.nnue)
I think the L2 biases are off
@stray reef forgot since we are using old master arch the 255s here should be 127
for everything?
i see, i'll send you a new net when i'm back
ok
What's the last test result of full threat small net?
Is there data of speed difference?
considering the main purpose of smallnet is speed
and not necessarily evaluation accuracy
I think now is the time to try a training to disable threats and use just psq for anything >400cp. I dont think threats have much value for that, its mostly just speed loss
wait for shawn to do stuff, we haven't even gotten threats merged in nnue-pytorch yet
its a simple change i think. can be done on top of the latest vondele branch
i know how this would be defined though
in the part that assigns active features, just compute simple eval first
that would require different weights for the following layer, right?
no need to modify the inference on training side. just pretend threats dont exist for above threshold
oh btw are you able to make this change
i did not notice any other errors
so if stuff still goes wrong then i really will need to manually compare the hidden layers
maybe half an hour?
ok
how would you know this on the engine side tho?
the same way?
in the trainer you have the datapoint evaluation
the threshold is computed based on simple eval
oh ok
L2 CReLU(x^2): [0 127 0 0 127 0 0 0 127 127 127 0 127 127 127 ]
L2 CReLU(x): [0 127 0 0 127 1 1 0 127 127 127 0 127 0 0 ]
L3: [-66045 128 131962 65793 382 131326 -196226 131328 381 -196607 -65660 327809 65410 -254 65537 131328 66046 65791 -64899 130943 632 380 131581 636 257 -65282 -65282 1145 65664 66680 130818 327046 ]
L3 CReLU(x): [0 2 127 127 5 127 0 127 5 0 0 127 127 0 127 127 127 127 0 127 9 5 127 9 4 0 0 17 127 127 127 127 ]
NNUE evaluation -25.60 (white side)
Final evaluation -16.01 (white side) [with scaled NNUE, ...]```
sigh
how are we getting l2 values that are 6 digits
ahhhhhhh
the skip connection from L1 to output is -131056
which is responsible for the negative eval
i wonder what happens if you remove it. a last resort thing to attempt lol
there's probably some major issue still, like wrong weight layout in ft or l1
??????
L2 CReLU(x^2): [0 127 0 0 127 0 0 0 127 127 127 0 127 127 127 ]
L2 CReLU(x): [0 127 0 0 127 1 1 0 127 127 127 0 127 0 0 ]
L3: [-66045 128 131962 65793 382 131326 -196226 131328 381 -196607 -65660 327809 65410 -254 65537 131328 66046 65791 -64899 130943 632 380 131581 636 257 -65282 -65282 1145 65664 66680 130818 327046 ]
L3 CReLU(x): [0 2 127 127 5 127 0 127 5 0 0 127 127 0 127 127 127 127 0 127 9 5 127 9 4 0 0 17 127 127 127 127 ]
[+0, -17]
NNUE evaluation -0.04 (white side)```
finally something that doesn't look total trash
probably
well i expect that. because it would have to be removed in trainer also i assume. but the core thing is if it is some kind of instability or a mapping issue
what happens when you inspect the weights itself
first of all I suspect the quantization is wrong
I think the l1 -> l2 values are way too large
what's the weight clipping in nnue-pytorch?
yeah it's +- 127 / 64
for l2 and l3
and nothing for ft/l1?
it's quantized to 127
no weight clipping
afaik
I suspect smth is wrong with the psqt
and the skip connection
can you check the pairwise output?
when I am ignoring those two the evals are actually reasonable
for eg
startpos
fen r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9
sure but the l2/l3 data above is also not normal
i guess those two being wrong just have a much bigger influence on the output
how do I do this w/o printing 3072 thing
you can just print the first X i guess
just so we can check sparsity and if everything's clamped there too
I can print first 16 cool
L2: [53 131077 9 42 65588 82 66 43 196643 131100 131119 0 65542 -65525 -65526 (-131056)]
L2 CReLU(x^2): [0 127 0 0 127 0 0 0 127 127 127 0 127 127 127 ]
L2 CReLU(x): [0 127 0 0 127 1 1 0 127 127 127 0 127 0 0 ]
L3: [-66045 128 131962 65793 382 131326 -196226 131328 381 -196607 -65660 327809 65410 -254 65537 131328 66046 65791 -64899 130943 632 380 131581 636 257 -65282 -65282 1145 65664 66680 130818 327046 ]
L3 CReLU(x): [0 2 127 127 5 127 0 127 5 0 0 127 127 0 127 127 127 127 0 127 9 5 127 9 4 0 0 17 127 127 127 127 ]
[normal, skip] = [-279 -154790]```
(startpos)
but looks fine at least
I can actually go to 128
why not
L1 (first 128): [0 0 0 0 0 0 5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
omega sparsity
lemme try amster
now
probably l1 has a problem then
L2: [-1128 647 1222 -3620 4443 -1457 -1256 725 -705 4858 -1640 4542 -1743 -604 -3117 (-42)]
L2 CReLU(x^2): [2 0 2 24 37 4 3 1 0 45 5 39 5 0 18 ]
L2 CReLU(x): [0 10 19 0 69 0 0 11 0 75 0 70 0 0 0 ]
L3: [-840 -1436 2324 -3142 -1453 -5814 718 -2749 -6213 -1084 -5075 -15 -1638 -713 -2499 -6018 416 -3924 -2577 647 -1328 20 2479 -3501 -5318 -1800 -661 1223 -2003 -4210 -1722 -2615 ]
L3 CReLU(x): [0 0 36 0 0 0 11 0 0 0 0 0 0 0 0 0 6 0 0 10 0 0 38 0 0 0 0 19 0 0 0 0 ]
[normal, skip] = [371 -49]
[psqt, positional] = [+0, +20]
NNUE evaluation +0.05 (white side)
Final evaluation +0.07 (white side) [with scaled NNUE, ...]```
here's 1c0...
i feel like the L2 biases
are screwed
somehow
or smth about the l2 scale
hm. the scale for the bias and regular weights are handled differently right
can you update github with the new bullet config?
done
.round().quantise(), yeah
so on paper, everything looks fine, except for https://github.com/Yoshie2000/sf-bullet-train/blob/fix-inputs/src/main.rs#L117 which is actually supposed to be 600 * 16 but the psqt are already blown up enough as is
what if you zero everything except the psqt. surely that works right
psqt alone would still give like +30 on this position
yeah cos it is trained with the rest of the net right?
yeah but i mean if you literally only train the psqt and inference it
ok we can try that
sure
also this but
i genuinely dunno
at this point
bullet does not like this
thread 'main' (546672) panicked at /home/patrick/.cargo/git/checkouts/bullet-8a69ed9a26c6f599/e37db79/crates/acyclib/src/graph/builder.rs:132:30:
called `Result::unwrap()` on an `Err` value: ## Error Occurred ##
Message("MultipleRoots")
oh i think it needs to be mut
if you return pst_out
maybe?
idk
actually this is strange
you can also try just doing the entire inference
and only returning pst_out
ofc i tried both
gonna try multiplying the other two with 0 now so everything is "used" at least
out = out.linear_comb(0.0, pst_out, 0.5) + skip_neuron.linear_comb(0.0, pst_out, 0.5);
this works :P
surely the pst should work ith this change
and eval_scale 600
bc if the float weights are x
on sf side you jsut have (600 * 16 * sum x) / 16
and you get 600 * sum x
applied all suggestions, startpos eval is 0 (as it should be!)
https://1drv.ms/u/c/74d39b59afff2586/IQCRA1uH1iBETr4miI4uK6YAAUlxKnOth1Zf5lMBdRA-FeY?e=8AJzWS
can you also get r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9
startpos pst has always been 0 :P
12 internal units
kk
eval
info string NNUE evaluation using nn-4e6276be8161.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.
WHITE added: 20800 20545 20674 20931 20677 21062 20424 20425 20426 20427 20429 20430 20431 20836 20651 20528 20529 20530 20531 20788 20533 20534 20535 20920 20794 21051 20925 21118
removed:
BLACK added: 20920 20665 20794 21051 20797 21118 20528 20529 20530 20531 20533 20534 20535 20892 20563 20424 20425 20426 20427 20684 20429 20430 20431 20800 20674 20931 20805 21062
removed:
L1 (first 128): [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
L2: [1 -1 0 -65535 -65536 65536 65535 65536 1 65536 0 -1 -65536 0 -65536 (65535)]
L2 CReLU(x^2): [0 0 0 127 127 127 127 127 0 127 0 0 127 0 127 ]
L2 CReLU(x): [0 0 0 0 0 127 127 127 0 127 0 0 0 0 0 ]
L3: [65663 -65536 -65409 254 65282 -889 -253 -889 -64900 -66171 -131325 -65536 65028 127 65409 65408 -127 131325 65409 -508 -127 65154 65662 -509 -65663 -65790 -65663 65409 65408 64900 -635 255 ]
L3 CReLU(x): [127 0 0 3 127 0 0 0 0 0 0 0 127 1 127 127 0 127 127 0 0 127 127 0 0 0 0 127 127 127 0 3 ]
[normal, skip] = [65277 77403]
[psqt, positional] = [-4095, +8917]
NNUE evaluation +12.75 (white side)```


no seriously
what the
the feature indices are correct right?
so purely parsing error or smth
probably a parsing/layout error now yeah
well there are strict checks on the layouts
in particular the l0b l0w and pst must be int he correct order
so something is wrong in the pst section itself
is it possible for you to get the 8 bucket weights of specific indices or smth
grasping at straws here
my best guess would be to write a small script that tests PST inference for the bullet checkpoint
but i don't see how it could be wrong there
yeah but every section individually has the correct length and format, so parsing/layout errors are contained within each section
wait pst is output bucketed right, how does this work in SF inference, it's also UE'd right? so all buckets are technically always computed, even if not needed
yes
ok i thought for a second we forgot to transpose the weights
all zeros
white queen on d1 is worthless when white king is on g1 !!!!
hm no smth is wrong with my code
[65537 1 65536 65535 65535 65536 65536 1 ]
this does
not seem right
but yeah we'll see
i get the same numbers...
welp
leb128 looking fine
as I suspected
so I have no idea why it's different
maybe lemme get some position with only few pieces
i think since the input type is factorised, the pst weights have a factoriser too
even though afaik Factorised<> automatically merges that, i'm gonna try it non-factorised rq
oh
ok
yeah and maybe try 8/6k1/8/8/3P4/8/1K6/8 w - - 0 1
there's only 3 pieces
what could go wrong :clueless:
When the Pawn
ah the checkpoint is about 4.5MB smaller now, which matches perfectly what would happen if the factorised weights were previously included
152 -87 -266 -201 75 290 397 299
reasonable values!
o
my uni wifi wondering why I've downloaded 8 nnue files of 66 MB today :P
[152 -87 -266 -201 75 290 397 299 ]
let's go???
startpos eval?
oh right
did you get this one
r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9 should be about -6
or the pawn endgame
ayyy
eval
info string NNUE evaluation using nn-a64da979b54f.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.
WHITE added: 20800 20545 20674 20931 20677 21062 20424 20425 20426 20427 20429 20430 20431 20836 20651 20528 20529 20530 20531 20788 20533 20534 20535 20920 20794 21051 20925 21118
removed:
BLACK added: 20920 20665 20794 21051 20797 21118 20528 20529 20530 20531 20533 20534 20535 20892 20563 20424 20425 20426 20427 20684 20429 20430 20431 20800 20674 20931 20805 21062
removed:
L1 (first 128): [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
L2: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (0)]
L2 CReLU(x^2): [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
L2 CReLU(x): [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
L3: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
L3 CReLU(x): [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[normal, skip] = [0 0]
[psqt, positional] = [-9, +0]```
let's go??
YES
life is good
once something works we can work backwards to add what's missing
btw when you do this can you get out, skip_out and pst_out
bc my debug info also has all 3 of them
i can try, but let's first see how bad things are
i completely removed the factoriser now
https://1drv.ms/u/c/74d39b59afff2586/IQBsQgdxgVMgRbekt661j2IUAbWtg-9jF7zF67SAXay11o8?e=WF2TVU
eval for r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9 is 16
random berlin position is actually a decent test position lmao
ok let's see
in a min or two
eval
info string NNUE evaluation using nn-2cc242fcab84.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.
WHITE added: 20800 20545 20674 20931 20677 21062 20424 20425 20426 20427 20429 20430 20431 20836 20651 20528 20529 20530 20531 20788 20533 20534 20535 20920 20794 21051 20925 21118
removed:
BLACK added: 20920 20665 20794 21051 20797 21118 20528 20529 20530 20531 20533 20534 20535 20892 20563 20424 20425 20426 20427 20684 20429 20430 20431 20800 20674 20931 20805 21062
removed:
L1 (first 128): [0 7 4 0 0 27 4 1 0 0 0 0 19 5 2 1 23 0 18 1 0 5 0 20 23 0 0 0 3 14 28 1 0 5 0 0 0 0 14 0 25 19 9 15 0 0 0 0 40 0 0 0 8 2 0 18 3 9 0 0 13 20 0 0 2 0 21 1 0 21 0 0 0 0 0 11 0 1 0 11 14 0 0 0 0 0 0 34 0 32 22 0 0 0 3 2 9 0 13 14 0 0 1 0 0 0 24 28 2 21 8 0 0 2 0 13 0 6 0 17 12 3 13 6 15 11 0 0 ]
L2: [-5890 4039 8313 4356 3438 5305 10575 6519 3957 1457 -2233 -2333 5295 7469 -4199 (4289)]
L2 CReLU(x^2): [66 31 127 36 22 53 127 81 29 4 9 10 53 106 33 ]
L2 CReLU(x): [0 63 127 68 53 82 127 101 61 22 0 0 82 116 0 ]
L3: [5564 4988 4871 -3625 -1894 -2410 -739 2902 3027 2338 -883 3304 -3766 -2235 -1100 2512 2216 -7469 467 2828 -833 -816 8457 -901 2398 2163 2244 11 -328 3479 4975 2129 ]
L3 CReLU(x): [86 77 76 0 0 0 0 45 47 36 0 51 0 0 0 39 34 0 7 44 0 0 127 0 37 33 35 0 0 54 77 33 ]
[normal, skip] = [-4262 5065]
[psqt, positional] = [-2, +50]
NNUE evaluation +0.13 (white side)```
maybe maybe
(raw eval is 48)
that looks quite reasonable now
48 is still alright given true eval is 16
do you have startpos eval as well
45
eval
info string NNUE evaluation using nn-2cc242fcab84.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.
WHITE added: 22208 21953 22082 22339 22468 22085 21958 22215 21832 21833 21834 21835 21836 21837 21838 21839 21936 21937 21938 21939 21940 21941 21942 21943 22328 22073 22202 22459 22524 22205 22078 22335
removed:
BLACK added: 22328 22073 22202 22459 22524 22205 22078 22335 21936 21937 21938 21939 21940 21941 21942 21943 21832 21833 21834 21835 21836 21837 21838 21839 22208 21953 22082 22339 22468 22085 21958 22215
removed:
L1 (first 128): [0 7 0 0 0 30 0 6 2 0 0 2 16 7 0 5 6 0 1 2 0 1 5 49 0 0 4 5 0 19 0 0 4 0 9 9 0 1 5 3 54 9 3 5 0 0 4 0 54 3 0 12 2 1 0 5 6 8 0 0 24 10 1 0 9 33 0 3 0 6 0 0 0 8 0 0 0 0 0 6 9 4 0 0 0 1 0 24 0 18 21 0 1 0 0 0 6 0 3 8 5 0 0 1 4 0 22 30 0 30 37 0 0 0 0 31 0 0 1 7 27 2 22 0 5 0 0 0 ]
L2: [5706 4058 1665 3009 -7774 7520 7305 -8251 -14491 -8684 -5017 6628 7127 -4286 6504 (-850)]
L2 CReLU(x^2): [62 31 5 17 115 107 101 127 127 127 48 83 96 35 80 ]
L2 CReLU(x): [89 63 26 47 0 117 114 0 0 0 0 103 111 0 101 ]
L3: [-1294 6294 6987 5821 4174 -3564 3387 -3172 7346 1499 -3768 -1403 -4705 1062 4005 6366 3391 1700 -6845 3495 4704 -4917 1691 -4078 2004 -1570 1976 3884 473 -1350 -3770 -1540 ]
L3 CReLU(x): [0 98 109 90 65 0 52 0 114 23 0 0 0 16 62 99 52 26 0 54 73 0 26 0 31 0 30 60 7 0 0 0 ]
[normal, skip] = [1419 -1003]
[psqt, positional] = [+0, +26]
NNUE evaluation +0.07 (white side)```
26
raw
reasonable
sick

maybe I comment out debug info and see if the pv for startpos makes any sense?
lemme try that
O YES IT LOOKS LIKE CHESS
not good chess
but still chess
hell yeah!
i'll sleep now, and then we can try tomorrow or so to re-integrate the other features
o
it beat a 2400 ish CCRL blitz engine
strangest game ever
@sage stream Test be like: NNUE trained on 1 SB and a firm handshake of selfgen games and the WDL is: 🤷♂️. We train HCEs for longer.
Holy pull
Wait where is the original
it's in #absolute-shashin
Btw I think if eg @violet badger wants to try a longer training test https://github.com/Yoshie2000/sf-bullet-train/tree/fix-inputs
also https://github.com/Yoshie2000/sf-bullet-train/blob/fix-inputs/src/main.rs#L221 I think these two should be 600 also
I'm happy to try to run for a little longer. Two quick questions, how to provide multiple binpacks as input, and how to setup multiGPU training.
sounds like for @formal smelt to answer
how should we proceed on this, try a 100 SB real run first to see if it becomes somewhat strong, or integrate everything first with 1SB sanity checks
I can quickly run 100SB right now, and we see where we stand? Would be faster multiGPU, but starting now is probably even faster 😉
yeah
I'll simultaneously try to fix the factoriser, until we see that it still produces reasonable results
Factorised l0w + pst, they should be merged correctly now
https://1drv.ms/u/c/74d39b59afff2586/IQCnZceEZ0E3Q7NguQ0l5oQRAStllnSKwUZkdtauv_n3l2o?e=NROEo7
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 -> 43
r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9 -> 57
you're faster than me installing rust ...
(ok, trying to figure out how to do it correctly in the container environment that I'm using, but well, no excuses)
is this with 600.0 change?
or do I need to multiply these by 1.5
the good news for you is that i did not train 100 SBs in these 13 minutes xD
ah it's still 400 on my side
changed it now for future evals
eval
info string NNUE evaluation using nn-81c082405712.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.
L1 (first 128): [0 35 9 7 0 0 28 0 0 7 0 2 28 0 2 36 0 12 3 31 31 28 1 56 11 0 0 3 0 5 0 40 16 55 14 19 38 0 46 19 14 0 0 28 0 52 0 39 49 0 4 0 0 21 26 30 0 0 9 0 0 0 22 48 0 1 37 0 19 0 1 23 21 0 8 0 8 11 0 24 10 33 0 44 0 2 43 0 19 35 0 17 0 34 62 0 48 6 0 0 17 24 8 0 19 27 0 8 0 7 41 17 0 1 66 22 17 4 3 0 34 7 23 0 45 0 26 0 ]
L2: [-782 2913 5915 1645 -14149 24231 2753 5418 3241 -4086 3977 8455 -8888 4432 -2446 (-2630)]
L2 CReLU(x^2): [1 16 66 5 127 127 14 55 20 31 30 127 127 37 11 ]
L2 CReLU(x): [0 45 92 25 0 127 43 84 50 0 62 127 0 69 0 ]
L3: [3290 3317 4467 -5288 -1449 -531 2395 -2337 -1085 -4165 -4409 -4177 707 3713 2600 2935 5039 2861 3123 2263 -2477 3885 -8094 -4863 3442 -3560 4140 -3969 -574 2406 -1194 -2577 ]
L3 CReLU(x): [51 51 69 0 0 0 37 0 0 0 0 0 11 58 40 45 78 44 48 35 0 60 0 0 53 0 64 0 0 37 0 0 ]
[normal, skip] = [4210 -3106]
[psqt, positional] = [+0, +69]
NNUE evaluation +0.18 (white side)```
raw startpos is 69
position fen r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9
eval
info string NNUE evaluation using nn-81c082405712.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.
L1 (first 128): [0 34 7 5 0 0 21 0 0 8 0 6 12 0 0 33 0 5 3 50 14 13 0 32 6 0 0 0 0 0 0 28 2 41 15 23 17 0 38 15 20 0 0 30 0 62 0 32 34 0 0 0 0 20 27 29 1 0 12 3 0 0 23 23 0 0 42 0 19 0 2 16 0 6 13 0 4 8 0 19 5 36 0 25 0 0 23 0 13 26 0 14 0 36 55 0 45 8 0 0 19 33 3 0 10 8 0 9 0 3 46 24 1 0 67 19 22 4 0 0 25 0 18 0 35 0 16 0 ]
L2: [-3180 8434 -8349 -4061 8049 114 -1044 1497 -3472 -5565 459 4286 1974 -3644 2177 (-779)]
L2 CReLU(x^2): [19 127 127 31 123 0 2 4 22 59 0 35 7 25 9 ]
L2 CReLU(x): [0 127 0 0 125 1 0 23 0 0 7 66 30 0 34 ]
L3: [-4048 1024 -3380 1544 3979 -909 3072 452 5896 602 1618 -1936 2329 3109 -1840 -1814 -149 4150 -2820 1299 3614 2346 1136 4138 -419 -2696 2506 1320 -1162 -2180 5403 1449 ]
L3 CReLU(x): [0 16 0 24 62 0 48 7 92 9 25 0 36 48 0 0 0 64 0 20 56 36 17 64 0 0 39 20 0 0 84 22 ]
[normal, skip] = [2422 -920]
[psqt, positional] = [-11, +93]
NNUE evaluation +0.22 (white side)``` raw is 82
looks good
amazing
if looks good, please push, and I'll start from that.
"remove factoriser" "add factoriser again" lol
done
shall we try the l1 factoriser as well?
# test bullet
git clone https://github.com/Yoshie2000/sf-bullet-train.git
cd sf-bullet-train
git checkout fix-inputs
# edit src/main.rs file_path
cargo run --release .
that's the procedure right?
(like manual edit of main.rs needed)
where are the datasets being loaded
sure that's the comment in the procedure above.
also need to adjust SB count in line 185
okay
hard to say right now, i'm training another net already :P but i remember roughly 800k pos/s from yesterday
(some multiple of 60 would make sense, since that suits the LR schedule, but you can change the step there too of course)
will make it 120
There is a speedup still when using a factoriser, though I won't implement it now as it does not work with threat inputs (at least I haven't found a way yet)
annoyingly the install is still not correct... somehow being installed as root, and starting the container as non-root. So, I reinstall when entering the container right now. SHould figure that out eventually.
That's one SB
Params: 72156296
Training Preamble
Net Name : test
Batch Size : 16384
Batches / Superbatch : 1024
Positions / Superbatch : 16777216
Start Superbatch : 1
End Superbatch : 1
Eval Scale : 600
Save Rate : 150
WDL Scheduler : constant 0
LR Scheduler : start 0.001 gamma 0.3 drop every 60 superbatches
Threads : 4
Output Path : checkpoints
Beginning Training
superbatch 1 | time 9.6s | running loss 0.013042 | 1738602 pos/sec | total time 11.3s
Estimated time remaining in training: 0h 0m 0s
Saved [test-1]
Total Training Time: 0h 0m 13s
Eval: 44.568cp
Eval: -31.222cp
looks OK?
seems fine yes
#engines-dev message #engines-dev message
some info from jw on multigpu
ok, let me try that.
seems to be slightly faster than nnue-pytorch? translates to 105 its/sec
yeah, though skipping and such is quit different, but certainly looks good.
why has
a superbatch been reduced
btw
to 1024 batches
and not the standard 6104
ah good point, we should change that
https://1drv.ms/u/c/74d39b59afff2586/IQAS-Qe0N5FqQpNQDZeaiMN8AXzVLXGiwHLXbQ_H34_XxBo?e=yuTeGP 1 SB with l1 factoriser, evals 66 and 112 (with 600 scale)
getting -91 and -28
eval
L1 (first 128): [22 0 0 0 2 0 0 0 0 0 0 6 0 0 0 0 5 0 0 0 2 0 2 1 1 3 1 0 0 0 18 0 4 0 14 0 13 0 0 9 0 5 0 0 0 6 0 3 0 0 2 0 0 2 0 0 2 8 0 0 5 0 0 0 0 0 0 7 7 13 3 8 17 3 0 0 0 0 3 1 7 0 0 0 0 0 11 0 0 3 0 6 0 7 0 20 0 15 0 0 0 0 8 0 4 0 4 9 2 1 0 0 4 0 11 1 0 0 0 0 0 0 1 0 0 0 0 0 ]
L2: [7211 -1613 4893 -1900 -12473 7297 9595 1191 14570 3115 -3398 6660 13664 5785 -8917 (-1121)]
L2 CReLU(x^2): [99 4 45 6 127 101 127 2 127 18 22 84 127 63 127 ]
L2 CReLU(x): [112 0 76 0 0 114 127 18 127 48 0 104 127 90 0 ]
L3: [3586 1415 5605 -6626 -5444 -371 3739 -3533 1561 4141 3812 -4530 -5525 -3268 4107 -4484 -3925 -5060 1606 -5402 -4427 -6424 -5318 7747 -10300 2550 -6464 1889 -7744 -1817 -1362 -1545 ]
L3 CReLU(x): [56 22 87 0 0 0 58 0 24 64 59 0 0 0 64 0 0 0 25 0 0 0 0 121 0 39 0 29 0 0 0 0 ]
[normal, skip] = [-132 -1324]
[psqt, positional] = [+0, -91]
NNUE evaluation -0.24 (white side)
ucinewgame
position fen r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9
eval
L1 (first 128): [12 0 0 0 1 0 0 1 0 0 0 6 0 1 0 0 11 0 10 0 6 0 11 0 0 6 15 0 0 11 0 0 7 0 4 0 16 0 0 15 0 2 0 0 0 8 0 0 0 1 3 0 0 4 0 8 8 0 0 5 3 0 0 0 0 3 1 8 2 10 2 4 12 4 0 13 0 0 3 0 5 0 0 0 0 0 0 0 0 0 0 14 0 4 0 2 0 0 0 0 0 0 3 0 0 3 1 7 0 1 2 3 9 0 6 2 0 0 0 0 0 0 5 0 0 0 2 0 ]
L2: [17849 -3002 -2770 3869 6234 -11711 -21902 15147 6015 -617 -3107 15343 2215 -3799 -9219 (226)]
L2 CReLU(x^2): [127 17 14 28 74 127 127 127 69 0 18 127 9 27 127 ]
L2 CReLU(x): [127 0 0 60 97 0 0 127 93 0 0 127 34 0 0 ]
L3: [-2937 1608 -1563 857 3589 3871 -6501 1365 -4169 2560 -5220 2119 -8509 -5111 2558 4084 -2520 -6030 2079 -7494 1958 617 2366 -633 -4361 3530 -171 -1299 3694 -4122 2150 -2276 ]
L3 CReLU(x): [0 25 0 13 56 60 0 21 0 40 0 33 0 0 39 63 0 0 32 0 30 9 36 0 0 55 0 0 57 0 33 0 ]
[normal, skip] = [-722 266]
[psqt, positional] = [+0, -28]
NNUE evaluation -0.07 (white side)```
this looks kinda far away now
Try this one, removing the unnecessary set of factorised biases
https://1drv.ms/u/c/74d39b59afff2586/IQDNu6vvsBtFQYC3VXUmTjdFAQPXJv4yRVVwykKmPJQSpjY?e=M08nTv 66, 82
don't think it's going to work, i think i need to transpose before merging, which i'll try next
yeah, it thinks raw evals are 305 and 160
-33 and +232
ah ofc the default .transpose() does not work with this type of factorisation
14th sfnnv9 net attempt lel
wait no this is bullshit. transposing worked without the factoriser
startpos +360 yeah
I have the output from the 120SB:
Estimated time remaining in training: 0h 0m 8s
superbatch 120 | time 8.6s | running loss 0.001450 | 1941046 pos/sec | total time 1046.2s
Estimated time remaining in training: 0h 0m 0s
Failed to write quantised network weights:
Failed quantisation from f32 to i8!
Saved [test-120]
Total Training Time: 0h 17m 28s
Eval: 73.723cp
Eval: 90.723cp
huh
(at re-add working ft factoriser)
failed quantisation from f32 to i8
Failed to write quantised network weights:
Failed quantisation from f32 to i8!
what!
something exceed weight limit?
looks like it
pushed a fix for that
are we sure nnue-pytorch has this l1 factoriser?
@violet badger if you integrate the latest commit, you should be able to retry the quantisation simply by doing
-trainer.run(&schedule, &settings, &data_loader);
+// trainer.run(&schedule, &settings, &data_loader);
+trainer.load_from_checkpoint("checkpoints/test-1");
+trainer.save_quantised("checkpoints/test-1/quantised.bin").unwrap();
suspicious
i would like to test if removing it helps
eventually
won't the force clipping change the evals
ah true. forget what i said
If no pilot error:
Params: 72156296
thread 'main' (293488) panicked at src/main.rs:228:66:
called `Result::unwrap()` on an `Err` value: Custom { kind: InvalidData, error: "Failed quantisation from f32 to i8!" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted (core dumped)
no problem.
like the l1 factorizer is strange, because then you need to half the weight limit
and it's already heavily quantized
i think merging is supposed to be done before transposing anyways
i don't think so, transposing puts it from [l1][ob][l2] to [ob][l1][l2], and the factoriser should have layout [l1][l2], so the standard fact.repeat(bucketcount) -> elementwise add should work. but it doesn't
idk i kind of want to try again, maybe i did something wrong last time https://1drv.ms/u/c/74d39b59afff2586/IQBGzgPeP_bVQKONuJm-6Dn7ARiZjXqldoUFCTSplSw0Cvc?e=mIz1D4 evals 59, 43
+5, +122
i don't know then. let's skip it for now i guess
if anything, it'll be low single-digit elo anyway
or this
idk if it's even good
like l1 -> l2 is way smaller
than inputs -> l1
do we have a second run
with the fixed clipping
there's a better way by using clip_pass_through_grad on l1w+l1f
but since quantisation did not fail, everything fits into i8
oh
My local run (but multiGPU) ended with
Saved [test-120]
Total Training Time: 0h 14m 59s
thread 'main' (27889) panicked at /users/vjoost/.cargo/git/checkouts/bullet-8a69ed9a26c6f599/e37db79/crates/bullet_lib/src/value.rs:245:18:
Invalid output size!
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
probably better to not mix both things though.
in the multiGPU context?
Yeah
okay.
but I guess that means that at least checkpoint saved correctly, so the failed to quantise error went away.
I guess just load the checkpoint and run eval using single gpu
(I cannot run this with the debug data right now, but you can check the normalized eval by loading the network into past master)
I'll run afterwards with 1 GPU... first some testing multiGPU.
Ah ok
so, started 120 SB of 6104 * 16384... should take about 2h
I used 1GPU, so that allows for speed comparison, ultimately not too different (about 50s bullet vs 67s pytorch per SB/epoch), assuming we're doing roughly the same thing now.
Interesting
That is probably because the default batch size is too low for threat inputs + high end GPU
see #1439214470529421384 message
but yeah, I'll run an experiment now on nnue-pytorch to see what a changed batch size does to training.
Is this on old inputs?
good question, the bullet is on old inputs, the pytorch number I quoted is probably on threats, though it wasn't too different.
But different HL size also then right
yeah, old arch on pytorch is 72s instead of 67s
and I think this current bullet training is setup to match the old arch..
Yeah
Almost 50% seems pretty good
And also if both had the better factoriser code the gap would widen I think
absolutely ....
(I know how much effort one might put in just 5% for e.g. megatron/LLM training).
I still think if someone was feeling cute they should just write a fused ft/l1 kernel for nnue-pytorch given the arch seems pretty much fixed
Would make a really big difference
I hope someone will pick up... @frosty imp was refactoring recently... so maybe
Did we verify that the produced networks are reasonably strong
I should have this 120SB trained network in a bit, that should give an idea.
I guess that could be within say 100-200Elo of master?
@violet badger
Script failure?
https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/2926829081096545/-/jobs/12100414422
🤨
ikr..
so threat net with 64k batch trains significantly faster (about 35s per SB/epoch)
with nnue-pytorch?
yeah
well. still needs to see this is Elo impact free. right
it also increases for whatever reason memory usage on CPU side.
probably each of the workers having a buffer that is proportional.
OK, longer train ended.
ah nice
Total Training Time: 1h 34m 23s
Eval: 86.946cp
Eval: 79.768cp
I can test
what data do you need
can you just load it into an old sf
and check the evals
in those two positions
(after running disservin converter script)
from 'quantised.bin' ?
yea
$ python convert_quantised_to_pytorch.py checkpoints/test-120/quantised.bin test.nnue
Read checkpoints/test-120/quantised.bin successfully.
Organized data into 8 buckets.
Writing to test.nnue...
Ending position for bucket 0: 70487760
Bucket 0 size: 1152 bytes
Ending position for bucket 1: 70538168
Bucket 1 size: 1152 bytes
Ending position for bucket 2: 70588576
Bucket 2 size: 1152 bytes
Ending position for bucket 3: 70638984
Bucket 3 size: 1152 bytes
Ending position for bucket 4: 70689392
Bucket 4 size: 1152 bytes
Ending position for bucket 5: 70739800
Bucket 5 size: 1152 bytes
Ending position for bucket 6: 70790208
Bucket 6 size: 1152 bytes
Ending position for bucket 7: 70840616
Bucket 7 size: 1152 bytes
Integer value at position 69389475: 33686908
Conversion complete: checkpoints/test-120/quantised.bin -> test.nnue
now, let me build an SF in that container.
info depth 30 seldepth 45 multipv 1 score cp 20 nodes 16143880 nps 712691 hashfull 1000 tbhits 0 time 22652 pv e2e4 c7c5 c2c3 d7d5 e4d5 d8d5 d2d4 g8f6 g1f3 b8c6 d4c5 d5c5 b1a3 e7e5 a3b5 c5e7 d1a4 e7d8 f3e5 f8c5 e5c6 b7c6 b5d4 e8g8 f1e2 f8e8 c1e3 f6g4
NNUE evaluation +0.23 (white side)
Final evaluation +0.31 (white side) [with scaled NNUE, ...]
and
NNUE evaluation +0.17 (white side)
Final evaluation +0.22 (white side) [with scaled NNUE, ...]
main net is pretty similar
NNUE evaluation +0.05 (white side)
Final evaluation +0.07 (white side) [with scaled NNUE, ...]
and
NNUE evaluation +0.24 (white side)
Final evaluation +0.31 (white side) [with scaled NNUE, ...]
let me see if I can start a short match.
pv looks nice, real chess
looks pretty good..
--------------------------------------------------
Results of master vs test (10+0.1, 1t, 16MB, UHO_Lichess_4852_v1.epd):
Elo: 149.91 +/- 16.47, nElo: 338.50 +/- 32.92
LOS: 100.00 %, DrawRatio: 25.70 %, PairsRatio: 78.50
Games: 428, Wins: 205, Losses: 31, Draws: 192, Points: 301.0 (70.33 %)
Ptnml(0-2): [0, 2, 55, 138, 19], WL/DD Ratio: 1.12
LLR: 1.10 (37.5%) (-2.94, 2.94) [0.00, 2.00]
--------------------------------------------------
I think that works.
nice, another good result from this thread 🙂
ok so looks like we got basic arch working
finally
which unlocks testing more things with bullet
nice
is that a threat input network or normal network test?
pre-threat input arch
but it should now be pretty straight forward to get threat inputs working as well
to continue training, can I just 'load_from_checkpoint' and increase end_superbatch in schedule?
yes, though you also have to increase start_superbatch to whatever SB the checkpoint is from
Also note it will start from the beginning of the dataset again. So this is not ideal
I always restart training for this reason. From the beginning
ok, yeah, this is still very early experiment.
is there a way to provide multiple binpack and have it interleave them on the fly?
not without a custom dataloader I think
though it shouldn't be too hard, one could mix and match existing code, e.g. interleaving exists for viri binpacks in bullet-utils
what's more important, that or threat inputs?
that python script is no longer needed with that? nvm it seems like it is since theres no leb128
yeah, you can effectively copy the standard bullet threat input definition
and then just tack on the other stuff onto it
which is basically how I got it in nnue-pytorch as well
240SB:
--------------------------------------------------
Results of master vs test240 (10+0.1, 1t, 16MB, UHO_Lichess_4852_v1.epd):
Elo: 112.26 +/- 9.57, nElo: 235.96 +/- 18.77
LOS: 100.00 %, DrawRatio: 31.91 %, PairsRatio: 13.45
Games: 1316, Wins: 553, Losses: 142, Draws: 621, Points: 863.5 (65.62 %)
Ptnml(0-2): [2, 29, 210, 390, 27], WL/DD Ratio: 1.08
LLR: 2.95 (100.1%) (-2.94, 2.94) [0.00, 2.00]
--------------------------------------------------
so, definitely working.
vondele, what is being trained here?
https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/2926829081096545/-/jobs/12100493856
a network, just experiment with batch size.. don't worry.
@formal smelt I need some help here, I'm working on integrating threat inputs with the SF PST right now. My idea was to have the input type have the layout
factoriser,halfkav2,threats
so I can modify PST inference like so
-let stm_pst = pst.matmul(stm).select(buckets)
+let pst_slice_end = ThreatInputsBucketsMirrored::FACTORISER_SIZE + ThreatInputsBucketsMirrored::HALFKA_V2_SIZE;
+let stm_pst = pst.matmul(stm.slice_rows(0, pst_slice_end)).select(buckets)
calling slice_rows() like this leads to an error that I'm not sure how to fix: Message("Op(IncorrectDataLayout)")
Any ideas?
It looks like this operation may not be allowed on sparse nodes. in which case this will be difficult, or training will be slow
Time for dinner tho, WIP code is up on https://github.com/Yoshie2000/sf-bullet-train/tree/threat-inputs
Cursed way is to have the full weights and element mul them by a mask lol
doesn't seem like the batch size increase affects strength that much, that's good
at least if only applied to one stage
could be different if done for all stages
yeah, looks very good to my eyes (strength is equivalent/better). Will now start a full training to verify. That's a bit more tricky. Now that means that making sure the DDP in pytorch is working would become very useful. It would imply a 5stage net trained in a day.
looks like 49c will be the shortest lived master net...
This entire effort was a godsend for the SF net training pipeline.
Bullet is using dp right?
dp?
data parallelism, I assume
in that case yes
see also #1439214470529421384 message
Is it time for a VVLTC search tune for threat input?
Pipeline failed, what's the reason?
https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/2926829081096545/-/jobs/12103304941
😕
restarted.
Wasn’t able to fetch the data? That’s a new one
yeah, could happen, some filesystem hickup, I've seen it before.
not related to the patch.
it's a bit hard to figure out due to the huge squash, who do i need to credit to for coming up with the way SF currently creates threat indices? (Using the PiecePairData and other lookup tables)
I'm cleaning up some code right now, but an early & dirty version seemed to be a small speedup at VSTC
https://furybench.com/test/3709/
I'm thinking probably a mix of @rocky vigil and @prime mica, but potentially cj as well, I'm not sure
oh interesting
you should credit me, anematode, cj i think
me when my legacy [piece][64] and [piece][65] code is making it into plentychess as well 💀
i'll get rid of those :P
Lol my latest thing will fix that
if it passes…
https://github.com/official-stockfish/Stockfish/pull/6406 has the non-squashed
yeah but easier to ask here than to dig through 73 commits
i guess it really depends on how far you want the contribution range to reach
if it's just "everyone who has touched threat indexing at some point" then that's like 6 ppl
ofc anematode is the one who has put in the most effort into optimizing the lookup table design
i think it's mainly about the three lookup tables, but idm crediting the whole team 
yeah then this makes most sense, have him as co-contributor and just mention the rest of us in the pr
so that would be me, cj, shawn, aliceroselia, and rn5 as well
(that's worth a lot)
(also tbh I only had this idea bc yoshie had lookup table from the beginning
)
I guess I'll just mention the entire team in the PR
In the end, it all boils down to Yoshie, who also invented the transistor and integrated circuit
I would like to thank the floor without which I would not be standing here today
yep, also don't forget to thank your parents
see this more often in (non-code) writing
and also your computer for tirelessly compiling plenty over and over again 
A cool thing is, this is not only a slight speedup, but also removes 100 lines of code, and no longer needs 2MB for the lookup table 
i thought it added 100 lines of code xD
but yeah it indeed removes 2MB of lookup table
ah I didn't see the other change
regarding sf-bullet-train, https://github.com/official-stockfish/nnue-pytorch/blob/master/model/config.py#L21 seems to imply that the MPE loss should have a power of 2.5
actually https://github.com/vondele/nettest/blob/main/threats.yaml has more detailed info
but short answer is that it's not 2.6
but rather 2.442
or so
there are also some other knobs nnue-pytorch has
but i guess we should first get threat inputs to work
I'm pretty busy rn, can't promise anything, but in theory all that's left to do is what jw suggested: #1336647760388034610 message
Since I never tested it properly, and king buckets did not gain in monty, I wanted to get some numbers on how much they are worth.
No king buckets vs. 12 king buckets, fixed nodes
Elo | -25.58 +- 5.36 (95%)
Conf | N=20000 Threads=1 Hash=16MB
Games | N: 6462 W: 1661 L: 2136 D: 2665
Penta | [164, 953, 1410, 602, 102]
https://furybench.com/test/3740/
STC (Could be more optimised)
Elo | -14.79 +- 5.76 (95%)
SPRT | 8.0+0.08s Threads=1 Hash=16MB
LLR | -2.26 (-2.25, 2.89) [0.00, 2.50]
Games | N: 3596 W: 838 L: 991 D: 1767
Penta | [18, 482, 939, 353, 6]
https://furybench.com/test/3739/
I guess not a huge surprise, king buckets are still worth it
@violet badger
net nn-a46c62f97ff9.nnue created
Why is more correction history and more node searched suddenly good now?
Isn't it supposedly that better net = more aggressive pruning?
And less need for correction.
Yes... but the bench is still going up for some reasons.
Oh... extensions are good reasons.
But yeah... 20% bench jump is quite meaningless. It's not just this though. With the old net, we sometimes went sub 2M.
Thought the trend was down and down and down.
Like... the graph suggested that the branching factor went down over time since like Stockfish 8.
branching factor is measured at much bigger depth than bench
bench actually jumps all over the place
I recall some PT having almost 8kk bench vs 2,5kk at some point
since I personally did a lot of work to make IIR less aggressive to improve scaling
this work resulted in bench increasing 3x more or less
so it's a normal thing - especially for VVLTC oriented patches
we should permanently move out of this thread shouldn't we
does the threat inputs net handle pinned pieces
oh my bad didn't see this