UE Threat Inputs for AB | Stockfish | Page 13

rocky vigil Nov 14, 2025, 2:08 PM

#

i also gtg for an hour

#

so someone else's turn to stare

#

and see if they find the errors

lofty cedar Nov 14, 2025, 2:10 PM

#

Do we spec LTC threat42 too?

#

Like... if threat 37 was an anti-scaler, there is no reason not to believe the current net can't be an anti-scaler comparedto threat42 or something else.

#

But yeah... Spec LTC-ing everything is expensive.

rocky vigil Nov 14, 2025, 2:13 PM

#

Realistically the only way we’ll be able to debug this is by comparing all of bullet layers to sf layers

#

Idk how feasible this is to do in bullet

#

@formal smelt ?

stray reef Nov 14, 2025, 2:15 PM

#

what kind of transformations does SF do on startup?

rocky vigil Nov 14, 2025, 2:16 PM

#

Not ones that change the overall eval

#

You can read the x32 code, that doesn’t have any transformations

stray reef Nov 14, 2025, 2:16 PM

#

yeah ofc i'm talking about transposing, packus stuff, etc

formal smelt Nov 14, 2025, 2:23 PM

#

rocky vigil <@236941606035521537> ?

It should be quite easy

rocky vigil Nov 14, 2025, 2:23 PM

#

Ok

#

So when I get back

#

I’ll also attempt to do it on sf side

formal smelt Nov 14, 2025, 2:23 PM

#

Return the relevant node and you can get it after calling forward

#

If it isn’t optimised out

lofty cedar Nov 14, 2025, 2:25 PM

#

Now that we can train new nets... what do you think?

formal smelt Nov 14, 2025, 2:26 PM

#

Random cross-post lol

#

why not put in #nnue-dev

lofty cedar Nov 14, 2025, 2:27 PM

#

IDK... at this point the thread and #nnue-dev are now used interchangeably.

formal smelt Nov 14, 2025, 2:28 PM

#

Well as always "what if we did <extremely vague thing>" isn't a great suggestion

rocky vigil Nov 14, 2025, 2:28 PM

#

Btw @stray reef can you update branch on GitHub

#

So that I can look and see if I can statically find additional issues

stray reef Nov 14, 2025, 2:30 PM

#

done

rocky vigil Nov 14, 2025, 2:39 PM

#

Well

#

The architecture now looks correct

#

sigh

rocky vigil Nov 14, 2025, 3:04 PM

#

L3 pre-activation: [757 -128007 889 1153 131982 66180 -63742 196852 65014 66823 132104 393984 135271 -63887 198538 -62482 131595 198409 65519 4239 132477 -62995 67830 66179 -61969 67456 66664 -260342 -63263 -63886 2152 67434 ]
L3 post-activation: [11 0 13 18 127 127 0 127 127 127 127 127 127 0 127 0 127 127 127 66 127 0 127 127 0 127 127 0 0 0 33 127 ]``` (startpos)

#

@stray reef I am assuming this is not how it's supposed to go?

stray reef Nov 14, 2025, 3:05 PM

#

nope

rocky vigil Nov 14, 2025, 3:05 PM

#

can you get the L2, L3 from bullet

#

(pre-activation)

#

i am concerned about why everything in L2 is 127

#

or 0

#

for the sqrrelu

stray reef Nov 14, 2025, 3:07 PM

#

not rn unfortunately

rocky vigil Nov 14, 2025, 3:07 PM

#

L3 pre-activation: [-840 -1436 2324 -3142 -1453 -5814 718 -2749 -6213 -1084 -5075 -15 -1638 -713 -2499 -6018 416 -3924 -2577 647 -1328 20 2479 -3501 -5318 -1800 -661 1223 -2003 -4210 -1722 -2615 ]
L3 post-activation: [0 0 36 0 0 0 11 0 0 0 0 0 0 0 0 0 6 0 0 10 0 0 38 0 0 0 0 19 0 0 0 0 ]``` for comparison here's old master network

stray reef Nov 14, 2025, 3:07 PM

#

yeah that's more like it

rocky vigil Nov 14, 2025, 3:15 PM

#

L2 CReLU(x^2): [127 127 127 127 127 127 127 127 127 0 127 127 127 127 127 ]
L2 CReLU(x): [0 127 0 127 127 127 127 127 0 8 0 0 0 127 127 ]
L3: [757 -128007 889 1153 131982 66180 -63742 196852 65014 66823 132104 393984 135271 -63887 198538 -62482 131595 198409 65519 4239 132477 -62995 67830 66179 -61969 67456 66664 -260342 -63263 -63886 2152 67434 ]
L3 CReLU(x): [11 0 13 18 127 127 0 127 127 127 127 127 127 0 127 0 127 127 127 66 127 0 127 127 0 127 127 0 0 0 33 127 ]```

#

ah yes

#

of course

#

L2 was always supposed to be this massive

#

sigh sigh sigh

#

L2 CReLU(x^2): [2 0 2 24 37 4 3 1 0 45 5 39 5 0 18 ]
L2 CReLU(x): [0 10 19 0 69 0 0 11 0 75 0 70 0 0 0 ]
L3: [-840 -1436 2324 -3142 -1453 -5814 718 -2749 -6213 -1084 -5075 -15 -1638 -713 -2499 -6018 416 -3924 -2577 647 -1328 20 2479 -3501 -5318 -1800 -661 1223 -2003 -4210 -1722 -2615 ]
L3 CReLU(x): [0 0 36 0 0 0 11 0 0 0 0 0 0 0 0 0 6 0 0 10 0 0 38 0 0 0 0 19 0 0 0 0 ]``` (1c0000000000.nnue)

#

I think the L2 biases are off

#

@stray reef forgot since we are using old master arch the 255s here should be 127

stray reef Nov 14, 2025, 3:21 PM

#

for everything?

rocky vigil Nov 14, 2025, 3:21 PM

#

yeah

#

we'll need to change it back to get to new arch

#

but for now it's 127

stray reef Nov 14, 2025, 3:22 PM

#

i see, i'll send you a new net when i'm back

rocky vigil Nov 14, 2025, 3:22 PM

#

ok

stray gyro Nov 14, 2025, 7:49 PM

#

What's the last test result of full threat small net?

rocky vigil Nov 14, 2025, 7:50 PM

#

bad

#

like -3 at least

stray gyro Nov 14, 2025, 7:51 PM

#

Is there data of speed difference?

rocky vigil Nov 14, 2025, 7:51 PM

#

unsure

#

i don't think it's that good though

stray gyro Nov 14, 2025, 7:51 PM

#

-3 Elo sounds like 2% slowdown

#

hmm

rocky vigil Nov 14, 2025, 7:51 PM

#

considering the main purpose of smallnet is speed

#

and not necessarily evaluation accuracy

twilit oriole Nov 14, 2025, 7:51 PM

#

I think now is the time to try a training to disable threats and use just psq for anything >400cp. I dont think threats have much value for that, its mostly just speed loss

rocky vigil Nov 14, 2025, 7:52 PM

#

twilit oriole I think now is the time to try a training to disable threats and use just psq fo...

wait for shawn to do stuff, we haven't even gotten threats merged in nnue-pytorch yet

twilit oriole Nov 14, 2025, 7:52 PM

#

its a simple change i think. can be done on top of the latest vondele branch

rocky vigil Nov 14, 2025, 7:52 PM

#

i know how this would be defined though

#

in the part that assigns active features, just compute simple eval first

prime mica Nov 14, 2025, 7:54 PM

#

that would require different weights for the following layer, right?

twilit oriole Nov 14, 2025, 7:55 PM

#

no need to modify the inference on training side. just pretend threats dont exist for above threshold

rocky vigil Nov 14, 2025, 7:56 PM

#

stray reef i see, i'll send you a new net when i'm back

oh btw are you able to make this change

#

i did not notice any other errors

#

so if stuff still goes wrong then i really will need to manually compare the hidden layers

stray reef Nov 14, 2025, 7:57 PM

#

maybe half an hour?

rocky vigil Nov 14, 2025, 7:58 PM

#

ok cool

#

🙏

stray reef Nov 14, 2025, 8:32 PM

#

https://1drv.ms/u/c/74d39b59afff2586/IQAwwIGnYS9BR6KhKZnWQzA3AaPFm_VLPYgeVRlvDynxvN0?e=3AaFhP startpos eval 66

#

@rocky vigil

rocky vigil Nov 14, 2025, 8:32 PM

#

ok

stray reef Nov 14, 2025, 8:33 PM

#

twilit oriole no need to modify the inference on training side. just pretend threats dont exis...

how would you know this on the engine side tho?

twilit oriole Nov 14, 2025, 8:34 PM

#

the same way?

stray reef Nov 14, 2025, 8:34 PM

#

in the trainer you have the datapoint evaluation

twilit oriole Nov 14, 2025, 8:34 PM

#

the threshold is computed based on simple eval

stray reef Nov 14, 2025, 8:34 PM

#

oh ok

rocky vigil Nov 14, 2025, 8:35 PM

#

L2 CReLU(x^2): [0 127 0 0 127 0 0 0 127 127 127 0 127 127 127 ]
L2 CReLU(x): [0 127 0 0 127 1 1 0 127 127 127 0 127 0 0 ]
L3: [-66045 128 131962 65793 382 131326 -196226 131328 381 -196607 -65660 327809 65410 -254 65537 131328 66046 65791 -64899 130943 632 380 131581 636 257 -65282 -65282 1145 65664 66680 130818 327046 ]
L3 CReLU(x): [0 2 127 127 5 127 0 127 5 0 0 127 127 0 127 127 127 127 0 127 9 5 127 9 4 0 0 17 127 127 127 127 ]


NNUE evaluation        -25.60 (white side)
Final evaluation       -16.01 (white side) [with scaled NNUE, ...]```

#

sigh

#

how are we getting l2 values that are 6 digits

rocky vigil Nov 14, 2025, 8:37 PM

#

rocky vigil so if stuff still goes wrong then i really will need to manually compare the hid...

ahhhhhhh

#

the skip connection from L1 to output is -131056

#

which is responsible for the negative eval

twilit oriole Nov 14, 2025, 8:51 PM

#

i wonder what happens if you remove it. a last resort thing to attempt lol

stray reef Nov 14, 2025, 8:51 PM

#

there's probably some major issue still, like wrong weight layout in ft or l1

rocky vigil Nov 14, 2025, 8:52 PM

#

twilit oriole i wonder what happens if you remove it. a last resort thing to attempt lol

??????

L2 CReLU(x^2): [0 127 0 0 127 0 0 0 127 127 127 0 127 127 127 ]
L2 CReLU(x): [0 127 0 0 127 1 1 0 127 127 127 0 127 0 0 ]
L3: [-66045 128 131962 65793 382 131326 -196226 131328 381 -196607 -65660 327809 65410 -254 65537 131328 66046 65791 -64899 130943 632 380 131581 636 257 -65282 -65282 1145 65664 66680 130818 327046 ]
L3 CReLU(x): [0 2 127 127 5 127 0 127 5 0 0 127 127 0 127 127 127 127 0 127 9 5 127 9 4 0 0 17 127 127 127 127 ]


              [+0, -17]
NNUE evaluation        -0.04 (white side)```

#

finally something that doesn't look total trash

stray reef Nov 14, 2025, 8:52 PM

#

all the values in the arrays are still trash

#

it's just luck that it's close to 0

rocky vigil Nov 14, 2025, 8:52 PM

#

probably

twilit oriole Nov 14, 2025, 8:53 PM

#

well i expect that. because it would have to be removed in trainer also i assume. but the core thing is if it is some kind of instability or a mapping issue

#

what happens when you inspect the weights itself

rocky vigil Nov 14, 2025, 8:55 PM

#

first of all I suspect the quantization is wrong

#

I think the l1 -> l2 values are way too large

stray reef Nov 14, 2025, 8:56 PM

#

what's the weight clipping in nnue-pytorch?

rocky vigil Nov 14, 2025, 8:56 PM

#

https://github.com/official-stockfish/nnue-pytorch/blob/master/model/quantize.py

stray reef Nov 14, 2025, 8:56 PM

#

i mean the float clipping during training

#

in bullet the default is [-1.98, 1.98]

rocky vigil Nov 14, 2025, 8:57 PM

#

there is some clipping

#

lemme check

rocky vigil Nov 14, 2025, 8:58 PM

#

stray reef in bullet the default is [-1.98, 1.98]

yeah it's +- 127 / 64

#

for l2 and l3

stray reef Nov 14, 2025, 8:59 PM

#

and nothing for ft/l1?

rocky vigil Nov 14, 2025, 9:00 PM

#

it's quantized to 127

#

no weight clipping

#

afaik

#

I suspect smth is wrong with the psqt

#

and the skip connection

stray reef Nov 14, 2025, 9:02 PM

#

can you check the pairwise output?

rocky vigil Nov 14, 2025, 9:02 PM

#

when I am ignoring those two the evals are actually reasonable

#

for eg

#

startpos

#

fen r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9

stray reef Nov 14, 2025, 9:02 PM

#

sure but the l2/l3 data above is also not normal

#

i guess those two being wrong just have a much bigger influence on the output

rocky vigil Nov 14, 2025, 9:03 PM

#

yeah

#

i suppose

rocky vigil Nov 14, 2025, 9:03 PM

#

stray reef can you check the pairwise output?

how do I do this w/o printing 3072 thing

stray reef Nov 14, 2025, 9:03 PM

#

you can just print the first X i guess

#

just so we can check sparsity and if everything's clamped there too

rocky vigil Nov 14, 2025, 9:04 PM

#

I can print first 16 cool

#

L2: [53 131077 9 42 65588 82 66 43 196643 131100 131119 0 65542 -65525 -65526 (-131056)]
L2 CReLU(x^2): [0 127 0 0 127 0 0 0 127 127 127 0 127 127 127 ]
L2 CReLU(x): [0 127 0 0 127 1 1 0 127 127 127 0 127 0 0 ]
L3: [-66045 128 131962 65793 382 131326 -196226 131328 381 -196607 -65660 327809 65410 -254 65537 131328 66046 65791 -64899 130943 632 380 131581 636 257 -65282 -65282 1145 65664 66680 130818 327046 ]
L3 CReLU(x): [0 2 127 127 5 127 0 127 5 0 0 127 127 0 127 127 127 127 0 127 9 5 127 9 4 0 0 17 127 127 127 127 ]
[normal, skip] = [-279 -154790]```

#

(startpos)

stray reef Nov 14, 2025, 9:05 PM

#

sss but looks fine at least

rocky vigil Nov 14, 2025, 9:05 PM

#

I can actually go to 128

#

why not

#

L1 (first 128): [0 0 0 0 0 0 5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]

#

omega sparsity

#

lemme try amster

#

now

stray reef Nov 14, 2025, 9:07 PM

#

probably l1 has a problem then

rocky vigil Nov 14, 2025, 9:09 PM

#

L2: [-1128 647 1222 -3620 4443 -1457 -1256 725 -705 4858 -1640 4542 -1743 -604 -3117 (-42)]
L2 CReLU(x^2): [2 0 2 24 37 4 3 1 0 45 5 39 5 0 18 ]
L2 CReLU(x): [0 10 19 0 69 0 0 11 0 75 0 70 0 0 0 ]
L3: [-840 -1436 2324 -3142 -1453 -5814 718 -2749 -6213 -1084 -5075 -15 -1638 -713 -2499 -6018 416 -3924 -2577 647 -1328 20 2479 -3501 -5318 -1800 -661 1223 -2003 -4210 -1722 -2615 ]
L3 CReLU(x): [0 0 36 0 0 0 11 0 0 0 0 0 0 0 0 0 6 0 0 10 0 0 38 0 0 0 0 19 0 0 0 0 ]
[normal, skip] = [371 -49]


[psqt, positional] = [+0, +20]
NNUE evaluation        +0.05 (white side)
Final evaluation       +0.07 (white side) [with scaled NNUE, ...]```

#

here's 1c0...

#

i feel like the L2 biases

#

are screwed

#

somehow

#

or smth about the l2 scale

twilit oriole Nov 14, 2025, 9:10 PM

#

hm. the scale for the bias and regular weights are handled differently right

rocky vigil Nov 14, 2025, 9:11 PM

#

stray reef probably l1 has a problem then

can you update github with the new bullet config?

stray reef Nov 14, 2025, 9:12 PM

#

done

rocky vigil Nov 14, 2025, 9:14 PM

#

x.quantise(Q) is just round(Q*x) right

#

in bullet

stray reef Nov 14, 2025, 9:15 PM

#

.round().quantise(), yeah

rocky vigil Nov 14, 2025, 9:16 PM

#

so on paper, everything looks fine, except for https://github.com/Yoshie2000/sf-bullet-train/blob/fix-inputs/src/main.rs#L117 which is actually supposed to be 600 * 16 but the psqt are already blown up enough as is

twilit oriole Nov 14, 2025, 9:19 PM

#

what if you zero everything except the psqt. surely that works right

rocky vigil Nov 14, 2025, 9:20 PM

#

psqt alone would still give like +30 on this position

twilit oriole Nov 14, 2025, 9:20 PM

#

yeah cos it is trained with the rest of the net right?

rocky vigil Nov 14, 2025, 9:20 PM

#

NNUE evaluation        +56.21 (white side)```

#

everything is cooked

twilit oriole Nov 14, 2025, 9:21 PM

#

yeah but i mean if you literally only train the psqt and inference it

rocky vigil Nov 14, 2025, 9:21 PM

#

ok we can try that

stray reef Nov 14, 2025, 9:22 PM

#

sure

rocky vigil Nov 14, 2025, 9:22 PM

#

so basically comment out lines 153-166 and return pst_out instead of out

#

i think

rocky vigil Nov 14, 2025, 9:24 PM

#

rocky vigil so on paper, everything looks fine, except for <https://github.com/Yoshie2000/sf...

also this but

#

i genuinely dunno

#

at this point

stray reef Nov 14, 2025, 9:25 PM

#

bullet does not like this

thread 'main' (546672) panicked at /home/patrick/.cargo/git/checkouts/bullet-8a69ed9a26c6f599/e37db79/crates/acyclib/src/graph/builder.rs:132:30:
called `Result::unwrap()` on an `Err` value: ## Error Occurred ##
Message("MultipleRoots")

rocky vigil Nov 14, 2025, 9:25 PM

#

oh i think it needs to be mut

#

if you return pst_out

#

maybe?

#

idk

#

actually this is strange

#

you can also try just doing the entire inference

#

and only returning pst_out

stray reef Nov 14, 2025, 9:26 PM

#

ofc i tried both

#

gonna try multiplying the other two with 0 now so everything is "used" at least

rocky vigil Nov 14, 2025, 9:27 PM

#

oh

#

also I think eval_scale is 600

#

not 400

#

that's purely cosmetic though

stray reef Nov 14, 2025, 9:33 PM

#

out = out.linear_comb(0.0, pst_out, 0.5) + skip_neuron.linear_comb(0.0, pst_out, 0.5);
this works :P

rocky vigil Nov 14, 2025, 9:33 PM

#

heh

#

tricked the compiler

#

but yeah

rocky vigil Nov 14, 2025, 9:34 PM

#

rocky vigil so on paper, everything looks fine, except for <https://github.com/Yoshie2000/sf...

surely the pst should work ith this change

#

and eval_scale 600

#

bc if the float weights are x

#

on sf side you jsut have (600 * 16 * sum x) / 16

#

and you get 600 * sum x

stray reef Nov 14, 2025, 9:35 PM

#

applied all suggestions, startpos eval is 0 (as it should be!)
https://1drv.ms/u/c/74d39b59afff2586/IQCRA1uH1iBETr4miI4uK6YAAUlxKnOth1Zf5lMBdRA-FeY?e=8AJzWS

rocky vigil Nov 14, 2025, 9:35 PM

#

can you also get r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9

#

startpos pst has always been 0 :P

stray reef Nov 14, 2025, 9:37 PM

#

rocky vigil can you also get `r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9`

12 internal units

rocky vigil Nov 14, 2025, 9:37 PM

#

kk

#

eval
info string NNUE evaluation using nn-4e6276be8161.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.

WHITE added: 20800 20545 20674 20931 20677 21062 20424 20425 20426 20427 20429 20430 20431 20836 20651 20528 20529 20530 20531 20788 20533 20534 20535 20920 20794 21051 20925 21118
removed:
BLACK added: 20920 20665 20794 21051 20797 21118 20528 20529 20530 20531 20533 20534 20535 20892 20563 20424 20425 20426 20427 20684 20429 20430 20431 20800 20674 20931 20805 21062
removed:
L1 (first 128): [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
L2: [1 -1 0 -65535 -65536 65536 65535 65536 1 65536 0 -1 -65536 0 -65536 (65535)]
L2 CReLU(x^2): [0 0 0 127 127 127 127 127 0 127 0 0 127 0 127 ]
L2 CReLU(x): [0 0 0 0 0 127 127 127 0 127 0 0 0 0 0 ]
L3: [65663 -65536 -65409 254 65282 -889 -253 -889 -64900 -66171 -131325 -65536 65028 127 65409 65408 -127 131325 65409 -508 -127 65154 65662 -509 -65663 -65790 -65663 65409 65408 64900 -635 255 ]
L3 CReLU(x): [127 0 0 3 127 0 0 0 0 0 0 0 127 1 127 127 0 127 127 0 0 127 127 0 0 0 0 127 127 127 0 3 ]
[normal, skip] = [65277 77403]


[psqt, positional] = [-4095, +8917]
NNUE evaluation        +12.75 (white side)```

#

move_blunder

#

no seriously

#

what the

#

the feature indices are correct right?

#

so purely parsing error or smth

stray reef Nov 14, 2025, 9:39 PM

#

probably a parsing/layout error now yeah

rocky vigil Nov 14, 2025, 9:42 PM

#

well there are strict checks on the layouts

#

in particular the l0b l0w and pst must be int he correct order

#

so something is wrong in the pst section itself

#

is it possible for you to get the 8 bucket weights of specific indices or smth

#

grasping at straws here

stray reef Nov 14, 2025, 9:45 PM

#

from the bullet checkpoint yes

#

not sure now the .nnue file works

rocky vigil Nov 14, 2025, 9:46 PM

#

yeah sure

#

from bullet checkpoint

stray reef Nov 14, 2025, 9:46 PM

#

my best guess would be to write a small script that tests PST inference for the bullet checkpoint

#

but i don't see how it could be wrong there

rocky vigil Nov 14, 2025, 9:48 PM

#

yeah I don't either

#

ngl

rocky vigil Nov 14, 2025, 9:49 PM

#

stray reef probably a parsing/layout error now yeah

yeah but every section individually has the correct length and format, so parsing/layout errors are contained within each section

stray reef Nov 14, 2025, 9:50 PM

#

wait pst is output bucketed right, how does this work in SF inference, it's also UE'd right? so all buckets are technically always computed, even if not needed

rocky vigil Nov 14, 2025, 9:50 PM

#

yes

stray reef Nov 14, 2025, 9:50 PM

#

ok i thought for a second we forgot to transpose the weights

rocky vigil Nov 14, 2025, 9:51 PM

#

they're stored as

#

[f0b0 f0b1 ... f1b0 f1b1 ... f22527b0 ... f22527b7]

stray reef Nov 14, 2025, 9:52 PM

#

ok which weights do you want to see?

#

for which feature index

rocky vigil Nov 14, 2025, 9:54 PM

#

uh all eight for 20931

#

how about

stray reef Nov 14, 2025, 9:57 PM

#

all zeros

rocky vigil Nov 14, 2025, 9:58 PM

#

bruh

#

what

#

ok

rocky vigil Nov 14, 2025, 9:58 PM

#

stray reef all zeros

white queen on d1 is worthless when white king is on g1 !!!!

stray reef Nov 14, 2025, 9:58 PM

#

hm no smth is wrong with my code

rocky vigil Nov 14, 2025, 9:59 PM

#

[65537 1 65536 65535 65535 65536 65536 1 ]

#

this does

#

not seem right

#

but yeah we'll see

stray reef Nov 14, 2025, 10:03 PM

#

i get the same numbers...

rocky vigil Nov 14, 2025, 10:07 PM

#

welp

#

leb128 looking fine

#

as I suspected

#

so I have no idea why it's different

#

maybe lemme get some position with only few pieces

stray reef Nov 14, 2025, 10:08 PM

#

i think since the input type is factorised, the pst weights have a factoriser too

#

even though afaik Factorised<> automatically merges that, i'm gonna try it non-factorised rq

rocky vigil Nov 14, 2025, 10:08 PM

#

oh

#

ok

#

yeah and maybe try 8/6k1/8/8/3P4/8/1K6/8 w - - 0 1

#

there's only 3 pieces

#

what could go wrong :clueless:

prime mica Nov 14, 2025, 10:09 PM

#

When the Pawn

stray reef Nov 14, 2025, 10:11 PM

#

ah the checkpoint is about 4.5MB smaller now, which matches perfectly what would happen if the factorised weights were previously included

#

152 -87 -266 -201 75 290 397 299
reasonable values!

rocky vigil Nov 14, 2025, 10:12 PM

#

o

stray reef Nov 14, 2025, 10:13 PM

#

https://1drv.ms/u/c/74d39b59afff2586/IQCB29TTo2UCT4bc_OUckGHsAfA67bizwdfJT-RfvrnufMk?e=zhBcdo

rocky vigil Nov 14, 2025, 10:13 PM

#

my uni wifi wondering why I've downloaded 8 nnue files of 66 MB today :P

rocky vigil Nov 14, 2025, 10:15 PM

#

stray reef `152 -87 -266 -201 75 290 397 299` reasonable values!

[152 -87 -266 -201 75 290 397 299 ]

#

let's go???

stray reef Nov 14, 2025, 10:15 PM

#

startpos eval?

rocky vigil Nov 14, 2025, 10:15 PM

#

0 :P

#

it's always been 0 for psqt

#

no matter what nnue

stray reef Nov 14, 2025, 10:16 PM

#

oh right

rocky vigil Nov 14, 2025, 10:16 PM

#

rocky vigil can you also get `r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9`

did you get this one

stray reef Nov 14, 2025, 10:16 PM

#

r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9 should be about -6

rocky vigil Nov 14, 2025, 10:16 PM

#

or the pawn endgame

prime mica Nov 14, 2025, 10:16 PM

#

rocky vigil `[152 -87 -266 -201 75 290 397 299 ]`

ayyy

rocky vigil Nov 14, 2025, 10:16 PM

#

stray reef `r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9` should be about -6

eval
info string NNUE evaluation using nn-a64da979b54f.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.

WHITE added: 20800 20545 20674 20931 20677 21062 20424 20425 20426 20427 20429 20430 20431 20836 20651 20528 20529 20530 20531 20788 20533 20534 20535 20920 20794 21051 20925 21118
removed:
BLACK added: 20920 20665 20794 21051 20797 21118 20528 20529 20530 20531 20533 20534 20535 20892 20563 20424 20425 20426 20427 20684 20429 20430 20431 20800 20674 20931 20805 21062
removed:
L1 (first 128): [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
L2: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (0)]
L2 CReLU(x^2): [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
L2 CReLU(x): [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
L3: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
L3 CReLU(x): [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[normal, skip] = [0 0]


[psqt, positional] = [-9, +0]```

#

let's go??

stray reef Nov 14, 2025, 10:16 PM

#

YES

prime mica Nov 14, 2025, 10:17 PM

#

life is good

rocky vigil Nov 14, 2025, 10:17 PM

#

minor quant error but to be expected

#

do u have any more test pos

stray reef Nov 14, 2025, 10:17 PM

#

nah this is fine

#

gonna add the rest of the net again

rocky vigil Nov 14, 2025, 10:17 PM

#

ok so factorizer is suspicious

#

cool

stray reef Nov 14, 2025, 10:18 PM

#

once something works we can work backwards to add what's missing

rocky vigil Nov 14, 2025, 10:18 PM

#

stray reef gonna add the rest of the net again

btw when you do this can you get out, skip_out and pst_out

#

bc my debug info also has all 3 of them

stray reef Nov 14, 2025, 10:18 PM

#

i can try, but let's first see how bad things are

rocky vigil Nov 14, 2025, 10:18 PM

#

ok

#

wait why was the input still factorized

#

did you swap it to non-factorized

stray reef Nov 14, 2025, 10:19 PM

#

i completely removed the factoriser now

rocky vigil Nov 14, 2025, 10:19 PM

#

yay

#

yeah simple things first

#

and gradually work up

stray reef Nov 14, 2025, 10:20 PM

#

https://1drv.ms/u/c/74d39b59afff2586/IQBsQgdxgVMgRbekt661j2IUAbWtg-9jF7zF67SAXay11o8?e=WF2TVU
eval for r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9 is 16

rocky vigil Nov 14, 2025, 10:21 PM

#

random berlin position is actually a decent test position lmao

#

ok let's see

#

in a min or two

rocky vigil Nov 14, 2025, 10:23 PM

#

stray reef <https://1drv.ms/u/c/74d39b59afff2586/IQBsQgdxgVMgRbekt661j2IUAbWtg-9jF7zF67SAXa...

eval
info string NNUE evaluation using nn-2cc242fcab84.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.

WHITE added: 20800 20545 20674 20931 20677 21062 20424 20425 20426 20427 20429 20430 20431 20836 20651 20528 20529 20530 20531 20788 20533 20534 20535 20920 20794 21051 20925 21118
removed:
BLACK added: 20920 20665 20794 21051 20797 21118 20528 20529 20530 20531 20533 20534 20535 20892 20563 20424 20425 20426 20427 20684 20429 20430 20431 20800 20674 20931 20805 21062
removed:
L1 (first 128): [0 7 4 0 0 27 4 1 0 0 0 0 19 5 2 1 23 0 18 1 0 5 0 20 23 0 0 0 3 14 28 1 0 5 0 0 0 0 14 0 25 19 9 15 0 0 0 0 40 0 0 0 8 2 0 18 3 9 0 0 13 20 0 0 2 0 21 1 0 21 0 0 0 0 0 11 0 1 0 11 14 0 0 0 0 0 0 34 0 32 22 0 0 0 3 2 9 0 13 14 0 0 1 0 0 0 24 28 2 21 8 0 0 2 0 13 0 6 0 17 12 3 13 6 15 11 0 0 ]
L2: [-5890 4039 8313 4356 3438 5305 10575 6519 3957 1457 -2233 -2333 5295 7469 -4199 (4289)]
L2 CReLU(x^2): [66 31 127 36 22 53 127 81 29 4 9 10 53 106 33 ]
L2 CReLU(x): [0 63 127 68 53 82 127 101 61 22 0 0 82 116 0 ]
L3: [5564 4988 4871 -3625 -1894 -2410 -739 2902 3027 2338 -883 3304 -3766 -2235 -1100 2512 2216 -7469 467 2828 -833 -816 8457 -901 2398 2163 2244 11 -328 3479 4975 2129 ]
L3 CReLU(x): [86 77 76 0 0 0 0 45 47 36 0 51 0 0 0 39 34 0 7 44 0 0 127 0 37 33 35 0 0 54 77 33 ]
[normal, skip] = [-4262 5065]


[psqt, positional] = [-2, +50]
NNUE evaluation        +0.13 (white side)```
maybe maybe
(raw eval is 48)

stray reef Nov 14, 2025, 10:23 PM

#

that looks quite reasonable now

rocky vigil Nov 14, 2025, 10:23 PM

#

psqt + positional

#

idk how big quant error was supposed to be

stray reef Nov 14, 2025, 10:24 PM

#

48 is still alright given true eval is 16

rocky vigil Nov 14, 2025, 10:24 PM

#

do you have startpos eval as well

stray reef Nov 14, 2025, 10:24 PM

#

45

rocky vigil Nov 14, 2025, 10:25 PM

#

eval
info string NNUE evaluation using nn-2cc242fcab84.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.

WHITE added: 22208 21953 22082 22339 22468 22085 21958 22215 21832 21833 21834 21835 21836 21837 21838 21839 21936 21937 21938 21939 21940 21941 21942 21943 22328 22073 22202 22459 22524 22205 22078 22335
removed:
BLACK added: 22328 22073 22202 22459 22524 22205 22078 22335 21936 21937 21938 21939 21940 21941 21942 21943 21832 21833 21834 21835 21836 21837 21838 21839 22208 21953 22082 22339 22468 22085 21958 22215
removed:
L1 (first 128): [0 7 0 0 0 30 0 6 2 0 0 2 16 7 0 5 6 0 1 2 0 1 5 49 0 0 4 5 0 19 0 0 4 0 9 9 0 1 5 3 54 9 3 5 0 0 4 0 54 3 0 12 2 1 0 5 6 8 0 0 24 10 1 0 9 33 0 3 0 6 0 0 0 8 0 0 0 0 0 6 9 4 0 0 0 1 0 24 0 18 21 0 1 0 0 0 6 0 3 8 5 0 0 1 4 0 22 30 0 30 37 0 0 0 0 31 0 0 1 7 27 2 22 0 5 0 0 0 ]
L2: [5706 4058 1665 3009 -7774 7520 7305 -8251 -14491 -8684 -5017 6628 7127 -4286 6504 (-850)]
L2 CReLU(x^2): [62 31 5 17 115 107 101 127 127 127 48 83 96 35 80 ]
L2 CReLU(x): [89 63 26 47 0 117 114 0 0 0 0 103 111 0 101 ]
L3: [-1294 6294 6987 5821 4174 -3564 3387 -3172 7346 1499 -3768 -1403 -4705 1062 4005 6366 3391 1700 -6845 3495 4704 -4917 1691 -4078 2004 -1570 1976 3884 473 -1350 -3770 -1540 ]
L3 CReLU(x): [0 98 109 90 65 0 52 0 114 23 0 0 0 16 62 99 52 26 0 54 73 0 26 0 31 0 30 60 7 0 0 0 ]
[normal, skip] = [1419 -1003]


[psqt, positional] = [+0, +26]
NNUE evaluation        +0.07 (white side)```

#

26

#

raw

#

reasonable

stray reef Nov 14, 2025, 10:25 PM

#

sick

rocky vigil Nov 14, 2025, 10:25 PM

#

sss

#

maybe I comment out debug info and see if the pv for startpos makes any sense?

#

lemme try that

#

O YES IT LOOKS LIKE CHESS

📎 message.txt

#

not good chess

#

but still chess

stray reef Nov 14, 2025, 10:33 PM

#

hell yeah!

#

i'll sleep now, and then we can try tomorrow or so to re-integrate the other features

rocky vigil Nov 14, 2025, 10:35 PM

#

o

#

it beat a 2400 ish CCRL blitz engine

#

https://lichess.org/lYD5225V

lichess.org

Classical Chess • Stockfish (1SB net) vs Wilted 3/18/25

Stockfish (1SB net) played Wilted 3/18/25 in a casual imported game of chess. Stockfish (1SB net) won after 47 moves. Click to replay, analyse, and discuss the game!

#

strangest game ever

daring wren Nov 14, 2025, 10:40 PM

#

rocky vigil https://lichess.org/lYD5225V

@sage stream Test be like: NNUE trained on 1 SB and a firm handshake of selfgen games and the WDL is: 🤷‍♂️. We train HCEs for longer.

sage stream Nov 14, 2025, 10:40 PM

#

daring wren <@489162384875847680> Test be like: NNUE trained on 1 SB and a firm handshake of...

Holy pull

#

Wait where is the original

daring wren Nov 14, 2025, 10:42 PM

#

sage stream Wait where is the original

it's in #absolute-shashin

rocky vigil Nov 14, 2025, 11:11 PM

#

Btw I think if eg @violet badger wants to try a longer training test https://github.com/Yoshie2000/sf-bullet-train/tree/fix-inputs

rocky vigil Nov 15, 2025, 8:27 AM

#

also https://github.com/Yoshie2000/sf-bullet-train/blob/fix-inputs/src/main.rs#L221 I think these two should be 600 also

violet badger Nov 15, 2025, 8:31 AM

#

I'm happy to try to run for a little longer. Two quick questions, how to provide multiple binpacks as input, and how to setup multiGPU training.

rocky vigil Nov 15, 2025, 8:33 AM

#

violet badger I'm happy to try to run for a little longer. Two quick questions, how to provide...

sounds like for @formal smelt to answer

rocky vigil Nov 15, 2025, 10:12 AM

#

stray reef i'll sleep now, and then we can try tomorrow or so to re-integrate the other fea...

how should we proceed on this, try a 100 SB real run first to see if it becomes somewhat strong, or integrate everything first with 1SB sanity checks

violet badger Nov 15, 2025, 10:16 AM

#

I can quickly run 100SB right now, and we see where we stand? Would be faster multiGPU, but starting now is probably even faster 😉

rocky vigil Nov 15, 2025, 10:16 AM

#

ok

#

i guess single gpu single binpack

#

should be relatively fast

violet badger Nov 15, 2025, 10:16 AM

#

yeah

stray reef Nov 15, 2025, 10:21 AM

#

I'll simultaneously try to fix the factoriser, until we see that it still produces reasonable results

rocky vigil Nov 15, 2025, 10:21 AM

#

cool

#

maybe take this chance to see later if the l2 factoriser is useful at all

stray reef Nov 15, 2025, 10:34 AM

#

Factorised l0w + pst, they should be merged correctly now
https://1drv.ms/u/c/74d39b59afff2586/IQCnZceEZ0E3Q7NguQ0l5oQRAStllnSKwUZkdtauv_n3l2o?e=NROEo7

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 -> 43
r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9 -> 57

violet badger Nov 15, 2025, 10:38 AM

#

you're faster than me installing rust ...

#

(ok, trying to figure out how to do it correctly in the container environment that I'm using, but well, no excuses)

rocky vigil Nov 15, 2025, 10:40 AM

#

stray reef Factorised l0w + pst, they should be merged correctly now <https://1drv.ms/u/c/7...

is this with 600.0 change?

#

or do I need to multiply these by 1.5

stray reef Nov 15, 2025, 10:40 AM

#

violet badger (ok, trying to figure out how to do it correctly in the container environment th...

the good news for you is that i did not train 100 SBs in these 13 minutes xD

stray reef Nov 15, 2025, 10:40 AM

#

rocky vigil or do I need to multiply these by 1.5

ah it's still 400 on my side

#

changed it now for future evals

rocky vigil Nov 15, 2025, 10:45 AM

#

eval
info string NNUE evaluation using nn-81c082405712.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.

L1 (first 128): [0 35 9 7 0 0 28 0 0 7 0 2 28 0 2 36 0 12 3 31 31 28 1 56 11 0 0 3 0 5 0 40 16 55 14 19 38 0 46 19 14 0 0 28 0 52 0 39 49 0 4 0 0 21 26 30 0 0 9 0 0 0 22 48 0 1 37 0 19 0 1 23 21 0 8 0 8 11 0 24 10 33 0 44 0 2 43 0 19 35 0 17 0 34 62 0 48 6 0 0 17 24 8 0 19 27 0 8 0 7 41 17 0 1 66 22 17 4 3 0 34 7 23 0 45 0 26 0 ]
L2: [-782 2913 5915 1645 -14149 24231 2753 5418 3241 -4086 3977 8455 -8888 4432 -2446 (-2630)]
L2 CReLU(x^2): [1 16 66 5 127 127 14 55 20 31 30 127 127 37 11 ]
L2 CReLU(x): [0 45 92 25 0 127 43 84 50 0 62 127 0 69 0 ]
L3: [3290 3317 4467 -5288 -1449 -531 2395 -2337 -1085 -4165 -4409 -4177 707 3713 2600 2935 5039 2861 3123 2263 -2477 3885 -8094 -4863 3442 -3560 4140 -3969 -574 2406 -1194 -2577 ]
L3 CReLU(x): [51 51 69 0 0 0 37 0 0 0 0 0 11 58 40 45 78 44 48 35 0 60 0 0 53 0 64 0 0 37 0 0 ]
[normal, skip] = [4210 -3106]


[psqt, positional] = [+0, +69]
NNUE evaluation        +0.18 (white side)```

#

raw startpos is 69

#

position fen r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9
eval
info string NNUE evaluation using nn-81c082405712.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.

L1 (first 128): [0 34 7 5 0 0 21 0 0 8 0 6 12 0 0 33 0 5 3 50 14 13 0 32 6 0 0 0 0 0 0 28 2 41 15 23 17 0 38 15 20 0 0 30 0 62 0 32 34 0 0 0 0 20 27 29 1 0 12 3 0 0 23 23 0 0 42 0 19 0 2 16 0 6 13 0 4 8 0 19 5 36 0 25 0 0 23 0 13 26 0 14 0 36 55 0 45 8 0 0 19 33 3 0 10 8 0 9 0 3 46 24 1 0 67 19 22 4 0 0 25 0 18 0 35 0 16 0 ]
L2: [-3180 8434 -8349 -4061 8049 114 -1044 1497 -3472 -5565 459 4286 1974 -3644 2177 (-779)]
L2 CReLU(x^2): [19 127 127 31 123 0 2 4 22 59 0 35 7 25 9 ]
L2 CReLU(x): [0 127 0 0 125 1 0 23 0 0 7 66 30 0 34 ]
L3: [-4048 1024 -3380 1544 3979 -909 3072 452 5896 602 1618 -1936 2329 3109 -1840 -1814 -149 4150 -2820 1299 3614 2346 1136 4138 -419 -2696 2506 1320 -1162 -2180 5403 1449 ]
L3 CReLU(x): [0 16 0 24 62 0 48 7 92 9 25 0 36 48 0 0 0 64 0 20 56 36 17 64 0 0 39 20 0 0 84 22 ]
[normal, skip] = [2422 -920]


[psqt, positional] = [-11, +93]
NNUE evaluation        +0.22 (white side)``` raw is 82

#

looks good

stray reef Nov 15, 2025, 10:48 AM

#

amazing

violet badger Nov 15, 2025, 10:49 AM

#

if looks good, please push, and I'll start from that.

rocky vigil Nov 15, 2025, 10:49 AM

#

"remove factoriser" "add factoriser again" lol

stray reef Nov 15, 2025, 10:50 AM

#

violet badger if looks good, please push, and I'll start from that.

done

#

shall we try the l1 factoriser as well?

rocky vigil Nov 15, 2025, 10:52 AM

#

perhaps

#

if rust installation is taking a while

#

might as wlel

violet badger Nov 15, 2025, 10:53 AM

#

# test bullet
git clone https://github.com/Yoshie2000/sf-bullet-train.git
cd sf-bullet-train
git checkout fix-inputs
# edit src/main.rs file_path
cargo run --release .

#

that's the procedure right?

#

(like manual edit of main.rs needed)

rocky vigil Nov 15, 2025, 10:53 AM

#

where are the datasets being loaded

violet badger Nov 15, 2025, 10:53 AM

#

near line 202 in main.rs?

rocky vigil Nov 15, 2025, 10:54 AM

#

ah

#

i suppose that needs to be changed

#

other than that i think this is good

violet badger Nov 15, 2025, 10:54 AM

#

sure that's the comment in the procedure above.

stray reef Nov 15, 2025, 10:55 AM

#

also need to adjust SB count in line 185

violet badger Nov 15, 2025, 10:55 AM

#

okay

rocky vigil Nov 15, 2025, 10:55 AM

#

btw yoshie how is speed

#

of training

stray reef Nov 15, 2025, 10:56 AM

#

hard to say right now, i'm training another net already :P but i remember roughly 800k pos/s from yesterday

stray reef Nov 15, 2025, 10:56 AM

#

violet badger okay

(some multiple of 60 would make sense, since that suits the LR schedule, but you can change the step there too of course)

violet badger Nov 15, 2025, 10:57 AM

#

will make it 120

stray reef Nov 15, 2025, 10:58 AM

#

There is a speedup still when using a factoriser, though I won't implement it now as it does not work with threat inputs (at least I haven't found a way yet)

violet badger Nov 15, 2025, 11:04 AM

#

annoyingly the install is still not correct... somehow being installed as root, and starting the container as non-root. So, I reinstall when entering the container right now. SHould figure that out eventually.

#

That's one SB

Params: 72156296
Training Preamble
Net Name               : test
Batch Size             : 16384
Batches / Superbatch   : 1024
Positions / Superbatch : 16777216
Start Superbatch       : 1
End Superbatch         : 1
Eval Scale             : 600
Save Rate              : 150
WDL Scheduler          : constant 0
LR Scheduler           : start 0.001 gamma 0.3 drop every 60 superbatches
Threads                : 4
Output Path            : checkpoints
Beginning Training
superbatch 1 | time 9.6s | running loss 0.013042 | 1738602 pos/sec | total time 11.3s
Estimated time remaining in training: 0h 0m 0s
Saved [test-1]
Total Training Time: 0h 0m 13s
Eval: 44.568cp
Eval: -31.222cp

#

looks OK?

stray reef Nov 15, 2025, 11:05 AM

#

seems fine yes

#

#engines-dev message #engines-dev message
some info from jw on multigpu

violet badger Nov 15, 2025, 11:06 AM

#

ok, let me try that.

rocky vigil Nov 15, 2025, 11:06 AM

#

violet badger That's one SB ``` Params: 72156296 Training Preamble Net Name : te...

seems to be slightly faster than nnue-pytorch? translates to 105 its/sec

violet badger Nov 15, 2025, 11:07 AM

#

yeah, though skipping and such is quit different, but certainly looks good.

rocky vigil Nov 15, 2025, 11:08 AM

#

why has

#

a superbatch been reduced

#

btw

#

to 1024 batches

#

and not the standard 6104

stray reef Nov 15, 2025, 11:09 AM

#

ah good point, we should change that

#

https://1drv.ms/u/c/74d39b59afff2586/IQAS-Qe0N5FqQpNQDZeaiMN8AXzVLXGiwHLXbQ_H34_XxBo?e=yuTeGP 1 SB with l1 factoriser, evals 66 and 112 (with 600 scale)

rocky vigil Nov 15, 2025, 11:19 AM

#

getting -91 and -28

#

eval
L1 (first 128): [22 0 0 0 2 0 0 0 0 0 0 6 0 0 0 0 5 0 0 0 2 0 2 1 1 3 1 0 0 0 18 0 4 0 14 0 13 0 0 9 0 5 0 0 0 6 0 3 0 0 2 0 0 2 0 0 2 8 0 0 5 0 0 0 0 0 0 7 7 13 3 8 17 3 0 0 0 0 3 1 7 0 0 0 0 0 11 0 0 3 0 6 0 7 0 20 0 15 0 0 0 0 8 0 4 0 4 9 2 1 0 0 4 0 11 1 0 0 0 0 0 0 1 0 0 0 0 0 ]
L2: [7211 -1613 4893 -1900 -12473 7297 9595 1191 14570 3115 -3398 6660 13664 5785 -8917 (-1121)]
L2 CReLU(x^2): [99 4 45 6 127 101 127 2 127 18 22 84 127 63 127 ]
L2 CReLU(x): [112 0 76 0 0 114 127 18 127 48 0 104 127 90 0 ]
L3: [3586 1415 5605 -6626 -5444 -371 3739 -3533 1561 4141 3812 -4530 -5525 -3268 4107 -4484 -3925 -5060 1606 -5402 -4427 -6424 -5318 7747 -10300 2550 -6464 1889 -7744 -1817 -1362 -1545 ]
L3 CReLU(x): [56 22 87 0 0 0 58 0 24 64 59 0 0 0 64 0 0 0 25 0 0 0 0 121 0 39 0 29 0 0 0 0 ]
[normal, skip] = [-132 -1324]
[psqt, positional] = [+0, -91]
NNUE evaluation        -0.24 (white side)

ucinewgame
position fen r1bq1rk1/ppppbppp/3n4/4R3/8/8/PPPP1PPP/RNBQ1BK1 w - - 1 9
eval
L1 (first 128): [12 0 0 0 1 0 0 1 0 0 0 6 0 1 0 0 11 0 10 0 6 0 11 0 0 6 15 0 0 11 0 0 7 0 4 0 16 0 0 15 0 2 0 0 0 8 0 0 0 1 3 0 0 4 0 8 8 0 0 5 3 0 0 0 0 3 1 8 2 10 2 4 12 4 0 13 0 0 3 0 5 0 0 0 0 0 0 0 0 0 0 14 0 4 0 2 0 0 0 0 0 0 3 0 0 3 1 7 0 1 2 3 9 0 6 2 0 0 0 0 0 0 5 0 0 0 2 0 ]
L2: [17849 -3002 -2770 3869 6234 -11711 -21902 15147 6015 -617 -3107 15343 2215 -3799 -9219 (226)]
L2 CReLU(x^2): [127 17 14 28 74 127 127 127 69 0 18 127 9 27 127 ]
L2 CReLU(x): [127 0 0 60 97 0 0 127 93 0 0 127 34 0 0 ]
L3: [-2937 1608 -1563 857 3589 3871 -6501 1365 -4169 2560 -5220 2119 -8509 -5111 2558 4084 -2520 -6030 2079 -7494 1958 617 2366 -633 -4361 3530 -171 -1299 3694 -4122 2150 -2276 ]
L3 CReLU(x): [0 25 0 13 56 60 0 21 0 40 0 33 0 0 39 63 0 0 32 0 30 9 36 0 0 55 0 0 57 0 33 0 ]
[normal, skip] = [-722 266]
[psqt, positional] = [+0, -28]
NNUE evaluation        -0.07 (white side)```

rocky vigil Nov 15, 2025, 11:30 AM

#

stray reef <https://1drv.ms/u/c/74d39b59afff2586/IQAS-Qe0N5FqQpNQDZeaiMN8AXzVLXGiwHLXbQ_H34...

this looks kinda far away now

stray reef Nov 15, 2025, 11:34 AM

#

Try this one, removing the unnecessary set of factorised biases
https://1drv.ms/u/c/74d39b59afff2586/IQDNu6vvsBtFQYC3VXUmTjdFAQPXJv4yRVVwykKmPJQSpjY?e=M08nTv 66, 82

#

don't think it's going to work, i think i need to transpose before merging, which i'll try next

rocky vigil Nov 15, 2025, 11:37 AM

#

stray reef don't think it's going to work, i think i need to transpose before merging, whic...

yeah, it thinks raw evals are 305 and 160

stray reef Nov 15, 2025, 11:41 AM

#

https://1drv.ms/u/c/74d39b59afff2586/IQDryNfwSlpfR5R5wT8YNutwAYF1pzJi1DEYHAHLJUpYq5k?e=FMLZO9 90, 93

rocky vigil Nov 15, 2025, 11:43 AM

#

-33 and +232

stray reef Nov 15, 2025, 11:48 AM

#

ah ofc the default .transpose() does not work with this type of factorisation

rocky vigil Nov 15, 2025, 11:49 AM

#

maybe need to define some custom stuff

#

ye

stray reef Nov 15, 2025, 11:52 AM

#

https://1drv.ms/u/c/74d39b59afff2586/IQCYRgNuWSFgT4NhwCD_qVwUAdlVV2hlyHyD16-Sx2C_iUU?e=OUnVBH 23, 33

rocky vigil Nov 15, 2025, 11:53 AM

#

14th sfnnv9 net attempt lel

stray reef Nov 15, 2025, 11:53 AM

#

wait no this is bullshit. transposing worked without the factoriser

rocky vigil Nov 15, 2025, 11:54 AM

#

startpos +360 yeah

violet badger Nov 15, 2025, 11:55 AM

#

I have the output from the 120SB:

Estimated time remaining in training: 0h 0m 8s
superbatch 120 | time 8.6s | running loss 0.001450 | 1941046 pos/sec | total time 1046.2s
Estimated time remaining in training: 0h 0m 0s
Failed to write quantised network weights:
Failed quantisation from f32 to i8!
Saved [test-120]
Total Training Time: 0h 17m 28s
Eval: 73.723cp
Eval: 90.723cp

rocky vigil Nov 15, 2025, 11:55 AM

#

huh

violet badger Nov 15, 2025, 11:55 AM

#

(at re-add working ft factoriser)

rocky vigil Nov 15, 2025, 11:55 AM

#

failed quantisation from f32 to i8

stray reef Nov 15, 2025, 11:55 AM

#

Failed to write quantised network weights:
Failed quantisation from f32 to i8!
what!

rocky vigil Nov 15, 2025, 11:55 AM

#

something exceed weight limit?

stray reef Nov 15, 2025, 11:56 AM

#

looks like it

rocky vigil Nov 15, 2025, 11:57 AM

#

must be in this one

#

1.98 doesn't work for this

#

needs to be 1.68

stray reef Nov 15, 2025, 12:00 PM

#

pushed a fix for that

#

are we sure nnue-pytorch has this l1 factoriser?

#

@violet badger if you integrate the latest commit, you should be able to retry the quantisation simply by doing

-trainer.run(&schedule, &settings, &data_loader);
+// trainer.run(&schedule, &settings, &data_loader);
+trainer.load_from_checkpoint("checkpoints/test-1");
+trainer.save_quantised("checkpoints/test-1/quantised.bin").unwrap();

rocky vigil Nov 15, 2025, 12:03 PM

#

stray reef are we sure nnue-pytorch has this l1 factoriser?

#

suspicious

#

i would like to test if removing it helps

#

eventually

rocky vigil Nov 15, 2025, 12:06 PM

#

stray reef <@713871252246495262> if you integrate the latest commit, you should be able to ...

won't the force clipping change the evals

stray reef Nov 15, 2025, 12:06 PM

#

ah true. forget what i said

violet badger Nov 15, 2025, 12:09 PM

#

stray reef <@713871252246495262> if you integrate the latest commit, you should be able to ...

If no pilot error:

Params: 72156296

thread 'main' (293488) panicked at src/main.rs:228:66:
called `Result::unwrap()` on an `Err` value: Custom { kind: InvalidData, error: "Failed quantisation from f32 to i8!" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted (core dumped)

stray reef Nov 15, 2025, 12:09 PM

#

yeah sorry, there was no way that'd work :P

#

gotta retrain unfortunately

violet badger Nov 15, 2025, 12:10 PM

#

no problem.

rocky vigil Nov 15, 2025, 12:13 PM

#

rocky vigil i would like to test if removing it helps

like the l1 factorizer is strange, because then you need to half the weight limit

#

and it's already heavily quantized

rocky vigil Nov 15, 2025, 12:18 PM

#

stray reef wait no this is bullshit. transposing worked without the factoriser

i think merging is supposed to be done before transposing anyways

stray reef Nov 15, 2025, 12:26 PM

#

i don't think so, transposing puts it from [l1][ob][l2] to [ob][l1][l2], and the factoriser should have layout [l1][l2], so the standard fact.repeat(bucketcount) -> elementwise add should work. but it doesn't

rocky vigil Nov 15, 2025, 12:29 PM

#

oh

#

right

stray reef Nov 15, 2025, 12:38 PM

#

idk i kind of want to try again, maybe i did something wrong last time https://1drv.ms/u/c/74d39b59afff2586/IQBGzgPeP_bVQKONuJm-6Dn7ARiZjXqldoUFCTSplSw0Cvc?e=mIz1D4 evals 59, 43

rocky vigil Nov 15, 2025, 12:42 PM

#

+5, +122

stray reef Nov 15, 2025, 12:45 PM

#

i don't know then. let's skip it for now i guess

#

if anything, it'll be low single-digit elo anyway

rocky vigil Nov 15, 2025, 12:46 PM

#

rocky vigil like the l1 factorizer is strange, because then you need to half the weight limi...

or this

#

idk if it's even good

#

like l1 -> l2 is way smaller

#

than inputs -> l1

#

do we have a second run

#

with the fixed clipping

stray reef Nov 15, 2025, 12:48 PM

#

rocky vigil or this

there's a better way by using clip_pass_through_grad on l1w+l1f

#

but since quantisation did not fail, everything fits into i8

rocky vigil Nov 15, 2025, 12:48 PM

#

oh

violet badger Nov 15, 2025, 2:20 PM

#

My local run (but multiGPU) ended with

Saved [test-120]
Total Training Time: 0h 14m 59s

thread 'main' (27889) panicked at /users/vjoost/.cargo/git/checkouts/bullet-8a69ed9a26c6f599/e37db79/crates/bullet_lib/src/value.rs:245:18:
Invalid output size!
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

probably better to not mix both things though.

formal smelt Nov 15, 2025, 2:23 PM

#

It’s due to to calling .eval

#

Not related to training

violet badger Nov 15, 2025, 2:24 PM

#

in the multiGPU context?

formal smelt Nov 15, 2025, 2:24 PM

#

Yeah

violet badger Nov 15, 2025, 2:24 PM

#

okay.

#

but I guess that means that at least checkpoint saved correctly, so the failed to quantise error went away.

rocky vigil Nov 15, 2025, 2:28 PM

#

I guess just load the checkpoint and run eval using single gpu

#

(I cannot run this with the debug data right now, but you can check the normalized eval by loading the network into past master)

violet badger Nov 15, 2025, 2:33 PM

#

I'll run afterwards with 1 GPU... first some testing multiGPU.

rocky vigil Nov 15, 2025, 2:33 PM

#

How fast is it?

#

Are these 16384 * 1024 superbatches or 16384 * 6104

violet badger Nov 15, 2025, 2:34 PM

#

I'm using 1024 right now

#

speed #1439214470529421384 message

rocky vigil Nov 15, 2025, 2:43 PM

#

Ah ok

violet badger Nov 15, 2025, 3:43 PM

#

so, started 120 SB of 6104 * 16384... should take about 2h

#

I used 1GPU, so that allows for speed comparison, ultimately not too different (about 50s bullet vs 67s pytorch per SB/epoch), assuming we're doing roughly the same thing now.

rocky vigil Nov 15, 2025, 3:47 PM

#

Interesting

twilit oriole Nov 15, 2025, 3:53 PM

#

That is probably because the default batch size is too low for threat inputs + high end GPU

violet badger Nov 15, 2025, 3:54 PM

#

see #1439214470529421384 message

#

but yeah, I'll run an experiment now on nnue-pytorch to see what a changed batch size does to training.

formal smelt Nov 15, 2025, 4:03 PM

#

Is this on old inputs?

violet badger Nov 15, 2025, 4:04 PM

#

good question, the bullet is on old inputs, the pytorch number I quoted is probably on threats, though it wasn't too different.

formal smelt Nov 15, 2025, 4:05 PM

#

But different HL size also then right

violet badger Nov 15, 2025, 4:06 PM

#

yeah, old arch on pytorch is 72s instead of 67s

#

and I think this current bullet training is setup to match the old arch..

formal smelt Nov 15, 2025, 4:07 PM

#

Yeah

#

Almost 50% seems pretty good
And also if both had the better factoriser code the gap would widen I think

violet badger Nov 15, 2025, 4:08 PM

#

absolutely ....

#

(I know how much effort one might put in just 5% for e.g. megatron/LLM training).

formal smelt Nov 15, 2025, 4:09 PM

#

I still think if someone was feeling cute they should just write a fused ft/l1 kernel for nnue-pytorch given the arch seems pretty much fixed

#

Would make a really big difference

violet badger Nov 15, 2025, 4:10 PM

#

I hope someone will pick up... @frosty imp was refactoring recently... so maybe

rocky vigil Nov 15, 2025, 4:19 PM

#

Did we verify that the produced networks are reasonably strong

violet badger Nov 15, 2025, 4:19 PM

#

I should have this 120SB trained network in a bit, that should give an idea.

#

I guess that could be within say 100-200Elo of master?

rocky vigil Nov 15, 2025, 4:20 PM

#

Yes

#

Should be around there

green moat Nov 15, 2025, 4:40 PM

#

@violet badger
Script failure?
https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/2926829081096545/-/jobs/12100414422
🤨

violet badger Nov 15, 2025, 4:47 PM

#

ikr..

violet badger Nov 15, 2025, 5:03 PM

#

so threat net with 64k batch trains significantly faster (about 35s per SB/epoch)

rocky vigil Nov 15, 2025, 5:03 PM

#

with nnue-pytorch?

violet badger Nov 15, 2025, 5:03 PM

#

yeah

rocky vigil Nov 15, 2025, 5:03 PM

#

ah nice

#

let's see how it affects elo

violet badger Nov 15, 2025, 5:04 PM

#

well. still needs to see this is Elo impact free. right

#

it also increases for whatever reason memory usage on CPU side.

#

probably each of the workers having a buffer that is proportional.

#

OK, longer train ended.

rocky vigil Nov 15, 2025, 5:06 PM

#

ah nice

violet badger Nov 15, 2025, 5:06 PM

#

Total Training Time: 1h 34m 23s
Eval: 86.946cp
Eval: 79.768cp

rocky vigil Nov 15, 2025, 5:06 PM

#

I can test

violet badger Nov 15, 2025, 5:07 PM

#

what data do you need

rocky vigil Nov 15, 2025, 5:07 PM

#

can you just load it into an old sf

#

and check the evals

#

in those two positions

#

(after running disservin converter script)

violet badger Nov 15, 2025, 5:08 PM

#

from 'quantised.bin' ?

rocky vigil Nov 15, 2025, 5:08 PM

#

yea

violet badger Nov 15, 2025, 5:09 PM

#

$ python convert_quantised_to_pytorch.py checkpoints/test-120/quantised.bin test.nnue
Read checkpoints/test-120/quantised.bin successfully.
Organized data into 8 buckets.
Writing to test.nnue...
Ending position for bucket 0: 70487760
Bucket 0 size: 1152 bytes
Ending position for bucket 1: 70538168
Bucket 1 size: 1152 bytes
Ending position for bucket 2: 70588576
Bucket 2 size: 1152 bytes
Ending position for bucket 3: 70638984
Bucket 3 size: 1152 bytes
Ending position for bucket 4: 70689392
Bucket 4 size: 1152 bytes
Ending position for bucket 5: 70739800
Bucket 5 size: 1152 bytes
Ending position for bucket 6: 70790208
Bucket 6 size: 1152 bytes
Ending position for bucket 7: 70840616
Bucket 7 size: 1152 bytes
Integer value at position 69389475: 33686908
Conversion complete: checkpoints/test-120/quantised.bin -> test.nnue

#

now, let me build an SF in that container.

#

info depth 30 seldepth 45 multipv 1 score cp 20 nodes 16143880 nps 712691 hashfull 1000 tbhits 0 time 22652 pv e2e4 c7c5 c2c3 d7d5 e4d5 d8d5 d2d4 g8f6 g1f3 b8c6 d4c5 d5c5 b1a3 e7e5 a3b5 c5e7 d1a4 e7d8 f3e5 f8c5 e5c6 b7c6 b5d4 e8g8 f1e2 f8e8 c1e3 f6g4

#

NNUE evaluation        +0.23 (white side)
Final evaluation       +0.31 (white side) [with scaled NNUE, ...]

and

NNUE evaluation        +0.17 (white side)
Final evaluation       +0.22 (white side) [with scaled NNUE, ...]

#

main net is pretty similar

NNUE evaluation        +0.05 (white side)
Final evaluation       +0.07 (white side) [with scaled NNUE, ...]

and

NNUE evaluation        +0.24 (white side)
Final evaluation       +0.31 (white side) [with scaled NNUE, ...]

rocky vigil Nov 15, 2025, 5:17 PM

#

pv and normalized evals look decent

#

normalization constant being around 3.5 or so

violet badger Nov 15, 2025, 5:17 PM

#

let me see if I can start a short match.

rocky vigil Nov 15, 2025, 5:17 PM

#

violet badger ``` info depth 30 seldepth 45 multipv 1 score cp 20 nodes 16143880 nps 712691 ha...

pv looks nice, real chess

violet badger Nov 15, 2025, 5:21 PM

#

looks pretty good..

#

--------------------------------------------------
Results of master vs test (10+0.1, 1t, 16MB, UHO_Lichess_4852_v1.epd):
Elo: 149.91 +/- 16.47, nElo: 338.50 +/- 32.92
LOS: 100.00 %, DrawRatio: 25.70 %, PairsRatio: 78.50
Games: 428, Wins: 205, Losses: 31, Draws: 192, Points: 301.0 (70.33 %)
Ptnml(0-2): [0, 2, 55, 138, 19], WL/DD Ratio: 1.12
LLR: 1.10 (37.5%) (-2.94, 2.94) [0.00, 2.00]
--------------------------------------------------

#

I think that works.

#

nice, another good result from this thread 🙂

rocky vigil Nov 15, 2025, 5:24 PM

#

ok so looks like we got basic arch working

#

finally

#

which unlocks testing more things with bullet

stray reef Nov 15, 2025, 5:56 PM

#

nice

candid ivy Nov 15, 2025, 5:57 PM

#

is that a threat input network or normal network test?

stray reef Nov 15, 2025, 5:57 PM

#

pre-threat input arch

#

but it should now be pretty straight forward to get threat inputs working as well

violet badger Nov 15, 2025, 5:59 PM

#

to continue training, can I just 'load_from_checkpoint' and increase end_superbatch in schedule?

twilit oriole Nov 15, 2025, 6:00 PM

#

Assuming no LR schedule?

#

Otherwise you need to change start super batch also

stray reef Nov 15, 2025, 6:00 PM

#

violet badger to continue training, can I just 'load_from_checkpoint' and increase end_superba...

yes, though you also have to increase start_superbatch to whatever SB the checkpoint is from

violet badger Nov 15, 2025, 6:00 PM

#

okay

#

so that would be 121 (i.e. previous end + 1)

twilit oriole Nov 15, 2025, 6:01 PM

#

Also note it will start from the beginning of the dataset again. So this is not ideal

#

I always restart training for this reason. From the beginning

violet badger Nov 15, 2025, 6:02 PM

#

ok, yeah, this is still very early experiment.

#

is there a way to provide multiple binpack and have it interleave them on the fly?

stray reef Nov 15, 2025, 6:03 PM

#

not without a custom dataloader I think

#

though it shouldn't be too hard, one could mix and match existing code, e.g. interleaving exists for viri binpacks in bullet-utils

#

what's more important, that or threat inputs?

violet badger Nov 15, 2025, 6:04 PM

#

I think threat inputs is more fun 🙂

#

(also more relevant on the longer run)

candid ivy Nov 15, 2025, 6:30 PM

#

stray reef pre-threat input arch

~~that python script is no longer needed with that?~~ nvm it seems like it is since theres no leb128

rocky vigil Nov 15, 2025, 7:36 PM

#

stray reef but it should now be pretty straight forward to get threat inputs working as wel...

yeah, you can effectively copy the standard bullet threat input definition

#

and then just tack on the other stuff onto it

#

which is basically how I got it in nnue-pytorch as well

violet badger Nov 15, 2025, 7:48 PM

#

violet badger ``` -------------------------------------------------- Results of master vs test...

240SB:

--------------------------------------------------
Results of master vs test240 (10+0.1, 1t, 16MB, UHO_Lichess_4852_v1.epd):
Elo: 112.26 +/- 9.57, nElo: 235.96 +/- 18.77
LOS: 100.00 %, DrawRatio: 31.91 %, PairsRatio: 13.45
Games: 1316, Wins: 553, Losses: 142, Draws: 621, Points: 863.5 (65.62 %)
Ptnml(0-2): [2, 29, 210, 390, 27], WL/DD Ratio: 1.08
LLR: 2.95 (100.1%) (-2.94, 2.94) [0.00, 2.00]
--------------------------------------------------

#

so, definitely working.

green moat Nov 15, 2025, 7:55 PM

#

vondele, what is being trained here?
https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/5137461961076608/2926829081096545/-/jobs/12100493856

violet badger Nov 15, 2025, 8:00 PM

#

a network, just experiment with batch size.. don't worry.

stray reef Nov 15, 2025, 8:03 PM

#

@formal smelt I need some help here, I'm working on integrating threat inputs with the SF PST right now. My idea was to have the input type have the layout
factoriser,halfkav2,threats
so I can modify PST inference like so

-let stm_pst = pst.matmul(stm).select(buckets)
+let pst_slice_end = ThreatInputsBucketsMirrored::FACTORISER_SIZE + ThreatInputsBucketsMirrored::HALFKA_V2_SIZE;
+let stm_pst = pst.matmul(stm.slice_rows(0, pst_slice_end)).select(buckets)

calling slice_rows() like this leads to an error that I'm not sure how to fix: Message("Op(IncorrectDataLayout)")
Any ideas?

#

It looks like this operation may not be allowed on sparse nodes. in which case this will be difficult, or training will be slow

#

Time for dinner tho, WIP code is up on https://github.com/Yoshie2000/sf-bullet-train/tree/threat-inputs

formal smelt Nov 15, 2025, 8:18 PM

#

stray reef It looks like this operation may not be allowed on sparse nodes. in which case t...

Cursed way is to have the full weights and element mul them by a mask lol

rocky vigil Nov 16, 2025, 4:38 AM

#

violet badger a network, just experiment with batch size.. don't worry.

doesn't seem like the batch size increase affects strength that much, that's good

#

at least if only applied to one stage

#

could be different if done for all stages

violet badger Nov 16, 2025, 7:49 AM

#

yeah, looks very good to my eyes (strength is equivalent/better). Will now start a full training to verify. That's a bit more tricky. Now that means that making sure the DDP in pytorch is working would become very useful. It would imply a 5stage net trained in a day.

rocky vigil Nov 16, 2025, 8:23 AM

#

violet badger yeah, looks very good to my eyes (strength is equivalent/better). Will now start...

looks like 49c will be the shortest lived master net...

dark stream Nov 16, 2025, 8:37 AM

#

This entire effort was a godsend for the SF net training pipeline.

frosty imp Nov 16, 2025, 8:44 PM

#

Bullet is using dp right?

daring wren Nov 16, 2025, 8:49 PM

#

frosty imp Bullet is using dp right?

dp?

violet badger Nov 16, 2025, 8:53 PM

#

data parallelism, I assume

#

in that case yes

#

#UE Threat Inputs for AB