#UE Threat Inputs for AB
1 messages Β· Page 12 of 1
yep doing that rn!
lol I can't test 32-bit builds locally
is there a way for me to run CI from my branch
ok lemme make sure it works on godbolt 32-bit compilers and then if it does we can just try it?
worst case we revert
except godbolt doesn't have them either 
Yes. When you name your branch github_ci the ci should run on your fork too, I think.
Hmm, at least that's how I understood it. Maybe it also needs to be activated in your repository settings?
Maybe I'm completely wrong, but something like this existed π
I think you need to install the right 32 bit runtime, should be possible on the side.
sudo apt-get install gcc-multilib
or similar
gotcha
ugh why is vmovl_high_s8 not defined on armv7-neon
oh it's 64-bit only?? why
good news tho, seems like my patch fixes x86-32...
will now try to fix armv7--neon
also matetrack... @violet badger any ideas what might be happening there?
what is matetrack anyway π©
matetrack is a script that tracks matefinding ability https://github.com/vondele/matetrack, but it is just the testcase above
gotcha
gotta run.
πββοΈ
oh yeah! Does anyone know how threat inputs performs on matetrack?? Im so curious!
it crashes :P
ok time to download TBs I guess
I was gonna do that anyway at some opint
Whoa! After all the patches and 20+ people working for like half a year or longer, the final code is only about 1000 lines?
Stockfish code is so compact.
quite a few months of the time "spent" on this project was me thinking that increasing L1 would not work
it's not that complex once everything works. though there's ofc some trainer changes as well
of nnue code? or other files?
But why is it only like 6 elo in Stockfish even at VVLTC?
All files combined.
In Yukari, it's like 25 elo.
what else should it be? :P
all files in stockfish total? Really?
Just the change.
20+?
I mean it's more like 15 elo when you take into account that one NN is SPSAed
also you'd expect gains to be smaller when modifying stronger engines right
ohhhh! lol
the engines are so different and so far apart, it's absolutely not comparable. maybe threat inputs work quite well with little data, weak data, or suboptimal training procedures. too many factors to say
also we don't know the speed diff (TI vs. pre-TI) in yukari
comparing a same-sized net in plenty vs. yukari, i found it's a rather slow engine, that might explain some of it
In Monty it's like 40.
Well, in Plentychess... it's less gain, but still.
But wow, Stockfish was fast.
monty is MCTS
info depth 7 seldepth 9 multipv 1 score cp -96 nodes 1268 nps 181142 hashfull 0 tbhits 0 time 7 pv g4e2 d3d4 e2f3 b3d2 f3g2 d2c4 g2f3
syzygy/tbprobe.cpp:1081:22: runtime error: shift exponent 64 is too large for 64-bit type 'value_type' (aka 'unsigned long long')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior syzygy/tbprobe.cpp:1081:22
(base) cowpox@Cowpox src %
ok great, can reproduce locally
this is on ARM, so I doubt it's any of the vector code
https://github.com/official-stockfish/Stockfish/actions/runs/19263787581/job/55074630761 also fixed armv7-neon!
so only CI failure now is matetrack (and include what you need whcih should be trivial to fix)
It looks like you're on a zero to one of the valuable contributors speedrun.
Now, do we wait a bit longer for things to settle? When do we launch a search tune?
NNUE weight tune wouldn't be done until we've exhausted the training options, but search params are not trainable anyway, so they can be tuned.
ah can you update your PR with this
i have no idea how we have issue, we didn't touch tb code...
so I guess we're not decrementing pieceCount properly
maybe swap_piece?
but what's weird is the pos_is_ok check is passing...
lemme verify by calling pos_is_ok in tbprobe
oh bc fricking Fast = true
constexpr bool Fast = true; // Quick (default) or full check?
ok this is aura
maybe
that would make more sense actually
also explains why even with Fast = false there's no pos_is_ok fails
I don't rly understand how materialKey works tho ngl
maybe something here?
fk
oh did you use git bisect
no but lemme try reverting this
ok that fixes the TB crash
so there's some subtle issue here
help me find it?
evidently it's somewhere which changes how materialKey is computed...
ohhhh
lol
i think there is something sus
about calling swap though
without the material key updates
sure
put_piece and remove_piece don't update the material key either tho
maybe try printing material keys before and after revert
that might help us understand where they diverge
agree
brb
https://github.com/official-stockfish/Stockfish/pull/6408 you can also try this to make sure I'm not trolling
might have made a mistake
this indeed fixes matetrack
like ??
is going on here
we're so close bro
I'm trying to step through the code in my head for every class of move lol
I think ur right that it might have to do with (under)promotion that's also a capture...
it's just underpromo
i think
no underpromo capture
though that would probably be even worse
oh ok
but it is an underpromo followed by 2 captures
wait
I think pieceCount[captured] is not updated yet at line 785
if it's not an ep
(This is the fix diff)
it has to be that right
ohhh
fun times
ok now let's fix IWYU
maybe in a later PR I'll add a check that materialKey is correct to pos_is_ok...
didn't realize TB depended on its accuracy
yessir
anyways for now if this works I'll just accept it
and hopefully IWYU won't take so long
@rocky vigil u can do this also now the pr is opened. download the file and then edit PR description and there is attach file button there
how close are we on the cleanup/code quality side?
edited
btw I'm just gonna download the pr doc
the "everything" section
unless ppl still have stuff to add to it
yeah ok
so now the PR message looks half decent
we can complete it once everything else is done
great
Given that we have been extremely lucky that matetrack caught the bug, is there a case for including a specific test to the ci that would catch the original bug?
.
materialKey is only used for tablebases fwiw
so it makes sense that we never detected this until now...
yeah but it would've been really sad if matetrack didn't catch it
also additional debug checks in tbprobe.cpp make sense to me
and then we got had in a tcec game...
use pdf for the attachment
bruh is my clang-format outdated
bruh
ok
sorry lol
lol
hm the other problem is that even if we add the materialKey check to pos_is_ok, the checks in that function are hidden by default because of their enormous runtime cost
so likely no one would have found it during testing...
@lucid grove any ideas?
I guess asserts in tbprobe are enough given that tablebases are used in CI...
it would be nice to have 6-men checked in CI but eh
π
Would need to take a closer look. Naively any test that would discover the mismatch is fine. Doesn't necessarily need tb for it iiuc
I think assert in tb probe is a possibility. But apart from matetrack no other ci uses TB atm iirc
matetrack runs on every SF Pr
But we were lucky it triggered, right? It's not a given that it would trigger for any mismatch.
If we add an assert in probe it should catch any mismatch
Agreed
Honestly Iβm not why such an assert didnβt previously exist
Passing the wrong position to a TB seems easy to do lol
Skill issue ig
But even with assert, it would only trigger if search comes up with a position that has capture promotion or whatever triggered this bug
The bug was actually just a capture
any non en passant capture was buggy
but yeah an extremely subtle bug could escape detection
fuzzing with TBs could be nice
but I think only when updating do_move and such functions
yay CI passed
@rocky vigil I think you can hold off on merging the speedup(s)
That way the upcoming non regression test is more accurateβ¦
gg
I think there is another one, but it is fairly rare bug to trigger.
rare ones are the hardest to debug π
cool, great fixes, btw!
sure, but also hard to exhaustively cover.
But I'm all for adding some more tests.
Now, for completeness, our tests did catch it, which isn't bad at all.
yeah so now wait for shawn to get back to us on nnue-pytorch
do maintainers (vondele, disservin) have any comments on PR
I think Yoshie2000 said he would have read the whole diff.....

had to actually listen to the lecture unfortunately
mfw you're actually learning in school
yeah if I don't plan on paying attention to lecture I typically just don't go
what are you studying :o
compsci masters, the lecture was on statistical signal processing
super cool :)
During work on threat inputs, there was a recent hard-to-debug crash in tbprobe caused by incorrectly computing st->materialKey. This PR adds correctness checking to pos_is_ok and a faster c...
(not sure if this is ideal, but here's something)
@rocky vigil can you add to the commit message also that the net is the result of the following training recipe https://github.com/vondele/nettest/blob/7de71238e9b295e3f88ed7c9c5936af632c9b981/threats.yaml
ok
That documents exactly what needs to be done to 'reproduce'
when the net is reproducible π
unfortunately pytorch training is not bitwise-reproducible, but executing this recipe enough times should yield a net of similar strength. In practice the variation is actually fairly small.
Bitwise-reproducibility is probably not possible on a gpu with floating point unit.
Because GPUs might operate in different orders affecting the rounding.
demented
there is a simpler reason, and that would be the random skipping of positions
(and presumably, the shuffling)
which is of course pseudo-random..
the core issue is use of atomics in FT backprop
i think pretty much everything else could be resolved (e.g. by fixing seeds)
oh
Another thing moving forward is... do we let people submit net training runs?
making the position order deterministic is strange to me but in principle it could be done
i think this has already been open for a while?
It's doesn't actually look deterministic. It just means that the rng seed is fixed.
as stated it still doesn't matter :p
I mean that the training code is open, but the resource needed to train is not available for everyone.
Since training ideas only take like a few days or so, we should be able to test at least a hundred training ideas in a year.
the ability to PR to nettest has also been open for a while
I know you can PR or run such a thing... but many don't have the resource to run.
seems a bit tricky w/o volunteer resources for GPUs
i thought if you made a PR to nettest then a run for it could have been set up on gitlab pipelines
there is at least support for 4 concurrent runs
so, ideally we get DDP to work, would reduce the cost ....
DDP?
oh I see
some bug somewhere.
There could be. There are about 2000 CPUs on fishtest. There could certainly be a few GPUs.
maybe... but then you need people to download many gigabytes of data
etc. etc., it's a lot more requirements than fishtest...
many gigabytes is an understatement
about 1TB
some people's RAM ...
vondele any thoughts on this?
seems reasonable.
I've also asked co-pilot for a review, only consider what is reasonable...
Stockfish threat inputs PR summary.pdf I think some of anematode's speedups could be mentioned?
once the PR is in final stage, I'd like it to be squashed, with the commit message containing the appropriate co-author: ... designations and also the PR message as commit comment... please.
Ummn what do we do after the cleanup, after the merge? Do we launch a VVLTC search tune first?
have a medium-strength, cold beer, and enjoy the day?
Some speedups could already be applied, no? Then we have to test Stage 4-5 nets with L2=31, then quantize the smallnet.....
@long quest https://tests.stockfishchess.org/tests/live_elo/691343297ca8781852331452 lol this is genius, I think you can give this simplification bounds...
although hm let's just let it run as is
still PLENTY (pun intended π ) things to do....
what exactly do you mean by this sorry?
so, all PRs are usually squashed into one commit, easiest for maintainers if this is done by the author. By adding to he commit message something like
Co-authored-by: sscg13 <[email protected]>
Co-authored-by: Timothy Herchen <[email protected]>
Also a single commit can be attributed to multiple authors.
one kinda goofy idea I've been mulling over
to try to address the x86 pain
is to have an i16 expanded version of the threat weights as well... then use that for features which are being very frequently used
it would be quite complex to track tho
and the tracking would have to be extremely fast
but I'll probably try it at some point... if the distribution of threat feature incremental updates is good then it could be a significant speedup
also somehow I didn't realize that the L1 size is now 1024...
that changes the calculus on add/sub honestly
loop overhead and index generation will matter more
oh yeah talk about claude comments
I do have a determination that 64 is an upper bound
and I know why 32 is insufficient
and I suspect 48 is sufficient
once snowy-egret is merged there will be no perf difference
but yeah maybe nice to have a justification about the size
here the move Qxe3 modifies 48 threats
hate when this happens
that makes sense
the only way to involve 4 pieces in a move is castling
but I strongly suspect you cannot hit anywhere near 48 with castling
you'd make a good mathematician
did I mention I'm supposed to be a math major
sounds about right
but then it requires 15 minutes of explanation covering two chalkboards to explain why its' trivial
"trivial" means "i could come up with the reason on the spot"
ok well we can try 48 then...
maybe in a future PR for safety? unless you're 100% confident
great
to silence claude
π€
also maybe you saw but I made a small PR to clean up some redundant stuff in full_threats.cpp
in this context https://lichess.org/@/Tobs40/blog/why-a-reachable-position-can-have-at-most-218-playable-moves/a5xdxeqs is a beautiful blog post..
https://github.com/Tobs40/chess218 as starting point
Gurobi
one thing I just realized is that https://tests.stockfishchess.org/tests/live_elo/690e99ecec1d00d2c195c391 probably only helps because we memcpy the whole thing...
so we can probably simp it out after snowy-egret
I actually think there was another blog post on that... maybe tehre are two.
O where
nah, probably that one. Even though I thought it appeared on lichess recently. Seemingly is already a bit older
So, basically everyone is waiting for shawn_xu to do his change to nnue-pytorch, right?
"Remove unused code (Legacy PSQ stuff in Full_Threats.h looking at you)" : this is one of the checklist items. Was this done?
this is considered done when everyone agrees it is
although the thing I specifically called out indeed has been removed
I didnβt see anything unusedβ¦
So, do you think there is any possible regression from this cleanup?
no
but just to be sure
(it's maintainer decision)
(as a stc nonreg will take a while)
LTC might be the shortcut π
STC filtering has been done.
and yeah, I think we should do a final LTC run, maybe after the squash.
oh so like do a second LTC test after squashing?
yeah.
nah..
oh
let's move forward as is.
ok
Agre
do we continue waiting on shawn to do nnue-pytorch work or do the merges asynchronously
can be asynchronous.. in the end the net generation was with a specific sha which we know where to find. Obviously, I'd like to move forward with integrating that in the main branch, but I'm confident we'll do that.
Yeah if no one else has an opinion Iβll set up a new LTC soon
I just noticed that I forgot to remove some things from the FusedUpdateData struct, dp2from and dp2fromBoard can be removed I think
@long quest
base (...stockfish.ti) = 1515728 +/- 1787
test (./stockfish ) = 1542716 +/- 1503
diff = +26988 +/- 2310
speedup = +0.0178
P(speedup > 0) = 1.0000
your patch works extremely well locally
ah you can pr this (to my branch) if you want
or let me know what to remove
2dd15: 41 31 d8 xor r8d,ebx
2dd18: 31 de xor esi,ebx
2dd1a: 41 c1 ec 1c shr r12d,0x1c
2dd1e: 49 0f be d1 movsx rdx,r9b
2dd22: 49 39 f0 cmp r8,rsi
2dd25: 40 0f 92 c5 setb bpl
2dd29: 49 89 d1 mov r9,rdx
2dd2c: c1 e8 10 shr eax,0x10
2dd2f: 49 c1 e1 04 shl r9,0x4
2dd33: 83 e0 0f and eax,0xf
2dd36: 48 c1 e2 06 shl rdx,0x6
2dd3a: 4c 01 c8 add rax,r9
2dd3d: 48 8d 6c 45 00 lea rbp,[rbp+rax*2+0x0]
2dd42: 4a 8d 04 02 lea rax,[rdx+r8*1]
2dd46: 48 89 c2 mov rdx,rax
2dd49: 45 8b 0c af mov r9d,DWORD PTR [r15+rbp*4]
2dd4d: 48 c1 e2 06 shl rdx,0x6
2dd51: 4c 01 ea add rdx,r13
2dd54: 44 0f b6 04 32 movzx r8d,BYTE PTR [rdx+rsi*1]
2dd59: 45 03 04 86 add r8d,DWORD PTR [r14+rax*4]
2dd5d: 45 01 c1 add r9d,r8d
2dd60: 41 81 f9 ef 37 01 00 cmp r9d,0x137ef
2dd67: 77 22 ja 2dd8b
assembly after also applying bald-eagle, they seem to synergize rly well
"small"-speedup my ass
Result of 100 runs
==================
base (...stockfish.ti) = 1515621 +/- 1655
test (./stockfish ) = 1569287 +/- 2560
diff = +53666 +/- 2930
speedup = +0.0354
P(speedup > 0) = 1.0000
threat-inputs vs. small-speedup + bald-eagle
super exciting
I commented the lines on github
alright I took them out
if no new suggestions I'll make a new LTC soon
aight well I've made it
vondele wanted a new ltc vs master
so a new ltc it shall be
it unironically might finish faster
Really? That seems not right, it's opening it up to the machine lottery again.
at least well
I'll approve but a non reg wouldn't have that issue
Yeah I'm not seeing the Vs master. Because I don't see the logic behind that
wait what do we want then
LTC against master again?
LTC nonreg against passed version?
A LTC non reg to make sure you didn't regress anything?
Vs what has already passed
ok
I can set that u
aight
yeah
let's do that
it might take like a day or so idk
am I crazy or is the github diff different than the diff on fishtest...
the one on github looks wrong
but the commit hashes look right
no, it makes sense.. but well, I see something is already running.
Here's the result on the ARM nodes for an LTC test vs master, you might want to add that to the PR:
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 patch : 13.9 1.9 38296.5 73728 52
2 master : 0.0 ---- 35431.5 73728 48
How does it make sense to test for non reg by passing it yet again. There is a test made for that
it make sense to test the final version of the patch by the usual standards...
it would also have saved resources..
But it does not tell you if it is actually non reg
no, it tells if it gains against master..
why does it matter if it regressed slightl
anyway,
will be fixed in the normal course of development
Extend this argument to any patch
any patch that passes the normal fishtest development, is actually handled that way..
but it is not worth long discussion.
Anyways correct it is not a big deal. I just think a non reg is more appropriate here
let's instead bet on how long it takes till the first gainers pass on fishtest after merging..
My bet.. 1 day for code and .. 1 day for net?
I think you have some insider info 
I thought Swiss banks were good for that
There are already speedups ready that are not part of the PR, right?
3.5% will pass STC quickly π
k7/2n1n3/1nbNbN2/2NbRBN1/1nbRQR2/2NBRBN1/3N1N2/7K w - - 0 1Lichess Link | Image
This position with move e4d5 has 68 dirty threats if I didnt measure anything wrong
I think sf should just segfault there
assert(pos_is_ok());
assert(dp.pc != NO_PIECE);
assert(!(bool(captured) || m.type_of() == CASTLING) ^ (dp.remove_sq != SQ_NONE));
assert(dp.from != SQ_NONE);
assert(!(dp.add_sq != SQ_NONE) ^ (m.type_of() == PROMOTION || m.type_of() == CASTLING));
+ std::cout << dts.list.size() << std::endl;
return {dp, dts};
Stockfish dev-20251109-dbc3dcc3 by the Stockfish developers (see AUTHORS file)
position fen k7/2n1n3/1nbNbN2/2NbRBN1/1nbRQR2/2NBRBN1/3N1N2/7K w - - 0 1 moves e4d5
68
so people wouldn't feed it some garbage
Is that relevant for this list?
using DirtyThreatList = ValueList<DirtyThreat, 64>;
I think it should be
It is a legal position I think
Oh actually it isn't
Because there isn't enough pawns to promo into all the extra pieces
Triggers an assert indeed:
stockfish: misc.h:138: void Stockfish::ValueList<T, MaxSize>::push_back(const T&) [with T = Stockfish::DirtyThreat; long unsigned int MaxSize = 64]: Assertion `size_ < MaxSize' failed.
What to figure out is if any legal position triggers it?
it is not too far from legal though..
https://tests.stockfishchess.org/tests/view/69108025ec1d00d2c195c5d6 this patch avoids copying it around, so we can make it as large as necessary w/o perf impact (within reason obviously)
k7/2n1n3/1nbNbn2/2NbRBn1/1nbRQR2/2NBRBN1/3N1N2/7K w - - 0 1Lichess Link | Image
but obviously it'd be nice to have a good idea of the bound
this should be legal
hate when this happens
but yeah, still triggering the issue.
stockfish: misc.h:138: void Stockfish::ValueList<T, MaxSize>::push_back(const T&) [with T = Stockfish::DirtyThreat; long unsigned int MaxSize = 64]: Assertion `size_ < MaxSize' failed.
π©
for the chess clarity this should be illegal
so, we should probably fix the code... and probably include this position in CI somewhere.
oh shoot I forgot that deduplication was only processed after the list was constructed
i also forgot about x rays
lemme get a new upper bound then
K7/8/8/BNQNQNB1/N5N1/R1Q1q2r/n5n1/bnqnqnbk w - - 0 1Lichess Link | Image
Qxe3 should have 71
replace black bishop on d5 with queen, will add a few more
If itβs just raw threats before deduplication we can use 80 as an upper bound
Oof I hurt my wrist tendons yesterday which is making typing difficult
Basically what I wanted to do is to move features to under model/modules, and for each feature set, define its own module by inheriting from DoubleTransformerSlice
Then each module specific to the input feature can define its own coalesce weights function
FeatureSet stuff can prolly just be deleted. Combined features can be reimplemented if needed anyway
is this a competition xD
looks like the testβs gonna take a while since I didnβt set simplification bounds - what should I do?
if it's ok with you we can stack our two changes and it'll probably pass a bit faster...
but taking a while is also fine
sure, can you just make a combined patch?
yes!
cool, I'll stop my test
sounds good, I think I'll make the test targeting 0x539's test (https://tests.stockfishchess.org/tests/live_elo/69139f8e7ca87818523314dc) if it passes because they touch the same area of code...
Merging time??? π
Obsevation: The stockfish discord channel is either really busy, or completely quiet depending on the time of day it is xD
Eh, it was still relatively busy this time of day for the last few days.
I just think people are chilling after putting in this much hard work. Which is fair.
I think it becomes more active around 2 hours from now?
If I remember correctly
We'll find out. Anyway, threat inputs probably gets merged today.
I most curious for the progression tests before and after search tuning with the best net, just how much elo will be gained since sf 17/17.1!?
I'm even more eager for search modifications, tbh. For the last couple of months gains from that had stalled. The new net probably changes that for a while.
I've ticked the box on the second LTC run, and added one for this upper bound change https://github.com/official-stockfish/Stockfish/pull/6406#issuecomment-3513836954
So any idea what the upper bound should be?
.
(8 attacks + 16 attacked) * 3 pieces + 8 discovered attacks
Oh, right. Btw, will this change playing strength or something like that?
No
I guess it is just update this and the logic
And then it is in a mergeable state
sounds good...
can we get a countdown to merge? Like on new years? Very serious request.
ig some of the random newlines in the diff can be removed
for the squashed commit, would people prefer I use their public emails or noreply emails?
btw with the change, it can now run both positions without triggering an assert
i think we can try to find a similar position that is actually a mate, and then add that onto matetrack
I think we could just add that position to bench, it gets caught by the asserts.
ah sure
just what is already in the git log?
(in that way github should properly associate it with the accounts I think)
ok
where are all you guys from? Its 2am in the US right now, so I'm guessing not there?
let's try to keep this thread on development please..
@rocky vigil pr sent
threat*
kidding lol
also I forgot to include disservin as a coauthor
yeah disservin/linrock/vondele should also be listed as coauthors
do this now, or after merge?
now, as part of the correction
ok
with this change, bench becomes 2626086, can someone else confirm
(it has been pushed to the PR)
2626086
git log contains both noreply emails and public emails probably because of differences between local commits and those done from github
how should I do this
I have uh, never really done big PRs before
I think the public email makes most sense
Huzzah
Goated domain name
π€
huh that seems to not be my github email actulaly
Shawn Xu [email protected]
is what git logs have
he used his bench
nvm it is on my gh profile. just anything is fine I guess
i didn't find a public email for cj in this log
yeah it seems that cj prefers noreply
@rocky vigil PR sent
(I actually had almost finished making the squash commit before I saw this)
are we about to merge π
From my side, I think we're basically good to go, but can only merge later today, so still a couple of hours to fix small things, if needed.
I think you can squash it essentially now. Even if tiny stuff comes after, this can still be integrated.
ok
Btw, there are still some speedups and stuff ready for after the merge, right?
ok it looks like contributors came out fine
oh i still have a suggestion, gotta add yoshie2000 to AUTHORS 
true
The purple elimination plan is working well.
oh right
i can pr in 5mins
yeah I can redo the squash
I've already written up the PR / squash commit msgs so it's fast
cool
mergers when
when vondele gets back
ci failing 
you didn't include bench in the commit
ci says
signature mismatch: reference 2351426 obtained: 2626086 .
tho, where is it taking the reference from then? latest commit with bench?
yeah
yeah that's master
it searches backwards
alright, then it's an easy fix
lol
yeah
also it's funny
how the random people cannot seem to like
comprehend
L1 is 3x smaller
but the net is larger
who is "random people"
thread
surely threat smallnet could work now
;)
the cloners and other ppl who are infamous
ah ok
blegh
vondele can do it during merge tho
yea
btw what was 0132r
it was going really good in stc
but then antiscaled
it's an attempt to migrate to viriformat. but it was using a different teacher net since i haven't gotten around to writing an updated relabeler in c++ yet, that's probably the reason it antiscaled
next one with correct teacher net ready in 5-ish hours, but also wrong channel probably
ah yeah this channel will probably just be used to finalize nnue-pytorch merge
at which point it becomes mainstream
and we move to the general channels
i can see if I'd be able to work on this in a few days
but in the meantime if someone wants to try
Vondele would also like the PR text as commit message.
makes it easier to avoid mistakes on my part, if I copy and paste from the PR message formating links etc might be off.
we're super close though....
btw I edited
the PR text
into the commit message
i think the only remaining thing is clang-format
that's easy enough on my side.
other than that I think everything else is good now?
nice
huge thanks to all contributors
this has been very nice
particular thanks to @rocky vigil and @frosty imp for their huge effort and endurance!
sure, beginning of a journey, hopefully!
So, where will the conversation surrounding this move? SF-general and dev?
I propose that on-topic conversations can move there indeed.
Huge congrats everyone!
Bruh
No coauthor for doing most of the code for threat inputs until it got handed off?
You can literally see I wrote all the code in the Monty PRs
hm, oversight and unfortunately 1min late... we can't change the commit, we can add it to the PR.
https://github.com/official-monty/Monty/pull/87/commits/77e74a2f95da9b3f3e34519422dfb026a05178e1
the first threat input commit btwβ¦
at least that one is referenced
yeah I thought while we took inspiration from that code in early impl the current impl is quite far
if we widen the co-authoring range then lofty should also be credited for having the precursor UE impl that Yoshie took inspiration from
ok I'll add a comment to the PR
@formal smelt is this okay?
Sure
Iβd argue that facilitating the whole development of threat inputs in the first place (and note Viren original pdf implemented as-is actually sucks due to explicitly excluding PSQ features) had a lot more bearing on threat inputs passing than w/e 1 elo speedupβ¦
this was my thought, sorry for making you (+ lofty) feel uncredited
that was a misunderstanding on my part, I thought Viren did all of the architecture design work
meanwhile SF 17 sees this as an opportunity
SF 18.1 coming
Oh, wait. I just thought of one thing. There was a gap between th1 and th8 performance in the previous PT. That will probably be closed this time.
Yes unfortunate that you have to dig into every Monty PR to see that
in actual fact there was no gap
8 th PT has bigger gamepair ratio
just that it compresses because of bigger game pair draw %
it was discussed at some point in this thread, and the takeaway I got from that was to list direct code+net contributors as co-contributors and credit indirect contributions in PR comments, but i don't remember if you had any part in that discussion
I really cannot tell if this is meant to be sarcastic
It isnβt, you see Viren opened all of them even if (in the case of 2/3 of them) his role was cargo r -ring training and uploading the network
Fixed indexing: https://github.com/official-monty/montytrain/pull/16/commits/98ca72d6cb340767484f49820dc0adf648bd8cb8 (thanks to you also for part of this ofc)
I8 quantisation: https://github.com/official-monty/Monty/pull/116/commits/0f5a66438f77052b6de404ff5080ad35ba2c2943
yeah that was definitely part of it, i hope that last comment in the PR makes amends at least somewhat
in retrospect I should have taken some more care handling the crediting, I forgot that SF is a way larger project and that the vast majority of people don't know the "inside info" and just go by what is written
to start this off, can we test the l2=31 net?
sure
cause itβs an easy merge
and the biggest
with the highest possibility of conflicts
I see.
will yall be moving the neural net discusions here at some point? Since threat inputs is the real actual only net? https://discord.com/channels/435943710472011776/718853716266188890
no
π
yeah I remembered this one sorry
this is what happens when too many messages get sent
@formal smelt I was under the impression you were getting credited second from the prior list posted. On the linked document the credit I claim is for creating the feature set. On the issue of who created the PRs, you raised no issues at the time with being coauthored on all of them. I had no problem if you wanted to open some of them yourself
I'm not concerned about that, on the monty commits its coauthored
its only an "issue" now from SF perspective because there was effectively no credit to me unless a person happened to actually inspect the Monty PRs and commits within
if you can autoopen fishtest which seem to not work I guess
nothing changed there, should still be working unless some IP block? (not by us).
yes, and this is what I'm referring to (IP block)
since our net haven't changed in ages welp, it wasn't an issue
well it's not really an issue anyway
should I just put stage 5 on fishtest, or wait for a local test on gitlab first
just put stuff on fishtest
ok
yes, probably best... in this case, I couldn't run a local test, since the inference code wasn't immediately available. For model arch changes, it is nice if that's available, in that case the local test runs directly.
After all the efforts... we got about 1 elo PT π
All of that... for a drop of elo?
Like... if we knew this from the start, would we have done this at all?
Viren has a 4 elo patch?
Yeah... it was 6 elo VVLTC SPRT before.
Then the PT came and only 1 was real progress.
Yes, I know that.
The point is not that the 6 elo was accurate. The point was that...
We got 1 freaking elo from all these?
There were times where you got a few random patches and like 3 elo.
And this time, all these for a single elo drop?
Are you being serious here?
- We have no idea what effect the verbatim patch had, so we are not really getting good reference. Ideally, it should have improved elo in PT, but unless threat-inputs actually lost elo in PT, we don't really know.
- There is still "a lot of low hanging fruit" left as per Viren. So some elo gains can be expected from that, purely from speedups and things.
- There is much scope for things like search tunes, which should also gain a nice bit.
- Most importantly, search gains have stagnated for months at this point. This might change that.
There is the 8 Elo of net spsa also for later
Oh... yeah... I guess maybe we can gain 10 elo after these. That would be nice.
And the fact that now we aren't stagnant on net anymore too.
And honestly, the PTs haven't exactly matched sources like NCM or SPCC a lot lately. I'm curious what SPCC will show.
SPCC even has the verbatim patch, so...
I'm fine even if this was neutral at all, what important for me is that we now have simple training steps that don't require SPSA.
Yes, thank you. The reproduction of the net training pipeline was also one of the bigger boons.
Well eventually master net will have SPSA optimized on top of everything, but this is a good improvement.
also since threat inputs are really hardware dependent
you can get w/e the hell noise from PT
someone can actually run a script that separates results by archs on pt
would be interesting
SF18 already on ARM.
Locally on vondele fleet ltc pt is +50 so
you'll also get credited on the NNUE pytorch PR
eventually
The progression tests are about +2 elo at the moment for the last progression tests, so thats not too bad π
btw, I would also see the fact that we currently have clearly defined how to train the master net with nnue-pytorch as an opportunity to repeat this will bullet. I can only believe this is beneficial for both toolchains. I'm not religious about nnue-pytorch and wouldn't mind running some trainings with bullet, while showing equivalence (or reaching it) might be useful for all engines using bullet.
weren't there still some kinds of conversion issues to be resolved
so far it is not working, afaik.
certainly would take some effort, but could be worth it?
i strongly suspect this is because the default psq order in bullet is different, btw
in sf it goes like, white pawn, black pawn, white knight, black knight, ... , king
whereas in bullet it goes like white pawn, white knight, ... , black pawn, black knight, ...
idk, this was the branch no https://github.com/Disservin/sf-bullet-train
yep and this line https://github.com/Disservin/sf-bullet-train/blob/main/src/main.rs#L34
indicates it's using a bullet default
which runs into this issue
would be a good reason..
there was also the 127/128 stuff
but that wouldn't be cause for a major error
still good to fix
i can't remember where it is off the top of my head but couldn't it cause overflow or something
i thought it was so you could divide by 128 rather than 127 in inference
probably fine
well it would be nice to try again
also bullet_lib api has been stable for many months now, unlike when Disservin's branch was written
yoshie is the most likely person to ask to write
btw @stray reef would you want to look into attempting to write a bullet training config for sf net?
i believe there's already some (more or less working) training configs, but they just can't be loaded into SF?
see here
i think this is the main one to fix
that sounds very easy to fix then
yeah if the later layers quantization still works
then it's just a matter of modifying psq part of features to match sf
ofc the transposing still needs to be done
transposing seems to already be done via SavedFormat in disservins config
i thought it's always been ints
alright then
feature_set_hash needs to change, idk what else
i forgot what "version" was
but yeah it seems like it shouldn't be too much additional work
what's better, doing the pst shuffling in bullet or the python script?
i think it'll be easier for you if you do it in the python script
also beware of the input buckets and the mirroring as well
sf mirrors to efgh
and '
'e1' is bucket 31
h8 h7 ... g8 g7 ... e8 ... e1
or
h8 g8 ... h7 g7 ... h1 ... e1
?
h1 = 28 (so, the latter)
actually you might be better off just defining the psq part properly in bullet
so that you don't need extra shuffling
if you were implementing it properly you should probably write a custom dataloader and input type and then obviate the shuffling entirely
so many complications
its not really that hard
custom input type sure, but why the dataloader?
at least, what does it have to do with the pst order and mirroring
https://github.com/Disservin/sf-bullet-train/compare/main...Yoshie2000:sf-bullet-train:fix-inputs
Basically did it (hopefully) exactly like https://github.com/official-stockfish/Stockfish/blob/master/src/nnue/features/half_ka_v2_hm.cpp so it should work, but I can't try it right now.
One thing that confused me a bit was the merged kings stuff vs. halfkav2, i thought the stm king has no feature in halfkav2 because it's implicitly encoded in the bucket, but it doesn't seem like that's the case now
why are perspective kings not excluded in https://github.com/official-stockfish/Stockfish/blob/master/src/nnue/features/half_ka_v2_hm.cpp#L40-L48 then?
wdym
the king also has a feature
iirc there was an idea to remove the biases to merge them into the king feature
i thought the stm king has no feature in halfkav2 because it's implicitly encoded in the bucket
i guess i misinterpreted your response to this then
like for bucket 29, ([stm] king, g1) is also a feature
someone should test if 32 buckets vs 16 is really worth it
later
are we not adding threat inputs
they are the same as in bullet
i think it's much easier to first test this and then add threat inputs
ok
how about reduce the L1 size to 1024 anyways
first
to make the test run faster
only need to train 1 SB anyway to see if results are reasonable
Alright I updated bullet on my branch, and I am now able to train something that produces a reasonable startpos eval of 43 (internal units)
Gotta stop for today, and I have no idea if I broke something in the SavedFormat (i probably did), so here's the checkpoint in case someone with more knowledge about how to get things loaded into Stockfish wants to take a look
https://1drv.ms/u/c/74d39b59afff2586/IQAPLaqfHD1gQLry3utFOuxXAWs8OVfaZnq1X46PJacO_lw?e=egTwI3
Maybe it's worth trying having additional features for the locations of pawns on the same or adjacent files to improve pawn structure understanding
@stray reef can you run the disservin converter script and upload the .nnue file? I'll try to get it to work in sf by disabling threats on big net
Read checkpoints/test-1/quantised.bin successfully.
Organized data into 8 buckets.
Writing to checkpoints/test-1/pytorch.nnue...
Ending position for bucket 0: 69717832
Bucket 0 size: 1152 bytes
Ending position for bucket 1: 69768240
Bucket 1 size: 1152 bytes
Ending position for bucket 2: 69818648
Bucket 2 size: 1152 bytes
Ending position for bucket 3: 69869056
Bucket 3 size: 1152 bytes
Ending position for bucket 4: 69919464
Bucket 4 size: 1152 bytes
Ending position for bucket 5: 69969872
Bucket 5 size: 1152 bytes
Ending position for bucket 6: 70020280
Bucket 6 size: 1152 bytes
Ending position for bucket 7: 70070688
Bucket 7 size: 1152 bytes
Integer value at position 69389475: 2171895937
Conversion complete: checkpoints/test-1/quantised.bin -> checkpoints/test-1/pytorch.nnue
https://1drv.ms/u/c/74d39b59afff2586/IQCTyS0zPPq3Q7v_ywlBV3pKAbabde9n_4bMbBtd54dD5u0?e=8TTMZi
i have no idea
43 + quantisation error
b9535843d87e is hash?
hash of what?
the net
appears so
info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info string Network replica 1: Shared memory.
NNUE network contributions (White to move)
+------------+------------+------------+------------+
| Bucket | Material | Positional | Total |
| | (PSQT) | (Layers) | |
+------------+------------+------------+------------+
| 0 | 0.00 | + 1.98 | + 1.98 |
| 1 | 0.00 | + 34.08 | + 34.08 |
| 2 | 0.00 | + 21.72 | + 21.72 |
| 3 | 0.00 | + 17.79 | + 17.79 |
| 4 | 0.00 | + 17.84 | + 17.84 |
| 5 | 0.00 | + 32.42 | + 32.42 |
| 6 | 0.00 | - 11.03 | - 11.03 |
| 7 | 0.00 | + 32.33 | + 32.33 | <-- this bucket is used
+------------+------------+------------+------------+
NNUE evaluation +32.33 (white side)
Final evaluation +14.02 (white side) [with scaled NNUE, ...]```
appears wrong
(startpos)
ah, sha256sum of the entire file, yes
sigh
well it would have been too good to be true if it just worked
can you load startpos in to bullet
and get the active feature indices?
for white perspective
22272 22017 22146 22403 22468 22149 22022 22279 21896 21897 21898 21899 21900 21901 21902 21903 21872 21873 21874 21875 21876 21877 21878 21879 22264 22009 22138 22395 22524 22141 22014 22271
ok lemme see
added: 22208 21953 22082 22339 22468 22085 21958 22215 21832 21833 21834 21835 21836 21837 21838 21839 21936 21937 21938 21939 21940 21941 21942 21943 22328 22073 22202 22459 22524 22205 22078 22335
huh
ah so mine are completely wrong then
ah lol i just had all piece colors wrong
BLACK added: 22328 22073 22202 22459 22524 22205 22078 22335 21936 21937 21938 21939 21940 21941 21942 21943 21832 21833 21834 21835 21836 21837 21838 21839 22208 21953 22082 22339 22468 22085 21958 22215```
wait shoot why are these not the same
are they the same
do you want to try kiwipete
BLACK added: 22328 22524 22335 21936 21937 21938 22195 22196 21941 21942 21943 22058 22445 21871 21857 21924 21915 22044 22096 21969 21844 21973 21846 21832 21834 21835 22348 21837 22094 22208 22468 22215```
that looks like startpos
bruh sorry
ok looks good now. let's try again
loss is still fucked, let's try smth where stm and ntm have different buckets and mirroring, e.g. 8/2p5/8/2kPKp1p/2p4P/2P5/3P4/8 w - - 0 1
also correct, i'll try disabling the factoriser, maybe the issue is there
hm can't really find the reason now. fixing the feature index calculation increases loss by like 50%, and even before that, it was too high for 0 WDL in my opinion, and also barely drops in the next SBs, that doesn't seem right
we can still compare startpos eval tho
what was it supposed to be
tested the pre-TI plenty arch instead of the SF arch with this script. loss is similar, but it works fine in plenty. so i'll stop worrying about loss now :P
ok
psqt or skip neuron or both?
also idk bullet syntax but is this supposed to be forward(stm).crelu().pairwise_mul()
that's being done in the lines above
well
(in bullet)
ah fuck you're right
i think pairwise mul worked differently back when disservin wrote this, at least i just renamed the function, and forgot that it's done differently now
yeah the original one has like
pairwise_mul_with_affine
or smth
bruh why is L1 -> L2 also factorized
does sf really do this
apparently?
judging by what disservin wrote at least
+36(.00)
soo
i think let's just try to get the basic stuff working
so remove the l1 -> l2 factorization
this is not screlu
well it was called pairwise_mul_post_concat lol
ok i'll change
https://1drv.ms/u/c/74d39b59afff2586/IQDmpB0l9uUFRopEkoXs4Wh7AdIGX9C81iP9MdKqiDmxqrk?e=uHneav no l1 factoriser, crelu instead of screlu, startpos eval 80
this is the wrong concat order, it's sqr first and then standard
^^
bruh
like this, right?
out = out.abs_pow(2.0).concat(out);
out = out.crelu();
yeah
if this does what I think it does
it's concatting ac_0_out to ac_sqr_0_out
which means the squared stuff should be first