#UE Threat Inputs for AB

1 messages · Page 9 of 1

prime mica
#

huh how is that possible

rocky vigil
#

it probably compresses

#

the

#

i16 part

#

actually

#

💀

prime mica
#

lol

#

shapely

#

is this right?

rocky vigil
#

yep

#

it's also listed in the commit msg

prime mica
#

ok so try these two commits vs. each other

rocky vigil
#

yep

#

speedtest

#

1 thread and then n thread

#

1 thread is probably neutral

#

since no big mem pressure

#

but for stuff like concurrency or smp it should help

prime mica
#

ooh it seems considerably faster on bench actually

#

lemme run that first then I'll run some speedtest

#

@torn lagoon no this is an LSS

torn lagoon
#

Bench can be noisy

prime mica
#

we need an LSS emoji

torn lagoon
#

VVLSS when

prime mica
#

lololol

rocky vigil
#

sample size and tc are inversely proportional

prime mica
#

does anyone have a test script that uses speedtest

#

mine only uses bench

rocky vigil
#

simply run VVSTC to get VVLSS

prime mica
#

I guess I can just replace the command huh

frosty imp
#

Think you also need to change the regex

prime mica
#
==================
base (./stockfish    ) =    1461604  +/- 1364
test (...sh.after.gcc) =    1527947  +/- 1821
diff                   =     +66343  +/- 2014

speedup        = +0.0454
P(speedup > 0) =  1.0000

very promising

rocky vigil
#

👀👀

warm thistle
#

wtf

#

👀

rocky vigil
prime mica
#

oh no

#

this is bench

#

so single threaded

#

I'ma try speedtest now

rocky vigil
#

wait is it just 100 in a row

prime mica
#

yep

rocky vigil
#

or 100 distributed over N cores

prime mica
#

just need to figure out how to modify the trusty pyshbench...

rocky vigil
#

i think you would also need to set the speedtest invocation to be much less than 150 seconds

prime mica
#

yeah...

rocky vigil
#

since it's just

#

total number of threads running search

#

at any given point

#

i think

#

in terms of mem pressure

#

nevertheless it'll definitely show on fishtest

#

assuming the net doesn't die

#

in the meanwhile lemme set up old stage 1 net...

stray reef
prime mica
#

yes unless I bungled something

stray reef
#

when fishtest 👀

prime mica
#

I just make -j profile-builded consecutive commits from mr. sscg13

rocky vigil
stray reef
#

ah

rocky vigil
#

stage 1

prime mica
#

ok running speedtest 16 now, be back with results in ~10 minutes

#

have you tried this stuff locally?

rocky vigil
#

sanity check this and full training should be done in 2 days

rocky vigil
native lake
#

Should try bt4 data lol

rocky vigil
#

so my laptop is anything but reliable rn

prime mica
#

aura

rocky vigil
#

in fact the regular bench is 20% slower than with no load

#

and highly inconsistent ofc

#

i will get the stage 1 fixed games test up on fishtest tho

prime mica
#

movsxbw movsxbw movsxbw movsxbw

rocky vigil
#

average memory bandwidth bottleneck

prime mica
#

Gordon Moore shaking in his boots

rocky vigil
#

do the i8 -> i16 conversions show up anywhere

prime mica
#

I think that's what these are

#

unless you're referring to soemthing eles

rocky vigil
#

I am referring to cvtepi8_epi16

prime mica
#

ye that compiles to the venerable vpmovsxbw

rocky vigil
#

o

prime mica
#

move sign extend byte to word

#

first pair in, speedtest 16, +3.1%

#

so a bit more modest with more threads actually

rocky vigil
#

hmm

prime mica
#

but we'll see

naive comet
#

wait I have an idea

prime mica
#

pray tell

naive comet
#

I need to double check first

prime mica
#

yup then I'll move my king

#

I'm still hoping there's slightly more efficient ways to load the i8 to i16 than vpmovsxbw spam

#

my first idea didn't work when Yoshie tried it

#

but there might be some variant

#

will muck around later

rocky vigil
#

yeah

#

if i8 to i16 could be sped up it would be good

#

though definitely it seems memory -> i8 -> i16 is faster than memory -> i16

prime mica
#

ye

#

at least on some devices... we'll see on fishtest

#

make sure to turn off autopurge ^_^

rocky vigil
#

curious if anyone has the 256 MB L3 cpus lying around

#

with this one it'll actually fit in L3

prime mica
#

yum

rocky vigil
#

and that might be very big

rocky vigil
prime mica
#

my computer has 512 MB but sadly it's split up across the CoRE ComPlExeS

#

so I don't think it would have ur intended effect

#

not sure tho

rocky vigil
#

well i can dream

prime mica
#

lol

#

good

#
  1   21663457   22347071  +683614
  2   21772336   22595945  +823609
  3   21680315   22189962  +509647
  4   22011912   22642714  +630802

Result of   4 runs
==================
base (./stockfish    ) =   21782005  +/- 157128
test (...sh.after.gcc) =   22443923  +/- 208742
diff                   =    +661918  +/- 127303

speedup        = +0.0304
P(speedup > 0) =  1.0000```
#

speedtest 16

#

would you like me to try more/less threads and/or other parameters?

stray reef
rocky vigil
#

ah nice

rocky vigil
#

looks good

prime mica
#

yessir

naive comet
prime mica
#

yeah...

stray reef
#

i used mulhi instead of slli+srai when testing it, but yes that was the idea

prime mica
#

I'm still confused why it didn't work tbh

stray reef
#

maybe mulhi latency is too high, with small L1 there's really not a lot of iterations

naive comet
#

would slli+srai be better?

prime mica
#

perhaps

#

yaeh could try that

rocky vigil
stray reef
#

can easily try that in about 45mins

rocky vigil
#

if validation loss is anything to go by it should be fine in terms of eval quality

#

the absolute best case would be a double whammy of the QA=255 change making it both faster and better now

prime mica
#

let us hope

#

u are doing god's work here

rocky vigil
#

nah

prime mica
#

new evaluation function is huge

rocky vigil
#

the only real work I have contributed is impl

#

and the random suggestion to only apply i8 to threats

#

which turned out to be 🔥

prime mica
#

that's like saying the people who built the Panama Canal only dug up 82 kilometers of soil, they didn't actually draw the line on a map

rocky vigil
#

fair

prime mica
#

😊

rocky vigil
prime mica
#

👻nn-1c000000000👻

rocky vigil
#

with this i hope ppl start exploring eval improvements again

prime mica
#

watershed moment

#

next chess move won't know what hit them

rocky vigil
#

ncm actually favorable to threat inputs

#

with the smp

prime mica
#

yep

stray reef
#

what is the L3 size on ncm? :P

prime mica
#

once things are cleaned up I'ma try porting some of my layer combining work to threat inputs

#

I actually think it could work even better

rocky vigil
#

"NCM uses Dell R7515 128-thread EPYC 7702 dedicated servers to perform its dev build tests. Each server plays 16 games concurrently with 30+0.3 time controls. Hash is set to 128MB, and Threads is set to 8."

#

256 MB shared

stray reef
#

256MB l3

prime mica
#

fancy schmancy

rocky vigil
#

ok but then why didn't it have any effect

#

(the memory sharing patch)

prime mica
#

I wondered that too

rocky vigil
#

bc with that the entire net could've fit in that 256 MB

prime mica
#

maybe they have IPC disabled

rocky vigil
#

maybe it's split up across

prime mica
naive comet
#

maybe shuf instead of left shift is better

#

idk free up ports? idk

#

concurrent execution smth smth

prime mica
#

even with a noobish implementation like that one ^^ it's a decent bump

rocky vigil
prime mica
#

lol

rocky vigil
#

random walk go

naive comet
prime mica
#

fancy schmancy

#

what is a finny table

#

oh is that the entryTile stuff

naive comet
#

basically a small cache to store old accumulators and positions so we can diff the positions and UE that instead of refreshing from scratch

naive comet
prime mica
#

lol

#

you anticipated the psychological effects quite precisely

#

debugging it took a couple hours

#
Result of   4 runs
==================
base (./stockfish    ) =   70245225  +/- 1147585
test (...sh.after.gcc) =   72228060  +/- 894349
diff                   =   +1982836  +/- 871094

speedup        = +0.0282
P(speedup > 0) =  1.0000

64 threads

#

so seems to scale nicely enough (super noisy at these thread counts tho)

rocky vigil
#

yea pretty consistent speedup

naive comet
#

why does this guy have 64 threads

split warren
rocky vigil
#

it has occurred to me that I maybe should've tested fixed nodes sanity check

stray reef
split warren
#

Though it'll probably be tomorrow morning as I'm not gonna do that on my phone lol

prime mica
#

just probing to see whether it's a potentially good idea. Honestly if in the end it's <4 ELO STC I don't think it's worth it bc you completely break encapsulation of the NN layers

naive comet
#

anematode you might want to combine this thing with refreshes like we do at alex

#

I think it gains more

#

and is less cancer to implement

#

but idk

prime mica
#

yeah I will def try that!

stray reef
#

that works?

naive comet
#

😎

prime mica
#

lol

#

'tis always fun to have convergent ideas

rocky vigil
#

bruh this nnue training for Prolix is not ideal rn

#

i can only use like 3 concurrency for fixed nodes

#

sanity check

#

instead of 12

prime mica
#

prolix

split warren
#

So just to make sure I understood, I gotta clone shawns ti branch and then do speedup between the two branches tomorrow?

prime mica
#

hm?

#

I think sscg13's branch acutally

split warren
#

Sscg13 is the dev, what's the base?

prime mica
#

two consecutive commits in his branch

rocky vigil
#

yea

split warren
#

I'll post the results here tomorrow

stray reef
rocky vigil
#

that would also work

#

just different nets

rocky vigil
#

it shouldn't matter

#

holy sss in that test tho

split warren
#

For speedup wouldnt matter as long as I'm using same net in both branches so we cool

rocky vigil
#

right on time

#

lmao

split warren
#

Bruh I have been meaning to, I'll do a Prolix net train for you while you do this, i gotcha

#

I know you've sent me the info, I'll actually get to it

prime mica
#

kyoot

split warren
#

Ngl, looking at ur net the last time, it's actually a pretty quick job 😉

twilit oriole
#

If the benches between the branches is different a speedup test is not valid

naive comet
#

^^^^^^

#

unless you do like fixed nodes or smth

split warren
#

Hash gate, thread gate n now benchgate?

prime mica
#

scandal central

twilit oriole
naive comet
rocky vigil
#

so yeah

#

just do two consecutive

#

commits

naive comet
rocky vigil
#

to avoid that

naive comet
#

and still a good approx imo

#

idk

#

at least in my experience

rocky vigil
#

i have those up

#

one i16

#

one i8

naive comet
#

ok

twilit oriole
rocky vigil
prime mica
#

@naive comet lol your neural network code is lowk 10x easier to read and understand than Stockfish's

rocky vigil
#

yeah sf nnue code is uh

#

ngl

#

it's not good

prime mica
#

I feel like it's just overabstracted...

#

maybe once threat inputs are in an overhaul is in order

rocky vigil
#

btw fixed nodes is concerning so lemme look into inference again first

...      Stockfish TI-i8 playing White: 144 - 105 - 251  [0.539] 500
...      Stockfish TI-i8 playing Black: 69 - 156 - 275  [0.413] 500
...      White vs Black: 300 - 174 - 526  [0.563] 1000
Elo difference: -16.7 +/- 14.8, LOS: 1.4 %, DrawRatio: 52.6 %
SPRT: llr 0 (0.0%), lbound -inf, ubound inf
1000 of 1000 games finished.```
prime mica
#

O no

#

when the i8 😭

rocky vigil
#

no this must be net issue

#

or inference issue

prime mica
#

yeah ik

rocky vigil
#

@naive comet shouldn't this actually be 255 / 256

#

or am I throwing

naive comet
#

this should be 255/256

naive comet
rocky vigil
#

could that be worth 15 elo

#

or however much stage 1 is losing at fixed nodes

naive comet
#

for me at least it's like minimal

#

in my old experience

#

you can honestly retrain from this xd

stray reef
naive comet
#

I love the naming

rocky vigil
#

then where'd the 15 elo go

#

💀

naive comet
#

the remaining 5 stages?

rocky vigil
#

remaining 4*

#

maybe

#

gotta 🙏 hard for this one

naive comet
#

yeah 4 fucking stages bro

rocky vigil
#

tbf factorizer stage 1 was also 15 elo above non-factorizer stage 1

#

and then only 3.5 elo

#

in the end

naive comet
#

just update the branch and retrain from there

prime mica
#

The x86 ISA and its consequences have been a disaster for the human race

#

Ugh hopefully we can figure the regression out 🤞

rocky vigil
#

i think the smallnet just died

#

with this

#

actually shoot

#

yeah

#

i killed the smallnet

#

that might regain some portion of elo

rocky vigil
#

so

#

I have not been able to do i8 on the existing net

#

in a manner that doesn't lose 300 elo at fixed nodes

stray reef
#

i'll stop it so my tune finishes faster :P

rocky vigil
#

salvation

rocky vigil
lofty cedar
#

The current i8 net loads half the vector at once and casts up.

#

But was that optimal?

#

Should we instead load the vector in full then split in half?

violet badger
#

am I reading fishtest correctly that the i8 branch gains about 10Elo STC? That would be impressive of course.

twilit oriole
#

yoshie had similar results, it compresses a lot at LTC because of the slight fixed nodes loss also

violet badger
rocky vigil
#

Ah either way works

#

It would’ve been fine to just go through with the original run

#

But either way the net will be done in ~2-3 days

rocky vigil
lofty cedar
#

Apruvu sama.

#

I tried loading and then using vector extract instruction to split instead of loading individual vectors.

violet badger
lofty cedar
#

Oops... fixed it not working on AVX2. Apruvu sama.

split warren
# rocky vigil anyways just run <https://github.com/sscg13/Stockfish/commit/5a6633ad554f22ef1ad...
  Run  1: 73598785 nps
  Run  2: 73103840 nps
  Run  3: 72480310 nps
  Run  4: 72441752 nps
  Run  5: 73022612 nps
  Run  6: 72743447 nps
  Run  7: 73217047 nps
  Run  8: 73797795 nps
  Run  9: 73452183 nps
  Run 10: 73739312 nps
Benchmarking 83eb0e1...
  Run  1: 67691034 nps
  Run  2: 67220747 nps
  Run  3: 67977216 nps
  Run  4: 67682587 nps
  Run  5: 68257834 nps
  Run  6: 67943234 nps
  Run  7: 67116181 nps
  Run  8: 68663434 nps
  Run  9: 67257098 nps
  Run 10: 67305347 nps

Engine                        Average NPS   Failures
------------------------- ---------------   --------
5a6633ad                         73159708          0
83eb0e1                          67711471          0```
rocky vigil
#

Hmm +8% pretty nice

#

All that’s left now is to wait for net

green moat
#

Is this correct as of now?
"Some speedup, i8 inference, i8 nets training, verbatim nets.....and eventually SPSA the net"

#

What would be next steps before merging TI?

torn lagoon
green moat
torn lagoon
#

It was agreed to not spsa

violet badger
#

(AMD Ryzen 9 3950X)

prime mica
#

We are off to the races!!!

#

Interesting that my computer saw the least speed up this time…

violet badger
#

really, this stuff is getting pretty HW dependent.

prime mica
#

😩

violet badger
#

in fact so HW dependent it is currently not compiling on ARM 😉

prime mica
#

LOL true

violet badger
#
nnue/nnue_accumulator.cpp:362:65: error: 'vec_convert_8_16' was not declared in this scope
  362 |                     acc[k] = vec_sub_16(acc[k], vec_convert_8_16(column[k]));
      |                                                 ~~~~~~~~~~~~~~~~^~~~~~~~~~~
nnue/layers/../simd.h:173:43: note: in definition of macro 'vec_sub_16'
  173 |     #define vec_sub_16(a, b) vsubq_s16(a, b)
      |                                           ^
prime mica
#

Ill do an investigation about the best approach for ARM

#

I suspect we should have some improvement from vldq4_u8 or whatever it’s called which loads four vectors in one instruction

#

oh also, should we maybe merge shared memory into this and run another fishtest to see if modifies the situation?

amber fern
#

Other than it being stronger I mean.

prime mica
#

I mean it's just exciting to have a different evaluation scheme

#

shawn's thesis is that Elo stagnation is in large part due to unchanging evaluation and I'm inclined to agree

rare jacinth
#

@prime mica btw the cached updates I suggested would probably be even more performant for threat inputs

prime mica
#

yeeee I will try them soon

amber fern
#

Hoping for a new stockfish with thread inputs as master as my early christmas present 🙂

prime mica
#

XD

#

ok we'll try to get it in before Dec 25 :)

#

@violet badger got it working on ARM...

#

will do some apple silicon speed tests in a bit

#

wow, fantastic on ARM

#
==================
base (./stockfish    ) =    1070090  +/- 12943
test (./stockfish_i8 ) =    1196706  +/- 12269
diff                   =    +126615  +/- 6814

speedup        = +0.1183
P(speedup > 0) =  1.0000

CPU: 10 x arm
Hyperthreading: off 

#

(Apple M1)

#
==================
base (./stockfish    ) =    1389464  +/- 16853
test (./stockfish_i8 ) =    1539608  +/- 18562
diff                   =    +150144  +/- 2732

speedup        = +0.1081
P(speedup > 0) =  1.0000

CPU: 12 x arm

(Apple M4)

prime mica
#

hm the arm64 codegen still looks suboptimal on Apple clang

#

I'll see if I can squeeze out a bit more with vld1q_s8_x4

prime mica
#

@rocky vigil we no longer prescale weights?

rocky vigil
#

Nope

prime mica
#

gotcha

#

it's not even helpful anymore right

rocky vigil
#

It would force extra x2s elsewhere since i8 is restrictive

prime mica
#

or in theory (suppose you could double them for free in add/sub) would it be nice?

#

the reason I'm asking is bc

#

ARM's i8 -> i16 conversion instructions have a shfito perand

prime mica
#

lol

rocky vigil
prime mica
#

yeah but does that even help

rocky vigil
#

Or smth

prime mica
#

for now I just have shift = 0

rocky vigil
#

Idk

#

Wait how does mulhi trick work on arm

prime mica
#

not sure

#

anyway we can flesh it out later, all that matters is it's already a huge win on ARM too

rocky vigil
#

Yeah

rocky vigil
#

Code cleanup also needs to happen

prime mica
#

what are your ideas for cleanup

rocky vigil
#

Like just a generic statement

prime mica
#

oh sure

rocky vigil
#

I think for one it’s currently hacky how I redefine vec ONE depending on smallnet

#

Like vec(254 + use_threats)

prime mica
#

couldn't we just get rid of use_threats

#

oh wait that's for small net only I see

rocky vigil
#

Also need to add non-avx2 back

#

There’s a way that Plentychess uses but it was a little too complicated for me to bother copying

prime mica
#

lol

#

non-avx2 mfs can upgrade

split warren
#

Ive wished this too many times to actually admit... But then there's always someone running a cpu from 2004 still

#

Most often it's these outdated xeon cores v2 or whatever pre avx2 was

prime mica
#

sigh

rocky vigil
prime mica
#

I mean we'll just write the straightforward translation and then it'll be good enough right

prime mica
#

somewhat crazy idea

#

could it potentially profitable to use VNNI instructions with multipliers of ±1 to further improve threat input updates

#

the pain point is that it accumulates to 32 bits

#

I'll probably try it once it's merged

amber fern
stray reef
violet badger
rocky vigil
#

yeah I can do that

prime mica
#

I modified it inline so it won't compile on x86 anymore

rocky vigil
#

oh

#

uh

prime mica
#

but I can fix that

rocky vigil
#

just get rid of that and then pr

#

ig

prime mica
#

yep!

violet badger
#

also didn't check this compiles on old arm...

#

anyway, progress..

#
==== 4a97c2ba244790c41bff09968d93430966ac5d48 ====
1 Nodes/second : 290009447
2 Nodes/second : 291033770
Average (over 2):  290521608
==== 83eb0e1d835e138194237c33cc968c48f42a6a68 ====
1 Nodes/second : 267842604
2 Nodes/second : 266942817
Average (over 2):  267392710
#

good 8% speedup

prime mica
#

😎

#

@rocky vigil PRed

rocky vigil
#

cool

#

merged

#

another name joins the eventual pr

prime mica
#

big ball of moss

stray reef
#

nice. how long until the net is fully trained?

rocky vigil
#

stage one being +10 elo at stc as expected

#

maybe +11

#

idk how much that minor additional smallnet fix is

prime mica
#

vs. what?

rocky vigil
#

vs last run stage 1

prime mica
#

ah ok

#

so just measuring the effect of speed ups

violet badger
#

39h I would guess.

rocky vigil
#

for which run

#

i guess 127 / 128 vs 255 / 256 is minor

violet badger
#

(for the first one)

#

2000 epochs remaining.

#

gives us time to think about QAT..

prime mica
#

what is QAT

violet badger
#

quantization aware training

prime mica
#

ohhh

rocky vigil
#

so far the only real change is it knows about the i8 limits

stray reef
#

in my experiments it only helped when the quantisation was really tight (when all feature weights are i8)

#

right now quantisation isn't really different then before

#

but ofc maybe it's a way to squeeze another elo at the cost of training speed :P

prime mica
#

what was the fixed-nodes loss

rocky vigil
#

linrock claimed the quantization change is worth 1 elo or so

#

so it's likely we might not even see fixed nodes loss

violet badger
#

if we're only losing 1Elo we're not quantizing hard enough 😉

prime mica
#

lololol

violet badger
#

int4 SF when?

prime mica
#

ideal

prime mica
rocky vigil
#

oh i meant like

#

the 127 -> 255 QA change

#

maybe cancels out the slight loss

#

from i8

prime mica
#

ohh I see

rocky vigil
#

i guess it's a "good antiscaler"

#

big gain at stc, moderate gain at ltc, neutral at vvltc

stray reef
# prime mica oh you did try quantizin the main net?

ok i checked back what i actually tested. QA=63 + QAT was about as strong as QA=127 (both -10 fixed nodes to master). but with QA=127, QAT did not help (same result against master)

this is all full i8 (except master, that was still i16 back then)

prime mica
#

interesting ok

rocky vigil
#

i guess it's only a bad antiscaler if it goes negative

violet badger
#

but that's where I think QAT could help. See how much it reduces fixed node Elo loss.

rocky vigil
#

i thought the fixed node loss was only -2 or smth

violet badger
#

so, freelo 😉

rocky vigil
#

hopefully the good results continue up to stage 4/5

violet badger
#

I think there is also some loss on the other parts of the net.

#

we could probably soon test stage 3.

#

that's pretty close to a converged net.

rocky vigil
violet badger
#

I mean they are also quantized from float to int

rocky vigil
#

right yeah

#

it might benefit there slightly

violet badger
#

I'm wondering if part of the SPSA gains are just related to cleaning up quantizing..

prime mica
#

that would be cruel but hilarious

rocky vigil
#

i remember viren saying a while ago that the quantization in later layers has a large effect

#

i think it is worth revisiting

frosty imp
#

QAT should be like 10 lines max with pytorch

#

any branch to work on?

rocky vigil
#

though idt it's exclusive to threat inputs

#

(QAT on the later layers, that is)

frosty imp
#

can't test rn but I think it should work

#

maybe we can extend the quantization to weights later

violet badger
#

can try to run this branch as well. I'm just somewhat surprised that this is the way it is done. I would expect some term added to the loss, that drives weights to be close to quantized values.

frosty imp
#

hmm I'm not sure if the quantization is applied to the weights or activations just from the bullet commit alone

#

quantized weight version

prime mica
#

💦

violet badger
#

so the latter version is the thing to run, I assume?

frosty imp
#

probably

violet badger
#

let's start with that and see where we get.

#

started

violet badger
#

step 3 training finished..

rocky vigil
#

Hmm

#

Can give it a go

#

It should already gain at this stage

#

If +10 is real

prime mica
#

Yes

#

Plz

violet badger
#

test is up, but I'm not sure what is being tested against what 🙂

prime mica
#

I think against stage 3 of previous threat weights run

rocky vigil
#

nope

prime mica
#

O?

rocky vigil
#

just stage 3 against stage 5

prime mica
#

I see ok

violet badger
#

so best threats setup, against current stage 3 i8

rocky vigil
#

stages 4/5 are worth max like 3 elo anyways from what we've seen

violet badger
#

yeah.

rocky vigil
#

so it should already gain at this point

#

hopefully that will be confirmed shortly

prime mica
#

Against master or against previous threat inputs

#

I am giddy

#

Time to sip my morning coffee and watch the Elo while reading the news

violet badger
#

you'd better sip Elo while watching the news.

prime mica
#

lololol does it taste good

violet badger
#

only one little sip and you're hooked, i've heard

prime mica
#

Well shit I gotta find some then

violet badger
#

let me trigger it a little bit

prime mica
#

😍

rocky vigil
#

gg sf 18 is here

prime mica
#

what are we expecting Elo wise?

violet badger
#

having no expectations is the safest, but some speedup O(10Elo) and some quantization error O(1Elo). As long as prefactors are no 1/3 and 3, all good.

prime mica
#

Elo: 6.32 ± 3.6

#

comports with stage 4/5 being handful of points right?

#

big error bars tho

stray reef
#

test vs master now?

violet badger
#

I would wait for the stages 4/5 to finish.. we can than pick the best net

rocky vigil
#

A few stage 4/5 runs are finishing in the next few days

rocky vigil
#

Btw we can look into removing leb128 for the threat weights

#

It literally cannot perform better

#

And removing it would simplify the current parsing code

prime mica
#

It should just be a memcpy at this point right

#

Followed by a permutation ofc

rocky vigil
#

Yeah

#

Rn what is done is read the entire thing into a big array

#

And then move it into separate arrays

#

Because as it turns out our readleb128 also includes a length

#

So it actually cannot just be read directly

rocky vigil
#

Which maybe also guards against the machine being big endian somehow

#

Ah nvm single byte

rocky vigil
prime mica
#

that's surprising..

#

unless you mean read_leb_128 is slow

#

which would make sense to me

rocky vigil
#

It’s like half a second

#

Or smth

#

Idk

#

At the very least if I open the exe and type uci right away it doesn’t process instantly

prime mica
#

ai ya

#

that's definitely something to fix before merging

amber fern
#

So the threat-inputs-i8 is the new optimisation that to my understanding reduces weight precision slightly to improve the speed of the network? Does that make it a slight antiscalar?

rocky vigil
#

+10 stc +3 ltc is our prediction

#

I guess that is an antiscaler

#

Technically

amber fern
#

what about vltc?

rocky vigil
#

Neutral most likely

#

Well

#

The fixed nodes loss is probably maximum 1-2 elo

#

So the speedup is well worth it

#

Lots of variables here

amber fern
#

has this been tried will the smallnet as well? Im guessing its also highly worth it

rocky vigil
#

Smallnet isn’t using threats at all

#

Still the old smallnet

frosty imp
naive comet
#

I will try to optimise the code maybe

#

is the i8 merged?

rocky vigil
#

although stage 3 is already much better at stc

naive comet
#

ugh

#

so I have to pull your branch instead of shawn

rocky vigil
#

for now yeah

naive comet
#

what is the latest branch?

#

for i8

prime mica
naive comet
#

wait hold up

#

why is the nnue code so nice now

violet badger
#

you have seen the light?

green moat
#

Also, does vondele have the recipe for smallnet cooking?
🤔

rocky vigil
green moat
#

Probably the Elo gain on TI paradigm for smallnet would be compensated by the speed loss, so in the end smallnet with TI might actually be neutral at best
😐

violet badger
#

we did train a smallnet with threats.. but it can't gain IMO.

rocky vigil
#

Viren also had an idea to use a single net but either psq inputs only or psq + threats, although experimenting with that can wait for after merge

dark stream
#

How long until the net is fully trained?

prime mica
#

ok quick question, aren't the threats which involve a piece attacking a king in a way that can't be blocked (e.g. slider directly adjacent, or knight attack) completely redundant?

#

because they are implied by the corresponding main net feature...

#

like random example

#

the threat "queen on d8 attacks king on c7" is active if and only if the main net feature "king on c7 and queen on d8" is active

#

if I'm not mistaken we could actually test this post-training... just need to add the threat weights to the right part of the main net weights then zero out the original

rocky vigil
#

Knight on e5 and knight on f7 isn’t the sum of their individual weights

stray reef
#

but only for the correct stm

prime mica
rocky vigil
prime mica
#

oh right...

#

ok that part negates any benefit of my idea

prime mica
#

do we ever train on positions in check? if not then those features will probably be driven to ~0 anyway...

violet badger
#

no.. skipped

prime mica
#

surely it would be good to skip add/sub for them though

#

I'll take another look at rn5's work

rocky vigil
#

Also some interesting thing is bullet initializes threats / psq separately according to their individual sparsities

#

Idk if it’s any good

#

But can certainly be tested

prime mica
#

interesting

#

so many avenues of exploration 😩

rocky vigil
#

In general I wonder how much weight initialization matters

violet badger
#

right now it is a bit harder to see, as I pushed an additional commit to 2 of 3 PRs, and github doesn't show the pipeline on the previous commit to be active, despite it being active.

rocky vigil
#

👀

violet badger
#

seems real.

rocky vigil
#

arm speedups for i8 done some big magic

#

as an aside, that is the strongest stage 2 I have ever seen

violet badger
#

also.

#

but see stage 4 😉

#

anyway, looks very sweet now.

prime mica
#

yw

#

what's "reference" in there, the previous threat inputs?

#

or stage 2 or what

violet badger
#

master

#

see yaml description 🙂

prime mica
#

no fucking way

violet badger
#

agree, that's sweet ...

prime mica
#

fpoaijpoiajewpofijwofeij

#

I am so excited

violet badger
#

not without reason.

prime mica
#

impressively calm response

violet badger
#

8640 messages later..

prime mica
#

how can I download the net?

#

I'm curious how it'll be on my computer given the lesser speedup

prime mica
#

Thxxxx

#

I think I’ll make another attempt at speedups this weekend on my train trip

violet badger
#

(you can get there via the artifacts of the proper training step)

prime mica
#

See if we can squeeze out a bit more..

violet badger
#

excellent

prime mica
#

I still can’t believe that’s against master…

#

Ok I’ll shut up

violet badger
#

I agree, though. It is quite spectacular.

#

let's triple check somehow 😉

#

-engine name=reference cmd=/workspace/scratch/packages/stockfish/69a01b88f35db2a5003d42116f573207ca5c275b-profile-build/Stockfish/src/stockfish

#

undeniable...

foggy wind
#

Maybe it doesn't scale at all and is therefore useless Kappa

violet badger
#

threats known antiscaler.

desert tree
#

is it happening👀

rocky vigil
#

looks safe enough to use stage 4 so I'll start a few progtests on fishtest

violet badger
#

might be sprt against master time?

rocky vigil
#

or that

#

the whole shebang?

#

stc / ltc / stc smp / ltc smp?

violet badger
#

well, one at a time..

rocky vigil
#

ok

prime mica
#

don't forget to turn off auto purge ;)

amber fern
#

Wait, no way we are about to get a fishtest of threat inputs vs master that is gaining?! 🥺

prime mica
#

threads

#

^_^

naive comet
#

threat inputs

prime mica
#

maybe I'll figure out a way to add thread count as a feature

#

then we'll have true thread inputs ;)

amber fern
#

So is it on fishtest yet? 🙂

prime mica
#

this is so hype

#

NOT A DRILL

amber fern
#

YOOOO!!!!

#

+14 guys xD

prime mica
#

let's take bets

#

I'm betting +4

amber fern
#

where do I check the error bars?

amber fern
#

okay maybe +5

plain flower
#

it'll slide down and settle at +1 </pessimism>

desert tree
#

im betting +7 because thats what the SPRT is currently saying

amber fern
#

Okay, guess ill middleground my guesses: 5.5 +-1 😂

warm thistle
#

i'm guessing 0 +/- 1199.99

split warren
#

imma bet 4, not 3, not 5, but 4

amber fern
#

Its gonna be 1 guys...

split warren
#

vvltc smp banger

split warren
prime mica
#

different net... but not sure if that can explain the delta

#

and different arch ofc but I don't think that's entirely it either

lofty cedar
#

All these for 1 elo?...

#

There must be more!

naive comet
#

#bet fail red -0.69

lofty cedar
#

Though I don't think threat input is an anti-scaler.

#

It's quite well-established that bigger neural networks are good scalers, not meaning that it would scale well to have a bloated net size, but that if the bigger net is already good at STC, it would probably be good at LTC and above.

#

However, I think it might have to do with search.

prime mica
#

I think it was a joke

lofty cedar
#

I think it may be an apparent anti-scaler if the search tuning was so heavily done on the old net that the search now adjusts for all the quirks of the old net.

#

And since the search-wide tuning was done at VVLTC, as you approach longer TC, you're fighting an increasingly uphill battle.

naive comet
#

maybe it becomes anti scaler if the speedup was too good

#

in that case we just increase L1

prime mica
#

(local SPRT, posting for future reference)

amber fern
prime mica
#

it looks like a lot more stuff got inlined into evaluate in master than in threat inputs

#

I wonder if forcing inlining would be good or bad

#

partial_insertion_sort is still taking a disgusting amount of time 😩

amber fern
#

rn the fishtest isn't going great... but I believe that better nets will come! And tuning for it will help a ton

prime mica
#

vpmovsxbw my beloved

amber fern
#

Guys, which net is next? Like to be tested on fishtest, I assume the current threat-inputs-i8 (update net) isn't the strongest one coming?

prime mica
#

not sure

amber fern
#

Any good ideas that weren't put into that net?

prime mica
#

if it's really close between master and threat inputs that's pretty great bc I'm positive there are more speedups to be found

warm thistle
prime mica
prime mica
#

oh you mean the log likelihood ratio

#

yeah we'll see

amber fern
amber fern
prime mica
#

oh ok

prime mica
#

especially with this where there's probably large inter-computer differences

#

actually I haven't looked at the residuals yet let's see

amber fern
#

it feels like a sport event watching the dials update live lol

#

nerdiest sporting event in history that is

prime mica
#

lololol

#

yeah just eyeballing the residuals, AVX2 machines are suffering while AVX512 machines are doing swell

#

probably the vpmovsxbw spam 😩

prime mica
#

lol

#

you can see the per-worker results

amber fern
prime mica
#

the link I sent

#

there's a dropdown somewhere

amber fern
#

yeah is it here?

prime mica
#

yep, you see the big table

#

one thing I've been meaning to add to fishtest is an ability to aggregate by some property

amber fern
#

how do I read the avx2 vs 512 differences? The residuals?

prime mica
#

yes but if I'm not mistaken the residual doesn't actually tell you whether the worker is significantly lower or higher than the mean

#

so you have to look at the pentanomial

#

anyway dw about it

#

we'll see in 12 hours where we're at

amber fern
#

haha, yeah wait till it gets to 50k games ig

rare jacinth
#

why hasn't l2 been increased to 31 to compensate for the smaller accumulator?

prime mica
#
Results of New vs Base (30+0.3, 8t, 256MB, UHO_4060_v4.epd):
Elo: 9.17 +/- 10.71, nElo: 21.75 +/- 25.38
LOS: 95.35 %, DrawRatio: 64.17 %, PairsRatio: 1.35
Games: 720, Wins: 201, Losses: 182, Draws: 337, Points: 369.5 (51.32 %)
Ptnml(0-2): [0, 55, 231, 74, 0], WL/DD Ratio: 1.22
LLR: 0.25 (8.4%) (-2.94, 2.94) [0.00, 2.00]```
results trickling in from a VVLTC run I'm doing
#

probably equivalent to 80+0.8 8t or so on fishtest

#

not quite as dramatic as the STC on vondele's CI...

rocky vigil
#

We have a couple more training runs so far, might brute force a better net by sheer luck

#

Also like Daniel said could also test increasing L2 size

stray reef
#

is the SPRT that's running rn a stage 5 net?

rocky vigil
#

local testing had them ~~equal but the stage 5 one is performing better on fishtest so far

amber fern
#

yeah! so far: stage 4 net = -0.26 elo (30k games) stage 5 net = 2.25 elo (17k games)

rocky vigil
#

btw it's time to look into preparing the branch for PR

#

so, what would we like the format of the net to be

#

some minor notes I have

#
  • change the mirroring of threat inputs to efgh (this can be done by permuting the weights, e.g.)
  • change the net format to store the i8 weights verbatim
  • ensure compilation works with all architectures
  • clean up the code generally
  • (optionally) do a lil bitcoin mining to rename the net
#

on the net side there are still a couple other things to try

#

check if L1=1280 works again after the i8 speedup

#

or check if L2=31(+1) works with the general L1 reduction

amber fern
#

Looking forward to SF18! 🙂

rocky vigil
#

heh would need to be like 10 elo gain for that

#

although

#

+X vvltc and another +Y from the (search) spsa that will happen after gets us closer

#

to sf 18

violet badger
#

let's not confuse this thread with SF18, it is going to be complex enough without that aspect 😉

amber fern
rocky vigil
#

need @violet badger to set it up

#

would also be helpful if someone had a profile of latest

#

branch

#

my general estimation is that l2=31 will end up being -5% speed or so

violet badger
rocky vigil
#

difficult there to tell the relative runtime

#

I'll try

violet badger
#

I'll setup the training, ultimately, one needs to measure to get a real number

rocky vigil
#

yeah

#

it would appear that l2 etc. take up 8% of total runtime

#

so I think -5% speed from doubling l2 seems reasonable

violet badger
#

so you change both l1 and l2?

rocky vigil
#

no

#

i think the l1 setting there gets overridden by --l1 option

#

anyways

violet badger
#

ah, it is just the diff showing it that way.

#

yeah, l1 is set by option

rocky vigil
#

i just saw it and thought it should change to make it more accurate

#

for cosmetic purposes

violet badger
#

so started

twilit oriole
#

im going to predict that fails at TC lol

twilit oriole
rocky vigil
#

i mean why not give it a try lol

#

we'll have to see

#

the actual slowdown

twilit oriole
#

Giving a prediction does not suggest not giving it a try (obviously)

#

The vibe I got was there was unusually high optimism for this

rocky vigil
#

nah I suspect 16 is optimal as well

#

in fact I am surprised 16 is better than 8 even

#

but indeed there is the possibility that since input -> l1 takes more portion of total time relative to l1 -> l2, l2 could be increased

naive comet
#

can I have some clarity - will threat-inputs-i8 eventually be merged into threat_inputs branch?

#

and if i want to write a patch that applies for both - which one do I base on/test against?

rocky vigil
rocky vigil
naive comet
#

ok

rocky vigil
#

rn anematode stage 5 branch performing slightly better on fishtest but still well within error bars

#

so idt it matters which of the i8 branches you use

lofty cedar
#

Maybe someone could make Finny table work with threat input?

#

Though not sure if it would gain.

rocky vigil
#

in what sense

#

full refreshes are basically inconsequential

naive comet
#

we dont have buckets on threat inputs

#

only HM

lofty cedar
#

Oh, I see.

#

I thought it had such a thing.

prime mica
#

So what explains the chasm between the run on CI and on fishtest

rocky vigil
#

hardware

#

x86 sees less benefit than arm from i8

#

apparently

prime mica
#

😩

rocky vigil
#

tbf why not start a 10k game ltc 1 thread

#

just to see how the scaling is like

stray reef
#

that's reasonable imo

rocky vigil
#

after i8 it should be neutral scaling

#

or so

prime mica
#

Yeah we should

rocky vigil
#

if it's antiscaling we have a slight issue

stray reef
#

just requires positive STC in that case

#

which is doable

prime mica
#

still sss but not looking too crazy at very long TC:

Results of New vs Base (30+0.3, 8t, 256MB, UHO_4060_v4.epd):
Elo: 3.43 +/- 3.73, nElo: 8.24 +/- 8.96
LOS: 96.43 %, DrawRatio: 65.57 %, PairsRatio: 1.12
Games: 5780, Wins: 1610, Losses: 1553, Draws: 2617, Points: 2918.5 (50.49 %)
Ptnml(0-2): [0, 470, 1895, 523, 2], WL/DD Ratio: 1.33
LLR: 0.69 (23.5%) (-2.94, 2.94) [0.00, 2.00]
#

(this is master vs. the stage 5 net)

#

we'll see on fishtest ofc

#

also I'm still confused, I thought we benchmarked some very nice speed gains on x86

rocky vigil
#

speedup

#

natural that speedup matters less at higher time control

prime mica
#

but shouldn't that help quite a bit on the current SPRTs?

rocky vigil
#

at stc

prime mica
#

ohhh

foggy wind
#

https://tests.stockfishchess.org/tests/view/690d2514ec1d00d2c195beb5


GROUPED BY ARCH

64bit AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                | Elo:    -2.45 ±    2.62 | LOS:   3.3% | LLR: -1.77 | [89, 2356, 4773, 2172, 114]
64bit AVX512ICL VNNI AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT | Elo:     2.85 ±    2.92 | LOS:  97.2% | LLR:  1.11 | [60, 1760, 3752, 1861, 71]
64bit AVX2 SSE41 SSSE3 SSE2 POPCNT                            | Elo:    -4.12 ±    3.68 | LOS:   1.4% | LLR: -1.40 | [60, 1200, 2437, 1097, 54]
64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                       | Elo:    -2.55 ±    3.97 | LOS:  10.4% | LLR: -0.80 | [51, 1061, 2141, 970, 65]
64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT                  | Elo:     4.68 ±    5.74 | LOS:  94.5% | LLR:  0.51 | [20, 445, 985, 498, 20]
64bit POPCNT NEON_DOTPROD                                     | Elo:    45.23 ±   23.07 | LOS: 100.0% | LLR:  0.33 | [0, 15, 55, 40, 2]
rocky vigil
#

lmao

#

arm be like

prime mica
#

NEON_DOTPROD enjoyer

#

ok yeah that's dispositive

#

arghh

rocky vigil
#

trust in scaling Kappa

#

smp is like +5 elo

prime mica
#

ok well pending scaling tests... does that mean that if we get like a 3% speedup across the board on x86 specific to threat inputs (not saying this is easy!) then we should be ok?

rocky vigil
#

1% is sufficient to make it pass stc sprt

#

anyways

#

it's like 1% -> 2 elo at stc

prime mica
#

right

#

tantalizing...

violet badger
#

Nice summary.. it basically is a question of having 'the right' HW on fishtest right now. VNNI seems to like this as well.

foggy wind
#

With fleed it would pass in a second xD

violet badger
#

ik...

prime mica
#

LOL

#

> deploy ARM cores
> release SF 18

foggy wind
#

We simply have to market this as mobile first. People only use their cell phones for everything anyway. Kappa

rocky vigil
#

total neon penta over the 2 tests

stray reef
#

what LLR for [0, 2] bounds?

rocky vigil
#

0.5

stray reef
rocky vigil
#

still need 5000 more neon games

violet badger
#

it is still somewhat surprising that neon does so well. Is there some particular instruction that works very well?

prime mica
#

I think the i8 -> i16 conversions are much cheaper

#

maybe not the only cause but you can do one load of 128-bits and then unpack it in two instructions into two 128-bit registers fit for accumulation

#

whereas on x86 we're doing one load + one conversion per accumulation

#

Yoshie tried a technique I suggested to avoid that but it performed worse

#



#ifdef USE_NEON
                for (IndexType k = 0; k < Tiling::NumRegs; k += 2) {
                    acc[k] = vec_sub_16(acc[k], vmovl_s8(vget_low_s8(column[k / 2])));
                    acc[k + 1] = vec_sub_16(acc[k + 1], vmovl_high_s8(column[k / 2]));
                }
#else
#

(vget_low_s8 is a no-op in assembly, just for casting porpoises)

#

I don't think that explains the full gap tho

stray reef
#

yeah there's no way this explains so much elo

#

especially since it's probably still mostly memory bound on x86

prime mica
#

honestly I"m not so sure ab that anymore...

#

I'll do some profiling later on my friend's older Intel box

violet badger
#

I think that big gap can almost only be explained by the used data now somehow fitting in some cache?

#

or by magic a better access pattern?

prime mica
#

hm yeah, maybe the VNNI trend is because of newer machines being more well-endowed

violet badger
#

possibly.

lofty cedar
#

I tried fusing load before. Didn't work.

prime mica
#

😩

prime mica
stray reef
#

@violet badger someone really wanted this STC to pass huh? xD

violet badger
#

oh, it passed 😮

stray reef
#

i see your arm machines there :P

#

congrats!

formal smelt
#

fighting against the technologov cores lol

violet badger
#

basically just documenting how HW dependent this is ....

#

let's wait a bit for the second test to pass, and after that submit LTC (for which I will remove again the arm machines).

rocky vigil
#

cool

rocky vigil
violet badger
#

well given the Elo numbers we had in the pipeline, it is clear they can easily do that

#

I would say go ahead and submit one LTC.

rocky vigil
#

ok

#

i suppose we just choose at random

#

since stage 4 and stage 5 are well within error bars at fishtest also

#

since I'm online rn i guess it'll just be stage 4 then

violet badger
#

either is fine.

#

eventually we sprt things against each other.

#

in principle 4 is nicer, since it would establish a shorter training baseline.

rocky vigil
#

ok

#

submitted

violet badger
#

I've also updated the reference in the pipelines to be your branch at the f3f net.

#

and switched it to sprt

#

so we will more easily see what is better in future tests.

rocky vigil
#

ah cool

#

actually the minor fix is finishing later today

#

so that'll be cool start

violet badger
#

in principle I'd have to stop and restart the pipeline for it to pick up that commit that changes the reference.

#

(right now I think the yaml it is testing is not yet with a suitable sf sha).

lapis parrot
#

anematode machine is like solo killing this test kek

#

smth like -27 -39 +1 -29 pairs

foggy wind
#

anematode-128cores-7b133829 | Elo: -6.29 ± 4.66 | LOS: 0.4% | LLR: -1.19 | [4, 608, 1351, 516, 5]

prime mica
#

Yaaaa the i8 speedup was so small on my computer

#

The break even point is probably even higher TC

prime mica
#

Silly suggestion, are there any things in search which are known to depend strongly on evaluation accuracy

#

or is the tuning far too diffuse

lapis parrot
#

well, any static eval based heuristic more or less

#

but it's like hmm

#

+90 elo from eval with major slowdown ~= +15-20 elo from search patches

#

at least this is what it was when the very first NNUE was introduced

#

also in general tuning should handle it nicely anyway

prime mica
#

right

#

like I'm wondering whether it makes sense given that we're within shooting distance of master to try some basic search tuning and see if we can exceed it

#

and that effort can be done in parallel with trying to speed up x86

lapis parrot
#

eh, not really imho

#

search tuning can and probably should be done on top of the net imho

prime mica
#

which net

lapis parrot
#

which passes

daring wren
prime mica
#

SPSA?

daring wren
#

maybe you are just tuning the search to be better regardless of what net is veing used

#

in that case, you're just hacking the net in and using the tune as an excuse to get a passing SPRT

prime mica
#

sure... but that can be validated by using the same parameters with the master net, no?

#

or running two SPSAs although that's expensive

lapis parrot
#

I would prefer to not touch search with new arch

#

I think maintainers will also be like this )

prime mica
#

kk

violet badger
#

yeah would be nice to break even with just net, and my expectation is that would lead to search tweaks afterwards.

#

ARMed riot against x86 right now

prime mica
#

lol

lapis parrot
#

hmph about search I would guess that since it has threat inputs

#

we can be more aggressive with capture pruning and qsearch

#

maybe

#

ofc it's all pretty vague

prime mica
#

sure

lapis parrot
#

maybe finally history adjustment will work for captures?

#

or correction history adjustments ?

violet badger
prime mica
#
#ifdef USE_NEON
constexpr bool threat_inputs = true;
#else
constexpr bool threat_inputs = false;
#endif```
lapis parrot
#

well this one can actually be a rluke

prime mica
#

I'm wondering whether speedups matter more at LTC than generally believed...

lapis parrot
#

they are like 70% of what they are at STC I think?

prime mica
#

oh what

#

I thought it was like 30%

#

hm ok

lapis parrot
#

elo wise? no

#

speedups are usually "mocked" for not scaling

foggy wind
# prime mica <@398510765910523904> would u mind running your Elo bucketing script on the PT h...
GROUPED BY ARCH

64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT   | Elo: 30.50 ± 1.84 | LOS: 100.0% | LLR: 29.71 | [2, 1204, 7486, 3301, 4]
64bit AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT | Elo: 25.27 ± 2.04 | LOS: 100.0% | LLR: 20.40 | [3, 1015, 6107, 2396, 4]
64bit BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT        | Elo: 24.71 ± 2.93 | LOS: 100.0% | LLR:  9.85 | [0, 515, 2989, 1178, 1]
64bit AVX2 SSE41 SSSE3 SSE2 POPCNT             | Elo: 19.43 ± 3.43 | LOS: 100.0% | LLR:  5.70 | [0, 388, 2176, 755, 2]
64bit SSE41 SSSE3 SSE2 POPCNT                  | Elo: 21.28 ± 9.48 | LOS: 100.0% | LLR:  0.80 | [1, 57, 300, 115, 1]
lapis parrot
#

because average patch is like ~ the same elo at LTC as it is at STC

#

and some even hyperscle

#

but it doesn't mean they are like 30% lol

prime mica
#

O wow, +5 delta between VNNI and normal AVX512

lapis parrot
#

210/142

prime mica
#

gotcha

lapis parrot
#

is like slightly less that 1,5

prime mica
#

lapis parrot
#

so speedups should be slightly above 66% stc -> ltc

#

just that usual stuff for sf releases is having this elo being 1:1

#

so logical patches in general scale better than speedups

lapis parrot
#

even our last PT is 31 - 27

#

which is far above 1,5:1

violet badger
#

a bit old data, I think that with the current book it is even more similar.

lapis parrot
#

maybe

#

you can measure

#

🙂

prime mica
#

📏

lapis parrot
#

also yeah at some point compression hits you anyway

#

even with UHO books