#small speedup ideas

295 messages · Page 1 of 1 (latest)

narrow plinth
#

Template Thread based on main thread vs non main thread. Use std::array for Threads instead of vector.

tardy jasper
#

is it possible to resize arrays on the fly like vectors?

narrow plinth
#

no but it is limited at compile time anyway

#

ucioption.cpp it is limited to 1024 max right now

light cliff
#

why would an array help over a vector?

narrow plinth
#

faster to access

light cliff
#

no?

narrow plinth
#

wut?

light cliff
#

identical operation

narrow plinth
#

no it's not

#

vectors use heap memory and can move around

light cliff
#

heap memory is still memory

#

no difference to the stack

#

moving around is irrelevant

narrow plinth
#

yes, it means extra indirection....

light cliff
#

accessing both is still just effectively a pointer deref

#

what indirection

narrow plinth
#

im not gonna debate basics with you lol....

#

array is faster by a bit

light cliff
#

then don't be wrong? lmfao

narrow plinth
#

it is well known that accessing arrays is faster than vector

maiden shard
narrow plinth
#

nope

light cliff
#

yep

#

it's not "well known"

#

cuz it's not true

#

wtf are you on

narrow plinth
#

they are very different data types

light cliff
#

GEEKSFORGEEKS LMAO

#

you legit have to be trolling

narrow plinth
#

Vectors in c++:

Vectors are sequence containers which utilize continuous storage locations to store elements. They can manage storage and grow dynamically in an efficient way. These abilities come at a price: vectors consume more memory in exchange for the ability to handle storage and growing dynamically in size. vector<int> v; where v is the variable of type Vector store integer elements.

Advantages of Vector:

  • Size of the vector can be changed
  • Multiple objects can be stored
  • Elements can be deleted from a vector

Disadvantages of Vector:

  • A vector is an object, memory consumption is more.

Declare an array in C++:

An array stores a fixed-size sequential collection of elements of the same type. It is used to store a collection of data, but the array can be considered as a collection of variables of the same type stored at contiguous memory locations. All arrays consist of contiguous memory locations, with the lowest address corresponds to the first element and the highest address to the last element.

Advantages of Arrays:

  • Arrays supports efficient random access to the members.
  • It is easy to sort an array.
  • They are more appropriate for storing fixed number of elements
    Disadvantages of Arrays:
  • Elements can not be deleted
  • Dynamic creation of arrays is not possible
  • Multiple data types can not be stored
#

are you? wtf

#

no one with an iota of c++ knowledge thinks that vector and array have same cost to access

light cliff
#

yeah you're trolling

#

o/

narrow plinth
#

so your position is that a data structure that is variable sized is the same cost to index into as a fixed size one?

#

LMFAO

#

please stop, this is very embarrassing for you

tardy jasper
#

If you think this is a possible speedup then maybe you can submit a test on fishtest?

narrow plinth
#

i will soonish

#

might combine with 1 other thing

#

last one probably biggest gain, remove thread voting and no need for rootmoves for every thread

strange kettle
#

Ciekce has a compulsive urge to correct others, taking everything into "you're wrong lol" territory in order to feel intellectually dominant. I guess something went wrong at some point in its development

tardy jasper
#

wtf

#

you clearly haven't been in engine dev

#

ciekce is literally one of the most helpful for beginners

strange kettle
#

Yeah, he is very helpful

#

I dont doubt that

terse crag
#

incredible

strange kettle
#

He doesnt know

narrow plinth
#

do you? why do you think indexing into a fixed size array that has a fixed memory location has the same cost as a variable sized vector that can move?

hollow mortar
#

vectors do vary in size but accessing an existing element is the exact same operation iirc

#

you might as well submit a test to fishtest to see the results

narrow plinth
#

no, there are special instructions/addressing modes if you know base address and max size

#

it is certainly not the same cost

hollow mortar
#

could you provide an example

narrow plinth
#

can you do basic research ?

hollow mortar
#

ah yes my google research would be like "array special instruction/addressing modes"

#

If i ask you to provide an example it's because i know researching it would take longer

#

since it's very vague

narrow plinth
#

do you think that a constant base address that can be encoded into an immediate operand at compile time is the same as looking it up in a pointer at runtime?

#

please tell me

#

if you don't know basics like this, there is no helping you

#

it is very sad the complete lack of knowledge demonstrated by many people here

hollow mortar
#

well if you really wanna dive into this topic i can tell you that usually common base addresses of stack arrays are not compiled into immediate operands each time, but rather put into registers upon use since it's faster and takes up less space. Even so, looking it up in a pointer (??) at runtime doesn't really sound like it would speed up Stockfish significantly compared to any other evaluation or search feature's speedups.

narrow plinth
#

please just stop

hollow mortar
#

reading an immediate operand in ram is slower than using an already-used and cached value in a cpu register

narrow plinth
#

you literally don't even know what it is lol

#

an immediate is encoded into the instruction stream directly

#

there is no lookup

#

please stop before you embarrass yourself further

hollow mortar
#

so how would you estimate this very significant speedup as

narrow plinth
#

i didn't say it was significant, did i?

#

i said small

#

literally in the title

hollow mortar
#

well clearly you thought it was significant enough to make a whole post about it

narrow plinth
#

yeah, and i made many posts about removing tb

#

but i literally only estimated 0.5-1 elo

#

and it came in at 0.6

#

but next time, talk about what you understand instead of sprouting nonsense

hollow mortar
#

so how much elo do you think this would gain?

narrow plinth
#

just the switch to std array?

#

maybe 0.1 at most

#

obviously at multi threaded test

#

probably closer to 0.05

hollow mortar
#

now that i think about it, in order to compile a vector/array's base address as an immediate value, wouldn't that need to be a static address? I can't really figure how SF would ever use static addresses even with the use of arrays.

maiden shard
narrow plinth
#

no one said significant except other people

hollow mortar
#

i think what he means is how is the speedup ever going to compensate for all the noise that comes from everything else

narrow plinth
#

that is why i put multiple ideas

#

not just 1

#

i don't think only 1 is measurable

#

fixed size allocation, can go on stack, can know base address

maiden shard
hollow mortar
#

yep

narrow plinth
hollow mortar
#

compiler still won't use immediate values

narrow plinth
#

yes it will

#

you literally don't even know what an immediate is

#

and you still have the nerve to say shit

hollow mortar
#

an operand in the instruction itself

narrow plinth
#

lmfao

hollow mortar
#

in the form of a number

#

or address in this case

maiden shard
narrow plinth
#

then why did you think a register lookup is faster?

hollow mortar
#

the difference in speed between the two, in absence of bottlenecks should be close to none

narrow plinth
#

even if you were correct, which you are not, memory savings of not having to store and lookup base address would still be there

hollow mortar
#

considering that using immediates still adds bytes to the code, we're literally talking about a bunch of bytes as memory saving, which would be irrelevant since they would still most likely be within an allocated memory page

narrow plinth
#

so it's in a register, how did you load it into the register?

#

from memory, which requires another instruction or immediate

#

please stop this

hollow mortar
#

just once tho lol

#

in your case it would use immediates every time

narrow plinth
#

what do you mean just once, you allocate a whole register for it for the lifetime of the program? no shot

hollow mortar
#

of course i mean each time it's needed, but then that register is used for addressing multiple times before being assigned to something else

narrow plinth
#

yes, but that doesn't mean loading the register is free

#

the link has benchmarks near the bottom

hollow mortar
#

i just took a look at the benchmark

#

so it seems like there are no immediate values being used for addressing

#

except jumps and calls of course

#

but that's obvious

#

It looks like the bench_array code is just better instead

#

even though it even has a nop instruction despite the O3 flag which is kinda funny

#

the actual cause for this difference in speed seems to be a bad use of xmm registers in bench_pushBack and bench_prealloc

#

this probably would never occur in the Stockfish executables that use AVX2 however in such artificial situation with a huge number of writes over a large array

#

actually

#

it looks like the bench_pushBack doesn't use xmm registers at all

#

only the general purpose ones

narrow plinth
#

it doesn't because he made array at run time

#

with new

#

your theory would mean that compilers would just use register addressing and only use immediates to load the register, which doesn't happen

hollow mortar
narrow plinth
#

tell us what compiler turns an immediate into a register load to index?

#

you have no clue bro

hollow mortar
#

at least i know what a benchmark actually measures

strange kettle
#

Now this is intense

#

You guys should resolve your differences in an octagon

#

No rules

hollow mortar
#

he's so hostile he would probably agree

strange kettle
#

And you wouldnt?

hollow mortar
strange kettle
#

Who doesn't sign up for a few good elbows to the nasal septum?

#

Come on

narrow plinth
#

because you sure demonstrated that bro

#

very good benchmark you posted

#

i see why zuppa had you blocked now

hollow mortar
narrow plinth
#

better than nothing lmfao

#

blocked this fool until he shows da numbers

#

probably the heat death of the universe will come first

strange kettle
#

Zuppa blocked you?

narrow plinth
#

LMFAO

#

zuppa blocked yes a long time ago

strange kettle
#

So he is a veteran

#

In the field of being blocked by zuppa

#

Im a novice

narrow plinth
#

you can do it...just takes time

#

i really don't even know how you can even conceive of such shit

#

register indexing better than immediate lmfao

hollow mortar
#

he unblocked me lol

hollow mortar
narrow plinth
#

LOLOLOL

#

find any benchmarks yet?????

#

i see you posted 0

hollow mortar
#

you're the one who made the first claim you should find them

narrow plinth
#

your claim was that immediate is worse

#

post your benchmark

hollow mortar
#

your claim was that immediate is better

#

so much to create a 0.1 elo gain

narrow plinth
#

and?

hollow mortar
#

and you didn't prove it yet

narrow plinth
#

and?

#

gain is gain

#

if you stop trolling here

#

then maybe i could actually do it instead of replying to you

hollow mortar
#

sounds like excuses

narrow plinth
#

what productive thing have you done here?

#

where is your benchmark?

#

show me

hollow mortar
#

where is yours?

narrow plinth
#

name 1 single compiler which does it your way

#

because it is better

#

or are compiler writers all dumbasses?

hollow mortar
#

i said that in general compilers use register indexing when using arrays, and gcc does that on your famous benchmark

#

so my statement isn't false

#

you genuinely thought compilers use immediate addressing all the time with arrays so

#

i'd say the one making shit up is you

narrow plinth
#

lololol

#

nice benchmark you posted

#

in general, blah blah blah

#

cpu makers dumb af for even including it

#

if what you say is true

#

even if there are extra shadowed registers, that is still limited

#

and you still have to use one of the architectural registers when addressing it

#

read the synthesis os paper

#

regardless, an extra load is not free

#

unlike an immediate

#

whether you load it from memory or an immediate

#

and extra pointer still takes up space in data cache and memory regardless of if a tlb miss is caused or not

#

but sure, keep thinking that BS

#

with no evidence

#

it is quite unfortunate how much nonsense you believe

#

just say I, Yes. I believe that including extra code to grow and shrink a vector is free.

#

do it, lets see how smart you are.

#

even if your method was 'faster', which it isn't, the extra code would make a vector slower than an array overall for this use case

strange kettle
#

What does smp stand for

narrow plinth
#

simultaneous multi processing

strange kettle
#

Ohh

narrow plinth
#

which basically means threads for sf usually

strange kettle
#

Patches passing with the old master will be rescheduled ig

#

After the new mergins I mean

narrow plinth
#

sometimes i wonder about sf methodology

#

how many of these patches tested at same time interfere with each other

jovial juniper
#

In theory maintainer says rebase and retest if they are likely to interfere. Although I suppose you could argue on "likely"

strange kettle
#

If a given patch starts with a prev master and completes testing with an updated master, how you guys proceed

jovial juniper
#

if they are likely to interfere

strange kettle
#

Retest with the updated?

#

And how you measure that

hollow mortar
#

so while you were here yapping i actually did a benchmark

#

by recycling some of the code that your dude used

strange kettle
#

Like how you measure

hollow mortar
#

to enfore the use of immediate addressing i had to allocate a global fixed size array and just guess a big enough size

#

but it worked

strange kettle
#

If they are likely to interfere

hollow mortar
#

keep in mind that i did this in 5 minutes at midnight

strange kettle
#

I guess its entirely dependent on the patch

#

So the question is rather vague

jovial juniper
# strange kettle And how you measure that

If it modifies same part of code then definitely retest. And some things like net and big tunes probably get retested. But mostly intuition on what probably does AFAIK.

strange kettle
#

Still Im curious

hollow mortar
#

if you look at the disassembly you can see that bench_immediate does use immediate addressing

strange kettle
hollow mortar
#

it seems like you weren't that right after all

#

Now I'm going to sleep since i'm done talking to you

#

have a good day

strange kettle
#

Have a good night

#

I hope you have sweet dreams

narrow plinth
#

ah yes, testing array vs array when we were talking about array vs vector

#

what a genius

strange kettle
#

Can you test it yourself in that page?

narrow plinth
#

im not at keyboard right now

#

maybe later when I'm home

strange kettle
#

Oko

narrow plinth
#

but it looks like compiler is vectorizing everything, i will have to take closer look

#

at a quick glance, the compiler vectorized one loop but not the other

#

when i tried it

#

it got different results

#

from his run

#

his run immediate was slightly slower, i clear results and got a huge difference

#

it probably depends on what cpu is assigned on server side

#

sometimes its under 1% difference, then sometimes it is 1.7x

#

i will try it locally when i get home

#

yeah something weird is going on

narrow plinth
#

i will try changing it to an add or shift

#

using an expensive multiply seems wrong

#

to be clear, in sf all this is accessed is when creating or deleting thread and once per go command

#

and yes, chasing the pointer that is stored in it is probably more cost than the actual iteration

long glacier
ember canyon
#

arguing in ideas channel is the most useless thing to do

#

everything can be responded to by "write patch and test it on fishtest"

#

but you keep hearing how they will test it soon(TM)

#

while defending the idea

narrow plinth
#

no point if test won't get approved

long glacier
#

best thing bro can do atm is delete this post fr

narrow plinth
#

delete yourself

#

you contributed nothing LMFAO

viscid stag
#

it's better to contribute nothing than to contribute utter bs and expose your own problems 😔

narrow plinth
#

what problems?

narrow plinth
#

do you believe that a fixed size array that is rarely accessed would be faster as a vector?

#

are you stupid?

narrow plinth
#

I was replying to someone who said i was full of BS and exposing my own problems

#

so that means he thinks the opposite for some reason

#

he must be really stupid

#

or maybe he is actually stupid enough to think testing array vs array is equivalent to array vs vector? LOL

strange kettle
#

Calm down

maiden shard
#

A single pointer is a performance trap?

narrow plinth
#

what benefit does this give you exactly?

sick tiger
#

Did anything happen out of this ? As aggressive as he sounded I think he is correct though. Array that is allocated on stack with fixed size technically should be slight faster, don't know if it matters too much though

rugged turtle
#

the things are almost all premature optimizations

#

in none hot paths

narrow plinth
#

yes it's not a hot path, main impact would be less memory use not actual clock cycles saved directly

#

but still a small impact

#

i will probably implement it soon in my fork, been doing other stuff mostly