#small speedup ideas
295 messages · Page 1 of 1 (latest)
is it possible to resize arrays on the fly like vectors?
no but it is limited at compile time anyway
ucioption.cpp it is limited to 1024 max right now
why would an array help over a vector?
faster to access
no?
wut?
identical operation
heap memory is still memory
no difference to the stack
moving around is irrelevant
yes, it means extra indirection....
then don't be wrong? lmfao
it is well known that accessing arrays is faster than vector
Vectors are arrays
nope
Vectors in c++:
Vectors are sequence containers which utilize continuous storage locations to store elements. They can manage storage and grow dynamically in an efficient way. These abilities come at a price: vectors consume more memory in exchange for the ability to handle storage and growing dynamically in size. vector<int> v; where v is the variable of type Vector store integer elements.
Advantages of Vector:
- Size of the vector can be changed
- Multiple objects can be stored
- Elements can be deleted from a vector
Disadvantages of Vector:
- A vector is an object, memory consumption is more.
Declare an array in C++:
An array stores a fixed-size sequential collection of elements of the same type. It is used to store a collection of data, but the array can be considered as a collection of variables of the same type stored at contiguous memory locations. All arrays consist of contiguous memory locations, with the lowest address corresponds to the first element and the highest address to the last element.
Advantages of Arrays:
- Arrays supports efficient random access to the members.
- It is easy to sort an array.
- They are more appropriate for storing fixed number of elements
Disadvantages of Arrays: - Elements can not be deleted
- Dynamic creation of arrays is not possible
- Multiple data types can not be stored
are you? wtf
no one with an iota of c++ knowledge thinks that vector and array have same cost to access
so your position is that a data structure that is variable sized is the same cost to index into as a fixed size one?
LMFAO
please stop, this is very embarrassing for you
If you think this is a possible speedup then maybe you can submit a test on fishtest?
i will soonish
might combine with 1 other thing
last one probably biggest gain, remove thread voting and no need for rootmoves for every thread
Ciekce has a compulsive urge to correct others, taking everything into "you're wrong lol" territory in order to feel intellectually dominant. I guess something went wrong at some point in its development
wtf
you clearly haven't been in engine dev
ciekce is literally one of the most helpful for beginners
oh my god, do you actually not know how indexing works?
incredible
He doesnt know
do you? why do you think indexing into a fixed size array that has a fixed memory location has the same cost as a variable sized vector that can move?
vectors do vary in size but accessing an existing element is the exact same operation iirc
you might as well submit a test to fishtest to see the results
no, there are special instructions/addressing modes if you know base address and max size
it is certainly not the same cost
could you provide an example
can you do basic research ?
ah yes my google research would be like "array special instruction/addressing modes"
If i ask you to provide an example it's because i know researching it would take longer
since it's very vague
do you think that a constant base address that can be encoded into an immediate operand at compile time is the same as looking it up in a pointer at runtime?
please tell me
if you don't know basics like this, there is no helping you
it is very sad the complete lack of knowledge demonstrated by many people here
well if you really wanna dive into this topic i can tell you that usually common base addresses of stack arrays are not compiled into immediate operands each time, but rather put into registers upon use since it's faster and takes up less space. Even so, looking it up in a pointer (??) at runtime doesn't really sound like it would speed up Stockfish significantly compared to any other evaluation or search feature's speedups.
da hell? no shot you think that putting something into a register is faster than an immrdiate operand?
please just stop
reading an immediate operand in ram is slower than using an already-used and cached value in a cpu register
you literally don't even know what it is lol
an immediate is encoded into the instruction stream directly
there is no lookup
please stop before you embarrass yourself further
so how would you estimate this very significant speedup as
well clearly you thought it was significant enough to make a whole post about it
yeah, and i made many posts about removing tb
but i literally only estimated 0.5-1 elo
and it came in at 0.6
but next time, talk about what you understand instead of sprouting nonsense
so how much elo do you think this would gain?
just the switch to std array?
maybe 0.1 at most
obviously at multi threaded test
probably closer to 0.05
now that i think about it, in order to compile a vector/array's base address as an immediate value, wouldn't that need to be a static address? I can't really figure how SF would ever use static addresses even with the use of arrays.
You’re also addressing with the array index into the thread array, along with the offset into the thread class, along with a potential offset into an array in the thread class like history
How is a constant base(assuming it even is a constant base) going to yield any significant increase in performance
no one said significant except other people
i think what he means is how is the speedup ever going to compensate for all the noise that comes from everything else
that is why i put multiple ideas
not just 1
i don't think only 1 is measurable
fixed size allocation, can go on stack, can know base address
The stack is not at a constant address
yep
virtual memory means any process pretends it owns the whole address space, it can put stuff whereever it likes at compile time
compiler still won't use immediate values
yes it will
you literally don't even know what an immediate is
and you still have the nerve to say shit
an operand in the instruction itself
lmfao
Do you have a godbolt link or something to show us?
I’ve never checked for this before
then why did you think a register lookup is faster?
because:
- it makes the instruction shorter which means easier decoding
- it can easily be cached
- cpus have other physical registers besides the logical ones bytecode uses so it wouldn't cause any problems with registers availability
the difference in speed between the two, in absence of bottlenecks should be close to none
even if you were correct, which you are not, memory savings of not having to store and lookup base address would still be there
considering that using immediates still adds bytes to the code, we're literally talking about a bunch of bytes as memory saving, which would be irrelevant since they would still most likely be within an allocated memory page
so it's in a register, how did you load it into the register?
from memory, which requires another instruction or immediate
please stop this
what do you mean just once, you allocate a whole register for it for the lifetime of the program? no shot
of course i mean each time it's needed, but then that register is used for addressing multiple times before being assigned to something else
yes, but that doesn't mean loading the register is free
the link has benchmarks near the bottom
i just took a look at the benchmark
so it seems like there are no immediate values being used for addressing
except jumps and calls of course
but that's obvious
It looks like the bench_array code is just better instead
even though it even has a nop instruction despite the O3 flag which is kinda funny
the actual cause for this difference in speed seems to be a bad use of xmm registers in bench_pushBack and bench_prealloc
this probably would never occur in the Stockfish executables that use AVX2 however in such artificial situation with a huge number of writes over a large array
actually
it looks like the bench_pushBack doesn't use xmm registers at all
only the general purpose ones
it doesn't because he made array at run time
with new
your theory would mean that compilers would just use register addressing and only use immediates to load the register, which doesn't happen
yeah and that means the benchmark doesn't really fit with your proposal so there isn't any actual measurement
tell us what compiler turns an immediate into a register load to index?
you have no clue bro
at least i know what a benchmark actually measures
Now this is intense
You guys should resolve your differences in an octagon
No rules
he's so hostile he would probably agree
And you wouldnt?
I wouldn't agree to violence no, but i would win nonetheless
uh huh, that's why you think register indexing is faster
because you sure demonstrated that bro
very good benchmark you posted
i see why zuppa had you blocked now
at least i didn't use one that doesn't have anything to do with my claim
better than nothing lmfao
blocked this fool until he shows da numbers
probably the heat death of the universe will come first
Zuppa blocked you?
you can do it...just takes time
i really don't even know how you can even conceive of such shit
register indexing better than immediate lmfao
he unblocked me lol
indeed i also don't know how your parents conceived you
you're the one who made the first claim you should find them
and?
and you didn't prove it yet
and?
gain is gain
if you stop trolling here
then maybe i could actually do it instead of replying to you
sounds like excuses
where is yours?
name 1 single compiler which does it your way
because it is better
or are compiler writers all dumbasses?
i said that in general compilers use register indexing when using arrays, and gcc does that on your famous benchmark
so my statement isn't false
you genuinely thought compilers use immediate addressing all the time with arrays so
i'd say the one making shit up is you
lololol
nice benchmark you posted
in general, blah blah blah
cpu makers dumb af for even including it
if what you say is true
even if there are extra shadowed registers, that is still limited
and you still have to use one of the architectural registers when addressing it
read the synthesis os paper
regardless, an extra load is not free
unlike an immediate
whether you load it from memory or an immediate
and extra pointer still takes up space in data cache and memory regardless of if a tlb miss is caused or not
but sure, keep thinking that BS
with no evidence
it is quite unfortunate how much nonsense you believe
just say I, Yes. I believe that including extra code to grow and shrink a vector is free.
do it, lets see how smart you are.
even if your method was 'faster', which it isn't, the extra code would make a vector slower than an array overall for this use case
What does smp stand for
simultaneous multi processing
Ohh
which basically means threads for sf usually
Patches passing with the old master will be rescheduled ig
After the new mergins I mean
sometimes i wonder about sf methodology
how many of these patches tested at same time interfere with each other
In theory maintainer says rebase and retest if they are likely to interfere. Although I suppose you could argue on "likely"
If a given patch starts with a prev master and completes testing with an updated master, how you guys proceed
if they are likely to interfere
so while you were here yapping i actually did a benchmark
by recycling some of the code that your dude used
Like how you measure
to enfore the use of immediate addressing i had to allocate a global fixed size array and just guess a big enough size
but it worked
If they are likely to interfere
keep in mind that i did this in 5 minutes at midnight
If it modifies same part of code then definitely retest. And some things like net and big tunes probably get retested. But mostly intuition on what probably does AFAIK.
Still Im curious
if you look at the disassembly you can see that bench_immediate does use immediate addressing
Okay thank you for your explanation!
@narrow plinth have fun
it seems like you weren't that right after all
Now I'm going to sleep since i'm done talking to you
have a good day
ah yes, testing array vs array when we were talking about array vs vector
what a genius
Can you test it yourself in that page?
Oko
but it looks like compiler is vectorizing everything, i will have to take closer look
at a quick glance, the compiler vectorized one loop but not the other
when i tried it
it got different results
from his run
his run immediate was slightly slower, i clear results and got a huge difference
it probably depends on what cpu is assigned on server side
sometimes its under 1% difference, then sometimes it is 1.7x
i will try it locally when i get home
yeah something weird is going on
i will try changing it to an add or shift
using an expensive multiply seems wrong
to be clear, in sf all this is accessed is when creating or deleting thread and once per go command
and yes, chasing the pointer that is stored in it is probably more cost than the actual iteration
you're wrong bro
arguing in ideas channel is the most useless thing to do
everything can be responded to by "write patch and test it on fishtest"
but you keep hearing how they will test it soon(TM)
while defending the idea
no point if test won't get approved
best thing bro can do atm is delete this post fr
it's better to contribute nothing than to contribute utter bs and expose your own problems 😔
what problems?
do you believe that a fixed size array that is rarely accessed would be faster as a vector?
are you stupid?
I believe you mean the opposite here
I was replying to someone who said i was full of BS and exposing my own problems
so that means he thinks the opposite for some reason
he must be really stupid
or maybe he is actually stupid enough to think testing array vs array is equivalent to array vs vector? LOL
Calm down
A single pointer is a performance trap?
why have an extra pointer and allocate on the heap?
what benefit does this give you exactly?
Did anything happen out of this ? As aggressive as he sounded I think he is correct though. Array that is allocated on stack with fixed size technically should be slight faster, don't know if it matters too much though