#Stockfish dev vs Stockfish 11 60x more time

15 messages ยท Page 1 of 1 (latest)

late sedge
#

cutechess-cli -tournament gauntlet -concurrency 12 -pgnout output_pgn_file.pgn -engine conf=stockfish_240603 option.Hash=16 option.Threads=1 tc=1+0.1 -engine conf=stockfish_11 option.Hash=16 option.Threads=1 tc=60+1 -draw movenumber=40 movecount=4 score=8 -resign movecount=4 score=500 -each proto=uci -openings file=G:/Users/ap/Documents/ChessBase/NoomenTopicalTestsuite2012.pgn policy=round -repeat -rounds 100 -games 1

options for both threads 1 hash 16 MB openings NoomenTopicalTestsuite2012.pgn
tc sf dev 1s +0.1s
tc sf 11 60s+1s

results
Score of stockfish_240603 vs stockfish_11: 19 - 20 - 61 [0.495] 100
... stockfish_240603 playing White: 14 - 7 - 29 [0.570] 50
... stockfish_240603 playing Black: 5 - 13 - 32 [0.420] 50
... White vs Black: 27 - 12 - 61 [0.575] 100
Elo difference: -3.5 +/- 42.7, LOS: 43.6 %, DrawRatio: 61.0 %
SPRT: llr 0 (0.0%), lbound -inf, ubound inf

comment:
Sf 11 needs 60x more time on the same hardware to draw against SF dev.

leaden jacinth
#

x60 more time would have been 60s + 6s

#

But pretty interesting anyways

#

๐Ÿ‘

late sedge
#

ok heres a good result 1+0.1 vs 60+6

#

Score of stockfish_240603 vs stockfish_11: 21 - 28 - 51 [0.465] 100
... stockfish_240603 playing White: 15 - 13 - 22 [0.520] 50
... stockfish_240603 playing Black: 6 - 15 - 29 [0.410] 50
... White vs Black: 30 - 19 - 51 [0.555] 100
Elo difference: -24.4 +/- 47.9, LOS: 15.9 %, DrawRatio: 51.0 %
SPRT: llr 0 (0.0%), lbound -inf, ubound inf

turbid merlin
#

60x is quite impressive I must say. Means your phone today, outperforms something like Sesse 4 years ago.

hot carbon
#

Also dev has 1 second. I bet if you give it more time and scale 11 tc accordongly it would be better for dev

turbid merlin
#

yeah, that's super short for a dev game. Gets harder to test though ๐Ÿ˜‰

leaden jacinth
#

the hash for 60s is too low but otherwise good test

late sedge
#

similar test, time x6 threads x10 hash x16

#

tc SF_11 60+0.6 (x6),hash 256 MB (x16),threads 10 (x10)
tc SF_dev 10+0.1 ,hash 16 MB ,threads 1
same opening NoomenTestSuite

Score of stockfish_240603 vs stockfish_11: 22 - 3 - 75 [0.595] 100
... stockfish_240603 playing White: 15 - 2 - 33 [0.630] 50
... stockfish_240603 playing Black: 7 - 1 - 42 [0.560] 50
... White vs Black: 16 - 9 - 75 [0.535] 100
Elo difference: 66.8 +/- 32.7, LOS: 100.0 %, DrawRatio: 75.0 %
SPRT: llr 0 (0.0%), lbound -inf, ubound inf

#

command:

#

cutechess-cli -tournament gauntlet -concurrency 1 -pgnout sf_test1.pgn -engine conf=stockfish_240603 option.Hash=16 option.Threads=1 tc=10+0.1 -engine conf=stockfish_11 option.Hash=256 option.Threads=10 tc=60+0.6 -each proto=uci -openings file=G:/Users/ap/Documents/ChessBase/NoomenTopicalTestsuite2012.pgn policy=round -draw movenumber=40 movecount=4 score=8 -resign movecount=4 score=500 -repeat -rounds 50 -games 2

hot carbon
#

well yeah, this is what I kinda expected. Because our extensions and other search tunes are optimized to 60+0.6 8 threads so at 1+0.1 it should massively regress