#fide-google-efficiency-chess-ai-challenge
1 messages ยท Page 3 of 1
Unless you want to SPSA the 4 million games we did to tune the search to your engine.
No. I added the net to Peres engine, but every engine hot-swap just loses.
well, you could still try to see big gaps
I put pere net in my engine, I lose elo. he puts it in his engine, he loses elo.
can't decide much.
isn't pere net very barebones?
yeah.
the cursed C-centric non-templated version:
void add_sub_add_sub(int16_t* dest, const int16_t* src, int16_t* weights, size_t size,
bool ADD1, bool SUB1, bool ADD2, bool SUB2,
size_t add1, size_t sub1, size_t add2, size_t sub2) {
i'll open source the code if we somehow end up placing in a top 5 spot
it's particularly bad code that doesn't need to see the light of day otherwise
ultimately it's just C alex with stdlib stuff removed and worse uci support
gotta pay the openbench tax if top-5
ob tax, bullet tax, actual taxes
i hope i don't win
i might lose money on it at this point
see If I can trade the 10k for 20k in compute credits ๐
if it helps i did my best to not sprt because i think closed source dev is pure villany
but in the end i caved in and i think we clocked in a couple dozens sprt
the test count is 100ish but it's highly misleading because most would just fail
OB got stuck in the awaiting artifacts stage
and the test need to be remade
i think that happens if you create it because the workflow is actually done
old openbench version?
I will say, I have NO issues with that on Torchbench.
so a couple months old
I did not use the private system though for this :/
we had it for every single test so it's not a niche bug
Something sussy then.
there's a chance that waiting more would've fixed the issue
I just made a 2nd github account. And slapped a private token into the src code of OB.
but after a couple minutes it was easier to nuke the test and recreate it
you should be able to create the test before the workload finishes.
well that's not what we experienced
it might be user error
ngl i haven't super looked into it
gunicorn? native django? ngingx reverse prxy?
since this worked just fine
refer to this for the technical portion
i would guess native django, but it's a guess
ah.
it's hosted on pythonanywhere, but that shouldn't be too relevant
That is very relevant actually.
pythonanywhere is a WSGI app, not native-django.
So this commit was needed: https://github.com/AndyGrant/OpenBench/commit/920b2cdfbde637662af6a69fc0fc53f88a3b5108
prior to that, pythonanywhere users could not use the pgn feature (nor artifact watcher) feature, since the threads would simply never be spawned.
And since OpenBench will attempt to find the artifacts ONCE upon creation; and otherwise defer to the watcher...
that explains all your problems.
is the server up?
we set it up at the start of january
i have messages of me trying to debug the token stuff dated 10/01
CLIENT_VERSION = 35 # Client version to send to the Server
I never wanted to create it ๐
well as long as one regrets their own mistake, it's fine
congrats on the hardware btw
it really wasn't much, most others probably had access to similar if not better hardware, but it was nice to see it unfold
if i get some money i'll match the hw too lol
I hope you finish 4th as Andrew said
(but no higher, the other 2 being 33.33% chance each is enough for my heart)
and I can attest that the secret of that team's sucess is not my hw, I even questioned weather it was inspite of it xD u should see some of the stupid things I asked Andrew
ah dw, i wasn't commenting on the competition
i think you gave hw to other devs in the past and i'm vaguely aware of it
thanks for the service
this is very hard to believe, maybe the test is bugged..
it's less than 5 Elo here, and I remember it was so when SF introduced it, I'd imagine the default is even harder to pass with weaker engines
It's not
Just tweaking it easily gained 10 elo
And I mean a very minor tweak
well, weird things happens
In Stockfish, I discovered a hack long ago that if you skip rootDepths by 2 in iterative deepening instead of 1 you get instant STC gain
I used this trick and another trick of not resolving failhighs which are ++ers in this time control
This rootDepth tweaks alone are +20Elo or something
literally if you send go infinite
you'd get depth 2 then depth 4 printing
Yeah when I talk about tests note that it's at fucked time controls
Because we were expecting increment
And it just never came
Did you also tune with Inc or at sudden death?
We tuned search and eval with inc, only TM with sudden death
Ah a split tune, we just bundled stuff
Tunes are 700k games of the 1.2M we used lol
last tune on search we wanted to try sudden death after 100k games
it failed even passing
sudden death
Sudden death is such a mistake
We did a lot of SPSA
As a concept
We did 2 but the second one ended up not working so I just stopped
Tbf I didn't know anything about tuning
Since I had never tuned Alex before
And that's why I then turned Alex for a month
It was interesting
I never tuned in SF before, I have 1 tune in fishtest
I also properly learned how to do it here
basically seen it as a doll thing
I always wanted to try ideas-patches in SF
but here I really wanted Elo only
not neccessarly interesting stuff
how big btw is your network?
We ended up having a 45kb network
Idk why I assumed your net is much smaller
HL32 16bit?
hl80
(x2, with hm)
admittedly with some novel compression
since i figured lebstuff wouldn't be worth it (it might be, honestly i was lazy)
and 8 output buckets (quadratic)
is this before or after compression? the raw net is 150KB for us
raw net is 45kb
yeah ours is 3.5x the size then
0
alex is probably compact
compared to alex master it only lacks probcut because probcut was introduced later
and i didn't backport it
we also had 1kb to spare so you could add it i reckon
there's still some bloat (like the templated makemove crap) i didn't bother to rewrite
but yeah it's fairly compact
i lied, it's 123Kb
i checked the 96x2 one by mistake (we never got it to fit, we needed 6-7 kbs for it)
I'd be very interested in testing that net if open sources, hopefully you and gedas finish in top 5
note that it requires specific code to read it
aside from the arch itself
it's 8bit quantized with a block scheme
so you can't just read it as is
but if we open source it the requantizer itself and the init code are part of it ofc
80x2 8 bit > 64x2 full size, but it's close
and in the same way 96x2 8bit -> 80x2 full size, as far as i could tell
a thing i wanted to test but never got around to was yeeting static eval from tt and just calling eval on each node, since the nets are very small
with so low hash i think it's a good idea
interesting idea, never thought of that
in newbish engines storing static eval often doesn't gain until the net is big enough
I thought of making a staticeval table instead of putting that in TT but a smaller one
and they all start with at least 64x2
and with nowhere near the hash pressure you get on kaggle
where you can have at most 1.5mb
we have HL64 8bit with 8 buckets, we figured that most of the 16bit weights already fit in 8bits
a very vanilla arch and inference, but a lot of consecutive improvements of the same net based on finding the best data and filtering
pushing it to probably the absolute best it can be with this arch including SPSA sessions after training
did you SPSA your net?
no
we spsa'd the engine/tm twice, the second one did nothing
took around 600k games
that's all the spsaing that went on
SPSAing the net weights and biases gained us 3Elo each time
tbqh i didn't want to waste hw resources i could divert to alex dev
so my worker was off limits for most of the time
we didn't have the code for it in the first place really
alex tuning code is very c++, since the switch to c gedas wrote some basic stuff so we could tune some things
but there's no infrastructure to tune the net
you'd need ad-hoc code for it
it would have 5x'd the effort easily
at that point it's better to implement ponder
pondering is also interesting, we have no ponderhit logic, I wonder if telling the engine that opponent played the ponder move gains Elo somehow
It's in SF but we don't have it
I looked briefly at andrew code
it has to
and he has it
yes
it works surprisingly well for such a makeshift solution
but i think it's several tens of elo worse than the real one
(note that this is speculation, if i had the "real" one i would've used it, i clearly don't so i can't test)
Haha I guess so also
I assume it could be a game changer, idk
btw it's fascinating that our submission and linrock found the same leela data for those small nets that works best, Test77
what data did you use?
probably for your bigger net
this Test77 data is not important
i used the same binpack i use for alex
which is pre transformer t80 stuff
i didn't try other leela binpacks
the only thing i tried was trying to get monty data to work
the idea of showing up with non sf + non leela data was funny to me
but it wasn't good enough
not really a choice dictated by strength, it was just already on my ssd
just a small note here, that in real world we disallow pondering in touraments
I learned some general stuff about nnue, not that much of an nnue guy
but I believe making an nnue only tournament is also unfair
it could create some novelties
and it might be the most interesting
but still I like to emphasize that search is very interesting also
copy pasting sf isn't as interesting
AGE and Kimmy were well aware that they were pretty much going up against the best search with Pere submission
perhaps if this was about search in poker, sure
nnue only was a (shortsighted perhaps) suggestion to severely limit the ability of people to just clone and repackage someone's else code
you should not be able to get a top 10 spot by cloning and blindly maiming cfish
the state of the art of search is just too explored to allow for a fun, fair, interesting competition
exhibit A: places 1-3 and 5-10 all showing up with cfish
(allegedly)
this is ofc no one's fault, it was simply optimal to do so
I have Obsidian with a 768->128x2->16->32->1 nnue, and no pondering at all.
tbh that's still repackaging existing search, even we did it by using Alex
Yes ofc
the point of the state of the art being too explored and only allowing for copy pasting with some minor tweaking holds
congrats on fitting such a huge net tho, i assume you did some novel compression stuff
but it's also repacking the existing nnue archs and inferences and techniques and beginning from there
I don't see how is that different if all have the same cfish to begin with
well the space for small nets is vastly more unexplored than search
mostly because no one constrains themselves to such a small net, it just makes no sense to
while most good search is just "good", no matter what
i've seen at least 4-5 different net archs in the top 10
Virtually 100% of my time was spent on nets, and shrinking code.
Search patches just uninteresting for the very reasons you've mentioned.
but ultimately chess is dead, you are right about that
To some of the above convos:
I never tried any pondering experiementation. I simply did not have the drive to code it up in OpenBench correctly, and local tests were a bit hard to run due to flagging on such fast games with high concurrency. So I just kept the existing ponderhit scheme, and hoped for the best. The only thing I did, is one submission ( not posted on github yet ), has this code, along with -march=broadwell.
if (strcmp(str, "ponderhit") == 0) {
Threads.ponder = false;
// When safe to do so, try to utilize the SlowDelay on Ponderhits
#ifdef KAGGLE
if (Time.totalTime > 2000)
Time.maximumTime = max_int(Time.maximumTime, time_elapsed() + 85);
#endif
}
For data: 100% Ethereal data. All self-play or adversarial against SF, with the SF evals tossed in the trash. Tried Lc0 data for a few minutes, tried SF relabeled data, tried Ethereal relabled data. Was unimpressed. Maybe its better, but its such an hard time to just insert a new net trained on a different data distribution, that I did not go any further.
UHHHHHHHH
Go to the leaderboard...
Guess we cant adjust the K-factor without restarting...? Maybe will be fixed lol.
Which makes that higher K-factor games obsolete.
AHAHAHAHAAHHA
Hilarious, but anxiety inducing.
Either flubbed K-factor change, or full restart for double-error fix? who knows.
(assuming it was fixed. We don't know that)
i just hope this doesn't postpone the end
^ me too
i really want to be done with this
Yeah I got bills to pay smh
One thing we do know is we will never know why or what Kaggle did, just won't happen
well idk maybe someone has stepped in, in the last 6 hrs.
you are doubting kaggle openness and ability to communicate? shame on you honestly
The ever glass half full perspective guy
Well its still bad like.
If someone steps in now, and fixes all the memory problems, and the double errors, and swaps to increment....
cool........ but lost a week of time, and also now all the submissions are suboptimal again lol
No games yet played by my entries lol.
not a problem for me
it does look like they already lowed the k-factor. So I actually do think this was an accidental reset.
so essentially all the games run this far with higher k-factor are now useless xD
I'll feign optimism. They fixed all the bugs and did a restart.
haha
c-number wanted it to get postponed, but submission is locked is the differnece
I am really tempted to setup a round robin tournament of interested entries (provided they are willing to opensource and OpenBench compatible their submissions), proper UHO, proper increment
For the record, I reported this three weeks ago to staff running the event and the support team.
I've still not gotten a response to those emails.
Andrew u r gonna get 'warned for bad word usage' again soon ๐
Easy solution, increase cores/machines running the leaderboard games, call it earlier :p
i'll extend the compute I had for the compute for the leaderboard if there were a way for me to do that
seems they are taking the errs more serious than incremental time management and other issues
Well fingers crossed that the fixed version does not introduce more bugs -- and we still see the same top-3 as we've seen the entire event thus far.
guy wants to donate to google :Kappa:
I mean if google were really still involved things would be different is my opinion, kaggle i dunno what they're doing xD
true
I mean if there was a will to.. sure
i do get the point xD
relax
I got hw from them for compute xD so
imo increment doesn't make sense anymore, the engines do not support it when the submission is locked
My understanding: Any change in environment post the final submission date would be detremental to competitors as none of them would've been able to test on 'the' environment, but its already changed twice i think since
What
yeah the bot is not very good.
xD
hahahahahaha
wait did he really get banned, what did he say?
Probably. I'm gonna ditch this discord, take conversations to SF kaggle so I can speak freely.
GL to all.
good one
What's SF Kaggle? (Sorry if this is off-topic, I just saw this and was curious.)
the kaggle channel on the sf discord
Could you send me an invite link? I've been looking for new datascience servers to join. Thanks!
stockfish isn't really a datascience server and im guessing sending an invite to people is against the TOS so just go on our github and get it yourself ๐
Ah, didn't realize that's what SF stood for. I'll find it myself then, but thanks for the pointer.
Apologies for the auto-ban, we are constantly dealing with spammers and the autoban stops just a few words that spammers constantly use. Specifically @ everyone, gift, cryptocurrency, bitcoin, ethereum, and whatsapp - because the vast majority of spam is crypto scams
What if I use "voldemort" ๐
Hi ! I am new here and to kaggle competition, and I would like to know why it isn't possible to do late submissions for this competition ? Thanks !
Hi, any updates about the current state of the competition?
not entirely sure how things follow up, regarding medals, prizes, winners in general since the leaderboard has this warning
The competition has ended. The private leaderboard is preliminary and will be finalized after the results are verified.
I find it alarming that no one bothered to respond to one of the top 3 place holders of the said leaderboards. Unfortunate. Upgrading it to Disappointed.
@languid barn @south cave @twilit urchin @muted horizon ?
Should I simply consider this competition as abandoned, and assume there will be no finalized leader board, no medals awarded, and no prize money distributed? I'm told this duration from closure without notice or updates is considered to be highly unusual for Kaggle.
Please leave comments in the forum, the discord channels are not followed by Kaggle staff (as described in the discord rules).
It is not uncommon for competition finalization to take multiple weeks. Among other things, this period of time is used to investigate all cheating allegations. This is a normal process on Kaggle.
If I'm being honest, using past as an indicator, I find it hard to believe forum is frequented by staff. We never really got any answers on the forum either. Some high level update here ,from time to time till conclusion,would help...