#Signals pipeline failed (partially) again...

1 messages · Page 1 of 1 (latest)

silk marlin
#

So this morning, my Signals pipeline submissions were a bit borked by the system...only 3 out of 5 were accepted (and I got two email confirmations for one of the ones that was accepted for some reason) and the others timed out or something (which it has been doing all the time lately, but never in the morning during the live submission window). But I also had all 5 with queued submissions as backup, but it only accepted 1 of 2 of those, i.e. it accepted 3 out of 5 live, leaving 2 queued which it should have taken, but it only took 1 of those. So one slot was just left with no submission at all. (Which I just uploaded again, so not it is "late".)

So obviously two annoyances here: why is the Signals (& crypto) submission pipeline so flaky lately? (And that flakiness is now bleeding into the the main live window.) And why are the backup queued submissions also not being accepted reliably? (I have the email showing that I uploaded the queued version yesterday, so I know it wasn't me.)

analog bramble
#

Hey, sorry for not responding. Have you continued to see this issue or was this just on 5/2? I think we experienced some other pipeline issues that day

silk marlin
#

I still see those time outs all the time on signals & crypto (on the first try uploading something during off-hours) -- been happening for a couple of months. (Using API or just uploading directly to website.) I did set up my pipeline with a retry function so I should be good if it does it again during the live window when I'm sleeping (although if it is going to skip my queued submissions can't do much about -- that's the only time that's happened).

analog bramble
#

okay, timeouts and retries are pretty normal, but if your queued submissions get skipped again please let me know. This topic is a bit stale so it'll be hard to track down why it happened this time (I think it had something to do with our backend pipelines being screwed that day), but if you let me know next time I'll be sure to track it down and fix the source of the problem

silk marlin
#

Well, it very much acts like it needs to "wake up", so in some sense in makes sense it doesn't happen during the live window when it is handling (presumably) lots of other requests also. (And I don't think this ever happens with the main tournament -- it is always Signals & Crypto.) But off-hours when I'm uploading stuff to be queued, I expect it to fail pretty much always on the first try, and then after that failure (once it has woken up), the next try it accepts immediately (like almost instantly -- faster than normal). But I get these failures every single day now, and that's a newish thing (like I said, been going on for a couple months at least, but it was very rare before that). And this happens with the API (which is how I do Signals) and also just uploading manually from the website (which is how I do Crypto).

analog bramble
#

Interesting, I did a db upgrade that eliminated most timeout errors we were seeing, so it's probably not related to load, it must be some other infra-related issue

silk marlin
#

And to be clear, it isn't a "per-submission" thing -- like on Signals I'll wanna upload to several slots. The first one fails, then I retry and it goes, and then all the rest also go smoothly after that. But if I do something again in a few hours it will fail again first try.

analog bramble
#

interesting, since it's intermittent and resolves after the first try, I have a feeling it's a cold start problem of some kind

verbal parcel
silk marlin
#

I still had first-try failures yesterday...let's try right now....

Signals (via API): it worked!

Crypto (via website):