Getting 202s for cron monitor checkins but reported as missed | Sentry Community | Page 1

granite peak Apr 11, 2024, 5:45 PM

#

Our cron monitors this morning are reporting that we're (almost 100%) consistently missing our checkins across multiple cron monitors. When I check our server logs, I see that we're getting 202's for our checkins, both when we set status=in_progress and for status=ok later for the same checkin ID.

Are there any issues currently that we should be aware of?

Example log entry that includes the URL we're hitting:

HTTP Request: GET https://o415829.ingest.sentry.io/api/XXXXXXXXXX/cron/d61e4e18-1b7e-405f-9581-50d7f3f43b6d/XXXXXXXXXX/?check_in_id=7475e911-8aeb-4b0f-8715-70f89129ac28&status=ok&environment=production "HTTP/1.1 202 Accepted"

hasty lion Apr 11, 2024, 6:01 PM

#

hey Riley, Looks like all your check in events are being rate limited. I'll have someone take a look

granite peak Apr 11, 2024, 6:06 PM

#

Oh wow. That's unexpected. @indigo acorn said:

I saw your other post. It looks like you're using the ingest endpoints which are not rate limited in this way. Those rate limit headers will appear on any of our APIs that respond from the Django part of our application and are usually involved in various CRUD operations. Check-ins would not be part of this.

indigo acorn Apr 11, 2024, 6:08 PM

#

^ may have jumped the gun on that statement. We haven't seen this one before on our team and it was unexpected. I would also expect those logs you shared to show a value associated with the header but they don't seem to be there

granite peak Apr 11, 2024, 6:15 PM

#

Ah okay, thanks @indigo acorn and @hasty lion for the info!

indigo acorn Apr 11, 2024, 7:39 PM

#

We are seeing a very large number of check-ins is there some kind of retry logic happening here? We did have an incident this morning that delayed check-ins from appearing in the UI

#

here's what we see on our side:

#

300K check-ins in an hour seems off to me which makes me wonder if you happened to be sending additional checkins by accident?

#

I've just heard that internally we do have rate limits but they're opaque which is why those headers are not populated and also why we return a 202.

The example given was this:

if you have 10 monitor environments ( across all your monitors, maybe 5 monitors 2 envs each) then you get 50 check ins per minute

#

There is also an additional rate limit on our event consumers that you also hit.

granite peak Apr 11, 2024, 8:05 PM

#

Are the event consumer rate limits also opaque?

#

And confirming: that graph of 300K in an hour is specifically for monitor check-ins, not other events, right?

indigo acorn Apr 11, 2024, 8:18 PM

#

its just monitor check-ins

#

the consumer rate limit is 6 per minute per monitor

#

but again you wouldn't see that in your logs

#

I guess my question is - do you have a record of 300K check-ins over that period?

granite peak Apr 11, 2024, 8:19 PM

#

Okay, yeah, that seems like way too many. I'll see if I can look into why.

#

Lemme check for that

#

Yyyuuuppp looks like 319K across environments from 9 AM to 10 AM this morning

#

Almost all of them are coming from one environment, so someone must have been doing something funky on our end in that environment

indigo acorn Apr 11, 2024, 8:27 PM

#

🔥 we're going to write down some of these rate limits in our help center but thanks for validating that 🙂

Sorry for the misinfo initially.

granite peak Apr 11, 2024, 8:32 PM

#

No worries, I really appreciate all the help here

#Getting 202s for cron monitor checkins but reported as missed