#jaked-retry-failed-webhook
1 messages · Page 1 of 1 (latest)
Hi there, yes webhooks will automatically retry until retries expire or until a 200 is received, even if you manually resend.
so if we manually resend... how do we make sure we don't process future auto retries by stripe?
You will need to handle the event just like you would handle duplicates, which can occur.
So you would check if you already ingested the data and then you would return a 200 and not take any action.
so on the auto retry we see the pending_webhooks as 0
when we get a webhook our approach is to then retrieve the event again from stripe... could we see if we have 0 pending webhooks and just bail at that point?
Once pending_webhooks is 0 then yes the event should not send to your webhook again as it indicates there was already a 200 returned.
thats not true as evidenced by the example I am providing
We manually resend an event.. which has a pending_webhook of 1 ... we retrieve the event immediately and see pending webhook is 0 ... but then later see another succesful retry (done by stripe) for the event and even in that request payload (looking via stripe ui) it shows pending webhook as 0
so even though pending webhook was 0 ... stripe still sent the webhook
here is the pending webhook section of the auto retry that occurred after our manual one
We actually pulled the event via the CLI after we manually retried it and saw pending webhook immediately go to 0 so was expecting the previous failed attempts to just not do anything when their auto retry time came around
Looking
My understanding is that these are separate
But double checking on behavior here
So when looking at your manual resend, I can see that the payload has "pending_webhooks": 1,
But you are saying you retrieve the event immediately and it shows 0?
correct and then later when it auto retried...the pending_webhook is 0.. .so why did it resend it
Well that makes sense for later on.... the pending_webhook is 0 at that point because you now handled the retry with a 200
I'm just confused how you are seeing pending_webhooks: 0 after the manual resend
no... the pending_webhook in the request payload from stripe is 0... that is before it comes to us... so that means an event with a pending_webhook of 0 was still sent
My screenshot from above shows the request payload of the auto retry... where it showed pending webhook was 0.
The request payload when you retrieve the event, not from the webhook request itself, right?
I'm not looking in the Dashboard... I'm looking at our internal logs. let me look at the Dashboard, one sec.
No the Dashboard shows 1 as well....
The second attempt from the top is the manual retry
Let me re-explain. We manually resent the webhook. We then immediately (via the stripe CLI retrieved the event) and saw the pending webhook was 0. In the ui the previously failed attempts had a Next Retry value of 3 minutes. After 3 minutes we saw a new attempt in the UI which we felt should not have occurred b/c there were 0 pending webhooks after our manual resend. We expanded the auto retry and saw the request payload also said there were 0 pending webhooks.
Yeah okay so I will need to check on my end whether I can reproduce seeing 0 pending_webhooks when you retrieve the event. That should say 1 as the auto retry still should happen. Everything else that you are stating is expected.
Why would it still auto retry... if the event has 0 pending webhooks.. the previusly failed attempts should not actually send.
I'll give you anohter example you can pull... b/c we had a bug and have thousands to fix... one sec
A resend is completely separate from the original event. It is essentially a new Webhook. We will still retry the original instance until we get a 200.
evt_1L3KjWGhx9M4dYTOwNvEokuA
This event I just manually resent.. It shows 0 pending webhooks if you pull the event. However there are some previous attempts with a retry in 3mins. Multiple actually. We will still see a new auto retry send when that time expires.
may need to find one with more time before auto retry as it already sent the auto retry
Can you share the response body from your event retrieval?
after the manual retry?
Yes when you retrieve the event via the CLI after you send the retry
Here is a new event: evt_1L39xhGhx9M4dYTOsPopKTJn
Just manually processed
{
id: 'evt_1L39xhGhx9M4dYTOsPopKTJn',
object: 'event',
api_version: '2020-08-27',
created: 1653445717,
data: {
object: {
id: 'sub_1KWtVTGhx9M4dYTOQLGWFQ7B',
object: 'subscription',
application: null,
application_fee_percent: null,
automatic_tax: [Object],
billing_cycle_anchor: 1645755967,
billing_thresholds: null,
cancel_at: null,
cancel_at_period_end: false,
canceled_at: null,
collection_method: 'charge_automatically',
created: 1645755967,
current_period_end: 1656123967,
current_period_start: 1653445567,
customer: 'cus_LDK9xL3TNZPyY9',
days_until_due: null,
default_payment_method: null,
default_source: null,
default_tax_rates: [],
description: null,
discount: null,
ended_at: null,
items: [Object],
latest_invoice: 'in_1L39xhGhx9M4dYTOqSBjLdvo',
livemode: true,
metadata: {},
next_pending_invoice_item_invoice: null,
pause_collection: null,
payment_settings: [Object],
pending_invoice_item_interval: null,
pending_setup_intent: null,
pending_update: null,
plan: [Object],
quantity: 1,
schedule: null,
start_date: 1645755967,
status: 'active',
test_clock: null,
transfer_data: null,
trial_end: null,
trial_start: null
},
previous_attributes: {
current_period_end: 1653445567,
current_period_start: 1650853567,
latest_invoice: 'in_1KsHd2Ghx9M4dYTO2wx25UMY'
}
},
livemode: true,
pending_webhooks: 0,
request: { id: null, idempotency_key: null },
type: 'customer.subscription.updated'
}
Thats the response of retrieving the event after the manual processing of the previously failed attempt.
Currently it shows that a retry will occur in 6hrs even though we just resent the previously failed attempt. When 6 hrs passes we will see a new auto retry was done and if we expand the request payload via the stripe UI on that new auto retry the pending_webhooks will be 0 in the request as well which doesn't make sense.
Okay I'll need to test this on my end and see if I can reproduce
As noted, I do expect the retry to happen
What I would not expect is us to not indicate that there is still another retry if you retrieve the event
Why do you expect it to happen? I am retrying a failed attempt which is successful which means any failed attempt for that event should not retry again.
Basically what i am saying is that why is there a next retry at all..... if a successful retry has occurred (even if done with a manual resend) since all the retries are for the same failed webhook.
You are resending via the CLI and not the Dashboard, correct?
Now I'm seeing something weird. We manually resent another failed event's webhook attempt. And when the next retry time came around it did NOT resend the webhook which is what we expected.
We have tried both approaches.
The previous examples I gave you we retried the failed attempts... but then later saw the auto retry still happen.
Maybe only the ones done via the CLI still caused the auto retry to occur??
Yes I think that would be the case.
Its inconsistent. We just did another manual retry... but stripe still send the auto retry later after a success was received
When you say "manual retry" you mean via the Dashboard?
Can you send me that event ID?
evt_1L34ICGhx9M4dYTOILxNXVob
Okay taking a look internally to see if this has been raised before
One sec
Okay
This is expected though undesirable
It was indeed raised relatively recently and talked through
The short of it is that a manual retry creates a new internal webhook token that doesn't match the original of the event. This means essentially the previous retry gets enqueued and does not get canceled.
It looks like there was discussion about improving this behavior, but it isn't being prioritized at the minute so I can't provide a timeline.
So overall, unfortunately, these retries will still occur even after a manual resend.
But that doesn't line up. Here is another event: evt_1L34IKGhx9M4dYTO2SxZ9mpr
That event we did a manual retry via the ui... then when the next retry time came... it did NOT resend the webhook
So event evt_1L34ICGhx9M4dYTOILxNXVob did do another auto retry but evt_1L34IKGhx9M4dYTO2SxZ9mpr did not
not it says "2 mins ago" not "in 2 minutes"
Hmm that's right in the Dashboard. It shows me pending still internally... 🤔
We definitely see expicit different behavior. In one case we manually resend a previous failed attempt and stripe does NOT execute the auto retry and in another case we do.
Okay let me confer with a colleague here to see if I'm missing something.
Is something now down.. I actually see other events like (evt_1L34JSGhx9M4dYTOTynpmkf9) which we have not manually retried saying it should have sent the auto retry 10+ mins ago
Not aware of anything being down, but looking into why it is showing this
I just looked at multiple failed events which should have already done their auto retry and they are showing something "X minutes ago"
another: evt_1L34TDGhx9M4dYTOGm9ZOfTB
evt_1L34JSGhx9M4dYTOTynpmkf9 now doesn't show 10+ min ago anymore... do you not show a pending one for it either?
Ah yeah that just retried
Okay it looks like we did just have a little bit of lag in our enqueuing
I think it just caught up
This stuff is not expected to happen synchronously, but it should happen within a few minutes.
So with the inconsistent behavior. Sry we derailed a second. On event evt_1L34ICGhx9M4dYTOILxNXVob we saw stripe do the retry after the manual attempt but event evt_1L34IKGhx9M4dYTO2SxZ9mpr looked like it did not retry...... but ultimately it did b/c there was a lag on the enqueue
Yes
K, I'm tracking... now... on those second auto retries.... we should be checking if the pending_webhook is 0 when we retrieve the event and just return a 200 in that case.... so we dont double process the event?
Yep.
We just added a bit of code to do that... we return a 200 with a body of IGNORED
That should do it, and I am also seeing that response for those auto retries
Example auto retry we ignored with new code: evt_1L34IKGhx9M4dYTO2SxZ9mpr
So seems to be handling it correctly
Yep that looks good
That said... you are only running this code for events that you have initiated a manual retry, correct?
I know that still has a gap if we have another partner webhook endpoint which we do .... Rewardful... if their system is down for whatever reason it is possible we get a value of pending_webhook of 1 even though we actually processed the event
No, we added this for all incoming webhooks. Is that wrong?