#jaked-retry-failed-webhook

1 messages · Page 1 of 1 (latest)

marsh plinth
#

Hi there, yes webhooks will automatically retry until retries expire or until a 200 is received, even if you manually resend.

wise ingot
#

so if we manually resend... how do we make sure we don't process future auto retries by stripe?

marsh plinth
#

You will need to handle the event just like you would handle duplicates, which can occur.

#

So you would check if you already ingested the data and then you would return a 200 and not take any action.

wise ingot
#

so on the auto retry we see the pending_webhooks as 0

#

when we get a webhook our approach is to then retrieve the event again from stripe... could we see if we have 0 pending webhooks and just bail at that point?

marsh plinth
#

Once pending_webhooks is 0 then yes the event should not send to your webhook again as it indicates there was already a 200 returned.

wise ingot
#

thats not true as evidenced by the example I am providing

#

We manually resend an event.. which has a pending_webhook of 1 ... we retrieve the event immediately and see pending webhook is 0 ... but then later see another succesful retry (done by stripe) for the event and even in that request payload (looking via stripe ui) it shows pending webhook as 0

#

so even though pending webhook was 0 ... stripe still sent the webhook

#

here is the pending webhook section of the auto retry that occurred after our manual one

#

We actually pulled the event via the CLI after we manually retried it and saw pending webhook immediately go to 0 so was expecting the previous failed attempts to just not do anything when their auto retry time came around

marsh plinth
#

Looking

#

My understanding is that these are separate

#

But double checking on behavior here

#

So when looking at your manual resend, I can see that the payload has "pending_webhooks": 1,

#

But you are saying you retrieve the event immediately and it shows 0?

wise ingot
#

correct and then later when it auto retried...the pending_webhook is 0.. .so why did it resend it

marsh plinth
#

Well that makes sense for later on.... the pending_webhook is 0 at that point because you now handled the retry with a 200

#

I'm just confused how you are seeing pending_webhooks: 0 after the manual resend

wise ingot
#

no... the pending_webhook in the request payload from stripe is 0... that is before it comes to us... so that means an event with a pending_webhook of 0 was still sent

#

My screenshot from above shows the request payload of the auto retry... where it showed pending webhook was 0.

marsh plinth
#

The request payload when you retrieve the event, not from the webhook request itself, right?

#

I'm not looking in the Dashboard... I'm looking at our internal logs. let me look at the Dashboard, one sec.

#

No the Dashboard shows 1 as well....

#

The second attempt from the top is the manual retry

wise ingot
#

Let me re-explain. We manually resent the webhook. We then immediately (via the stripe CLI retrieved the event) and saw the pending webhook was 0. In the ui the previously failed attempts had a Next Retry value of 3 minutes. After 3 minutes we saw a new attempt in the UI which we felt should not have occurred b/c there were 0 pending webhooks after our manual resend. We expanded the auto retry and saw the request payload also said there were 0 pending webhooks.

marsh plinth
#

Yeah okay so I will need to check on my end whether I can reproduce seeing 0 pending_webhooks when you retrieve the event. That should say 1 as the auto retry still should happen. Everything else that you are stating is expected.

wise ingot
#

Why would it still auto retry... if the event has 0 pending webhooks.. the previusly failed attempts should not actually send.

#

I'll give you anohter example you can pull... b/c we had a bug and have thousands to fix... one sec

marsh plinth
#

A resend is completely separate from the original event. It is essentially a new Webhook. We will still retry the original instance until we get a 200.

wise ingot
#

evt_1L3KjWGhx9M4dYTOwNvEokuA

#

This event I just manually resent.. It shows 0 pending webhooks if you pull the event. However there are some previous attempts with a retry in 3mins. Multiple actually. We will still see a new auto retry send when that time expires.

#

may need to find one with more time before auto retry as it already sent the auto retry

marsh plinth
#

Can you share the response body from your event retrieval?

wise ingot
#

after the manual retry?

marsh plinth
#

Yes when you retrieve the event via the CLI after you send the retry

wise ingot
#

Here is a new event: evt_1L39xhGhx9M4dYTOsPopKTJn

#

Just manually processed

#

{
id: 'evt_1L39xhGhx9M4dYTOsPopKTJn',
object: 'event',
api_version: '2020-08-27',
created: 1653445717,
data: {
object: {
id: 'sub_1KWtVTGhx9M4dYTOQLGWFQ7B',
object: 'subscription',
application: null,
application_fee_percent: null,
automatic_tax: [Object],
billing_cycle_anchor: 1645755967,
billing_thresholds: null,
cancel_at: null,
cancel_at_period_end: false,
canceled_at: null,
collection_method: 'charge_automatically',
created: 1645755967,
current_period_end: 1656123967,
current_period_start: 1653445567,
customer: 'cus_LDK9xL3TNZPyY9',
days_until_due: null,
default_payment_method: null,
default_source: null,
default_tax_rates: [],
description: null,
discount: null,
ended_at: null,
items: [Object],
latest_invoice: 'in_1L39xhGhx9M4dYTOqSBjLdvo',
livemode: true,
metadata: {},
next_pending_invoice_item_invoice: null,
pause_collection: null,
payment_settings: [Object],
pending_invoice_item_interval: null,
pending_setup_intent: null,
pending_update: null,
plan: [Object],
quantity: 1,
schedule: null,
start_date: 1645755967,
status: 'active',
test_clock: null,
transfer_data: null,
trial_end: null,
trial_start: null
},
previous_attributes: {
current_period_end: 1653445567,
current_period_start: 1650853567,
latest_invoice: 'in_1KsHd2Ghx9M4dYTO2wx25UMY'
}
},
livemode: true,
pending_webhooks: 0,
request: { id: null, idempotency_key: null },
type: 'customer.subscription.updated'
}

#

Thats the response of retrieving the event after the manual processing of the previously failed attempt.

#

Currently it shows that a retry will occur in 6hrs even though we just resent the previously failed attempt. When 6 hrs passes we will see a new auto retry was done and if we expand the request payload via the stripe UI on that new auto retry the pending_webhooks will be 0 in the request as well which doesn't make sense.

marsh plinth
#

Okay I'll need to test this on my end and see if I can reproduce

#

As noted, I do expect the retry to happen

#

What I would not expect is us to not indicate that there is still another retry if you retrieve the event

wise ingot
#

Why do you expect it to happen? I am retrying a failed attempt which is successful which means any failed attempt for that event should not retry again.

#

Basically what i am saying is that why is there a next retry at all..... if a successful retry has occurred (even if done with a manual resend) since all the retries are for the same failed webhook.

marsh plinth
#

You are resending via the CLI and not the Dashboard, correct?

wise ingot
#

Now I'm seeing something weird. We manually resent another failed event's webhook attempt. And when the next retry time came around it did NOT resend the webhook which is what we expected.

#

We have tried both approaches.

#

The previous examples I gave you we retried the failed attempts... but then later saw the auto retry still happen.

#

Maybe only the ones done via the CLI still caused the auto retry to occur??

marsh plinth
#

Yes I think that would be the case.

wise ingot
#

Its inconsistent. We just did another manual retry... but stripe still send the auto retry later after a success was received

marsh plinth
#

When you say "manual retry" you mean via the Dashboard?

wise ingot
#

yes

marsh plinth
#

Can you send me that event ID?

wise ingot
#

evt_1L34ICGhx9M4dYTOILxNXVob

marsh plinth
#

Okay taking a look internally to see if this has been raised before

#

One sec

#

Okay

#

This is expected though undesirable

#

It was indeed raised relatively recently and talked through

#

The short of it is that a manual retry creates a new internal webhook token that doesn't match the original of the event. This means essentially the previous retry gets enqueued and does not get canceled.

#

It looks like there was discussion about improving this behavior, but it isn't being prioritized at the minute so I can't provide a timeline.

#

So overall, unfortunately, these retries will still occur even after a manual resend.

wise ingot
#

But that doesn't line up. Here is another event: evt_1L34IKGhx9M4dYTO2SxZ9mpr

#

That event we did a manual retry via the ui... then when the next retry time came... it did NOT resend the webhook

marsh plinth
#

It still is pending

#

It is going to retry

#

In 2 mins

wise ingot
#

So event evt_1L34ICGhx9M4dYTOILxNXVob did do another auto retry but evt_1L34IKGhx9M4dYTO2SxZ9mpr did not

#

not it says "2 mins ago" not "in 2 minutes"

marsh plinth
#

Hmm that's right in the Dashboard. It shows me pending still internally... 🤔

wise ingot
#

We definitely see expicit different behavior. In one case we manually resend a previous failed attempt and stripe does NOT execute the auto retry and in another case we do.

marsh plinth
#

Okay let me confer with a colleague here to see if I'm missing something.

wise ingot
#

Is something now down.. I actually see other events like (evt_1L34JSGhx9M4dYTOTynpmkf9) which we have not manually retried saying it should have sent the auto retry 10+ mins ago

marsh plinth
#

Not aware of anything being down, but looking into why it is showing this

wise ingot
#

I just looked at multiple failed events which should have already done their auto retry and they are showing something "X minutes ago"

#

another: evt_1L34TDGhx9M4dYTOGm9ZOfTB

#

evt_1L34JSGhx9M4dYTOTynpmkf9 now doesn't show 10+ min ago anymore... do you not show a pending one for it either?

marsh plinth
#

Ah yeah that just retried

#

Okay it looks like we did just have a little bit of lag in our enqueuing

#

I think it just caught up

#

This stuff is not expected to happen synchronously, but it should happen within a few minutes.

wise ingot
#

So with the inconsistent behavior. Sry we derailed a second. On event evt_1L34ICGhx9M4dYTOILxNXVob we saw stripe do the retry after the manual attempt but event evt_1L34IKGhx9M4dYTO2SxZ9mpr looked like it did not retry...... but ultimately it did b/c there was a lag on the enqueue

marsh plinth
#

Yes

wise ingot
#

K, I'm tracking... now... on those second auto retries.... we should be checking if the pending_webhook is 0 when we retrieve the event and just return a 200 in that case.... so we dont double process the event?

marsh plinth
#

Yep.

wise ingot
#

We just added a bit of code to do that... we return a 200 with a body of IGNORED

marsh plinth
#

That should do it, and I am also seeing that response for those auto retries

wise ingot
#

Example auto retry we ignored with new code: evt_1L34IKGhx9M4dYTO2SxZ9mpr

marsh plinth
#

So seems to be handling it correctly

#

Yep that looks good

#

That said... you are only running this code for events that you have initiated a manual retry, correct?

wise ingot
#

I know that still has a gap if we have another partner webhook endpoint which we do .... Rewardful... if their system is down for whatever reason it is possible we get a value of pending_webhook of 1 even though we actually processed the event

#

No, we added this for all incoming webhooks. Is that wrong?

marsh plinth
#

Never mind

#

You are correct

#

This should work well. If you haven't manually retried then it should be pending_webhooks: 1

#

So you process it

#

If 0, that indicates you have already processed via the manual retry.

#

So yeah, that should work well