#ervan_webhook-handshake-errors

1 messages ยท Page 1 of 1 (latest)

fleet haloBOT
#

๐Ÿ‘‹ Welcome to your new thread!

โฒ๏ธ We'll be here soon! Typically we respond in a few minutes, but sometimes we might take a bit longer if the server is busy or if you have a particularly tricky question.

โฑ๏ธ We close idle threads, which makes them read-only. Once a thread is closed it won't be reopened, but you can always start a new thread if you have another question.

๐Ÿ”— This thread will always be available, even after it's closed. You can find it again using Discord's search, or you can save this link: https://discord.com/channels/841573134531821608/1310638391490248857

๐Ÿ“ Have more to share? Add more details, code, screenshots, videos, etc. below.

idle rain
#

Regarding our logs, we didn't receive any webhook since Nov 20. We do get logs when we try curling our api but stripe does not seem to even connect to our host

lost basinBOT
fossil prairie
#

Hi there ๐Ÿ‘‹ is it safe to assume you're talking about the deliveries to this webhook endpoint?
we_1QP4UgCespT5BPv9NFfZuNDU

#

Timed out connecting to remote host errors typically mean we couldn't establish a connection to your server because we didn't receive a response to the handshake.

idle rain
#

The webhook you mentionned is the new one we just added to test a workaround. Is it possible to take a look at this one we_1Ox8QxCespT5BPv98ld8tSRQ?

#

Which handshake are you talking about ? We do manually check the header signature with the body content but this happens once the request is received. As a matter of fact, we don't receive any request in our application (probably because it fails before; hard to know exactly where)

#

(ps: Hello ! Thank you for your time)

fossil prairie
#

The handshake that happens as part of establishing the tls connection. My understanding is that when this type of behavior is encoutnered, it's typically caused by the requests being blocked at the ingress layer of your network.

inner dagger
#

Hi Toby, and thanks again for helping us. I'm the client ๐Ÿ˜Š

#

We tried to curl the endpoint, without any issue

#

TLS certificates are issued by Let's Encrypt and is valid

#

We do host our application on Heroku, and thus have very little options to filter requests

fossil prairie
#

Are you able to see all the incoming traffic that is hitting your network? My suspicion is that these requests are being dropped at the edge of your network, and never actually making it to your webhook endpoint.

idle rain
#

I agree with your suspicion. All the clues we have is a big silence. Stripe seems to make request but it never reaches us.

What's curious is that we didn't change a thing recently.

#

We did have lots of request timeout, leading to response time greater than 30sec; could this be the source of our problems ?

#

We tried re-running events manually recently with sufficient bandwidth though; and still got connection timeout

fossil prairie
#

There are two different types of timeouts. Connection timeouts that happen if we have trouble resolving a hostname or if we don't receive a response to our handshake within 10 seconds. The second is our global timeout for Events, where we expect a response with an indication of whether or not you received the Event within 20 seconds.

idle rain
#

Therefore, in regards of the error message Timed out connecting to remote host; it seems that our issue is the former type, right ?

#

So stripe can't connect with us, but browsers and CLIs can; what could be the explanation ?

fossil prairie
#

Yeah, that's what I'm seeing. The pending Event for the webhook endpoint you shared, evt_3QP5EHCespT5BPv91ytT5hCq, failed to be delivered after 10 seconds.

Typically it means the request makes it to the edge of your network and is then blocked/dropped by some sort of access control.

I'm checking to see if I spot anything that would give me a different suspicion.

fleet haloBOT
idle rain
#

Thank you for the details. We're slowly reaching my skill cap as I know very little about Network & Ops ๐Ÿ˜ฌ
In your experience, what could lead an application (here mostly heroku) to cut stripe webhooks without intervening ?

fossil prairie
#

I don't know, beyond the broad umbrella of access control. It can fluctuate based on what you're using in your network stack, and I'm not too familiar with the majority of the options there.

idle rain
#

Resuming what we learned:

  • Our webhook is well configured (it did process events until Nov 19)
  • Suddently, we have a 97% failure rate due to connection timeout
  • The connection timeout happens outside our application; proly around our PaaS
  • Stripe can't change a thing on their own to change this behaviour

The explanation:
Stripe tries to handshake our hostname, and fails to receive a response in a 10sec time-frame

The solution to explore:
eeeeh ๐Ÿ‘€ I guess it's in our hands ?

Did I get it right or did I misunderstood something ?

fossil prairie
#

Yup, that's my read of what is happening here.

idle rain
#

Thank you for your time ! I'd like to leave this thread open a bit again as @inner dagger may have more to ask soon (currently AFK).

Thank you again !

small notch
#

I wanted to flag that we close threads for inactivity ~45m

inner dagger
#

Thanks for your help

#

We'll try to reach Heroku (our PaaS)

#

May we open that thread again later ?

small notch
#

We do not re-open threads but you are welcome to post a new question to the main channel