#ervan_webhook-handshake-errors
1 messages ยท Page 1 of 1 (latest)
๐ Welcome to your new thread!
โฒ๏ธ We'll be here soon! Typically we respond in a few minutes, but sometimes we might take a bit longer if the server is busy or if you have a particularly tricky question.
โฑ๏ธ We close idle threads, which makes them read-only. Once a thread is closed it won't be reopened, but you can always start a new thread if you have another question.
๐ This thread will always be available, even after it's closed. You can find it again using Discord's search, or you can save this link: https://discord.com/channels/841573134531821608/1310638391490248857
๐ Have more to share? Add more details, code, screenshots, videos, etc. below.
Regarding our logs, we didn't receive any webhook since Nov 20. We do get logs when we try curling our api but stripe does not seem to even connect to our host
Hi there ๐ is it safe to assume you're talking about the deliveries to this webhook endpoint?
we_1QP4UgCespT5BPv9NFfZuNDU
Timed out connecting to remote host errors typically mean we couldn't establish a connection to your server because we didn't receive a response to the handshake.
The webhook you mentionned is the new one we just added to test a workaround. Is it possible to take a look at this one we_1Ox8QxCespT5BPv98ld8tSRQ?
Which handshake are you talking about ? We do manually check the header signature with the body content but this happens once the request is received. As a matter of fact, we don't receive any request in our application (probably because it fails before; hard to know exactly where)
(ps: Hello ! Thank you for your time)
The handshake that happens as part of establishing the tls connection. My understanding is that when this type of behavior is encoutnered, it's typically caused by the requests being blocked at the ingress layer of your network.
Hi Toby, and thanks again for helping us. I'm the client ๐
We tried to curl the endpoint, without any issue
TLS certificates are issued by Let's Encrypt and is valid
We do host our application on Heroku, and thus have very little options to filter requests
Are you able to see all the incoming traffic that is hitting your network? My suspicion is that these requests are being dropped at the edge of your network, and never actually making it to your webhook endpoint.
I agree with your suspicion. All the clues we have is a big silence. Stripe seems to make request but it never reaches us.
What's curious is that we didn't change a thing recently.
We did have lots of request timeout, leading to response time greater than 30sec; could this be the source of our problems ?
We tried re-running events manually recently with sufficient bandwidth though; and still got connection timeout
There are two different types of timeouts. Connection timeouts that happen if we have trouble resolving a hostname or if we don't receive a response to our handshake within 10 seconds. The second is our global timeout for Events, where we expect a response with an indication of whether or not you received the Event within 20 seconds.
Therefore, in regards of the error message Timed out connecting to remote host; it seems that our issue is the former type, right ?
So stripe can't connect with us, but browsers and CLIs can; what could be the explanation ?
Yeah, that's what I'm seeing. The pending Event for the webhook endpoint you shared, evt_3QP5EHCespT5BPv91ytT5hCq, failed to be delivered after 10 seconds.
Typically it means the request makes it to the edge of your network and is then blocked/dropped by some sort of access control.
I'm checking to see if I spot anything that would give me a different suspicion.
Thank you for the details. We're slowly reaching my skill cap as I know very little about Network & Ops ๐ฌ
In your experience, what could lead an application (here mostly heroku) to cut stripe webhooks without intervening ?
I don't know, beyond the broad umbrella of access control. It can fluctuate based on what you're using in your network stack, and I'm not too familiar with the majority of the options there.
Resuming what we learned:
- Our webhook is well configured (it did process events until Nov 19)
- Suddently, we have a 97% failure rate due to connection timeout
- The connection timeout happens outside our application; proly around our PaaS
- Stripe can't change a thing on their own to change this behaviour
The explanation:
Stripe tries to handshake our hostname, and fails to receive a response in a 10sec time-frame
The solution to explore:
eeeeh ๐ I guess it's in our hands ?
Did I get it right or did I misunderstood something ?
Yup, that's my read of what is happening here.
Thank you for your time ! I'd like to leave this thread open a bit again as @inner dagger may have more to ask soon (currently AFK).
Thank you again !
I wanted to flag that we close threads for inactivity ~45m
Thanks for your help
We'll try to reach Heroku (our PaaS)
May we open that thread again later ?
We do not re-open threads but you are welcome to post a new question to the main channel