#Is it expected that Payload goes offline a lot?

24 messages · Page 1 of 1 (latest)

modest stag
#

Hi, we're building an application that heavily relies on a hosted payload cloud. We've noticed intermittent outages pretty regularly, and it's affecting our users. Is it just us, or is this service-wide?

wary socketBOT
pale bolt
#

The service is hosted by DigitalOcean, and we have not been notified of any outages - so this typically means it's your application code.

I can provide some insight if you provide your project ID from Settings -> Billing

trim spear
#

hi @pale bolt , jumping in here. our project id is:
65c420e015aac56684a56746

#

as @modest stag said, we've been experiencing intermittent outages typically lasting between 5-10 mins. however, the last 24 hours have been constantly down, with 30 seconds of uptime if it kicks back on, then going offline immediately after

pale bolt
#

I'd be curious if triggering a redeploy from the dashboard would cause any change in behavior.

strong iron
#

I host on DO, never had an issue (not saying
that it's not possible)

trim spear
# pale bolt Any indication of issues in the logs?

thanks for the responses.

i redeployed and we were back up for all of about 3 mins, then back down.

@pale bolt the only error i found in the logs was this:
(payload): MongoNetworkError: connection xxxx to xx.xxx.xxx.xxx:xxxx closed

upon searching for the issue, i found you opened this issue a while back:
https://github.com/payloadcms/payload/issues/3180

curious if the cause was ever identified?

GitHub

Link to reproduction . To Reproduce Run payload attached to MongoDB Atlas Serverless Run load test to try and get the error. Describe the Bug One user has reported getting a MongoNetworkError inter...

strong iron
#

@trim spear I'm looking at possible causes, I see one post from the mongo community that may be relevant

If you experience network timeouts or socket errors in communication between clients and servers, or between members of a sharded cluster or replica set, check the TCP keepalive value for the affected systems.

#

try running sysctl net.ipv4.tcp_keepalive_time to check

#

Though, it seems unlikely to me that the requests payload makes to atlas would exceed the defaults

trim spear
#

we're getting back net.inet.tcp.keepidle: 7200000

per that highlighted line, it would seem the value of 7200000 is being ignored?

strong iron
#

It would seem that way, though the time in both cases is longer than the duration of your testing period

#

so i'm not sure it's this

#

I'm curious if this is a pooling issue perhaps

trim spear
#

thank you for the guidance. we're going to keep troubleshooting.

one thing i did notice when using the API:

https://x.payloadcms.app/api/products/6629c476ace2ebaaa45428dc?locale=undefined&draft=true&depth=1

returns the error:

Too many requests, please try again later.

pale bolt
#

I don't see any mongo connection errors in your logs, though, which is odd. Is "back down" the same issue as before?

trim spear
#

yeah, same issue as before.

we have a lot of relationship fields throughout our collections. i'm wondering if they're just not optimized well & getting rate limited

modest stag
#

yesterday we had over 8 hours of outage, there are no indications as to why? are we the only ones that have reached out about this?

pale bolt
#

Just the ones in this thread.

pale bolt