Is it expected that Payload goes offline a lot? | Payload | Page 1

modest stag May 1, 2024, 10:14 PM

#

Hi, we're building an application that heavily relies on a hosted payload cloud. We've noticed intermittent outages pretty regularly, and it's affecting our users. Is it just us, or is this service-wide?

wary socketBOT May 1, 2024, 10:14 PM

#

New Community-Help Thread Created!

Help is on the way! To mark it as solved, use the /solve command. In the meantime, here are some existing threads that may help you:

Documentation:

pale bolt May 2, 2024, 12:19 AM

#

The service is hosted by DigitalOcean, and we have not been notified of any outages - so this typically means it's your application code.

I can provide some insight if you provide your project ID from Settings -> Billing

trim spear May 2, 2024, 12:25 PM

#

hi @pale bolt , jumping in here. our project id is:
65c420e015aac56684a56746

#

as @modest stag said, we've been experiencing intermittent outages typically lasting between 5-10 mins. however, the last 24 hours have been constantly down, with 30 seconds of uptime if it kicks back on, then going offline immediately after

pale bolt May 2, 2024, 1:14 PM

#

trim spear hi <@967118574445547650> , jumping in here. our project id is: 65c420e015aac5668...

Any indication of issues in the logs?

#

I'd be curious if triggering a redeploy from the dashboard would cause any change in behavior.

strong iron May 2, 2024, 1:38 PM

#

I host on DO, never had an issue (not saying
that it's not possible)

trim spear May 2, 2024, 1:55 PM

#

pale bolt Any indication of issues in the logs?

thanks for the responses.

i redeployed and we were back up for all of about 3 mins, then back down.

@pale bolt the only error i found in the logs was this:
(payload): MongoNetworkError: connection xxxx to xx.xxx.xxx.xxx:xxxx closed

upon searching for the issue, i found you opened this issue a while back:
https://github.com/payloadcms/payload/issues/3180

curious if the cause was ever identified?

GitHub

Intermittent MongoNetworkError when running on MongoDB Atlas Server...

Link to reproduction . To Reproduce Run payload attached to MongoDB Atlas Serverless Run load test to try and get the error. Describe the Bug One user has reported getting a MongoNetworkError inter...

strong iron May 2, 2024, 1:58 PM

#

@trim spear I'm looking at possible causes, I see one post from the mongo community that may be relevant

If you experience network timeouts or socket errors in communication between clients and servers, or between members of a sharded cluster or replica set, check the TCP keepalive value for the affected systems.

#

try running sysctl net.ipv4.tcp_keepalive_time to check

#

https://www.mongodb.com/docs/manual/faq/diagnostics/#does-tcp-keepalive-time-affect-mongodb-deployments-

FAQ: MongoDB Diagnostics

#

Though, it seems unlikely to me that the requests payload makes to atlas would exceed the defaults

trim spear May 2, 2024, 2:25 PM

#

we're getting back net.inet.tcp.keepidle: 7200000

per that highlighted line, it would seem the value of 7200000 is being ignored?

Screenshot_2024-05-02_at_10.24.29_AM.png

strong iron May 2, 2024, 2:38 PM

#

It would seem that way, though the time in both cases is longer than the duration of your testing period

#

so i'm not sure it's this

#

I'm curious if this is a pooling issue perhaps

trim spear May 2, 2024, 3:31 PM

#

thank you for the guidance. we're going to keep troubleshooting.

one thing i did notice when using the API:

https://x.payloadcms.app/api/products/6629c476ace2ebaaa45428dc?locale=undefined&draft=true&depth=1

returns the error:

Too many requests, please try again later.

pale bolt May 2, 2024, 3:33 PM

#

trim spear thanks for the responses. i redeployed and we were back up for all of about 3 m...

A root cause was not found, and we were unable to recreate.

#

I don't see any mongo connection errors in your logs, though, which is odd. Is "back down" the same issue as before?

trim spear May 2, 2024, 4:09 PM

#

yeah, same issue as before.

we have a lot of relationship fields throughout our collections. i'm wondering if they're just not optimized well & getting rate limited

modest stag May 2, 2024, 5:47 PM

#

yesterday we had over 8 hours of outage, there are no indications as to why? are we the only ones that have reached out about this?

pale bolt May 2, 2024, 9:14 PM

#

Just the ones in this thread.

pale bolt May 2, 2024, 9:14 PM

#

modest stag yesterday we had over 8 hours of outage, there are no indications as to why? are...

Can you give more info on what you're seeing? 504s? slow requests? Is your front-end and back-end hosted on Payload Cloud?

#Is it expected that Payload goes offline a lot?

Documentation: