#Very slow response time

39 messages · Page 1 of 1 (latest)

crimson fossil
#

I've had a website hosted on Railway for the past month or so - load times have occasionally run a bit slow (up to 5 seconds) but usually are consistently within 1-2 seconds which is fine.
Today has been a totally different story... consistently taking 6+ seconds for the code to run (I have a run time print statement built in the code) and often the site takes far more than that to load. On top of that, sometimes the site has just gone unresponsive for several minutes at a time, and in the logs errors such as the following print:

[2023-01-25 01:22:02 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:10)
[2023-01-25 01:22:02 +0000] [10] [INFO] Worker exiting (pid: 10)
[2023-01-25 01:22:02 +0000] [288] [INFO] Booting worker with pid: 288
[2023-01-25 01:22:33 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:288)
[2023-01-25 01:22:33 +0000] [288] [INFO] Worker exiting (pid: 288)
[2023-01-25 01:22:33 +0000] [320] [INFO] Booting worker with pid: 320

When I run the site locally I encounter no issues and normal response times. What could be going on here?

olive breachBOT
#

Project ID: 52b32c2c-e65f-443a-b59c-ba6a9bccf24c

#

Samy mentioned that responses from the servers were always < 1 second, but now it's more than 2 seconds and sometimes even more than an 5 seconds. He did a rollback to a month ago and it's still slow, so it's possible that something happened with the servers last week.

crimson fossil
#

52b32c2c-e65f-443a-b59c-ba6a9bccf24c

grand marsh
#

likely just database latencies combined with django's poor optimization of database querying. what kind of queries are you executing to cause this kinda timeout?

#

in most cases the latency to your local database is a couple nanoseconds, whereas the latency from a railway deployment to database is more like 5-10 milliseconds

so each query takes much much longer, causing unoptimized queries to have sort of an exponentially worse performance as the latency increases

crimson fossil
#

The only db-related thing I have is a redis instance, and each website load makes max 4 calls to it. The only other calls are to weather apis which occur when the called location is not in the redis cache. Also I'm using Flask.

crimson fossil
#

Any ideas? This has been an on-and-off issue today. Is it possible if I upgrade to the teams plan it will mitigate the issue?

#

Also this issue began before, but I launched a little promo yesterday and it pushed CPU to 124% (?), is that also reason to upgrade?

quick ginkgo
#

Where is your db located? Railway services are all on US-West.

crimson fossil
#

I created the Redis within the project so its linked within the project environment

quick ginkgo
#

Gotcha

#

Have you tested how fast the weather API responds and if it ever hangs? That could easily be the issue

#

Could be getting rate limited

karmic panther
#

Post it here and I can check the mem and metrics.

crimson fossil
#

I've just thrown in some time print statements thoughout the code so im gonna try to get to bottom of it that way

#

oh wait what do you mean by link

karmic panther
#

Yep!

#

This is it

crimson fossil
#

Ok i’ve tested it out and it definitely has to do with the formula and not the api calls.

#

Basically the formula works by taking the hourly forecast and tweaking the conditions to create combinations of possible actual hour-by-hour snow cover outcomes. For example, it takes the hour-by-hour temperature array and generates 7 arrays with all temps -3, -2, -1, 0, +1, +2, +3. It then combines each array with 7 different timings. This happens with 2 more variables to create 777*8, or 2,744 snow cover arrays.

#

The code is also set to only go through all 4 stages under certain weather forecast conditions which is why only some locations have this issue. Without one of the variables, the number of arrays stays in the low-mid hundreds or less with a max processing time of about 3 seconds.

#

These are numpy arrays and each is a total length of ~30.

#

My computer is able to process the code in full in less than 1.5 seconds, while its taking about 7 seconds on railway. My computer is 32GB RAM while I see Railway is up to 8GB so I’m assuming what’s needed is more processing power.

quick ginkgo
#

Your memory metrics aren’t reaching anywhere close to that amount

grand marsh
#

ideally you do everything database related in 1 go, before you start post-processing data in memory

quick ginkgo
#

I wouldn’t recommend an upgrade to the teams plan here. The CPU power will be the same, just higher capacity. Same with memory. Have you tried multithreading?

#

Judging by how you described the process, this sounds like something you could multithread pretty easily

#

Doesn't look like you're doing it atm judging by your metrics

grand marsh
#

yes i believe multiprocessing is the way to go here, multithreading would be better if this was I/O bound but since this sounds like number crunching multiprocessing is the move probably

crimson fossil
#

Good idea I'm gonna try that and I'll update, thanks

quick ginkgo
#

There’s a difference between multiprocessing and multithreading??

#

did not know that lol

grand marsh
#

multiprocessing is great for cpu intensive stuff
multithreading is great for IO stuff like requests or db queries

#

watched a very handy video on it but those are the main takeaways