#Expected latency on railway network (both on metal and off)?

50 messages · Page 1 of 1 (latest)

rapid verge
#

hello I have a simple service that responds very quickly (~1ms, definitley less than 10ms. Most all of my traces on sentry appear as 0.00ms. It is a simple rust axum api). It can run a test with 2,000 concurrent requests in about 200ms and that is with very small time.sleeps in the test case mentioned. However when I deploy, my response times have been anywhere between 300-700ms. Is there something I can do to speed this up?

some info:

  • it runs in a docker container
  • it is hosted in US East on railway metal
  • my ping to US East it about 60ms
  • i am using the default railway provided domain
  • the average server load is extremely low (0.0vCPU and 5-8 MB ram)

If there is any more information needed please let me know. Any help would be greatly appreciated.

Thanks!

wild pathBOT
#

Project ID: N/A

novel fossil
#

When you're testing the app and getting <1ms response times, are you running it locally? If so, those response times should not be expected when running on a cloud provider such as Railway.

I'll need some more info to help you out here. First, please send your project ID so a member of the team can view your project if need.

#

Where are you located?

#

Does your service communicate with any other services/databases?

rapid verge
rapid verge
rapid verge
rapid verge
rapid verge
#

thank you for the timely response!

novel fossil
#

Could you please try swapping your service off of metal?

#

Any region is fine.

rapid verge
#

sure thing, is it alright if i do it with a duplicate service? dont want to muddy the waters but also want to keep this service running but if i have to i will

novel fossil
#

Do what you gotta do! That's not a problem

rapid verge
#

ok moved it to US west no metal. just running with a quick script

num_requests = 10
# Function to measure request times and calculate average
total_time = 0
for i in range(num_requests):
    start_time = time.time()
    try:
        response = requests.get(base_url, headers=headers, timeout=70)
        response_time = time.time() - start_time
        print(f"Request {i + 1}: Took {response_time:.2f} seconds")
        total_time += response_time
    except requests.exceptions.RequestException as e:
        print(f"Request {i + 1}: Failed with exception: {e}")

if num_requests > 0:
    average_time = total_time / num_requests
    print(f"\nAverage time for {num_requests} requests: {average_time:.2f} seconds")

gets me

Request 1: Took 0.19 seconds
Request 2: Took 0.17 seconds
Request 3: Took 0.17 seconds
Request 4: Took 0.17 seconds
Request 5: Took 0.17 seconds
Request 6: Took 0.18 seconds
Request 7: Took 0.17 seconds
Request 8: Took 0.17 seconds
Request 9: Took 0.18 seconds
Request 10: Took 0.17 seconds

Average time for 10 requests: 0.17 seconds

so it is looking better for sure. maybe i should have done US East non metal since that is where the other deployments are

#

also here are some railway http logs for the old deployment

#

and the new one (wow)

rapid verge
#

on US East non metal VVVVVV:

Request 2: Took 0.29 seconds
Request 3: Took 0.28 seconds
Request 4: Took 0.29 seconds
Request 5: Took 0.23 seconds
Request 6: Took 0.22 seconds
Request 7: Took 0.29 seconds
Request 8: Took 0.22 seconds
Request 9: Took 0.29 seconds
Request 10: Took 0.22 seconds

Average time for 10 requests: 0.26 seconds```
#

US East non metal (much larger than us west hmmm)

rapid verge
#

what do the response times in the http logs measure exactly?

novel fossil
#

Hmm not sure. I'm going to have to tag in the team here as there may be bugs I'm not aware of with metal

#

!t

hollow lodgeBOT
#

New reply sent from Help Station thread:

⬆️ This thread has been escalated to the Railway team.

You're seeing this because this thread has been automatically linked to the Help Station thread.

hollow ridge
#

it can run a test with 2,000 concurrent requests in about 200ms

where is this number from? your local computer?

do you have the metal edge enabled on the service you have deployed to metal?

rapid verge
hollow ridge
#

what computer do you have? It is very possible it has a higher per core performance than the CPUs we use on Railway as we went for higher price efficiency for our customers

rapid verge
#

M4 pro 48GB ram. The endpoint just does one to a few hashmap lookups and returns. According to sentry it is still very fast on railway too (0.33ms duration avg)

hollow ridge
#

Yep our CPUs on Metal do not have the same single core performance of the M4, it's just not a fair comparison, we choose 5th Gen Xeons for the cost efficiency that we pass onto the customers

rapid verge
#

That makes sense! I don't expect the same performance, again the 2000 burst requests is just an example to illustrate that the endpoint doesn't take a lot of time. The hosted service doesn't even get near that amount of traffic, hence the 0.0 vCPU avg usage on railway with 32 allocated cores. The main thing I am wanting to get to the bottom of is the expected network latency. Sentry is reporting that the request duration is indeed very low, but like I mentioned the east metal deployment was yielding between 300ms to 700ms response times, which I doubt would be caused by a 5th Gen Xeon doing a hashmap lookup.

hollow ridge
#

haha our routing had to be far more complex than a hashmap at our scale, but yes it's not the cause here.

Do you also have the Metal Edge enabled?

rapid verge
rapid verge
hollow ridge
#

sentry is reporting function execution time, those times in the http logs record round trip time from when your request hits our edge network to when your application finishes the request.

#

So may I ask where you are located, because if you aren't east cost you will of course see increased latency due to distance

rapid verge
#

Utah, however our production environments are on the east coast which is why this is deployed there (however this is used by our dev environments too). I just thought the latency was unusually high so wanted to check in to see if this is expected for now or if there is anything I should do on my end

hollow ridge
#

please put the service on metal and switch on the edge network and test again

rapid verge
#

sounds good! I'll check back in a bit thanks!

#

Just to check, edge doesn't replicate your service does right? This service is meant to be stateful (in memory stateful) as a single global source of truth

rapid verge
#

aka it is serving as a glorified redis db haha, in other words there can only be a single instance of it for it to work/be useful

hollow ridge
#

The edge network does not replicate anything

rapid verge
#

ok thanks, its back on east metal but I don't see where to enable edge

hollow ridge
#

hover your mouse over the domain

rapid verge
#
Request 2: Took 0.20 seconds
Request 3: Took 0.28 seconds
Request 4: Took 0.26 seconds
Request 5: Took 0.20 seconds
Request 6: Took 0.20 seconds
Request 7: Took 0.20 seconds
Request 8: Took 0.20 seconds
Request 9: Took 0.18 seconds
Request 10: Took 0.20 seconds

Average time for 10 requests: 0.23 seconds```
#

wow that is much better, thank you!

hollow ridge
#

no problem!

rapid verge
# rapid verge and the new one (wow)

any idea how this one was doing so well? i know I'm closer but not THAT much closer. Could it have been a demand thing/low usage at that time?

hollow ridge
#

not sure tbh