V2 + App Sleep = first response always empty | Railway | Page 1

tropic grotto Jul 4, 2024, 6:10 AM

#

I recently deployed a new Bun API on V2 with App Sleep, and I've noticed that the first request to a sleeping app always returns an empty response. This hasn't happened on non-V2 Bun apps with App Sleep on.

The following are tests with curl using the same URL/endpoint.

Normal request (non-sleeping app, {"status": "OK"} is the response from my API):

HTTP/2 200
content-type: application/json;charset=utf-8
date: Thu, 04 Jul 2024 06:03:09 GMT
server: railway-edge
x-request-id: 56xHFuw3QlCDX-2Zclvhkw_3165824431
content-length: 15

{"status":"OK"}
real    0m0.269s
user    0m0.016s
sys     0m0.000s

First request on the same app but sleeping:

HTTP/2 200
server: railway-edge
x-request-id: Z4G6GgaAQziEbf20vOO_UQ_3165824431
content-length: 0
date: Thu, 04 Jul 2024 06:02:56 GMT


real    0m1.275s
user    0m0.000s
sys     0m0.000s

Project ID: 34304961-2ebf-4d0b-b2ae-3585cf6b9353

slender lodgeBOT Jul 4, 2024, 6:10 AM

#

Project ID: 34304961-2ebf-4d0b-b2ae-3585cf6b9353

dusky plaza Jul 4, 2024, 6:13 AM

#

can you also provide the same data for the same app running on the legacy runtime

tropic grotto Jul 4, 2024, 6:14 AM

#

you mean change the runtime for that app, right?

#

I tested a different app running on Legacy and the issue didn't happen

#

but I'll change the runtime

dusky plaza Jul 4, 2024, 6:15 AM

#

testing a different app is not conclusive, when testing you need to change only one variable at a time, a completely different app changes too many variables

tropic grotto Jul 4, 2024, 6:16 AM

#

it was another Bun app, but I can see the variables

#

I'm deploying the reported app on Legacy and have to wait for it to sleep 🙂

dusky plaza Jul 4, 2024, 6:16 AM

#

I'm not talking about environment variables

tropic grotto Jul 4, 2024, 6:17 AM

#

yeah I meant variables as not in environment variables but how the apps are different despite both being Bun apis

#

deploy is done, will report back in about 10 mins

tropic grotto Jul 4, 2024, 6:35 AM

#

Test done, request on sleeping app worked fine:

HTTP/2 200
content-type: application/json;charset=utf-8
date: Thu, 04 Jul 2024 06:34:47 GMT
server: railway-edge
x-request-id: gLwN8r6fSpSLChJuzIS30g_3165824431
content-length: 15

{"status":"OK"}
real    0m1.795s
user    0m0.000s
sys     0m0.016s

#

switching back to V2 to repeat the test

#

btw that cold boot time = 🏆

dusky plaza Jul 4, 2024, 6:38 AM

#

1.795s is good?

tropic grotto Jul 4, 2024, 6:38 AM

#

for a cold boot time? I'd say excellent

#

I have another Rails app running on Railway that cold-boots in about 10s, kinda bad

#

but that's mostly Rails to blame

dusky plaza Jul 4, 2024, 6:40 AM

#

similarly I have a feeling this is bun to blame

tropic grotto Jul 4, 2024, 6:40 AM

#

I'm running a compiled executable, so very likely

dusky plaza Jul 4, 2024, 6:40 AM

#

isn't that the recommended way to run in production though

tropic grotto Jul 4, 2024, 6:41 AM

#

it is, I'm following their guide on that

#

for comparison, this is normal request time

content-type: application/json;charset=utf-8
date: Thu, 04 Jul 2024 06:38:27 GMT
server: railway-edge
x-request-id: vAXlgIj9Rlq8binHs09mAQ_603524580
content-length: 15

{"status":"OK"}
real    0m0.221s
user    0m0.000s
sys     0m0.000s

#

the increased memory usage on V2 is a bit of a bummer though

dusky plaza Jul 4, 2024, 6:44 AM

#

how much of an increase.

tropic grotto Jul 4, 2024, 6:45 AM

#

Legacy was running ~36MB
V2 is running 53~61MB

dusky plaza Jul 4, 2024, 6:45 AM

#

is this the exact same app, or are you comparing different apps again lol

tropic grotto Jul 4, 2024, 6:45 AM

#

it's the same app

dusky plaza Jul 4, 2024, 6:46 AM

#

someone else reported higher memory usage on the v2 runtime, but I can't reproduce it just by purely allocating bytes

tropic grotto Jul 4, 2024, 6:49 AM

#

just by looking at its memory metrics
looking further back (the app has been up only for a couple hours) the lowest it got on V2 was 42MB, but I only had one run of it on Legacy, so probably needs more data

#

but the difference is quite noticeable, maybe not in the image because the chart ceiling is a bit too high

#

btw the app went to sleep again and curl returned an empty response

#

if it matters, the service has a volume

dusky plaza Jul 4, 2024, 6:52 AM

#

remove the volume and try again?

tropic grotto Jul 4, 2024, 6:52 AM

#

I'll try, that might break the app though

#

app broke, trying to fix it

#

alright it's back up, now waiting for sleep

#

Network graph also wild on Legacy

dusky plaza Jul 4, 2024, 6:58 AM

#

then it's a good thing the legacy runtime will be phased out

tropic grotto Jul 4, 2024, 7:03 AM

#

got empty response on sleeping app, so volume is not it

dusky plaza Jul 4, 2024, 7:04 AM

#

okay can you provide a minimal reproducible bun app that sends an empty response on the v2 runtime

tropic grotto Jul 4, 2024, 7:06 AM

#

I'll try

tropic grotto Jul 4, 2024, 7:32 AM

#

I deployed a smaller app, and could not replicate the issue

#

but here's the thing... the affected app also stops having the issue 👀

dusky plaza Jul 4, 2024, 7:33 AM

#

uh.. task failed successfully?

tropic grotto Jul 4, 2024, 7:33 AM

#

ugh, technology these days 😛

dusky plaza Jul 4, 2024, 7:33 AM

#

ugh, bun these days

tropic grotto Jul 4, 2024, 7:33 AM

#

now that I have both apps running, I'll try to replicate it again with the affected app

#

then try to replicate it with the smaller app

#

code btw: https://github.com/pauldps/bun-railway-v2-test

#

I'll try deploying to separate project in case of same project shenanigans

dusky plaza Jul 4, 2024, 7:39 AM

#

that's definitely minimal

tropic grotto Jul 4, 2024, 7:45 AM

#

I was able to reproduce the issue with the minimal app in a separate project

#

and the original app also started showing blank responses after I removed the minimal app from that project 👀

dusky plaza Jul 4, 2024, 7:49 AM

#

this is looking more like instabilities with bun

#

try the same code with node?

tropic grotto Jul 4, 2024, 8:09 AM

#

added a node branch to the minimal app and deployed it, now waiting for sleep

dusky plaza Jul 4, 2024, 8:11 AM

#

just a question, why do you have the healthcheck timeout set to a low value like 30 seconds?

tropic grotto Jul 4, 2024, 8:12 AM

#

because I want it to fail fast
usually if the first request fails, the deploy is likely busted, and I don't want to wait 5 minutes for the deployment to fail

dusky plaza Jul 4, 2024, 8:13 AM

#

makes sense

tropic grotto Jul 4, 2024, 8:13 AM

#

I think for Rails apps with slower boot times I set a higher value

#

got the empty response with the Node app as well

dusky plaza Jul 4, 2024, 8:17 AM

#

interesting

#

can you link the applicable deployment

tropic grotto Jul 4, 2024, 8:18 AM

#

two requests


$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
server: railway-edge
x-request-id: 3OBw1saOQ2WUAAJCR4VCGw_603524580
content-length: 0
date: Thu, 04 Jul 2024 08:17:05 GMT


real    0m1.322s
user    0m0.000s
sys     0m0.000s


$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
content-type: application/json
date: Thu, 04 Jul 2024 08:17:29 GMT
server: railway-edge
x-request-id: 2SzMY7RyRVie9l27WpIy2Q_882434190
content-length: 19

{"status": "NODE"}

real    0m0.342s
user    0m0.000s
sys     0m0.000s

#

the deployment? or the project?

dusky plaza Jul 4, 2024, 8:19 AM

#

the deployment

tropic grotto Jul 4, 2024, 8:19 AM

#

oh, got it
7368d15e-ed13-4684-aab3-72e2b3bdaa74

dusky plaza Jul 4, 2024, 8:19 AM

#

full link please

tropic grotto Jul 4, 2024, 8:20 AM

#

the url in the browser when the deployment is open?

#

https://railway.app/project/46548220-e0ba-4a16-b80a-706a55133413/service/8da48915-af62-477c-b850-17671fee4e86?id=7368d15e-ed13-4684-aab3-72e2b3bdaa74

dusky plaza Jul 4, 2024, 8:26 AM

#

would it be too much to ask you to also do an express app?

tropic grotto Jul 4, 2024, 8:27 AM

#

lemme see if I can do it quickly, I never used express before lol

dusky plaza Jul 4, 2024, 8:28 AM

#

that's a crazy sentence, I had never imagined someone who uses bun and Elysia to say they've never used express

tropic grotto Jul 4, 2024, 8:31 AM

#

when express was a thing I was mostly working with Rails
when I moved to Node it was during a time where express was considered too slow compared to other libs, so I never touched it

#

express app is up

$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
content-type: application/json; charset=utf-8
date: Thu, 04 Jul 2024 08:33:43 GMT
etag: W/"14-kjLmVQInBma0jJMTEoZwvPwAyY4"
server: railway-edge
x-powered-by: Express
x-request-id: F32SRDmRQFSyDPyKYfb06w_603524580
content-length: 20

{"status":"EXPRESS"}
real    0m0.388s
user    0m0.000s
sys     0m0.000s

waiting for sleep

#

code in the express branch

#

the express app responds correctly

dusky plaza Jul 4, 2024, 8:44 AM

#

you're still on the v2 runtime?

tropic grotto Jul 4, 2024, 8:45 AM

#

yes, I just changed the branch and nothing else

#

I wonder what's going on with Node's http server, which is probably what Bun servers are based off of

#

using express is not an option for me though

dusky plaza Jul 4, 2024, 8:52 AM

#

well that seems like this isn't an issue with railway then

tropic grotto Jul 4, 2024, 8:54 AM

#

hold on

#

got an empty response with express

#

I think my first test was too fast

#

$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
server: railway-edge
x-request-id: xSTDdCMbTrexjKz3i8FsOg_1654200396
content-length: 0
date: Thu, 04 Jul 2024 08:54:27 GMT


real    0m1.298s
user    0m0.000s
sys     0m0.000s


$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
content-type: application/json; charset=utf-8
date: Thu, 04 Jul 2024 08:54:53 GMT
etag: W/"14-kjLmVQInBma0jJMTEoZwvPwAyY4"
server: railway-edge
x-powered-by: Express
x-request-id: 7r720LsRTEyZ8daihSwTQg_1654200396
content-length: 20

{"status":"EXPRESS"}
real    0m0.270s
user    0m0.000s
sys     0m0.000s

#

it seems like an issue with V2 to me

#

could potentially test with non-Javascript frameworks but that would be a bit too much for me to do atm

dusky plaza Jul 4, 2024, 8:58 AM

#

ill test with a go server

#

what happens if i dont experience the same issue?

tropic grotto Jul 4, 2024, 9:01 AM

#

I can test with Ruby later, but for now I need to go sleep myself lol

tropic grotto Jul 4, 2024, 9:03 AM

#

dusky plaza what happens if i dont experience the same issue?

good question, I have a theory, but want to test a slow language first

#

remember to deploy in a new project since it seems multiple services in a project can affect the results, I'd like to test more about that part too

dusky plaza Jul 4, 2024, 9:06 AM

#

i have indeed created a new project

tropic grotto Jul 4, 2024, 4:51 PM

#

I have deployed a Ruby/Sinatra app, and was not able to replicate the issue on the first cold boot. But I'm seeing a pattern in the logs that I want to investigate

#

these are the logs from the Express app. My first request did not trigger the problem, but my second did.

The second request was after the "container event container died" log entry that was absent from the first request. So I'm trying to get that log entry to show on the Sinatra app

#

the "Stopping Container" spam seems to indicate there's a problem somewhere with V2

#

was not able to replicate the issue with Ruby after 2 attempts. I'm going back to the main branch (Bun) to see if maybe the problem resolved itself

dusky plaza Jul 4, 2024, 5:05 PM

#

stopping container is it being put to sleep

tropic grotto Jul 4, 2024, 5:05 PM

#

does it show even if the app is already sleeping?

#

"Stopping Container" logs did not show up for the Ruby app 🤔

#

but it did go to sleep

#

(according to the dashboard)

dusky plaza Jul 4, 2024, 5:06 PM

#

maybe the ruby app is on the legacy runtime

tropic grotto Jul 4, 2024, 5:07 PM

#

I will doublecheck after I test one more time with the Bun branch

#

the problem is still there with the Bun app. The logs:

#

no "Stopping Container" tho

#

switching to the ruby branch for now to investigate more, made sure it's on V2

tropic grotto Jul 4, 2024, 6:18 PM

#

got to reproduce the issue with the Sinatra app. It was a little worse as two requests gave empty responses before the third one returned the correct response

$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
server: railway-edge
x-request-id: OPED8JR1TW6SpSjny6blUg_882434190
content-length: 0
date: Thu, 04 Jul 2024 18:16:25 GMT


real    0m1.567s
user    0m0.016s
sys     0m0.000s



$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
server: railway-edge
x-request-id: PbIYVlwJQ-qkE6uGJUtMsw_882434190
content-length: 0
date: Thu, 04 Jul 2024 18:16:29 GMT


real    0m0.211s
user    0m0.000s
sys     0m0.000s



$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
content-type: application/json
date: Thu, 04 Jul 2024 18:16:35 GMT
server: railway-edge
server: WEBrick/1.8.1 (Ruby/3.2.4/2024-04-23)
x-content-type-options: nosniff
x-request-id: ZBY6eoJsS_qCym1oa23Yyg_3165824431
content-length: 20

{"status":"SINATRA"}
real    0m0.378s
user    0m0.000s
sys     0m0.000s

#

logs

#

(those are not errors btw, wtf Sinatra)

dusky plaza Jul 4, 2024, 6:20 PM

#

printed to stderr

#

I have requested my go app a few times when it has gone to sleep and was not able to get an empty response

tropic grotto Jul 4, 2024, 6:21 PM

#

were the logs like the above? I let my app sleep for about an hour or so before making requests

#

from observation the problem seems to be related to those "contained died" and "stopping container" errors

dusky plaza Jul 4, 2024, 6:23 PM

#

those are regular event logs, nothing to be concerned about

tropic grotto Jul 4, 2024, 6:23 PM

#

right, I meant logs, not errors 👍

dusky plaza Jul 4, 2024, 6:24 PM

#

yeah the container log stuff is perfectly normal

tropic grotto Jul 4, 2024, 6:26 PM

#

I do think they seem to indicate the container is going into a state where it fails to render responses on wakeup
so far V2 is the common denominator; I've changed projects and languages, and the problem doesn't happen on Legacy. What else could we try?

dusky plaza Jul 4, 2024, 6:28 PM

#

not sure, I'll report it to the team anyway

tropic grotto Jul 4, 2024, 6:28 PM

#

thanks, I'll keep the project up if the team wants to debug/investigate

fair geyser Jul 4, 2024, 6:35 PM

#

tropic grotto got to reproduce the issue with the Sinatra app. It was a little worse as two re...

So you can repro this on both bun and sinatra?

tropic grotto Jul 4, 2024, 6:37 PM

#

correct, also Node-http and express

fair geyser Jul 4, 2024, 6:50 PM

#

Ack and escalated

#

It should be triaged on Monday

dusky plaza Jul 8, 2024, 7:37 AM

#

fairly certain that the blank response should in fact be a 503 application failed to respond page, but railway is no longer sending that page at the moment due to what i believe to be a bug.

so lets assume your first response to a sleeping service is a 503 status code, meaning your app did not respond to the first request in time, that explains why a statically compiled go app did not exhibit this behavior.

when a request comes in for a slept app the container is started and a tcp connection attempt is done on a loop every 30ms, once that succeeds the request is forwarded to your app, but if your app is not ready to handle http traffic just yet you will get the 503 app failed to respond page, the apps health check is not taken into account.

theres definitely some room for improvement here on the railway side of things for waking sleeping services aside from fixing the blank page being sent instead of 503.

fair geyser Jul 8, 2024, 5:14 PM

#

dusky plaza fairly certain that the blank response should in fact be a 503 application faile...

Is this for the new proxy or?

dusky plaza Jul 8, 2024, 5:14 PM

#

yep all testing done with only the new proxy enabled

fair geyser Jul 8, 2024, 6:13 PM

#

dusky plaza yep all testing done with only the new proxy enabled

Great. Miguel merged a fix for this. Should be good to go

dusky plaza Jul 8, 2024, 6:14 PM

#

for clarity, the fix was for the blank response instead of the 503 application failed to respond page that should have been shown

tropic grotto Jul 8, 2024, 6:14 PM

#

I can handle the 503 response better than a blank page, can set my client to retry or something

#

(although it would be nice if the 503 didn't happen)

fair geyser Jul 8, 2024, 6:17 PM

#

dusky plaza for clarity, the fix was for the blank response instead of the 503 application f...

We now no longer return a 200

#

We should return a 500 as was the previous behavior

tropic grotto Jul 8, 2024, 6:19 PM

#

just ran a test now and got a 502 with a long HTML error page

fair geyser Jul 8, 2024, 6:20 PM

#

tropic grotto just ran a test now and got a 502 with a long HTML error page

That's correct ye?

tropic grotto Jul 8, 2024, 6:20 PM

#

$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 502
content-type: text/html
server: railway-edge
x-railway-fallback: true
x-request-id: Y6CCMwhzRo-9NqOjSW-vSw_882434190
content-length: 4689
date: Mon, 08 Jul 2024 18:19:26 GMT

... HTML ...

#

that's better than a 200 for sure

fair geyser Jul 8, 2024, 6:20 PM

#

What's "Best"

tropic grotto Jul 8, 2024, 6:20 PM

#

Best would be 200 with my app returning the correct response

fair geyser Jul 8, 2024, 6:20 PM

#

tropic grotto Best would be 200 with my app returning the correct response

Well yes

#

Is your app returning the correct response?

tropic grotto Jul 8, 2024, 6:21 PM

#

no, it's returning that huge HTML from Railway

fair geyser Jul 8, 2024, 6:21 PM

#

tropic grotto no, it's returning that huge HTML from Railway

No I mean like

tropic grotto Jul 8, 2024, 6:21 PM

#

that's just the first request though

fair geyser Jul 8, 2024, 6:21 PM

#

What happens is this intermittent is it always

tropic grotto Jul 8, 2024, 6:21 PM

#

the next requests work fine

#

it's always when the app is sleeping

#

"first request when app is sleeping"

fair geyser Jul 8, 2024, 6:23 PM

#

tropic grotto "first request when app is sleeping"

"First request when app sleeping results in 503"

#

Gotchu

#

Esclating again

#

This is only on the V2 runtime ye?

tropic grotto Jul 8, 2024, 6:23 PM

#

correct

#

does not happen on Legacy

#

I suspect it wasn't more widely noticed because it was returning a 200

fair geyser Jul 8, 2024, 6:24 PM

#

tropic grotto I suspect it wasn't more widely noticed because it was returning a 200

Yep I suspect so too. Bubbled up

severe lion Jul 8, 2024, 6:28 PM

#

@dusky plaza , is this application one of those app sleep healthcheck candidates ?

#

The fact the legacy proxy would work but new one does not makes me think the healthcheck feature we talked about wouldn't matter.

The new proxy is using the same timeouts as legacy and these edge proxies are not aware of application sleep logic. There must be a timeout setting that is different.

dusky plaza Jul 8, 2024, 6:30 PM

#

yep I feel like this could benefit from having the heath check checked on waking up the service

severe lion Jul 8, 2024, 6:37 PM

#

tropic grotto does not happen on Legacy

oh this is saying that this issue with app sleep waking only happens on v2 runtime but works on legacy runtime.

Correct ?

#

We just happen to be on the new proxy and saw those 200s (sorry about that...will work on ensuring issues that like that don't through)

#

I was able to reproduce and have a strong lead on the issue.

severe lion Jul 8, 2024, 9:42 PM

#

I'm continuing to work on this and will share updates here when I have some.

severe lion Jul 9, 2024, 6:22 PM

#

Hello, I'm still looking into this.

severe lion Jul 10, 2024, 7:12 PM

#

heads up, I saw a few reports of 502s and the people's containers were stopped

#

if you don't exit with non-0 the restart policy won't start it again.

#

unsure what stopped the people's applications but pointing out that checking the logs for container exits and restart the container resolved the issue.

dusky plaza Jul 10, 2024, 7:26 PM

#

noted, thanks for the heads up

severe lion Jul 11, 2024, 9:14 PM

#

I’m not sure if I’ll be digging in to the more since I found in 2 examples of this the container was stopped.

#

Next time someone brings this up I’ll just ask to restart their container to see if it comes back

#

I’ll have to look in to it if more people report of course

dusky plaza Jul 11, 2024, 9:18 PM

#

OP did say it was reproducible between multiple deployments

severe lion Jul 12, 2024, 6:39 PM

#

ok I don't think it's stopped containers in that case.

#

!remind me to reproduce this in 1 hour

rose craneBOT Jul 12, 2024, 6:39 PM

#

severe lion !remind me to reproduce this in 1 hour

Got it, I will remind you to reproduce this at Fri, 12 Jul 2024 19:39:21 GMT

rose craneBOT Jul 12, 2024, 7:39 PM

#

Hey @severe lion, remember to reproduce this - #1258304003930984531 message

fair geyser Jul 12, 2024, 7:39 PM

#

rose crane Hey <@181632237500891136>, remember to reproduce this - https://discord.com/chan...

BTW I don't think we have the cycles ATM to repro stuff for people

They'll have to come to us with reproductions that we can have a look at

severe lion Jul 12, 2024, 8:25 PM

#

In production I have an app with app sleep and v2 runtime. On my first request it says the app is unavailable. The container starts though. Second request it works.

So easy to reproduce.

fair geyser Jul 12, 2024, 8:34 PM

#

severe lion In production I have an app with app sleep and v2 runtime. On my first request i...

Okay then givr

tropic grotto Jul 13, 2024, 12:43 AM

#

severe lion I’m not sure if I’ll be digging in to the more since I found in 2 examples of th...

I don't understand this part: "the container was stopped". Do you mean the container was stopped manually? In my case the container stopped because it went to sleep. I didn't do anything to stop the container and my app is a long-running http server, so it doesn't stop on its own. The only reason it stops is the App Sleeping feature.

dusky plaza Jul 13, 2024, 12:52 AM

#

I can assure you applications can stop on their own

fair geyser Jul 13, 2024, 1:01 AM

#

dusky plaza I can assure you applications can stop on their own

I think like, even if it does stop on it's own, we should probably restart it?

#

Cause like if it crashes and another request or something comes in IDK

dusky plaza Jul 13, 2024, 1:02 AM

#

if it exits with an error code, yes, if it exits with a success code, maybe not? but yes if restart is set to always

severe lion Jul 13, 2024, 2:09 AM

#

Brody got it for the restart behaviour

#

The fact I could reproduce this though means we can disregard what I said about the container being stopped.

I think mentioning it might have been a mistake as I was jumping between a few threads debugging stuff.

I will dig into this more next week.

rose craneBOT Jul 16, 2024, 10:19 AM

#

New reply sent from Help Station thread:

I have same problem...With legacy runtime work well but with new V2 not.The first http(s) request via browser is allways 502, nexts working normally.Build with custom Dockerfile based on alpine+nginx+phpfpm

You're seeing this because this thread has been automatically linked to the Help Station thread.

severe lion Jul 19, 2024, 4:46 PM

#

hey folks, I spent some time on this and basically, the v2 runtime wakes and forwards http requests differently than v1 runtime. I have observed the success rate of starting and getting an HTTP response to be pretty flakey (sometimes it works, sometimes it does not). I believe this is something to do with how fast the container can start in v2 runtime before the request times out.

I can't spend more time on this right now because the number of reports for this has been small and have to prioritize some other issues.

If you need app sleep right now I advise just using the v1 (legacy) runtime.

rose craneBOT Jul 24, 2024, 8:27 AM

#

New reply sent from Help Station thread:

Thanks for the update, I just want to say that I also faced the same issue with my python instance. Mainly observing that the first request wakes up the instance (but the request does not go through) but any subsequent request s work. Will try v1 for the time being but would be nice if resolved.

You're seeing this because this thread has been automatically linked to the Help Station thread.

dusky plaza Jul 25, 2024, 3:29 PM

#

severe lion hey folks, I spent some time on this and basically, the v2 runtime wakes and for...

another report of it here - https://help.railway.app/feedback/app-sleeping-65af513a#solution

fair geyser Jul 25, 2024, 6:52 PM

#

We’re expressly not going to be able to prioritize this until the new proxy is out unfortunately

tropic grotto Oct 14, 2024, 8:53 PM

#

Has something changed with this issue? My Legacy apps are starting to show 502 errors after coming back from sleep
Most of my apps also no longer allow me to change between Legacy and V2

dusky plaza Oct 14, 2024, 8:54 PM

#

We have indeed removed the ability for users to switch back to legacy

tropic grotto Oct 14, 2024, 8:55 PM

#

Is Legacy going to be removed soon?

dusky plaza Oct 14, 2024, 8:56 PM

#

when we move to metal legacy will not be supported, thus in the intrest of moving to metal faster all deploys for all plan tiers use runtime v2

tropic grotto Oct 14, 2024, 8:56 PM

#

I really need App Sleep to work reliably 😦

dusky plaza Oct 14, 2024, 8:58 PM

#

well then you will be pleased to know that the new proxy is indeed fully rolled out and 100% of the nearly half a million domains used on our platform now have their traffic served via the new proxy, thus we should be able to take a look at picking back up the 502 app sleeping issue.

tropic grotto Oct 14, 2024, 8:59 PM

#

That would be great, V2 working with App Sleep would be ideal

dusky plaza Oct 14, 2024, 9:02 PM

#

i've also bumped the linear ticket on your behalf

tropic grotto Oct 14, 2024, 9:03 PM

#

Appreciate that, thanks!

fair geyser Oct 28, 2024, 4:28 PM

#

We will make sure this works reliably within the next 2 weeks

#

!remind me to circle back in 2 weeks

rose craneBOT Oct 28, 2024, 4:28 PM

#

fair geyser !remind me to circle back in 2 weeks

Got it, I will remind you to circle back at Mon, 11 Nov 2024 16:28:32 GMT

dusky plaza Nov 6, 2024, 4:47 PM

#

Just wanna say that we are actively working on a solution to this!

rose craneBOT Nov 11, 2024, 4:28 PM

#

Hey @fair geyser, remember to circle back - #1258304003930984531 message

dusky plaza Nov 11, 2024, 4:30 PM

#

dusky plaza Just wanna say that we are actively working on a solution to this!

Circling has been done

severe lion Nov 11, 2024, 5:19 PM

#

We have a solution and hoping to have it out this week.

severe lion Nov 14, 2024, 10:21 PM

#

Solution is being tested and trying to get it out tomorrow.

#

I'll be sure to comment here when it is.

severe lion Nov 16, 2024, 7:20 AM

#

A fix has been merged and in production now. @tropic grotto give it a try whenever you can and let me know. Current implementation allows your app to take up to 10 seconds to accept the incoming connection.

dusky plaza Nov 16, 2024, 7:21 AM

#

They do at least have to redeploy for the new chances to take effect right?

severe lion Nov 16, 2024, 7:22 AM

#

oh yes. Please trigger a redeploy. This action applies some settings for the network to be aware of your application has application sleep set.

#

since this issue impacted applications that started slower than 100 ms, making something that backfilled the applications did not seem worth it given a redeploy would fix.

My go application for example never has this issue because it starts up fast enough to accept the connection before the host rejects it thinking there's no app listening.

#

started slower than 100 ms

this number is a guess. I think it's roughly correct. Might be 30-100ms

tropic grotto Nov 16, 2024, 5:59 PM

#

So this is what I did:

Changed the app to V2
Triggered a redeploy (also made some code changes etc, it's a GraphQL Yoga API running in Bun)
Service starts fine (health check worked on first try) and runs fine in a browser
Service goes to sleep
I refresh the website
Got the error in the first image
Did I miss anything?

dusky plaza Nov 16, 2024, 6:00 PM

#

request id please

tropic grotto Nov 16, 2024, 6:00 PM

#

n6zQAxIuT5ysSFbJ-GY0nA_3118653284

dusky plaza Nov 16, 2024, 6:23 PM

#

ill let mig comment on this

#

though, might be worth trying a newer version of bun, you're on 1.1.18

tropic grotto Nov 16, 2024, 7:09 PM

#

I'll do that soon, but I don't think it will fix the issue
the app is booting in about ~1s

dusky plaza Nov 16, 2024, 7:11 PM

#

i was able to confirm app sleeping works with a node app that took 8 seconds to start, so this may just be bun being bun

tropic grotto Nov 16, 2024, 7:12 PM

#

maybe my project is stuck in some old/cached workflow?

dusky plaza Nov 16, 2024, 7:12 PM

#

what region?

tropic grotto Nov 16, 2024, 7:16 PM

#

us-west, the default one

dusky plaza Nov 16, 2024, 7:17 PM

#

same, we'll see if mig wants to work around bun's strange networking issues on monday

tropic grotto Nov 16, 2024, 7:28 PM

#

I can test with a Ruby/Sinatra app later

tropic grotto Nov 16, 2024, 8:07 PM

#

I tested it with another Bun app but in a different project and it worked
I think my project/service is borked somehow

#

the Sinatra app also worked fine on first try

#

just re-tested the project that had the issue and it still showed the error

#

This project works: 46548220-e0ba-4a16-b80a-706a55133413
This one does not: 34304961-2ebf-4d0b-b2ae-3585cf6b9353 (service: e2a687a5-9ce2-4694-81ae-12c6756b0bce)

atomic onyx Nov 16, 2024, 8:29 PM

#

maybe try with the same code but in a new project?

tropic grotto Nov 16, 2024, 8:50 PM

#

for the project that's not working it'll be a bit more difficult since it has other dependencies inside that project that I'd have to deploy too, but I will do it if time permits

atomic onyx Nov 16, 2024, 9:42 PM

#

tropic grotto for the project that's not working it'll be a bit more difficult since it has ot...

you could create a template from the project and then create a new project from it

#

its in project settings

tropic grotto Nov 16, 2024, 9:43 PM

#

oh, didn't know that. I'll give it a try

#

will the new project use the same env variables and stuff?

severe lion Nov 16, 2024, 10:55 PM

#

dusky plaza same, we'll see if mig wants to work around bun's strange networking issues on m...

If someone gives me the source code for reproducible bug I will check it out!

tropic grotto Nov 16, 2024, 11:18 PM

#

I copied my services to another project and it seems to be working without issues, no 502s on wakeup

#

so it seems my old project is somehow bugged

#

I'll make one last test, as my old service had a volume attached that I wasn't using. I've deleted the volume, redeployed the service, and will wait for it to sleep

tropic grotto Nov 16, 2024, 11:56 PM

#

just did ☝️ and it errored the same, so it doesn't seem to be volume-related

dusky plaza Nov 17, 2024, 1:06 AM

#

@tropic grotto - as mig said, we would need a reproducible example in order to look into it

tropic grotto Nov 17, 2024, 1:56 AM

#

I'm not sure that's reproducible
both projects are running the same code with the same env vars, one works, one does not

#

I have given the project IDs of both, feel free to look into them

dusky plaza Nov 17, 2024, 1:57 AM

#

unfortunately we wont be able to spend time doing that, we would need a reproducible example

tropic grotto Nov 17, 2024, 1:58 AM

#

you're probably going to get other people with old projects facing the issue but some of them won't be able to do what I did (copy/move everything to a new project)

dusky plaza Nov 17, 2024, 1:59 AM

#

i had a service i deployed 8 months ago, with the changes done it can now properly wake up from sleep

tropic grotto Nov 17, 2024, 2:00 AM

#

same-ish with my Sinatra project
that's why I think it's a problem with my project specifically
it's not something related to source code

#

something about my project, infrastructure/configuration-wise, not code-wise, might be causing the issue

#

but I can't look at infra/configs

dusky plaza Nov 17, 2024, 2:02 AM

#

if you think that is the case you are welcome to try duplicating the gesund-api service

tropic grotto Nov 17, 2024, 2:02 AM

#

I already did that

#

... ah, the service, into the same project?

dusky plaza Nov 17, 2024, 2:02 AM

#

yes

tropic grotto Nov 17, 2024, 2:02 AM

#

I duplicated the entire project

#

let me give that a try

#

done, will wait for it to sleep

#

also have an update:
the second project's first request failed on wake up, just tried now

#

that said, how do I actually send the code of this repo for reproduction?

#

Request ID: f5B_p5h0T5SopI4KU

dusky plaza Nov 17, 2024, 2:09 AM

#

tropic grotto that said, how do I actually send the code of this repo for reproduction?

https://stackoverflow.com/help/minimal-reproducible-example

dusky plaza Nov 17, 2024, 2:09 AM

#

tropic grotto also have an update: the second project's first request failed on wake up, just ...

bun too?

tropic grotto Nov 17, 2024, 2:11 AM

#

the exact same code as gesund-api. Bun as well

dusky plaza Nov 17, 2024, 2:11 AM

#

just since its easy, I'd still recommend trying the latest bun version

tropic grotto Nov 17, 2024, 2:14 AM

#

Bun upgraded to 1.1.34

dusky plaza Nov 17, 2024, 2:15 AM

#

and if this doesn't work, we would need that MRE

tropic grotto Nov 17, 2024, 2:16 AM

#

it's going to be very hard for me to have a MRE if my other Bun example (let's call that Project 3) isn't failing

#

I'll upgrade Bun there too and test it again

dusky plaza Nov 17, 2024, 2:17 AM

#

tropic grotto also have an update: the second project's first request failed on wake up, just ...

project id?

tropic grotto Nov 17, 2024, 2:19 AM

#

46548220-e0ba-4a16-b80a-706a55133413
this one is just a plain Bun server
it doesn't seem to be affected by the issue
I have other branches where I have other servers like Sinatra for testing purposes

dusky plaza Nov 17, 2024, 2:20 AM

#

also bun 1.1.18

tropic grotto Nov 17, 2024, 2:20 AM

#

as for that MRE I mean how i can make it actually minimal, it's a graphql api with multiple endpoints, I don't know what is making it fail if anything.
this testing takes time, I have to wait the app to sleep and test the first request, it doesn't seem like I can make several code changes removing stuff until I find the culprit if there's one

dusky plaza Nov 17, 2024, 2:21 AM

#

its up to you to remove as much code from the current app while still retaining the issue

tropic grotto Nov 17, 2024, 2:22 AM

#

that might take me a very long time

dusky plaza Nov 17, 2024, 2:23 AM

#

there is no rush

tropic grotto Nov 17, 2024, 2:23 AM

#

I'll see what I can do but I'm not happy about having to do this when the server is returning a 502 which seems to be out of my control

dusky plaza Nov 17, 2024, 2:24 AM

#

what your app is doing is out of our control too, and thus we need that MRE to reproduce and patch around it

tropic grotto Nov 17, 2024, 2:24 AM

#

don't you think it it was an error on my app we'd at least see it in the logs?

#

there are no logs for the failed request

dusky plaza Nov 17, 2024, 2:25 AM

#

no i dont think we would see logs for this

tropic grotto Nov 17, 2024, 2:26 AM

#

does Railway use the health check path during wakeup?

dusky plaza Nov 17, 2024, 2:26 AM

#

no

tropic grotto Nov 17, 2024, 2:26 AM

#

how does it know it is up and ready to serve requests?

dusky plaza Nov 17, 2024, 2:26 AM

#

we replay the incoming connection for up to 10 seconds

tropic grotto Nov 17, 2024, 3:14 AM

#

I have a MRE.
Instead of changing my project, I deployed a minimal Graphql-yoga+Bun server to Project 3.
Just tried the first request on sleep and it failed with a 502.
Here's the code: https://github.com/pauldps/bun-railway-v2-test/tree/graphql-yoga

dusky plaza Nov 17, 2024, 3:18 AM

#

so the newer bun version didnt help it seems

tropic grotto Nov 17, 2024, 3:19 AM

#

yup, the previous version (no graphql) running Bun 1.1.18 also didn't have the issue

dusky plaza Nov 17, 2024, 3:21 AM

#

alright, thank you

tropic grotto Nov 17, 2024, 3:22 AM

#

Strangely: I deployed the same app on Project 2 (a0aefb5f-15c4-49c6-a7ec-020b58d0cfc5)
same branch and everything. It's working fine!

#

I've checked it twice now and both requests worked fine. So I don't know what's going on

dusky plaza Nov 17, 2024, 3:23 AM

#

well i have your code deployed so ill let you know if i can reproduce it

tropic grotto Nov 17, 2024, 3:57 AM

#

I got an error on the app running on Project 2. The fact that it occasionally works seems to be a bit random.

dusky plaza Nov 17, 2024, 3:58 AM

#

I got your bun MRE to cause a 502 on wake

#

so yeah, some strange issue with bun

tropic grotto Nov 17, 2024, 3:59 AM

#

is Bun networking known to cause issues?

dusky plaza Nov 17, 2024, 3:59 AM

#

yeah, it's not the first time

severe lion Nov 17, 2024, 10:56 PM

#

since we got a MRE I could try it out tomorrow.

Thanks for the persistence on wanting to get this fixed. Sometimes it's a 50/50 effort for us to help when the issue is rare.

dusky plaza Nov 17, 2024, 10:57 PM

#

severe lion since we got a MRE I could try it out tomorrow. Thanks for the persistence on ...

here is their MRE repo in template form -

https://railway.com/template/lGBlqd

tropic grotto Nov 18, 2024, 3:39 AM

#

is it deploying the graphql-yoga branch? (can't tell)

dusky plaza Nov 18, 2024, 3:40 AM

#

yes

tropic grotto Dec 5, 2024, 2:37 PM

#

👋 Any updates on this? I'm seeing another post reporting similar issues with other languages (https://discord.com/channels/713503345364697088/1313496536650616852). Hopefully they can provide a MRE.
Can confirm the issue is still happening with my Bun apps. I have set up client retry mechanisms to work around it in the meanwhile.

#

One thing I noticed that makes this difficult to work around is that when the 502 is shown via XHR, it fails CORS (b/c CORS handling is in my app and the request doesn't reach it), so my client-retry mechanism has to be aware of that too

#

Also this is just a hunch but my app boots nearly instantly. Maybe there's a race condition somewhere where the app boots too fast that's not covered?

dusky plaza Dec 5, 2024, 3:41 PM

#

mig had to move his attention elsewhere for the time being.

dusky plaza Jan 9, 2025, 7:23 PM

#

@tropic grotto - we push an additional fix, can you redeploy and then let us know if your app can wake up properly?

tropic grotto Jan 9, 2025, 7:25 PM

#

will do, thanks! will report in a while

oak valley Jan 9, 2025, 8:09 PM

#

Not sure if that is related, but when app sleeping is used with some chunky apps, the first request still always fails

dusky plaza Jan 9, 2025, 8:16 PM

#

yeah that's what we hope we have just fixed, at least as long as the app responds within 10 seconds

oak valley Jan 9, 2025, 8:27 PM

#

Welp 10s seems not to be enough for some apps to come back after sleep

dusky plaza Jan 9, 2025, 8:33 PM

#

we talked about it internally and think 10 seconds is plently for most apps

#

its not like you'll have to run your migration commands as part of the start command for much longer 😉

tropic grotto Jan 9, 2025, 8:42 PM

#

Image 1:

Service ID ae4c8ca4-00b8-415b-ac10-4baa1612691f
🔴 Received a 502 on wakeup, 200 afterwards
Image 2 (similar stack, but a MRE):
Service ID da0153da-2b2d-4ea2-aed8-e6791380c74a
🟢 Received a 200 on wakeup and afterwards

#

The logs for both are without any errors

#

Previously, the MRE was also failing. It's succeeding now, so I think we have some progress

#

I'll keep testing to see if the MRE will randomly fail

dusky plaza Jan 9, 2025, 8:47 PM

#

failed to forward request to upstream: failed to read from request body

tropic grotto Jan 9, 2025, 8:48 PM

#

what would that mean? something like the app is up and listening, but the request ultimately failed?

dusky plaza Jan 9, 2025, 8:49 PM

#

something like that, an application (bun) level issue, out of your control though, i wonder if we retry on that error, given that it means the initial connection did work

atomic onyx Jan 9, 2025, 8:50 PM

#

maybe try a request logger on Bun to see what it actually responds?

tropic grotto Jan 9, 2025, 8:50 PM

#

sounds like there's a race condition somewhere, but it's weird that I don't get any errors in app logs

#

These are the logs from the failing service

#

There's a 9 second gap between Starting Container (Railway) and my shell script (The one starting with [Boot])

#

I use a shell script to start my app; once it starts the app boots in 1s.

#

Logs from the service that succeeded:

#

Much smaller gap (4s). What would explain it?

#

This is the failed request from Service 1:

#

The time matches the "Starting container" log.

severe lion Jan 9, 2025, 8:59 PM

#

Clarification, we pushed a fix today for an edge condition where we didn't detect sleeping apps. We have not changed the logic when a slept app is detected.

I think I can try to reproduce today to see if I can identify the issue. Seems to be something with Bun.

tropic grotto Jan 9, 2025, 9:00 PM

#

This is the successful request from Service 2:

#

This happens 1s after "Starting Container" but yet before the server is fully running (:36)

tropic grotto Jan 9, 2025, 9:02 PM

#

severe lion Clarification, we pushed a fix today for an edge condition where we didn't detec...

I reported that bug in another thread, but I haven't experienced it anymore with my apps after redeploying them.

severe lion Jan 9, 2025, 9:03 PM

#

yeah, we fixed an issue awhile ago but found this week an edge case where the code wouldn't know to apply that fix in some apps. This has been fixed today.

#

I should be able to reproduce the Bun issue today though. I will look for some Bun example projects to try. IF you have any info about how your project is setup so I can reproduce that 9 second start might help too

tropic grotto Jan 9, 2025, 9:05 PM

#

the MRE is a stripped-down version of my real app, which connects to a libsql database via a private URL. Adding that to the MRE would be a bit of work on my side that I can't commit at this time.
But if it helps, the MRE is here: https://github.com/pauldps/bun-railway-v2-test/tree/graphql-yoga
It's a Bun + GraphQL Yoga API

severe lion Jan 9, 2025, 9:08 PM

#

I'll see if I can reproduce with that alone.

tropic grotto Jan 9, 2025, 9:39 PM

#

ah, some relevant info
Service 1 (consistently failing) is running on Metal
Service 2 (consistently succeeding) is on non-Metal
So the fix might have positively affected non-Metal apps

#

I'll try moving Service 2 (the MRE) to Metal if I can. (Edit: done, will report later)

#

Test done, both services are now giving 502s on first request

#

so you should be able to reproduce the issue with the MRE running on Metal

severe lion Jan 9, 2025, 10:49 PM

#

I tried reproducing with that repo and the request worked.

#

it says that the server is running in 1 second after the container starts though

#

i'll try on metal

#

same results with metal.

tropic grotto Jan 9, 2025, 11:38 PM

#

I just tested the MRE again with Metal and it worked
but it did fail before (see timestamps)

#

The non-MRE service consistently fails (so far):

#

going to run a higher sample with the MRE to see how often it fails

tropic grotto Jan 10, 2025, 3:17 AM

#

the MRE is no longer failing it seems. Failed only once after I moved to Metal 🤷
the other is still failing 100% of the time.

severe lion Jan 10, 2025, 3:29 AM

#

What client are you using to make the requests ?

tropic grotto Jan 10, 2025, 4:05 AM

#

I'm using a browser. I go to the /graphql endpoint and prepare a query. When the app sleeps, I press the button to perform the query

#

I also deployed an exact copy of the affected service in the same project. Just tried once, and it woke up without errors.

#

the existing service must have something cached poisoning them or something.

#

that would explain why your MRE is not getting errors.

tropic grotto Jan 10, 2025, 5:27 AM

#

nevermind, the copy is also returning 502. 😦

dusky plaza Jan 10, 2025, 5:27 AM

#

okay now switch to node kekw /j

tropic grotto Jan 10, 2025, 5:27 AM

#

that ship has sailed lol

#

In Bun We Trust

#

I'll move one of the copies to non-Metal

tropic grotto Jan 10, 2025, 4:22 PM

#

The MRE and both copies of the affected service failed for me today
Seems like the MRE failure rate is random, it fails once then works for several hours.
The copies are still failing consistently, both in Metal and non-Metal.

severe lion Jan 11, 2025, 12:30 AM

#

you go to your site in browser, get html response, wait for app to sleep, send request via website action, request fails.

Is this correct ?

tropic grotto Jan 11, 2025, 1:29 AM

#

yes. Example:

#

I press the "Play" button after the app sleeps.

#

the response in the image is for a failed request. If I press the button again it returns succesfully

oak valley Jan 11, 2025, 2:07 AM

#

Ye because it takes your app longer than 10 seconds to resume from sleep , because of that railway gives you that generic "application failed to respond" ahh screen

#

I have the same with my grafana instance

#

One way to solve that is to put ur resource intensive app behind a proxy service that would wait longer than those 10s

tropic grotto Jan 11, 2025, 2:25 AM

#

My app boots in ~1s

#

also Railway gives me the error after a couple of seconds, not 10 seconds

#

I've made a change to the affected service, thanks to the preDeployCommand new feature ✨ the app boots even faster now.
I just did this so I'll be testing again over the next hours/days

dusky plaza Jan 11, 2025, 2:51 AM

#

I'm sure you already know, but I just want to avoid any uncertainty, the pre-deploy command isn't run when waking the service

tropic grotto Jan 11, 2025, 2:52 AM

#

yes I'm aware, and that's great

#

when waking the service the app doesn't need to run any migrations, hence my "boots even faster" comment

tropic grotto Jan 11, 2025, 3:15 AM

#

tropic grotto My app boots in ~1s

Connecting to database... 2025-01-11T03:12:01.773Z
Using database: libsql://... 2025-01-11T03:12:01.913Z
Creating yoga instance... 2025-01-11T03:12:01.925Z
Starting server... 2025-01-11T03:12:01.956Z
Server is running on http://localhost:8080/graphql 2025-01-11T03:12:01.964Z

Added logs to some strategic places to de-facto measure the boot time of my app: ~191ms

#

I have a hunch that because it boots too fast it may be causing the issues.

dusky plaza Jan 11, 2025, 3:17 AM

#

yolo, add a 2 second sleep haha

tropic grotto Jan 11, 2025, 3:24 AM

#

you know what? I'll actually try that lol

tropic grotto Jan 11, 2025, 3:43 AM

#

That didn't work 😦 (I was actually hopeful...)

Server is running on http://localhost:8080/graphql 2025-01-11T03:41:04.501Z

It's weird tho. It took longer to return the error (usually it's within ~2s but this one took 4), so it seems the infra is actually waiting for my server to come up, after the 2s sleep

dusky plaza Jan 11, 2025, 3:56 AM

#

was worth a shot

tropic grotto Jan 11, 2025, 4:05 PM

#

okay, new evidence:
GET requests all work fine waking up the apps and returning correct responses
only POST requests are returning 502s on first request
this is good because I'd imagine the majority of requests waking up apps will be GETs
that said, does the infra code handle non-GETs differently? or is it only running on GETs or something?

dusky plaza Jan 11, 2025, 4:56 PM

#

you really have not been able to get a 502 from waking with a get request?

tropic grotto Jan 11, 2025, 5:24 PM

#

I've tested a few times today and no 502s so far from GETs
POSTs are still failing tho

dusky plaza Jan 11, 2025, 5:24 PM

#

interesting

tropic grotto Jan 12, 2025, 1:28 AM

#

I got one 502 today from the MRE, using POST. One out of a dozen.
Been testing that one using POST only, but it works most of the time.

severe lion Jan 13, 2025, 8:39 PM

#

there's no difference with how we handle HTTP methods

oak valley Jan 13, 2025, 8:42 PM

#

GET requests could be cached by their client

#

So that they only get 502 on POSTs

severe lion Jan 13, 2025, 8:44 PM

#

I am wondering if the issue is..

your browser creates the connection to our edge network
our edge makes a connection to your app and returns the HTML response to you
you wait until your app goes to sleep
the connection between your app and our proxy is now closed
a. but the browser connection to our proxy is not because you remained active on your machine
you send POST to your app
the proxy has some bug with existing downstream connections and dead upstream connections. (I would be shocked).

What if the sleep command for your app prevents your application from closing the connection so our edge proxy thinks the connection to your app is actually active still.

You could do your test but wait 15 minutes just to be really safe that the connection between the proxy and your app is closed. Then retry. I assume you try the request as soon as the application goes to sleep.

#

the error failed to forward request to upstream: failed to read from request body means an issue between our proxy and your app.

This error mentions body which GET requests do not have, just headers.

the issue not occurring with GET requests would explain why I couldn't ever reproduce the issue.

severe lion Jan 13, 2025, 8:54 PM

#

severe lion I am wondering if the issue is.. 1. your browser creates the connection to our...

try this test and let me know. (wait longer after your app is slept before sending POST to ensure connection between proxy and your app is closed)

#

another test is, can you make the POST request without loading the HTML. I want you to go from no connection to the proxy to sending a POST which wakes up the app.

#

it seems this app sleep issue has just been with Bun projects too which aligns with a runtime issue not closing connections when it gets terminated, thus making the proxy think those connections are still active.

#

these connections between the proxy and your app have long timeouts to increase connection re-use across requests so if we set some keep alive / idle timeout to 5 seconds that would impact our connection re-use statistics which make for faster latency.

tropic grotto Jan 14, 2025, 5:37 AM

#

Yesterday I built a script (in Bun... the irony) that would use fetch on the graphQL URL with a POST (similar to how I made requests using the interface)
Note: this is with the MRE.
Most requests worked, but there were some failures.
Would this qualify as "make the POST without loading HTML"?

📎 message.txt

#

Whenever it returned a 502, the script would retry in 1s.

#

with the non-MRE service, 100% of requests returned a 502, so I'm not posting it here.
the MRE had a rather varying error rate.

#

ah yes, the interval between requests was set to 15 minutes.

severe lion Jan 15, 2025, 5:22 AM

#

thanks for doing this test. I think there's some testing I'll be able to perform also to see how the proxy handles upstream connections when the bun app is stopped.

tropic grotto Mar 28, 2025, 12:58 AM

#

I am now having this issue with a Crystal app that was automatically moved to Metal by Railway.
Problem only started after the move.
First request always 502 from sleeping, 100% of the time.
I've redeployed the service manually but issue persists.

Project ID: 3b1dbdc2-7f93-4dc1-a731-8f076713a724
Service ID: 9c2ae4bb-4e73-41f2-9be3-12d8c1a8a029
Deploy ID: aadcd8a6-bc3e-4340-9616-3c9a12eb286a

#

This is when the problem started happening. Those requests are all pings from cron-job.org that I set up to keep my app awake during certain hours of the day:

dusky plaza Mar 28, 2025, 1:41 AM

#

the app does wake in under 10 seconds right?

tropic grotto Mar 28, 2025, 2:23 AM

#

let me double check but it should boot very fast

#

the migration command is separate from the start command, so it should be booting in ~1s

#

(this one is still not using pre-deploy commands)

dusky plaza Mar 31, 2025, 5:30 PM

#

Do you have a MRE that could be made into a template?

tropic grotto Mar 31, 2025, 9:12 PM

#

Not at this time, I'm also inexperienced with template creation

dusky plaza Mar 31, 2025, 9:13 PM

#

Then an MRE repo will suffice

tropic grotto Apr 1, 2025, 12:33 AM

#

I think I managed to fix it. The migration command was apparently taking longer than 10s and it was part of the start command
I moved it to a preDeployCommand and the app seems to wake up correctly now
WIll test a few more times to confirm

#V2 + App Sleep = first response always empty