#Websocket disconnecting

149 messages · Page 1 of 1 (latest)

summer isle
#

Hey, I have an app that uses a websocket (hosted on railway) but for some reason every once in a while (2 - 2.5h) the websocket just disconnectes (and then reconnects since i set up a reconnecting websocket) but since there is high traffic, in the second it disconnects i can loose a lot of data.

I saw another issue of this type and you guys suggested to use pings (which i'm sending like every 10 seconds both from the websocket and from the app connected to the websocket) and to use the .up.railway.app domain instead of a custom one as written here https://help.railway.app/troubleshooting/hKqw9Dr1moxfDySTy98E6G/websocket-connections-disconnecting/sb6bfjV6UnuMwkRyAvQ3Sb and i'm doing that too.

Ill also attach some logs of the client reconnecting to the websocket every 2 or 2.5h

wanton basinBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

dark summitBOT
#

Project ID: b16be651-2cf1-4464-91ef-3890dc9c63aa

summer isle
#

b16be651-2cf1-4464-91ef-3890dc9c63aa

#

(websocket service id)

visual eagle
#

this is a websocket connection between a client and your railway service right?

summer isle
visual eagle
#

are they in the same project?

summer isle
#

you mean service?

visual eagle
#

I mean project

summer isle
#

no they're not

visual eagle
#

should they be?

summer isle
#

hmm no

#

why?

visual eagle
#

because it seems like they should, they talk to each other after all, and if you put them in the same project, you can use the internal networking

summer isle
#

but its a websocket server it doesent need to be in the same project, it works fine just that sometimes it disconnects

visual eagle
#

also you miss read the help page, it says to use a custom domain instead of the railway domain, you have it the other way around

summer isle
#

oh god...😂

#

I feel so stupid ahah

visual eagle
#

either way I'm kinda leaning towards this not being an issue with railway

#

your time between disconnects are not fixed times, some 2 hours, some 4 hours, etc

#

so I think you should start logging the disconnect reason, and once you have those error logs, go from there

summer isle
#

will try with custom domain and add some logs and see then 😅

visual eagle
visual eagle
#

so yeah hold off on the custom domain for now

#

log both error and disconnect events on both ends of the websocket

summer isle
#

the only log i can get from websocket close event is the code

#

code 1006

      applications expecting a status code to indicate that the
      connection was closed abnormally, e.g., without sending or
      receiving a Close control frame.
visual eagle
#

there are error and close event emitters for the websocket connection, you will want to print the reason for error and the reason for disconnect respectively

summer isle
#

im listening to both of them

#

error is not firing, only close

visual eagle
summer isle
#

but how can i print the error if there is no error...

visual eagle
#

if there was no error then you wouldn't have a disconnect, there is an error, print it

summer isle
#


    this.ws.on("close", closeMessage => {
      console.log("WEBSOCKET CLOSE", closeMessage); // 1006 WAS LOGGED
      if (this.shouldReconnect) {
        this.scheduleReconnect();
        this.stopPingTimer();
      }
    });

    this.ws.on("error", (error: any) => {
      // if (error.code === "ECONNREFUSED") return;
      console.log(`${this.nameIdentifier} WebSocket error`, error); // NOTHING WAS LOGGED
    });


visual eagle
#

i just cant see how this issue is railways fault, given the fact that the time between disconnects is sporadic, if there where timeouts for websocket connections in place you would see constant time between disconnects

visual eagle
summer isle
#

ok then I will see if I can manage to figure it out thank you anyways

maybe the problem is that ws.send("ping") instead of ws.ping()

summer isle
visual eagle
summer isle
#

you mean I'm doing console.log(closeMessage) instead of console.log(closeMessage.reason)?

summer isle
#

I have read it

#

and still not understanding what you are saying

visual eagle
#

I don't think my attempts to guide you in the right direction are bearing any fruit. I have other threads from other users to attend to and wish you the best as you debug your issue. Hopefully, it gets resolved! 🙂

summer isle
#

Hey brody, i'm pretty sure that the websocket disconnecting issue is a problem from railway

I have deployed the same websocket server on AWS and 11 hours later (for me it 20:10) the node client still hasen't disconnected from the websocket

visual eagle
#

okay try a custom domain next

summer isle
#

Yes I already tried it yesterday

visual eagle
#

try on fly?

summer isle
#

Wym?

visual eagle
#

you tried it on aws, now try it on fly.io

summer isle
#

Oh its a hosting platform?

visual eagle
#

yeah

summer isle
#

Ok later will try to host on fly and tomorrow will see if it disconnected then

summer isle
#

I tried with fly and 10h and 30 min later its still connected

#

i also retried with railway websocket again just to make sure and less than 2h later it disconnected

visual eagle
#

okay, I'm gonna run a test myself too, if I can reproduce this, I will get the team involved for you

summer isle
#

Ok thanks, for the moment i deployed the websocket server on fly io

visual eagle
#

what would you say the max time you have been able to keep a websocket connection open for on railway is?

summer isle
visual eagle
#

well I hope I don't have to get back to you in 9.5 hours

#

I also kind hope I can reproduce this, I'm not sure what I'd say to you if I can't

summer isle
#

if you want i can provide you the code i'm using

visual eagle
#

all we will be waiting for is my websockets test to disconnect

summer isle
#

yeah

visual eagle
#

can reproduce

clever oakBOT
#

Thread has been flagged to Railway team by @visual eagle.

visual eagle
#

@fleet oriole - websocket disconnects

fleet oriole
#

Actively looking into this - have an idea of what it could be, so testing that hypothesis. Will update here when I have more.

summer isle
#

Hey any news?

acoustic jackal
#

Hey, not yet sorry. It may be related to some other network issues we're going to investigate soon

The team's heads down on getting regions out atm

summer isle
#

Still no updates on this? 😕

visual eagle
#

well as long as envoy has a memory leak lol

#

I've said this before, but are you sure you don't want to run this communication through the private network?

summer isle
#

Tho i've never used it, I tried reading docs but don't much understand how to get it working

visual eagle
#

everthing in the same project, then just use internal domains and the port your app runs on when opening the ws connections

summer isle
#

but is it like a guaranteed thing that with private networking it won't disconnect?

visual eagle
#

well theres no proxy with the private network

summer isle
#

ok then for the moment I will use private networking, thank you brody

acoustic jackal
#

This happens as a side effect of our edge proxy. No short-term fix for now 😦

Sorry for the late follow up, and the bad news on this. We're reworking a major layer on all of this soon

silk blade
#

Might just be needing a keepalive ?

visual eagle
#

both ray and char8 have confirmed its not a code/config issue

silk blade
#

I meant a keepalive message to avoid inactivity

#

But 9 hours feels lengthy

visual eagle
#

both fedev's app and my app has constant activity, my test had a message every second, so its railway

silk blade
#

Alright, sorry I'm caught up now

visual eagle
#

no worries at all

fleet oriole
#

We’ve lowered the envoy restart times to once a week for the edge proxies a few weeks ago. The routing ones still reset once a day, looking into upping that as well (these eat up a fair bit more ram). Hopefully some improvement though 🤞

lavish thistle
#

yo @visual eagle is this still an issue? my Interval template suffers from the same thing and their reconnection logic is not that good so sometimes it breaks out of nowhere (totally code's fault).

#

im using custom domain and all that jazz, hope its not a problem with Cloudflare

visual eagle
#

it no longer an issue

#

if there's still issues, they wouldn't be platform related

lavish thistle
#

some context:
i hosted that application for a long time in Railway with their cloudy thingy and never had a problem
now cloudy thingy is in Railway and its breaking

visual eagle
#

cloudy thing?

lavish thistle
#

then they released a open source version where I can host the SaaS

#

but my code where it interacts with the SaaS product was always in Railway

#

hmm this is documented

#

maybe tcp proxy doesnt have that issue? im fine with no ssl, better than no application at all

visual eagle
#

your railway service is connecting out to interval websocket server?

lavish thistle
#

through public

visual eagle
#

not quite how it works, the TCP proxy or the domains on the service has absolutely nothing to do with connecting out to 3rd party services

lavish thistle
#

? the interval websocket server is hosted on Railway

#

im basically connecting from a Railway service to a Railway service through websocket

visual eagle
#

are you using the private network?

lavish thistle
#

no

#

the service is in a separate project

visual eagle
#

why not in the same project?

lavish thistle
#

its a general purpose service

#

i could move it yeah but i would have to put all of our future services into that same project

visual eagle
#

fair

#

well we have a completely different proxy from when the issue was originally opened

lavish thistle
#

yeah i guess it was Envoy? i remember someone saying that Envoy was kinda limiting Railway

visual eagle
#

yes it used to be envoy

#

envoy does not serve any more traffic at this point

lavish thistle
#

but yeah idk what to do, i'll try tcp proxy to see if it works
wish there was some kinda of guide for connecting different projects through Tailscale, maybe thats possible

visual eagle
#

TCP proxy and http proxy are the same btw

#

the HTTP proxy is a wrapper on top of the TCP proxy

lavish thistle
visual eagle
#

docs are outdated

lavish thistle
#

and tcp proxy is supposed to be more reliable IMO as most of clients doesnt handle TCP reconnections that well

lavish thistle
visual eagle
#

how often are you seeing disconnections?

lavish thistle
#

let me see

#

seems to be pretty random tbh

visual eagle
#

yikes

#

if our proxy was that unstable, and with just under half a million domains, we would know about it

lavish thistle
#

sometimes it seems to be 5-6 hours

#

and then when a disconnection happens, it seems to reconnect a few times until it stabilizes

#

but again, i could try to host the SaaS thingy into another place to see if drops connection constantly

visual eagle
#

have you checked the http logs to see what they say?

lavish thistle
#

yep, nothing that caught attention

#

i'll disable Cloudflare to see if that helps

lavish thistle
#

it was cloudflare 💀

#

i could probably enable cloudflare and use a secret railway domain for the public communication

#

security through obscurity but what can i do

visual eagle