#Bot is taken offline after about an hour of activity.

1 messages · Page 1 of 1 (latest)

warm sinew
#

Bun v1.1.38
Discordeno v20.0.0

I've attached logs from Discordeno's gateway, rest and bot processes.

#

I've extracted the logs from a much longer logfile, I think I got the range of messages correlated with the start and the end of the process correct.

#

Line 1 onwards shows the bot starting and making requests as normal.

#

Line 2303 shows the last message involving the Shard:

{"level":20,"time":1733668700153,"pid":1487756,"hostname":"logos","name":"gateway","msg":"[Shard] Shard #0 closed with code 3024, and reason: Reconnecting the shard, closing old connection.."}```
#

Line 2308 shows the last message involving the LeakyBucket:

{"level":20,"time":1733668726565,"pid":1487756,"hostname":"logos","name":"gateway","msg":"[LeakyBucket] Timeout for leaky bucket requests executed. Refilling bucket."}
#

From lines 2309 onwards, there are no more messages involving either the LeakyBucket or the Shard.

#

After that, the bot is completely dead; there are no events coming in, the bot is shown to be offline.

#

Any help would be really appreciated given as the bot currently cannot stay up unless manually restarted.

#

Here is the file that sets up the bot and connects it, if it's of any help.

sudden phoenix
#

The last close code (and only one i see) is a 3024 (aka a call to resume() was done on a open shard)

however the logs we can deduce it is stuck in await this.connect() as:

  • [Shard] Resuming open Shard #0, closing the connection is logged (log right before the .close())
  • [Shard] Trying to resume Shard #0 without the session id. Identifying the shard instead. does not get logged so the sessionId is (inteory) present
  • [Shard] Resuming Shard #0 connected. Session id: ${this.sessionId} | Sequence: ${this.previousSequenceNumber} does not get logged (after the await this.connect()

on to why this.connect stales that is a very good question:
only thing i can think of is somehow it never reaches the onopen but that does not make sense given the absence of other errors
edit: ... or thinking about it, i could be never resolving the .close call

-# what is a .os file btw?

#

rn i can't but i will try looking a bit more in the connect method & the code you sent to see if i find anything weird

warm sinew
#

The .os file extension is just a play on words; the bot is called logos ('word' in Greek), the main log file is called log.os.

#

The other files are called debug-log.os and discordeno-log.os. 😛

warm sinew
#

I tried to improve that situation by first of all logging everything coming in from Discordeno, so at least the logs are there.

#

If there's any other information to provide, I'll provide it if I can.

#

For now I'll try and work around it by restarting the bot at an interval.

opal cradle
#

maybe dont abuse gateway

warm sinew
opal cradle
#

which you shouldnt doing

warm sinew
opal cradle
#

because it's API abuse

warm sinew
#

This is 6 requests per minute at best.

#

The REST API alone allows 50 requests per second.

#

In either case, Discordeno should ensure that a bot cannot violate rate limits by queueing requests.

#

According to Discord's API:

Apps can send 120 gateway events per connection every 60 seconds, meaning an average of 2 commands per second. Apps that surpass the limit are immediately disconnected from the Gateway. Similar to other rate limits, repeat offenders will have their API access revoked.

warm sinew
#

If Discord thinks that showing users of my bot what commands there are available 6 times a minute are too much for their API to handle, they can message me.

#

Other than that, I'd like to stay on topic please.

sudden phoenix
#

ok so i was looking a bit

on a bit of an unrelated note

desiredProperties: constants.properties as unknown as DesiredProperties,
breaks the TS desired props thing (ig bc it gave you too many errors?)

going back on the gw topic:
is the file you sent the only one it interacts with gw related code?

#

you said

after about an hour of activity
so is this something you can reproduce fairly reliably?

#

just for context:
my current suspection is the .close() being the issue, however i'm not entirerly sure how

warm sinew
#

Just standard.

sudden phoenix
#

ig for now if you could add these 3 log statements so we can understand if it is actually close being stuck

this.logger.debug(`[Shard] Waiting for Shard #${this.id} to close the socket.`)

this.logger.debug(`[Shard] Resuming open Shard #${this.id}, closed connection`)

this.logger.debug(`[Shard] Received a Reconnect for Shard #${this.id}`)
``` so you can copy-paste the debug logs
(if you prefer i can send the entire built .js file)
warm sinew
warm sinew
sudden phoenix
warm sinew
#

As expected, the bot went down.

#

Retrieving logs...

#

@sudden phoenix Last messages before the loop of doom:

{"level":20,"time":1733690945029,"pid":1514348,"hostname":"logos","name":"gateway","msg":"[Shard] Received a Reconnect for Shard #0"}
{"level":20,"time":1733690945029,"pid":1514348,"hostname":"logos","name":"gateway","msg":"[Shard] Resuming Shard #0"}
{"level":20,"time":1733690945029,"pid":1514348,"hostname":"logos","name":"gateway","msg":"[Shard] Resuming open Shard #0, closing the connection"}
{"level":20,"time":1733690945030,"pid":1514348,"hostname":"logos","name":"gateway","msg":"[Shard] Shard #0 closed with code 3024, and reason: Reconnecting the shard, closing old connection.."}
{"level":20,"time":1733690945030,"pid":1514348,"hostname":"logos","name":"gateway","msg":"[Shard] Waiting for Shard #0 to close the socket."}
sudden phoenix
#

ok so the close promise does not resolve... wow

#

@vague zinc any idea?

#

you are using bun right? (don't think it matters but just in case)

sudden phoenix
sudden phoenix
#

....

#

it is bun fault?!

#

ok so i will explain myself

Bun is doing weird shi* with the event loop

i have a script that is opening a conn to echo.websocket.org, after a second it closes and waits for the close event to be fired

in node i get this logs:

Open
Request served by 1781505b56ee58
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
Close: 3000 
The ws is not closed and we can go on!

in deno i get:

Open
Request served by 1781505b56ee58
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
Close: 1005 
The ws is not closed and we can go on!
``` (lets ignore the 1005)

however in bun i get:

Open
Request served by 7811941c69e658
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
Close: 3000

#
let resolve;

const webSocket = new WebSocket("wss://echo.websocket.org");

const { promise: openPromise, resolve: openResolve } = Promise.withResolvers();

webSocket.onopen = () => {
    console.log("Open");
    openResolve();
}
webSocket.onmessage = (msg) => console.log(msg.data)
webSocket.onerror = (err) => console.error("Error!", err)
webSocket.onclose = (close) => {
    console.log("Close:", close.code, close.reason)
    resolve?.();
}

await openPromise;

const interval = setInterval(() => {
    webSocket.send("hello world");
}, 100)

const { promise: timeoutPromise, resolve: timeoutResolve } = Promise.withResolvers();
setTimeout(timeoutResolve, 1000);
await timeoutPromise;

clearInterval(interval);
webSocket.close(3000);

await new Promise(r => resolve = r);

// EDIT: this should say now closed, lol, not editing it as it will mess the logs above
console.log("The ws is not closed and we can go on!")
``` (code for reference)
#

so now i have to figure out how to make bun happy (...yay)

#

i wonder why bun calls onclose immediatly as it should wait for the TCP handshake to drop the connection (or whatever are the spec terms for closing a TCP conn & websocket conn)

#

can you try applying these changes and see if it changes anything?

warm sinew
#

Oh, right, so this is an issue caused by runtime differences?

warm sinew
sudden phoenix
#

and if this (screen above) fixes it, i am blaming this issue on bun

#
// This has to be created before the actual call to socket.close as Bun calls socket.onclose immediately on the .close() call instead of waiting for the connection to end
const promise = new Promise((resolve) => {
  this.resolveAfterClose = resolve
})

this.socket.close(code, reason)

this.logger.debug(`[Shard] Waiting for Shard #${this.id} to close the socket.`)

// We need to wait for the socket to be fully closed, otherwise there'll be race condition issues if we try to connect again, resulting in unexpected behavior.
await promise
#

if you want something to copy-paste

warm sinew
#

Right, I've updated it.

warm sinew
#
{"level":20,"time":1733702434890,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Received a Reconnect for Shard #0"}
{"level":20,"time":1733702434893,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Resuming Shard #0"}
{"level":20,"time":1733702434893,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Resuming open Shard #0, closing the connection"}
{"level":20,"time":1733702434895,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Shard #0 closed with code 3024, and reason: Reconnecting the shard, closing old connection.."}
{"level":20,"time":1733702434895,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Waiting for Shard #0 to close the socket."}
{"level":20,"time":1733702435045,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Shard #0 received Hello"}
{"level":20,"time":1733702435045,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Start heartbeating on Shard #0"}
{"level":20,"time":1733702435045,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Resuming Shard #0 connected. Session id: 433e4ba54d0a0123460cb04d3fe53f30 | Sequence: 1129"}
{"level":20,"time":1733702435045,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[LeakyBucket] Processing queue"}
{"level":20,"time":1733702435045,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[LeakyBucket] Processing queue. Remaining: 111 Length: 1"}
{"level":20,"time":1733702435157,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Shard #0 received RESUMED"}
opal cradle
#

Blame Bun

vague zinc
#

top reasons why i wouldnt use bun for the next 3 years

warm sinew
#

Update: The bot is still up after more than 12 hours. 🥳

#

Seems to be all good.

#

(I did update to v20 so I was worried there'd be other issues)

#

but it seems both the old v19 issue and this new one have been fixed now.

vague zinc
#

Did you have the issue with constant reconnecting in v19? think

vague zinc
#

I’m talking about v19.0.0 though

#

Cuz it introduced the bug of .resume() constantly closing, failing heartbeat, trying to resume loop, but I thought that shouldn’t have been an issue with bun

#

But maybe it could be

warm sinew
twin osprey