#Bot is taken offline after about an hour of activity.
1 messages · Page 1 of 1 (latest)
I've extracted the logs from a much longer logfile, I think I got the range of messages correlated with the start and the end of the process correct.
Line 1 onwards shows the bot starting and making requests as normal.
Line 2303 shows the last message involving the Shard:
{"level":20,"time":1733668700153,"pid":1487756,"hostname":"logos","name":"gateway","msg":"[Shard] Shard #0 closed with code 3024, and reason: Reconnecting the shard, closing old connection.."}```
Line 2308 shows the last message involving the LeakyBucket:
{"level":20,"time":1733668726565,"pid":1487756,"hostname":"logos","name":"gateway","msg":"[LeakyBucket] Timeout for leaky bucket requests executed. Refilling bucket."}
From lines 2309 onwards, there are no more messages involving either the LeakyBucket or the Shard.
After that, the bot is completely dead; there are no events coming in, the bot is shown to be offline.
Any help would be really appreciated given as the bot currently cannot stay up unless manually restarted.
Here is the file that sets up the bot and connects it, if it's of any help.
The last close code (and only one i see) is a 3024 (aka a call to resume() was done on a open shard)
however the logs we can deduce it is stuck in await this.connect() as:
[Shard] Resuming open Shard #0, closing the connectionis logged (log right before the.close())[Shard] Trying to resume Shard #0 without the session id. Identifying the shard instead.does not get logged so the sessionId is (inteory) present[Shard] Resuming Shard #0 connected. Session id: ${this.sessionId} | Sequence: ${this.previousSequenceNumber}does not get logged (after theawait this.connect()
on to why this.connect stales that is a very good question:
only thing i can think of is somehow it never reaches the onopen but that does not make sense given the absence of other errors
edit: ... or thinking about it, i could be never resolving the .close call
-# what is a .os file btw?
rn i can't but i will try looking a bit more in the connect method & the code you sent to see if i find anything weird
The .os file extension is just a play on words; the bot is called logos ('word' in Greek), the main log file is called log.os.
The other files are called debug-log.os and discordeno-log.os. 😛
Thank you Fleny, I appreciate the help you provided in the past as well even though I know I wasn't providing the most useful of information at all.
I tried to improve that situation by first of all logging everything coming in from Discordeno, so at least the logs are there.
If there's any other information to provide, I'll provide it if I can.
For now I'll try and work around it by restarting the bot at an interval.
maybe dont abuse gateway
Yes, the bot updates its status every 10 seconds.
which you shouldnt doing
Why not?
because it's API abuse
On what basis?
This is 6 requests per minute at best.
The REST API alone allows 50 requests per second.
In either case, Discordeno should ensure that a bot cannot violate rate limits by queueing requests.
According to Discord's API:
Apps can send 120 gateway events per connection every 60 seconds, meaning an average of 2 commands per second. Apps that surpass the limit are immediately disconnected from the Gateway. Similar to other rate limits, repeat offenders will have their API access revoked.
If Discord thinks that showing users of my bot what commands there are available 6 times a minute are too much for their API to handle, they can message me.
Other than that, I'd like to stay on topic please.
ok so i was looking a bit
on a bit of an unrelated note
desiredProperties: constants.properties as unknown as DesiredProperties,
breaks the TS desired props thing (ig bc it gave you too many errors?)
going back on the gw topic:
is the file you sent the only one it interacts with gw related code?
you said
after about an hour of activity
so is this something you can reproduce fairly reliably?
just for context:
my current suspection is the .close() being the issue, however i'm not entirerly sure how
Yes.
Yeah, I just migrated from some pre-v19 commit to v20.
The desired properties are stored here: https://github.com/vxern/logos/blob/main/source/constants/properties.ts
Just standard.
ig for now if you could add these 3 log statements so we can understand if it is actually close being stuck
this.logger.debug(`[Shard] Waiting for Shard #${this.id} to close the socket.`)
this.logger.debug(`[Shard] Resuming open Shard #${this.id}, closed connection`)
this.logger.debug(`[Shard] Received a Reconnect for Shard #${this.id}`)
``` so you can copy-paste the debug logs
(if you prefer i can send the entire built .js file)
Yes, it has happened twice or thrice already.
I'll do that and come back with some logs.
I'd expect something like this in the logs but we will see:
As expected, the bot went down.
Retrieving logs...
@sudden phoenix Last messages before the loop of doom:
{"level":20,"time":1733690945029,"pid":1514348,"hostname":"logos","name":"gateway","msg":"[Shard] Received a Reconnect for Shard #0"}
{"level":20,"time":1733690945029,"pid":1514348,"hostname":"logos","name":"gateway","msg":"[Shard] Resuming Shard #0"}
{"level":20,"time":1733690945029,"pid":1514348,"hostname":"logos","name":"gateway","msg":"[Shard] Resuming open Shard #0, closing the connection"}
{"level":20,"time":1733690945030,"pid":1514348,"hostname":"logos","name":"gateway","msg":"[Shard] Shard #0 closed with code 3024, and reason: Reconnecting the shard, closing old connection.."}
{"level":20,"time":1733690945030,"pid":1514348,"hostname":"logos","name":"gateway","msg":"[Shard] Waiting for Shard #0 to close the socket."}
ok so the close promise does not resolve... wow
@vague zinc any idea?
you are using bun right? (don't think it matters but just in case)
.. wait a second
how is it doing the close handler before the log ("Waiting for shard 0 to close")
Yes.
v1.1.38 (latest)
....
it is bun fault?!
ok so i will explain myself
Bun is doing weird shi* with the event loop
i have a script that is opening a conn to echo.websocket.org, after a second it closes and waits for the close event to be fired
in node i get this logs:
Open
Request served by 1781505b56ee58
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
Close: 3000
The ws is not closed and we can go on!
in deno i get:
Open
Request served by 1781505b56ee58
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
Close: 1005
The ws is not closed and we can go on!
``` (lets ignore the 1005)
however in bun i get:
Open
Request served by 7811941c69e658
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
Close: 3000
let resolve;
const webSocket = new WebSocket("wss://echo.websocket.org");
const { promise: openPromise, resolve: openResolve } = Promise.withResolvers();
webSocket.onopen = () => {
console.log("Open");
openResolve();
}
webSocket.onmessage = (msg) => console.log(msg.data)
webSocket.onerror = (err) => console.error("Error!", err)
webSocket.onclose = (close) => {
console.log("Close:", close.code, close.reason)
resolve?.();
}
await openPromise;
const interval = setInterval(() => {
webSocket.send("hello world");
}, 100)
const { promise: timeoutPromise, resolve: timeoutResolve } = Promise.withResolvers();
setTimeout(timeoutResolve, 1000);
await timeoutPromise;
clearInterval(interval);
webSocket.close(3000);
await new Promise(r => resolve = r);
// EDIT: this should say now closed, lol, not editing it as it will mess the logs above
console.log("The ws is not closed and we can go on!")
``` (code for reference)
so now i have to figure out how to make bun happy (...yay)
i wonder why bun calls onclose immediatly as it should wait for the TCP handshake to drop the connection (or whatever are the spec terms for closing a TCP conn & websocket conn)
can you try applying these changes and see if it changes anything?
Oh, right, so this is an issue caused by runtime differences?
i think yes
I shall do that immediately.
and if this (screen above) fixes it, i am blaming this issue on bun
// This has to be created before the actual call to socket.close as Bun calls socket.onclose immediately on the .close() call instead of waiting for the connection to end
const promise = new Promise((resolve) => {
this.resolveAfterClose = resolve
})
this.socket.close(code, reason)
this.logger.debug(`[Shard] Waiting for Shard #${this.id} to close the socket.`)
// We need to wait for the socket to be fully closed, otherwise there'll be race condition issues if we try to connect again, resulting in unexpected behavior.
await promise
if you want something to copy-paste
Right, I've updated it.
Congrats, @sudden phoenix, the bot survived with the help of your changes. 🙂
{"level":20,"time":1733702434890,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Received a Reconnect for Shard #0"}
{"level":20,"time":1733702434893,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Resuming Shard #0"}
{"level":20,"time":1733702434893,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Resuming open Shard #0, closing the connection"}
{"level":20,"time":1733702434895,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Shard #0 closed with code 3024, and reason: Reconnecting the shard, closing old connection.."}
{"level":20,"time":1733702434895,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Waiting for Shard #0 to close the socket."}
{"level":20,"time":1733702435045,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Shard #0 received Hello"}
{"level":20,"time":1733702435045,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Start heartbeating on Shard #0"}
{"level":20,"time":1733702435045,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Resuming Shard #0 connected. Session id: 433e4ba54d0a0123460cb04d3fe53f30 | Sequence: 1129"}
{"level":20,"time":1733702435045,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[LeakyBucket] Processing queue"}
{"level":20,"time":1733702435045,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[LeakyBucket] Processing queue. Remaining: 111 Length: 1"}
{"level":20,"time":1733702435157,"pid":1523761,"hostname":"logos","name":"gateway","msg":"[Shard] Shard #0 received RESUMED"}
Blame Bun
top reasons why i wouldnt use bun for the next 3 years
Update: The bot is still up after more than 12 hours. 🥳
Seems to be all good.
(I did update to v20 so I was worried there'd be other issues)
but it seems both the old v19 issue and this new one have been fixed now.
Did you have the issue with constant reconnecting in v19? 
Yep
I had it for months
I’m talking about v19.0.0 though
Cuz it introduced the bug of .resume() constantly closing, failing heartbeat, trying to resume loop, but I thought that shouldn’t have been an issue with bun
But maybe it could be
Yeah, I was on some v19 commit.
so beta