discordjs/ws big bot memes (old) | discord.js - imagine an app | Page 1

stable hatch Mar 17, 2023, 11:37 AM

#

@dusty dove

#

discordjs/ws issue

dim oracle Mar 17, 2023, 11:38 AM

#

@sullen snow for any follow up questions

stable hatch Mar 17, 2023, 11:39 AM

#

Also was the version not injected??

dim oracle Mar 17, 2023, 11:39 AM

#

doesn't seems like it

stable hatch Mar 17, 2023, 11:39 AM

#

Tfffff

#

Pain

sullen snow Mar 17, 2023, 11:40 AM

#

what I did is mostly just inject the erlpack since d.js dont support it yet so connection flow was untouched on my version of discordjs/ws port to v14

stable hatch Mar 17, 2023, 11:40 AM

#

Wait

#

This is injected ws into djs?

sullen snow Mar 17, 2023, 11:40 AM

#

yes I have my own impl.

#

I wanted to port it myself but someone already did it so I didnt

dusty dove Mar 17, 2023, 11:41 AM

#

i cant debug off that lol

#

i need an actual repro sample

#

either way that looks like its heartbeating a closed connection

#

which doesnt sound all that related to the bubbling pr compared to some other stuff i changed

stable hatch Mar 17, 2023, 11:42 AM

#

Tbf it looks like a long lasting conn issue that you can't easily repro in quick prs

sullen snow Mar 17, 2023, 11:42 AM

#

I feel like this could be caused by a missed interval clean

#

though I'm not sure how it happens

#

cause seems like it doesn't happen on all the bots I did

dusty dove Mar 17, 2023, 11:42 AM

#

doesnt help you dont run with --enable-source-maps

#

so i have no idea what index.js line 795

sullen snow Mar 17, 2023, 11:43 AM

#

oh that is disabled by default?

dusty dove Mar 17, 2023, 11:43 AM

#

i feel like this could be tied to the jitter pr somehow

#

yes

stable hatch Mar 17, 2023, 11:43 AM

#

sullen snow oh that is disabled by default?

Ye

stable hatch Mar 17, 2023, 11:43 AM

#

dusty dove i feel like this could be tied to the jitter pr somehow

Did we release that?

#

I don't remember

dusty dove Mar 17, 2023, 11:43 AM

#

yea

#

i need to know what 795 is

#

its either in the interval

#

or its the awaited send call

sullen snow Mar 17, 2023, 11:43 AM

#

also on prior discordjs versions

#

like my prod rn

#

is running on 0.6.x

#

has all the shards intact

#

so it could be isolated as 0.7.x issue

dusty dove Mar 17, 2023, 11:44 AM

#

nvm its def interval now that i paid more attention

stable hatch Mar 17, 2023, 11:44 AM

#

sullen snow so it could be isolated as 0.7.x issue

That definitely helps us debug

dusty dove Mar 17, 2023, 11:44 AM

#

are you running into this consistently

dim oracle Mar 17, 2023, 11:44 AM

#

795 - 798

    await this.send({
      op: import_v102.GatewayOpcodes.Heartbeat,
      d: this.session?.sequence ?? null
    });

dusty dove Mar 17, 2023, 11:44 AM

#

or was this a one off earlier

sullen snow Mar 17, 2023, 11:44 AM

#

right now yes

dusty dove Mar 17, 2023, 11:44 AM

#

interesting

dim oracle Mar 17, 2023, 11:44 AM

#

second time it has happened now

dusty dove Mar 17, 2023, 11:45 AM

#

is there anything that seems to lead there

#

like a resume or reconnect

dim oracle Mar 17, 2023, 11:45 AM

#

sec

stable hatch Mar 17, 2023, 11:45 AM

#

You'd need debug logs for that

dusty dove Mar 17, 2023, 11:46 AM

#

because the interval is cleared in destroy()

#

which should be called the moment we get a close event

#

or a payload telling us to resume

dim oracle Mar 17, 2023, 11:47 AM

#

Exported the logs to a file so ignore the markdown but we get a few reconnects/resumes before (Ignore the no close code, that's on us)

[32mINFO[39m [Wed,03/15/23,12:04:47] (Cluster Process [ID: 27]): [36m[EventHandler]: Shard Reconnecting => Shard: 888 | Close Code: none[39m
[32mINFO[39m [Wed,03/15/23,12:04:47] (Cluster Process [ID: 27]): [36m[EventHandler]: Shard Resumed => Shard: 888 => Replayed Events: 1[39m
[32mINFO[39m [Wed,03/15/23,12:04:47] (Cluster Process [ID: 28]): [36m[EventHandler]: Shard Reconnecting => Shard: 914 | Close Code: none[39m
[31mERROR[39m [Wed,03/15/23,12:04:47] (Cluster Process [ID: 28]): [36mWebSocket is not open: readyState 0 (CONNECTING)[39m
    err: {
      "type": "Error",
      "message": "WebSocket is not open: readyState 0 (CONNECTING)",
      "stack":
          Error: WebSocket is not open: readyState 0 (CONNECTING)
              at WebSocket.send (/main/node_modules/ws/lib/websocket.js:442:13)
              at WebsocketShard.send (/main/node_modules/vanguard/dist/src/ws/WebsocketShard.js:187:25)
              at async WebsocketShard.heartbeat (/main/node_modules/@discordjs/ws/dist/index.js:795:5)
    }

dusty dove Mar 17, 2023, 11:48 AM

#

lmao nvm

#

i think i found it

#

nop just chrome search being busted

sullen snow Mar 17, 2023, 11:49 AM

#

we could enable debug logs and filter things out but that would require us some time

dim oracle Mar 17, 2023, 11:49 AM

#

Best case would be like a week or two lol, so not really an option I think

sullen snow Mar 17, 2023, 11:50 AM

#

@dusty dove also could I request a way to inject a custom identify manager specially for multi process bots so we could implement our own identify throttler? ~~also pass the shardId that requests for identify~~

dusty dove Mar 17, 2023, 11:51 AM

#

the strategy handles identify throttling

#

so u can just write your own

#

the identifythrottler class is just used for our built in strategies

sullen snow Mar 17, 2023, 11:51 AM

#

yes but the websocket shard dont pass the shardId

#

I would appreciate if it would also pass the shardid

#

since its needed to calculate for buckets

dusty dove Mar 17, 2023, 11:52 AM

#

wut

#

where

sullen snow Mar 17, 2023, 11:52 AM

#

a sec

#

https://github.com/discordjs/discord.js/blob/51edba78bc4d4cb44b4dd2b79e4bbc515dc46f5b/packages/ws/src/strategies/context/WorkerContextFetchingStrategy.ts#L57

GitHub

discord.js/WorkerContextFetchingStrategy.ts at 51edba78bc4d4cb44b4d...

A powerful JavaScript library for interacting with the Discord API - discord.js/WorkerContextFetchingStrategy.ts at 51edba78bc4d4cb44b4dd2b79e4bbc515dc46f5b · discordjs/discord.js

#

this one

dusty dove Mar 17, 2023, 11:54 AM

#

so why do you need the shardId there

sullen snow Mar 17, 2023, 11:54 AM

#

like just send the shardId as well on the payload

#

of the shard that asks for identify

#

because on big bots

#

its needed to properly calculate buckets of shard to identify

stable hatch Mar 17, 2023, 11:54 AM

#

If this is about max concurrency, that's not how it works

#

You don't really need shard id to calculate it

dusty dove Mar 17, 2023, 11:55 AM

#

Thonk yeah you dont need it per api docs afaik

sullen snow Mar 17, 2023, 11:55 AM

#

are you sure? seems like all the devs I asked that has big bots calculates it using the shard id and concurrency

stable hatch Mar 17, 2023, 11:56 AM

#

That's bc they don't realize buckets are already just n shards based on max concurrency

#

If you need 64 shards and max concurrency 16, its shard 0-15, 16-31, 32-47, stc

sullen snow Mar 17, 2023, 11:56 AM

#

const bucket = shardId % concurrency; like I calculate if the shards are able to login by using this formula

#

and if the redis lock is on this bucket it will let the shard login

stable hatch Mar 17, 2023, 11:57 AM

#

We can't easily test max concurrency

sullen snow Mar 17, 2023, 11:57 AM

#

yes im not asking for d.js to implement the max concurrency, but rather just pass the shardId for this to be possible

stable hatch Mar 17, 2023, 11:57 AM

#

Sofor all intents and purposes you could be right, but the docs example just shows batches of max_concurrency identifies

dusty dove Mar 17, 2023, 11:57 AM

#

unless this is some big bot thing only and its secret and not in the api docs no, thats not the case

#

theres no buckets, it just says you can identify max_concurrency shards per 5 seconds

#

it doesnt matter what id they have

stable hatch Mar 17, 2023, 11:58 AM

#

dusty dove it doesnt matter what id they have

Yknow it might, we don't know for sure

dusty dove Mar 17, 2023, 11:58 AM

#

well idk

stable hatch Mar 17, 2023, 11:58 AM

#

I doubt you can identify the same shard for n amount of times

dusty dove Mar 17, 2023, 11:58 AM

#

the idea pisses me off

#

that theyd not put it in the docs

#

for lib devs to actually handle

#

so i almost dont want to in protest

#

but anyway ill look into it

sullen snow Mar 17, 2023, 11:59 AM

#

thanks also let me know if what we are doing is wrong or right

#

but it was stable for like

#

months now

dim oracle Mar 17, 2023, 11:59 AM

#

5s after shard 0 identifies you can do 16

sullen snow Mar 17, 2023, 11:59 AM

#

we dont have any reidents

stable hatch Mar 17, 2023, 11:59 AM

#

There's definitely a bug in 0.7

#

From what you said

dim oracle Mar 17, 2023, 11:59 AM

#

you need 5s between identifies for shards in the same bucket, where bucket = shard_id % max_concurrency

#

other than that you can do whatever order you want

stable hatch Mar 17, 2023, 11:59 AM

#

Thats unrelated to max concurrency

sullen snow Mar 17, 2023, 11:59 AM

#

yes I thought I just want to bring that up

#

since we are already talking about the ws

stable hatch Mar 17, 2023, 11:59 AM

#

I'll poke some people about it later

sullen snow Mar 17, 2023, 11:59 AM

#

thank you

#

cause personally myself

#

concurrency is also something I'm not sure of

#

if its just idc just login the shards as long as the max concurrency allows

dim oracle Mar 17, 2023, 12:00 PM

#

also, kinda weird they don't allow you to have 16x to test stuff

stable hatch Mar 17, 2023, 12:00 PM

#

But even the docs show that its batches of max-concurrency shards that are sequential

sullen snow Mar 17, 2023, 12:00 PM

#

or there is buckets

#

the buckets implementation is stable so far but then again without knowing what they mean about it I'm unsure as well

dusty dove Mar 17, 2023, 12:01 PM

#

re the heartbeats

#

looks like it can happen after reconnects

#

will see whats up

stable hatch Mar 17, 2023, 12:01 PM

#

Ty

#

I'll look into version injection too

#

Seems like it broke

dusty dove Mar 17, 2023, 12:01 PM

#

is that broken for every pkg

stable hatch Mar 17, 2023, 12:01 PM

#

Possible

#

But idk

#

I'll need to check

#

When home

sullen snow Mar 17, 2023, 12:02 PM

#

let us know if we could help somehow it would depend on my timezone though

#

CheshireXD

dim oracle Mar 17, 2023, 12:02 PM

#

dim oracle also, kinda weird they don't allow you to have 16x to test stuff

re this, I assume you've asked before if you could?

dusty dove Mar 17, 2023, 12:03 PM

#

no we dont rly have comms

#

well

#

vlad does and a few other ppl on the team

#

but lib devs r treated like nobodies mostly lol

dim oracle Mar 17, 2023, 12:03 PM

#

same in the bot space unless you're 10m+

sullen snow Mar 17, 2023, 12:03 PM

#

I wish I have 16x cause its a pain to login 112 shards with 1x concurrency

dim oracle Mar 17, 2023, 12:04 PM

#

for what it's worth, if you need to test something I'm down to run it on my token

#

finally, next time we have an issue do I just DM you again Vladdy?

stable hatch Mar 17, 2023, 12:05 PM

#

dim oracle finally, next time we have an issue do I just DM you again Vladdy?

You don't need to ask to ping, feel free to, or dm, ideally just make a thread and ping me (and I'll ping dd if needed)

dim oracle Mar 17, 2023, 12:05 PM

#

Sure that works

#

Thank you both for the quick replies ^^

dusty dove Mar 17, 2023, 8:12 PM

#

dim oracle Exported the logs to a file so ignore the markdown but we get a few reconnects/r...

wth is actually happening here

#

so this is you hacking it into discord.js right

#

that makes it a bit hard to track which of those events come from /ws and what is patched up by you

#

because i do not have a reconnect event

#

yeah sorry I can't get too far on this without you listening to my debug event

dim oracle Mar 17, 2023, 8:39 PM

#

reconnect is just https://discord.js.org/#/docs/discord.js/main/class/Client?scrollTo=e-shardReconnecting

#

iirc

#

we don't use it for anything besides logging/graphing @dusty dove

#

Hopefully this can be done without the debug listener, I'd rather not restart prod

dusty dove Mar 17, 2023, 8:47 PM

#

unless i can repro it with a super clean minimal sample to get debug logs myself, probably not

dim oracle Mar 17, 2023, 8:48 PM

#

dusty dove unless i can repro it with a super clean minimal sample to get debug logs myself...

I don't know too much about our ws, but the custom stuff we use is just OS (by Saya) -> https://github.com/Deivu/Vanguard

#

Not sure if that would help

#

Saya could probably answer most of you questions, but its midnight for him

#

cc @sullen snow

dim oracle Mar 17, 2023, 10:22 PM

#

@dusty dove alright so it is happening quite often, with a few occurrences the past hour, here's what I've found so far

Here are two examples, each representing one shard having this issue, for the first one (shard 394) we try to reconnect and get the error/issue the same second, for the second shard (866) we try to reconnect but get the issue after 3 seconds.

[Fri,03/17/23,14:54:02] [EventHandler]: Shard Reconnecting => Shard: 394 | Close Code: none
[Fri,03/17/23,14:54:02] [ERROR] WebSocket is not open: readyState 0 (CONNECTING)
    err: {
      "type": "Error",
      "message": "WebSocket is not open: readyState 0 (CONNECTING)",
      "stack":
          Error: WebSocket is not open: readyState 0 (CONNECTING)
              at WebSocket.send (/main/node_modules/ws/lib/websocket.js:442:13)
              at WebsocketShard.send (/main/node_modules/vanguard/dist/src/ws/WebsocketShard.js:187:25)
              at async WebsocketShard.heartbeat (/main/node_modules/@discordjs/ws/dist/index.js:795:5)
    }

[Fri,03/17/23,14:54:02] WebSocket is not open: readyState 0 (CONNECTING)
    err: {
      "type": "Error",
      "message": "WebSocket is not open: readyState 0 (CONNECTING)",
      "stack":
          Error: WebSocket is not open: readyState 0 (CONNECTING)
              at WebSocket.send (/main/node_modules/ws/lib/websocket.js:442:13)
              at WebsocketShard.send (/main/node_modules/vanguard/dist/src/ws/WebsocketShard.js:187:25)
              at async WebsocketShard.heartbeat (/main/node_modules/@discordjs/ws/dist/index.js:795:5)
    }

[Fri,03/17/23,20:17:18] [EventHandler]: Shard Reconnecting => Shard: 866 | Close Code: none
[Fri,03/17/23,20:17:21] [WebSocket is not open: readyState 0 (CONNECTING)
    err: {
      "type": "Error",
      "message": "WebSocket is not open: readyState 0 (CONNECTING)",
      "stack":
          Error: WebSocket is not open: readyState 0 (CONNECTING)
              at WebSocket.send (/main/node_modules/ws/lib/websocket.js:442:13)
              at WebsocketShard.send (/main/node_modules/vanguard/dist/src/ws/WebsocketShard.js:187:25)
              at async WebsocketShard.heartbeat (/main/node_modules/@discordjs/ws/dist/index.js:795:5)
    }
[Fri,03/17/23,20:17:21] [WebSocket is not open: readyState 0 (CONNECTING)
    err: {
      "type": "Error",
      "message": "WebSocket is not open: readyState 0 (CONNECTING)",
      "stack":
          Error: WebSocket is not open: readyState 0 (CONNECTING)
              at WebSocket.send (/main/node_modules/ws/lib/websocket.js:442:13)
              at WebsocketShard.send (/main/node_modules/vanguard/dist/src/ws/WebsocketShard.js:187:25)
              at async WebsocketShard.heartbeat (/main/node_modules/@discordjs/ws/dist/index.js:795:5)
    }

The error originates from here + the error showing a readyState of CONNECTING just seems like we're trying to send a heartbeat while the shard is still connecting (not sure if this could cause issues). But we get the error twice per shard Thonkang

// djs/ws index.js line 795
    await this.send({
      op: import_v102.GatewayOpcodes.Heartbeat,
      d: this.session?.sequence ?? null
    });

Then finally it gets moved to Vanguard's send function https://github.com/Deivu/Vanguard/blob/master/src/ws/WebsocketShard.ts#L157

#

I do want to move this to high priority since it's happening often, without enabling debug on prod (for now), how can I enable you to get debug logs on your own? I'm really out of touch with Djs so I'd need you to point me to the right parts.

If needed I can even stream and we can go through this in vc or something

sullen snow Mar 17, 2023, 11:38 PM

#

i didnt really hack anything, i just included erlpack on the ws build, and didnt touch any connection flow of ws package

#

i left that as is

#

i tried to mimic the old ws manager, so its mostly just discord.js package who is modified to fit into ws, not ws modified to fit into discord.js

dusty dove Mar 18, 2023, 6:44 AM

#

dim oracle <@223703707118731264> alright so it is happening quite often, with a few occurre...

can you.. update that send function

#

ive heavily changed it in ws because of certain bugs

#

nevermind, yours is up to date

#

sigh

#

ill see what i can do today

sullen snow Mar 18, 2023, 11:38 AM

#

@dusty dove is it possible to lock into a specific commit in the current repo setup?

dusty dove Mar 18, 2023, 11:50 AM

#

sullen snow <@223703707118731264> is it possible to lock into a specific commit in the curre...

not easily I don't think

#

also, if you're around

#

how often were you saying you run into this

#

since I think I figured it out

sullen snow Mar 18, 2023, 11:51 AM

#

very frequent

#

based on the latest news I know

#

50 shards are down in less than a day

dusty dove Mar 18, 2023, 11:51 AM

#

mmm

#

no but im asking more like

#

time frequency on a per-shard basis

#

like this issue will hit all shards regardless

#

does it take a while before it happens the first time on a given shard

#

and then it keeps happening?

#

because if so I think I have it

sullen snow Mar 18, 2023, 11:52 AM

#

once the shard goes in this state

#

you cant revive it

dusty dove Mar 18, 2023, 11:52 AM

#

no idea then lol

#

I'd probably figure it out instantly with debug logs but /shrug

#

I can't quite get it into your state locally

#

I just see something wrong with the code that vaguely resembles your issue

sullen snow Mar 18, 2023, 11:53 AM

#

what do you think it is?

#

so I have an idea and see

#

actually I have the commit that is known to be the last stable

#

if you want I could give you the files

#

and try to compare it to latest master

#

"version": "0.6.1-dev.1675904160-0e4224b.0", this is the current version on my prod bot and it doesnt crash

dusty dove Mar 18, 2023, 11:55 AM

#

            case GatewayOpcodes.Hello: {
                this.emit(WebSocketShardEvents.Hello);
                const jitter = Math.random();
                const firstWait = Math.floor(payload.d.heartbeat_interval * jitter);
                this.debug([`Preparing first heartbeat of the connection with a jitter of ${jitter}; waiting ${firstWait}ms`]);

                await sleep(firstWait);
                await this.heartbeat();

                this.debug([`First heartbeat sent, starting to beat every ${payload.d.heartbeat_interval}ms`]);
                this.heartbeatInterval = setInterval(() => void this.heartbeat(), payload.d.heartbeat_interval);
                break;
            }```

as it is, that `await sleep` call isn't cancelled in any circumstances, so if your shard goes through a reconnect very soon after another one (or after the initial connect), you end up in a state where:
- you end up sending 2 heartbeats once the connection is finally fully re-established
- the old `heartbeatInterval` is never cleared and instead gets lost since there's no reference to it anymore as its overwritten by the new one
- the part that confuses me - only with very specific bad timing would you end up sending a heartbeat before the conn is actually open sometime in the future, since well, now there's just 2 or more heartbeatIntervals running, not in sync
- if that initial condition keeps occurring, the issue adds up, with more and more loose unbound intervals

#

so it would actually be the heartbeat jitter PR that's guilty

sullen snow Mar 18, 2023, 11:55 AM

#

actually

#

that is probably the issue

dusty dove Mar 18, 2023, 11:55 AM

#

yeah most def.

sullen snow Mar 18, 2023, 11:56 AM

#

cause I also encounter 2 reconnects on my current prod bot

dusty dove Mar 18, 2023, 11:56 AM

#

I guess since your bot is big maybe your heartbeat interval is much smaller than mine

#

(my test bot has 45 seconds)

#

so I can't really trigger it

#

even manually

sullen snow Mar 18, 2023, 11:56 AM

#

this is on a 1.7mil bot or 1.5m

#

the bot I own is around 105k

dusty dove Mar 18, 2023, 11:56 AM

#

and you also just have more shards so you're more likely to run into it lol

#

but yeah I'll PR a fix now

sullen snow Mar 18, 2023, 11:59 AM

#

BlushThanks

gloomy skyBOT Mar 18, 2023, 12:02 PM

#

pr_open #9244 in discordjs/discord.js by didinele opened <t:1679140961:R> (review required)
fix(WebSocketShard): cancel initial heartbeat in destroy

dusty dove Mar 18, 2023, 12:02 PM

#

@stable hatch ^

#

yeah I know one of those = null assignments is redundant in some cases, I just want to be extra safe

stable hatch Mar 18, 2023, 12:03 PM

#

that is...jankkkk

dusty dove Mar 18, 2023, 12:03 PM

#

a little

#

but its ok

stable hatch Mar 18, 2023, 12:03 PM

#

couldn't you have done like

#

what someone else suggested somewhere

#

in your jitter pr actually

#

https://github.com/discordjs/discord.js/pull/9223#discussion_r1137021606

dusty dove Mar 18, 2023, 12:04 PM

#

notLikeCat

stable hatch Mar 18, 2023, 12:04 PM

#

KEKW

dusty dove Mar 18, 2023, 12:04 PM

#

well they mean setTimeout on that first call, first off

stable hatch Mar 18, 2023, 12:04 PM

#

well ye

dusty dove Mar 18, 2023, 12:05 PM

#

i dont know if i like that more

#

i do

stable hatch Mar 18, 2023, 12:05 PM

#

ignore me

dusty dove Mar 18, 2023, 12:05 PM

#

nice delete

stable hatch Mar 18, 2023, 12:05 PM

#

gh didn't show the method name dead

dusty dove Mar 18, 2023, 12:05 PM

#

ugh

#

this is honestly just JS sucking here lmao

#

I took this super async/await approach everywhere when it came to events and waiting for things

#

and what I did with the controller is consistent with that

stable hatch Mar 18, 2023, 12:06 PM

#

dusty dove i dont know if i like that more

both methods are dead

#

Also

dusty dove Mar 18, 2023, 12:06 PM

#

but the other pattern does look nicer here i guess

stable hatch Mar 18, 2023, 12:06 PM

#

you could've / should've used a try/catch/finally

dusty dove Mar 18, 2023, 12:06 PM

#

beeproll

stable hatch Mar 18, 2023, 12:06 PM

#

LISTEN

dusty dove Mar 18, 2023, 12:06 PM

#

will keep this approach but will do try/catch i guess after lunch

dim oracle Mar 18, 2023, 12:57 PM

#

Do you perhaps know when this will be merged?

dusty dove Mar 18, 2023, 12:59 PM

#

whenever space and kyra review it

dim oracle Mar 18, 2023, 1:07 PM

#

Also (unrelated to the rest), what is now the official/best way to support Djs, I'm still subscribed to https://patreon.com/discordjs but I haven't seen Amish in eons, and iirc he's also no longer part of Djs

dusty dove Mar 18, 2023, 1:07 PM

#

yeah he only really runs /voice

#

https://opencollective.com/discordjs

discord.js - Open Collective

discord.js is the largest JavaScript/TypeScript Discord library to create bots.

#

you can donate there

#

it's all transparent and you can also see where the money is going in the Expenses tab

#

every so often we (contribs) can bill if we've had any significant work on the library ^^

dim oracle Mar 18, 2023, 1:18 PM

#

Alright, I'll look into that

stable hatch Mar 18, 2023, 3:45 PM

#

dim oracle Also (unrelated to the rest), what is now the official/best way to support Djs, ...

Either github sponsors, or directly on open collective, whichever is simpler for you

dim oracle Mar 18, 2023, 4:49 PM

#

stable hatch Either github sponsors, or directly on open collective, whichever is simpler for...

Seems like the GitHub sponsors are more directly to a person itself rather than the whole project, and I'd have no idea who does what

stable hatch Mar 18, 2023, 4:49 PM

#

https://github.com/sponsors/discordjs

GitHub

Sponsor @discordjs on GitHub Sponsors

Support discordjs's open source work

#

I mean either way it goes to/through open collective

dusty dove Mar 18, 2023, 7:33 PM

#

@dim oracle @sullen snow it's been merged

#

just wait for the next @dev release and let me know if it helps

#

oh, sorry about that ping

#

dont know if that's helpful either kek

dusty dove Mar 18, 2023, 7:34 PM

#

dusty dove just wait for the next `@dev` release and let me know if it helps

we make those automatically every 12hrs

dim oracle Mar 18, 2023, 7:37 PM

#

dusty dove <@310853886191599616> <@325231623262044162> it's been merged

Great thanks a ton again, we'll wait for the dev release then ^^

dusty dove Mar 18, 2023, 7:38 PM

#

while you're at it, since this is gonna take a prod deploy anyway, enable debug logs so I can actually figure out what's going on if this doesn't fix it

dim oracle Mar 18, 2023, 7:39 PM

#

Yeah will do that for sure, will probably release it on prod this Monday

dim oracle Mar 20, 2023, 1:29 PM

#

@stable hatch @dusty dove alright so it appears the issue has not yet been resolved. Surprisingly it took over 24 hours before the first shard got hit. Here's what I can find in the debug logs for shard 873 (Once again ignore markdown)

Seems like there's... a lot going on

📎 message.txt

#

You can ignore the INFO lines here

stable hatch Mar 20, 2023, 1:30 PM

#

Well we haven't released the fix, unless you put it in yourself in the code?

dim oracle Mar 20, 2023, 1:30 PM

#

We used the dev release

stable hatch Mar 20, 2023, 1:30 PM

#

Hmm

#

Can you npm ls it?

#

Just to be sure 🙏

#

Or yarn why, whichever package manager you use

#

Also HOLY did you just say shard 873

#

😵‍💫😵‍💫😵‍💫

sullen snow Mar 20, 2023, 1:31 PM

#

yeah we have 1.5k shards

stable hatch Mar 20, 2023, 1:31 PM

#

Damnnn

sullen snow Mar 20, 2023, 1:31 PM

#

i just did some magic to make d.js scale at this number

#

CheshireXD

dim oracle Mar 20, 2023, 1:31 PM

#

We don't have the files on disk, we run npm install in the docker file

stable hatch Mar 20, 2023, 1:32 PM

#

You can access the container!

sullen snow Mar 20, 2023, 1:32 PM

#

we are on 0.7.1-dev

dim oracle Mar 20, 2023, 1:32 PM

#

I'd normally just eval the version but that isn't inject

#

Yeah I exec'd the container

stable hatch Mar 20, 2023, 1:32 PM

#

docker exec -it name /bin/bash

dim oracle Mar 20, 2023, 1:32 PM

#

package.json shows "version": "0.7.1-dev.1679184639-9842082.0"

sullen snow Mar 20, 2023, 1:32 PM

#

"version": "0.7.1-dev.1679184639-9842082.0" to be precise

dusty dove Mar 20, 2023, 1:32 PM

#

ill look at those logs when i have a sec

dim oracle Mar 20, 2023, 1:32 PM

#

And yeah we run 1440 shards

dusty dove Mar 20, 2023, 1:32 PM

#

probably will be a while

stable hatch Mar 20, 2023, 1:32 PM

#

Well its definitely latest version at least

sullen snow Mar 20, 2023, 1:32 PM

#

seems like an error occurred before the send error happened

stable hatch Mar 20, 2023, 1:32 PM

#

Can you provide us the strategy you use too?

sullen snow Mar 20, 2023, 1:33 PM

#

worker

stable hatch Mar 20, 2023, 1:33 PM

#

Shards per worker?

sullen snow Mar 20, 2023, 1:33 PM

#

1

stable hatch Mar 20, 2023, 1:33 PM

#

Mkayyyy

#

Ty

sullen snow Mar 20, 2023, 1:33 PM

#

my guess is when ws.on error happened

#

the interval was not cleaned for some reason

stable hatch Mar 20, 2023, 1:33 PM

#

I'll also take a look to see if anything jumps out but I doubt I'll figure it out as quick as dd might

sullen snow Mar 20, 2023, 1:34 PM

#

thank you ayase_smile

dusty dove Mar 20, 2023, 1:38 PM

#

dim oracle <@139836912335716352> <@223703707118731264> alright so it appears the issue has ...

this is new though isnt it

#

theres no send calls erroring anymore

dim oracle Mar 20, 2023, 1:38 PM

#

Yeah the previous error is not present which kinda surprised me

dusty dove Mar 20, 2023, 1:38 PM

#

either way that looks like a good clue

#

outside of that SSL error

#

which ???

dim oracle Mar 20, 2023, 1:39 PM

#

But it results in the same thing

#

Yeah no clue

#

cleaner version without markdown and INFO logs

📎 message.txt

dusty dove Mar 20, 2023, 1:40 PM

#

i need lunch im running on 0 calories for the past 18 hours or so kek

dim oracle Mar 20, 2023, 1:40 PM

#

np ^^

#

@sullen snow do you know what the SSL error is about

sullen snow Mar 20, 2023, 1:41 PM

#

i dont touch anything about ssl

#

nor how this ws open a connection

#

but the error happened after an error

#

also it looks like ws is trying to close something that isnt established?

#

could be a good clue since it means we may have been missing some checks here

dusty dove Mar 20, 2023, 1:46 PM

#

yeah that's what I eye'd as well

sullen snow Mar 20, 2023, 1:51 PM

#

though

#

why it happened

#

after several "error"

#

it could happen after the first error

#

but it decided not to happen on that and happened after several errors

#

dusty dove Mar 20, 2023, 3:45 PM

#

@dim oracle yeah so, it looks like the WS docs just straight up betrayed me

#

Prevent the server from accepting new connections and close the HTTP server if created internally. If an external HTTP server is used via the server or noServer constructor options, it must be closed manually. Existing connections are not closed automatically. The server emits a 'close' event when all connections are closed unless an external HTTP server is used and client tracking is disabled. In this case the 'close' event is emitted in the next tick. The optional callback is called when the 'close' event occurs and receives an Error if the server is already closed.

#

nothing here says I can't call .close() if it's CONNECTING

#

but that's what happened to you

#

DEBUG [Mon,03/20/23,11:55:22] (Cluster Process [ID: 27]): [EventHandler](Discord.JS): [WS => Shard 873 => Worker] Connection status during destroy
    Needs closing: true
    Ready state: 0
ERROR [Mon,03/20/23,11:55:22] (Cluster Process [ID: 27]): [EventHandler]: Shard Errored => Shard: 873
    err: {
      "type": "Error",
      "message": "WebSocket was closed before the connection was established",
      "stack":
          Error: WebSocket was closed before the connection was established
              at WebSocket.close (/main/node_modules/ws/lib/websocket.js:285:7)
              at WebsocketShard.destroy (/main/node_modules/@discordjs/ws/dist/index.js:638:25)
              at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
              at async WebsocketShard.bubbleWaitForEventError (/main/node_modules/@discordjs/ws/dist/index.js:693:7)
              at async WebsocketShard.connect (/main/node_modules/@discordjs/ws/dist/index.js:587:20)
              at async WebsocketShard.connect (/main/node_modules/vanguard/dist/src/ws/WebsocketShard.js:79:9)
    }```

stable hatch Mar 20, 2023, 3:46 PM

#

dim oracle cleaner version without markdown and INFO logs

SSL errors = internet going dumb btw

dusty dove Mar 20, 2023, 3:46 PM

#

dusty dove ``` DEBUG [Mon,03/20/23,11:55:22] (Cluster Process [ID: 27]): [EventHandler](Dis...

¯_(ツ)_/¯

#

fricking insane

#

hate how poorly documented this package is

stable hatch Mar 20, 2023, 3:46 PM

#

And for what it matters it seems that we handle those gracefully

dusty dove Mar 20, 2023, 3:46 PM

#

yeah it looks like it was handled fine

stable hatch Mar 20, 2023, 3:46 PM

#

dusty dove > Prevent the server from accepting new connections and close the HTTP server if...

this seems to have nothing about ws clients

#

only the server

dusty dove Mar 20, 2023, 3:47 PM

#

lmao yeah i think you're right

#

yeah

#

https://sucks-to-b.eu/d8I4SE.png

#

there's the client one

#

thanks, very informative

#

either way, I guess like

#

https://sucks-to-b.eu/ylGc5s.png

stable hatch Mar 20, 2023, 3:47 PM

#

I guess try catch connection close, assume its fine and carry on?

dusty dove Mar 20, 2023, 3:47 PM

#

nuh-uh

#

I feel like that could leak

#

somehow

#

leaves an open connection or smth

#

I was thinking we make that if (this.connection.readyState === WebSocket.OPEN)

#

and proceed as we currently do

#

and else if (this.connection.readyState === WebSocket.CONNECTING)

stable hatch Mar 20, 2023, 3:48 PM

#

dusty dove leaves an open connection or smth

discord will close it for us

#

its like

dusty dove Mar 20, 2023, 3:48 PM

#

use connection.terminate() instead

stable hatch Mar 20, 2023, 3:48 PM

#

not a big deal

dusty dove Mar 20, 2023, 3:48 PM

#

oh yeah I guess that's fair

#

much simpler for us to handle too then

stable hatch Mar 20, 2023, 3:48 PM

#

connection.terminate is nodejs ws only

dusty dove Mar 20, 2023, 3:49 PM

#

great

#

anyway

#

this seems like the only issue kyoso ran into

#

DEBUG [Mon,03/20/23,11:55:22] (Cluster Process [ID: 27]): [EventHandler](Discord.JS): [WS => Shard 873 => Worker] Destroying shard
    Reason: Something timed out
    Code: 1000
    Recover: Reconnect
DEBUG [Mon,03/20/23,11:55:22] (Cluster Process [ID: 27]): [EventHandler](Discord.JS): [WS => Shard 873 => Worker] Connection status during destroy
    Needs closing: false
    Ready state: 3
ERROR [Mon,03/20/23,11:55:22] (Cluster Process [ID: 27]): WebSocket was closed before the connection was established
    err: {
      "type": "Error",
      "message": "WebSocket was closed before the connection was established",
      "stack":
          Error: WebSocket was closed before the connection was established
              at WebSocket.close (/main/node_modules/ws/lib/websocket.js:285:7)
              at WebsocketShard.destroy (/main/node_modules/@discordjs/ws/dist/index.js:638:25)
              at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
              at async WebsocketShard.bubbleWaitForEventError (/main/node_modules/@discordjs/ws/dist/index.js:693:7)
              at async WebsocketShard.connect (/main/node_modules/@discordjs/ws/dist/index.js:587:20)
              at async WebsocketShard.connect (/main/node_modules/vanguard/dist/src/ws/WebsocketShard.js:79:9)
    }
ERROR [Mon,03/20/23,11:55:22] (Cluster Process [ID: 27]): WebSocket was closed before the connection was established
    err: {
      "type": "Error",
      "message": "WebSocket was closed before the connection was established",
      "stack":
          Error: WebSocket was closed before the connection was established
              at WebSocket.close (/main/node_modules/ws/lib/websocket.js:285:7)
              at WebsocketShard.destroy (/main/node_modules/@discordjs/ws/dist/index.js:638:25)
              at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
              at async WebsocketShard.bubbleWaitForEventError (/main/node_modules/@discordjs/ws/dist/index.js:693:7)
              at async WebsocketShard.connect (/main/node_modules/@discordjs/ws/dist/index.js:587:20)
              at async WebsocketShard.connect (/main/node_modules/vanguard/dist/src/ws/WebsocketShard.js:79:9)
    }

#

well this looks weird

#

but they got into a broken state by that point already

#

sooo

#

we'll pretend we don't see that part

gloomy skyBOT Mar 20, 2023, 3:56 PM

#

pr_open #9254 in discordjs/discord.js by didinele opened <t:1679327757:R> (review required)
fix(WebSocketShard): don't close in #destroy when status is connecting

dusty dove Mar 20, 2023, 3:56 PM

#

@stable hatch ^

#

also updated that debug log since it was a bit miss-leading after some refactors

stable hatch Mar 20, 2023, 3:57 PM

#

hmmm, you should remove all listeners from the conn if its not in a should close state

#

👀

#

oh wait i see

#

misread

dusty dove Mar 20, 2023, 3:57 PM

#

ye

#

its not done there either way

stable hatch Mar 20, 2023, 3:57 PM

#

maybe add a debug message in the else for shouldClose to log that "shit broke, oh well"

dusty dove Mar 20, 2023, 3:58 PM

#

the debug log there is already clear enough though if it'll enter the if or not

#

Needs closing: false, will mean it didn't

stable hatch Mar 20, 2023, 3:58 PM

#

ugh its still pain that we cannot close while its connecting

dusty dove Mar 20, 2023, 3:59 PM

#

yeah, shrug

#

u're right though it'll eventually just die anyway if it does end up staying open with no refs to it

#

WS connections due that in the first place if they don't send any payloads for a while

#

discord might be even faster if they see we aren't heartbeating or anything

stable hatch Mar 20, 2023, 4:00 PM

#

its not IDEAL

#

but yea

dusty dove Mar 20, 2023, 4:00 PM

#

its p rare it'll happen anyway

#

and leaks basically nothing realistically

#

it's just a tcp handle

stable hatch Mar 20, 2023, 4:03 PM

#

still hate it but its the best we can do I guess because ws fucking throws on attempting to close when connecting

dusty dove Mar 20, 2023, 4:03 PM

#

rofl

#

though I must ask

#

@dim oracle did the shard eventually recover w/o your intervention or

dim oracle Mar 20, 2023, 4:12 PM

#

dusty dove <@310853886191599616> did the shard eventually recover w/o your intervention or

Not yet, I haven't tried doing anything with it yet either

dusty dove Mar 20, 2023, 4:13 PM

#

so like.. what is it doing

#

is it just destroy-looping

dim oracle Mar 20, 2023, 4:13 PM

#

I don't think its doing anything, after those logs the shard ID never showed up again

dusty dove Mar 20, 2023, 4:13 PM

#

lmao

#

not the most intended of behaviors

#

is that the only shard that broke?

#

if so it sounds like we did squash the original bug and you just ran into something new that's much less likely to happen

#

(which seems to be trying to destroy a shard that hasn't fully connected yet)

dim oracle Mar 20, 2023, 4:19 PM

#

I think it is, I don't really have a good way to check since my PC cries when I open the 8GB log file

dusty dove Mar 20, 2023, 4:19 PM

#

yeah good, cool

stable hatch Mar 20, 2023, 4:21 PM

#

For confirmation sake, can you patch-package it with the PR fix?

dim oracle Mar 20, 2023, 4:22 PM

#

I don't really want to restart prod during peak hours tbh

#

Only one shard down atm after 30+ hours, so not really worth restarting prod for atm

stable hatch Mar 20, 2023, 4:23 PM

#

Well you dont have to do it now

dim oracle Mar 20, 2023, 4:23 PM

#

I can do it, but probably in a few days

dusty dove Mar 20, 2023, 7:31 PM

#

pr was merged anyways

dim oracle Mar 23, 2023, 6:31 PM

#

Just thought I'd keep you posted~ surpassed 48 hours of no issues so far

dusty dove Mar 23, 2023, 6:31 PM

#

have u pulled the last fix we merged

#

or is this w/o

dim oracle Mar 23, 2023, 7:01 PM

#

Running 0.7.1-dev.1679400254-950fc47.0 still

dim oracle Mar 23, 2023, 7:01 PM

#

gloomy sky <:pr_open:852715356622553088> [#9254 in discordjs/discord.js](<https://github.co...

this one iirc

dusty dove Mar 23, 2023, 7:09 PM

#

yeah so i was probs right your last issue was just a super rare thing

dusty dove Mar 23, 2023, 7:09 PM

#

gloomy sky <:pr_open:852715356622553088> [#9254 in discordjs/discord.js](<https://github.co...

this should've patched that as well

#

either way good to know u're stable now

#

cc @stable hatch, looks like I nailed it the first try

stable hatch Mar 23, 2023, 7:10 PM

#

dusty dove cc <@139836912335716352>, looks like I nailed it the first try

Don't jinx it

dusty dove Mar 23, 2023, 7:10 PM

#

too late

dusty dove Mar 23, 2023, 7:11 PM

#

dusty dove this should've patched that as well

oh u replied to the same one

#

yes ok

#

so everything should just be good

dim oracle Mar 24, 2023, 10:19 PM

#

or is it

#

Let me actually confirm if this is related at all

dim oracle Mar 24, 2023, 10:44 PM

#

@stable hatch @dusty dove it seems like there's another issue? Here's an example for one of the shards affected. After this it is completely radio silence and I never see that shard again.

📎 logs.txt

stable hatch Mar 24, 2023, 11:08 PM

#

ohlawd

dim oracle Mar 24, 2023, 11:39 PM

#

cc @sullen snow as well I guess

dusty dove Mar 25, 2023, 11:25 AM

#

what 1k shards does to an mf

#

dude ur ass got a 520 from discord

#

wth

#

yeah ok

#

that is insane

#

i have no idea what im looking at

dusty dove Mar 25, 2023, 11:31 AM

#

dim oracle <@139836912335716352> <@223703707118731264> it seems like there's another issue?...

ur logs are failing to capture smth fwiw

#

or not actually

#

nvm

#

i dont know why theres just radio silence after

#

either way that shard just completely broke

#

it managed to get into a state where it identified after resuming?

#

there is one reference to this.identify in the whole file

#

and its in connect()

#

you had 2 concurrent connect calls running

#

one with a session and one without

#

WAYTOODANK

dim oracle Mar 25, 2023, 11:36 AM

#

I can check if the same happened for the other 18 shards I suppose

#

Weird thing is, this happened on all 3 bots at the exact same time

#

KannaDetective

dusty dove Mar 25, 2023, 11:37 AM

#

yeah that makes a lot of sense

#

because you got a 520 from discord

#

which means this is specific breakage from them im not handling appropriately

#

you just keep running into the most absurd edge cases

dim oracle Mar 25, 2023, 11:37 AM

#

what 1440 shards does to a mf ig

dusty dove Mar 25, 2023, 11:38 AM

#

DEBUG [Fri,03/24/23,22:20:18] (Cluster Process [ID: 42]): [EventHandler](Discord.JS): [WS => Shard 1370 => Worker] Resuming session
DEBUG [Fri,03/24/23,22:20:18] (Cluster Process [ID: 42]): [EventHandler](Discord.JS): [WS => Shard 1370 => Worker] Identifying
    shard id: 1370
    shard count: 1440
    intents: 131
    compression: none```

#

this is nonsense

#

kek

#

nope

#

I think I get it

dim oracle Mar 25, 2023, 11:39 AM

#

time for a third pr I guess

#

Though, whatever changed from 0.6.x -> 0.7.0 seems to have added a lot of edge cases

#

Things were running fine for weeks before we upgraded to 0.7.0

dusty dove Mar 25, 2023, 11:41 AM

#

omg

#

that's such a cool bug

#

look at this

#

    /**
     * Does special error handling for waitForEvent calls, depending on the current state of the connection lifecycle
     * (i.e. whether or not the original connect() call has resolved or if the user has an error listener)
     */
    private async bubbleWaitForEventError(
        promise: Promise<unknown>,
    ): Promise<{ error: unknown; ok: false } | { ok: true }> {
        try {
            await promise;
            return { ok: true };
        } catch (error) {
            // Any error that isn't an abort error would have been caused by us emitting an error event in the first place
            // See https://nodejs.org/api/events.html#eventsonceemitter-name-options for `once()` behavior
            if (error instanceof Error && error.name === 'AbortError') {
                this.emit(WebSocketShardEvents.Error, { error });
            }

            // As stated previously, any other error would have been caused by us emitting the error event, which looks
            // like { error: unknown }
            // eslint-disable-next-line no-ex-assign
            error = (error as { error: unknown }).error;

            // If the user has no handling on their end (error event) simply throw.
            // We also want to throw if we're still in the initial `connect()` call, since that's the only time
            // the user can catch the error "normally"
            if (this.listenerCount(WebSocketShardEvents.Error) === 0 || !this.initialConnectResolved) {
                throw error;
            }

            // If the error is handled, we can just try to reconnect
            await this.destroy({
                code: CloseCodes.Normal,
                reason: 'Something timed out or went wrong while waiting for an event',
                recover: WebSocketShardDestroyRecovery.Reconnect,
            });

            return { ok: false, error };
        }
    }```

#

so there's this method, right

#

in your case, what happened is

#

the shard was trying to resume from the await this.resume() call in connect()

#

and it timed out for whatever reason

#

but since this isn't your first connection

#

it goes to those last 2 statements

#

            await this.destroy({
                code: CloseCodes.Normal,
                reason: 'Something timed out or went wrong while waiting for an event',
                recover: WebSocketShardDestroyRecovery.Reconnect,
            });

            return { ok: false, error };```

#

and it awaits the destroy call

#

which initiates a fresh reconnect

#

before returning { ok: false }

#

so back in connect

#

        const { ok } = await this.bubbleWaitForEventError(
            this.waitForEvent(WebSocketShardEvents.Hello, this.strategy.options.helloTimeout),
        );
        if (!ok) {
            return;
        }

        if (session?.shardCount === this.strategy.options.shardCount) {
            this.session = session;
            await this.resume(session);
        } else {
            await this.identify();
        }```

#

the waitforEvent call for hello fails in a connect() where connect would be called because of the state of session

#

and it initiates another connect because of that inner destroy call - before hitting the if (!ok) return;

#

so the 2nd wait for Hello on the non-resume connect goes through

#

and both of them end up going through, I guess?

#

kek

#

though I still don't quite understand how, since we still eventually return ok: false

#

but it's def something racing there

dim oracle Mar 25, 2023, 11:44 AM

#

I'm surprised you found this already

dusty dove Mar 25, 2023, 11:44 AM

#

lol

#

my instinct for async races is crazy good since its all ive been working with in my time programming

dim oracle Mar 25, 2023, 11:46 AM

#

Welp, it sure worked this time I suppose

dusty dove Mar 25, 2023, 11:46 AM

#

still super odd

#

but yeah it's def smth to do with this given both a resume and identify fired right after a Hello came through

dim oracle Mar 25, 2023, 11:47 AM

#

also, 520 is from cloudflare mmLol

dusty dove Mar 25, 2023, 11:47 AM

#

lol

#

classic

dim oracle Mar 25, 2023, 11:48 AM

#

tbh, this direct line has helped a ton the past few days so glad this came to be 🙏

dusty dove Mar 25, 2023, 11:49 AM

#

yeah i mean

#

its good you're running it in prod

#

because I get to iron it out like this

#

yea im not 100% this will do it but its a one-line diff that should improve the behavior either way

#

while im at it

#

i wanna append something to the readme

dim oracle Mar 25, 2023, 11:57 AM

#

Seems like this only happens rarely so I don't really need to rush any of this

dusty dove Mar 25, 2023, 11:57 AM

#

yeah

#

its the v specific "shard timed out on hello while resuming"

stable hatch Mar 25, 2023, 11:59 AM

#

dusty dove dude ur ass got a 520 from discord

I got some of these too!

#

..just for 2 shards

dusty dove Mar 25, 2023, 11:59 AM

#

@stable hatch https://github.com/discordjs/discord.js/pull/9276

stable hatch Mar 25, 2023, 12:00 PM

#

OMEGALUL

#

thats the fix?

#

Deadge

dusty dove Mar 25, 2023, 12:00 PM

#

rofl

#

unclear 100%

#

but it's def. more correct now

#

since before it made connect calls just kinda hang about if something timed out

#

until the shard fully reconnected

#

and that could start nesting up

stable hatch Mar 25, 2023, 12:00 PM

#

so heres

#

a wild shot

#

can you yolo unit test this?

dusty dove Mar 25, 2023, 12:01 PM

#

lmao

#

not really

#

mocking WS even enough just for this is insane

#

well its not insane its just completely unmaintainable if I just hack it up like this

stable hatch Mar 25, 2023, 12:01 PM

#

hrm right, you'd need to make some mock ws that just sends the 3 payloads

dusty dove Mar 25, 2023, 12:01 PM

#

yeah

dim oracle Mar 25, 2023, 12:03 PM

#

are versions injecting correctly yet?

stable hatch Mar 25, 2023, 12:03 PM

#

in dev, yea

#

they should be

stable hatch Mar 25, 2023, 12:04 PM

#

dusty dove ```ts /** * Does special error handling for waitForEvent calls, dependi...

btw DD

#

wont this throw undefined too?

dusty dove Mar 25, 2023, 12:04 PM

#

shouldn't?

#

when would it

stable hatch Mar 25, 2023, 12:04 PM

#

if (this.listenerCount(WebSocketShardEvents.Error) === 0 || !this.initialConnectResolved) {
  throw error;
}

dusty dove Mar 25, 2023, 12:04 PM

#

yeah, but when would it be undefined

stable hatch Mar 25, 2023, 12:04 PM

#

I mean

#

when an aborterror is thrown

#

ALSO emitting error will throw if theres no error listener

stable hatch Mar 25, 2023, 12:05 PM

#

dim oracle are versions injecting correctly yet?

they are btw, just checked

dusty dove Mar 25, 2023, 12:05 PM

#

thinking

dim oracle Mar 25, 2023, 12:05 PM

#

cool

dusty dove Mar 25, 2023, 12:06 PM

#

stable hatch ALSO emitting error will throw if theres no error listener

#

crap

dusty dove Mar 25, 2023, 12:06 PM

#

stable hatch ALSO emitting error will throw if theres no error listener

yeah, the point of the throw there is to act as control flow, mostly

#

in the !this.initialConnectResolved case its so the user can try..catch connect()

stable hatch Mar 25, 2023, 12:07 PM

#

it is still technically able to throw undefined, right?

dusty dove Mar 25, 2023, 12:07 PM

#

oh lmao

#

yes there is a missing conditional

#

kek

stable hatch Mar 25, 2023, 12:08 PM

#

if an aborterror is thrown

dusty dove Mar 25, 2023, 12:08 PM

#

I always destructure even if its an aborterror

#

Pepega

#

ill push a fix for that into this PR as well

stable hatch Mar 25, 2023, 12:08 PM

#

also why do you even emit error if its an aborterror

#

thats really not useful, right?

#

if you get an aborterror you just wanna return false, or?

dusty dove Mar 25, 2023, 12:09 PM

#

stable hatch also why do you even emit error if its an aborterror

well i actually dont have a good answer to this my brain was just like "anything else is an error event in the first place so i guess i should make sure abort errors also go there"

stable hatch Mar 25, 2023, 12:10 PM

#

maybe rethink the whole catch block then?

dusty dove Mar 25, 2023, 12:10 PM

#

stable hatch if you get an aborterror you just wanna return false, or?

no, I still want to throw

#

because of control flow

stable hatch Mar 25, 2023, 12:10 PM

#

Thonk

dusty dove Mar 25, 2023, 12:10 PM

#

like

#

you call .connect()

#

the first time

#

I need it to throw so connect() throws

#

and so the user can catch it

stable hatch Mar 25, 2023, 12:10 PM

#

I guess from the main ws.connect call

dusty dove Mar 25, 2023, 12:10 PM

#

yeah

#

in any other case i dont really care

stable hatch Mar 25, 2023, 12:14 PM

#

@dusty dove honestly tho

#

error = isAbortError ? error : (error as { error: unknown }).error; should be error = error instanceof Error ? error : theCast

dusty dove Mar 25, 2023, 12:14 PM

#

oh

#

sure

dim oracle Mar 25, 2023, 1:28 PM

#

merged already sheesh? Will just wait for dev release then

dusty dove Mar 26, 2023, 11:54 AM

#

@dim oracle @sullen snow do u guys keep perf metrics

#

we're getting rid of zlib-sync in favor of node:zlib

#

and i wanna see what the impact is

sullen snow Mar 26, 2023, 11:55 AM

#

we dont compress

#

CheshireXD

dusty dove Mar 26, 2023, 11:55 AM

#

oh

#

wth why do you use etf then

#

it's literally just bigger payloads in a bunch of cases if you don't zlib

sullen snow Mar 26, 2023, 11:55 AM

#

used to be for performance

#

but nowdays

#

due to threaded ws

#

might just remove it to reduce maintenance for me or wait for your implementation

stable hatch Mar 26, 2023, 11:56 AM

#

Please tell me you've installed bufferutil and utf-8-validate

dusty dove Mar 26, 2023, 11:56 AM

#

discordjs/ws big bot memes

sullen snow Mar 26, 2023, 11:56 AM

#

both are installed, but even with those, cpu usage without etf encoding is high

dusty dove Mar 26, 2023, 11:56 AM

#

thats nuts

stable hatch Mar 26, 2023, 11:56 AM

#

That's why you use zlib-stream

dusty dove Mar 26, 2023, 11:56 AM

#

yeah lol

sullen snow Mar 26, 2023, 11:56 AM

#

zlib has leaks on node

stable hatch Mar 26, 2023, 11:56 AM

#

It wat

dusty dove Mar 26, 2023, 11:56 AM

#

yeah it actually does

#

I've seen that before

#

but i dont know why/how

sullen snow Mar 26, 2023, 11:57 AM

#

wait a second

stable hatch Mar 26, 2023, 11:57 AM

#

Deadge

dusty dove Mar 26, 2023, 11:57 AM

#

I'd imagine its because abal doesn't maintain zlib-sync

sullen snow Mar 26, 2023, 11:57 AM

#

ill show you a really old

#

experimentation I have

dusty dove Mar 26, 2023, 11:57 AM

#

so there's just no updates to zlib being pulled in

#

but node:zlib shouldn't leak

dim oracle Mar 26, 2023, 11:58 AM

#

I don't really know the compression stuff but we basically have no CPU usage on prod

sullen snow Mar 26, 2023, 11:58 AM

#

with etf ^

dim oracle Mar 26, 2023, 11:58 AM

#

Like <5%

sullen snow Mar 26, 2023, 11:58 AM

#

how do you search for

#

messages that I sent with a specific keyword again

#

#archive-offtopic message

#

@stable hatch @dusty dove

#

surely you dont want those pauses

#

KEKW

dusty dove Mar 26, 2023, 12:00 PM

#

yea dunno

#

once we merge this try it i guess

#

and see if it breaks ur stuff

sullen snow Mar 26, 2023, 12:00 PM

#

I'll see if I can add garbage collection pauses again

#

as that was my 2020 code

#

who knows where I even put it

#

LUL

stable hatch Mar 26, 2023, 12:01 PM

#

@dusty dove where pr

dusty dove Mar 26, 2023, 12:01 PM

#

making now

#

just doing some cleanup

sullen snow Mar 26, 2023, 12:01 PM

#

thats why personally I used etf over compression

#

but then again, once this zlib issues is fixed

#

we may want to try it but this is prod so we don't want anything breaking like the 0.7.0 commit CheshireXD

dim oracle Mar 26, 2023, 12:06 PM

#

"big bot memes" Deadge

stable hatch Mar 26, 2023, 12:06 PM

#

dusty dove just doing some cleanup

Btw don't remove zlib-sync yet

#

Lets make a (dev) release with both

dim oracle Mar 26, 2023, 12:06 PM

#

dim oracle Like <5%

damn I was wrong, we hit 8% during peak hours

stable hatch Mar 26, 2023, 12:07 PM

#

sullen snow thats why personally I used etf over compression

Discord isn't supporting etf for much longer apparently

#

So

dusty dove Mar 26, 2023, 12:07 PM

#

stable hatch Lets make a (dev) release with both

how do u want me to support that

stable hatch Mar 26, 2023, 12:07 PM

#

...add a new compression method called nativezlib?

dusty dove Mar 26, 2023, 12:07 PM

#

awful

#

fine

stable hatch Mar 26, 2023, 12:07 PM

#

dusty dove awful

Suffer

dusty dove Mar 26, 2023, 12:08 PM

#

yeah im not doing this today anymore

#

dont feel like it

#

OMEGALUL

#

too much conditional work

stable hatch Mar 26, 2023, 12:08 PM

#

I'll do it if you push code to a branch

#

Lol

dusty dove Mar 26, 2023, 12:08 PM

#

ok

dim oracle Mar 26, 2023, 12:08 PM

#

anything else you want to test on a big bot before I pull the latest dev release somewhere tomorrow

dusty dove Mar 26, 2023, 12:09 PM

#

we have one too now actually

#

0 guilds but it has big bot sharding toggled on

stable hatch Mar 26, 2023, 12:09 PM

#

dusty dove we have one too now actually

If you mean big bot that's a stretch

dim oracle Mar 26, 2023, 12:09 PM

#

oh that's nice

stable hatch Mar 26, 2023, 12:09 PM

#

Yeah LOL so we can test max concurrency

dusty dove Mar 26, 2023, 12:09 PM

#

well we can't actually test stress and stuff

#

but its still useful

dim oracle Mar 26, 2023, 12:09 PM

#

yeah its not gonna be anything like the 'real' thing though

dusty dove Mar 26, 2023, 12:09 PM

#

ye

dim oracle Mar 26, 2023, 12:10 PM

#

16x or higher?

dusty dove Mar 26, 2023, 12:10 PM

#

https://github.com/discordjs/discord.js/pull/9279 do w/e u want w it @stable hatch

stable hatch Mar 26, 2023, 12:10 PM

#

Pls tell me you let maintainers push to it

dusty dove Mar 26, 2023, 12:10 PM

#

i also split useIdentifyCompression into its own option but thats probably not how i should have done it

#

yes

stable hatch Mar 26, 2023, 12:10 PM

#

Otherwise i will ping you every minute for the next day

dusty dove Mar 26, 2023, 12:10 PM

#

i never tick that box off

dusty dove Mar 26, 2023, 12:11 PM

#

dim oracle 16x or higher?

16x yeah

#

looks so goofy https://sucks-to-b.eu/13cg2W.png

dim oracle Mar 26, 2023, 12:11 PM

#

do you actually know how many bots with big bot sharding use djs

stable hatch Mar 26, 2023, 12:12 PM

#

dim oracle do you actually know how many bots with big bot sharding use djs

No

stable hatch Mar 26, 2023, 12:12 PM

#

dusty dove <https://github.com/discordjs/discord.js/pull/9279> do w/e u want w it <@1398369...

Why the hell did you split the options

dusty dove Mar 26, 2023, 12:12 PM

#

mostly cus of how I deal with the compression enum

stable hatch Mar 26, 2023, 12:12 PM

#

You can't mix identify compression with zlib-stream

dusty dove Mar 26, 2023, 12:12 PM

#

yes, that's handled elsewhere

#

join them back if u want to

stable hatch Mar 26, 2023, 12:12 PM

#

THEN WHY'D YOU SPLIT

#

Reeeee

dusty dove Mar 26, 2023, 12:12 PM

#

dusty dove mostly cus of how I deal with the compression enum

^

stable hatch Mar 26, 2023, 12:12 PM

#

Zoomers i s2g

dim oracle Mar 26, 2023, 12:12 PM

#

dusty dove looks so goofy https://sucks-to-b.eu/13cg2W.png

I'm glad its an automated system now tbh, back in the day you had to contact Discord to move you, and if you weren't running a multiple of 16 you'd have to reboot before you can access it

dusty dove Mar 26, 2023, 12:12 PM

#

I pass the value as-is into the query params of the WS url

dusty dove Mar 26, 2023, 12:13 PM

#

dusty dove I pass the value as-is into the query params of the WS url

            params.append('compress', compression);

stable hatch Mar 26, 2023, 12:13 PM

#

dusty dove ```ts params.append('compress', compression); ```

Horrible

#

I'll look into it Deadge

stable hatch Mar 26, 2023, 12:13 PM

#

dim oracle I'm glad its an automated system now tbh, back in the day you had to contact Dis...

Is it automated lol

dim oracle Mar 26, 2023, 12:13 PM

#

it is now, if you hit 150k and have a multiple of 16 as shard count you're just moved automatically to 16x

#

same for 32x iirc

#

not sure about 64x or 128x though

stable hatch Mar 26, 2023, 12:14 PM

#

Are there any bots in that range o.o

dim oracle Mar 26, 2023, 12:14 PM

#

but only like 4 bots have access to that anyways

#

iirc Mee6 is the only on 128x

dusty dove Mar 26, 2023, 12:14 PM

#

p sure rythm still is as well

dim oracle Mar 26, 2023, 12:14 PM

#

yeah but dead bot so shrug

dusty dove Mar 26, 2023, 12:14 PM

#

well

#

we still keep a gateway connection going

#

so they def. care lol

#

esp. considering the last time we did anything spicy we took the platform down

#

PepeLaugh

stable hatch Mar 26, 2023, 12:15 PM

#

dusty dove we still keep a gateway connection going

Why

dusty dove Mar 26, 2023, 12:15 PM

#

to display presence, marketing things i guess meguFace

stable hatch Mar 26, 2023, 12:15 PM

#

Discord rolling in their grave

#

If you ever disconnect and the sessions fully shut down and you reconnect you'll be yelled at 😂

dusty dove Mar 26, 2023, 12:16 PM

#

PepeLaugh

#

idk tho i had nothing to do w bot eng and still dont have anything to do with eng there

dim oracle Mar 26, 2023, 12:16 PM

#

I do wonder what you'll come up with to support 16x if that's gonna be a thing now

stable hatch Mar 26, 2023, 12:16 PM

#

dim oracle I do wonder what you'll come up with to support 16x if that's gonna be a thing n...

Wat

#

We already support max concurrency

dusty dove Mar 26, 2023, 12:16 PM

#

yeah i had vlad double check and we're still waiting for a response but

#

p sure you were wrong abt the max concurrency formula thing

#

there's just no such thing as bucketing on them, you can just identify max_concurrency shards at any given time regardless of their id

#

so we already fully support it lol

stable hatch Mar 26, 2023, 12:17 PM

#

We'll see when I get an answer

dim oracle Mar 26, 2023, 12:18 PM

#

Hmm, tbf I haven't actually used djs's sharding manager in a while

stable hatch Mar 26, 2023, 12:18 PM

#

Good

#

Save yourself

dim oracle Mar 26, 2023, 12:18 PM

#

💀

stable hatch Mar 26, 2023, 12:18 PM

#

The amount of issues it has is insane

#

Anyways this is what i asked discord

We do actually have a question about max_concurrency!

Is there any formula to what shards can connect concurrently? Does max_concurrency only alter the identify limit (so we could technically identify shard id 0/16 16 times), or does it alter the sequence too (so only shard 0-15 first, then 16-31)

#

We shall wait and see

#

The docs suggest the latter

#

(which wouldn't affect us anyways bc it works already)

dim oracle Mar 26, 2023, 12:22 PM

#

We'll see, every bot dev I have contact with does say the bucketing is a thing

#

how many sessions does the test bot have actually? Since it scales of guild count I guess just the normal 1000

stable hatch Mar 26, 2023, 12:58 PM

#

dim oracle how many sessions does the test bot have actually? Since it scales of guild coun...

2000 total, concurrency 16

dim oracle Mar 26, 2023, 8:06 PM

#

@dusty dove @stable hatch I'm not sure if this is caused by Djs, but again a shard is showing some weird things before never recovering again. Still on ws 0.7.1-dev.1679400254-950fc47.0

📎 logs.txt

#

Again after these logs shard 162 is never seen again

dusty dove Mar 26, 2023, 8:07 PM

#

aaaa

stable hatch Mar 26, 2023, 8:07 PM

#

likely bc of outdated ws

dusty dove Mar 26, 2023, 8:07 PM

#

why do you not run with --enable-source-maps

stable hatch Mar 26, 2023, 8:07 PM

#

I mean @dusty dove

#

for all we know

#

your awaiting destroy bug fix could've solved this

#

Best bet is wait for them to update, and check after

dusty dove Mar 26, 2023, 8:07 PM

#

if u look at the later logs

#

thats a network issue anyway

#

opening handshake timeout

stable hatch Mar 26, 2023, 8:08 PM

#

which could be what you solved in https://github.com/discordjs/discord.js/commit/519825a651fe22042a73046824d12f03f56ca9e2

#

dead

dusty dove Mar 26, 2023, 8:08 PM

#

well

#

that specifically patched a race caused by uh

#

waitForEvent timing out while resuming

#

but yes, thatd typically happen from network issues

stable hatch Mar 26, 2023, 8:09 PM

#

I stand by what i said

#

@dim oracle

update ws
run with --enable-source-maps
followup if it dies again
~~4. contemplate why you made a discord bot when networking is so reliable~~

dim oracle Mar 26, 2023, 8:10 PM

#

Just making sure this isn't missed before I update ws again

stable hatch Mar 26, 2023, 8:10 PM

#

i mean dd knows that better than me but its not impossible its already handled

#

this will end up being a cat and mouse against async race conditions

#

~~still more stable than djs's current ws which has at least 1-2 dead locks LMAO~~

dim oracle Mar 26, 2023, 8:11 PM

#

no weird network shenanigans around that time though

stable hatch Mar 26, 2023, 8:12 PM

#

your logs show networking issues

#

@dusty dove fyi the error seems to have been thrown in waitForEvent

dusty dove Mar 26, 2023, 8:16 PM

#

yeah makes sense

stable hatch Mar 26, 2023, 8:16 PM

#

probably waiting for hello?

dusty dove Mar 26, 2023, 8:16 PM

#

but the handshake timeouts are def. outside of my control

stable hatch Mar 26, 2023, 8:16 PM

#

ye

dusty dove Mar 26, 2023, 8:16 PM

#

if its not your network its discord

dusty dove Mar 26, 2023, 8:16 PM

#

stable hatch probably waiting for hello?

that should have a debug call tied to it if it was hello

stable hatch Mar 26, 2023, 8:17 PM

#

DEBUG [Sun,03/26/23,18:55:51] (Cluster Process [ID: 5]): [EventHandler](Discord.JS): [WS => Shard 162 => Worker] Waiting for event hello for 60000ms
ERROR [Sun,03/26/23,18:56:51] (Cluster Process [ID: 5]): [EventHandler]: Shard Errored => Shard: 162
    err: {
      "type": "Error",
      "message": "The operation was aborted",
      "stack":
          AbortError: The operation was aborted
              at EventTarget.abortListener (node:events:958:14)
              at [nodejs.internal.kHybridDispatch] (node:internal/event_target:735:20)
              at EventTarget.dispatchEvent (node:internal/event_target:677:26)
              at abortSignal (node:internal/abort_controller:308:10)
              at AbortController.abort (node:internal/abort_controller:338:5)
              at Timeout.<anonymous> (/main/node_modules/@discordjs/ws/dist/index.js:650:91)
              at listOnTimeout (node:internal/timers:569:17)
              at process.processTimers (node:internal/timers:512:7)
    }

#

ya dont say

dusty dove Mar 26, 2023, 8:18 PM

#

no but

#

nvm

#

i missed the destroy log

#

thats what i was looking for

#

yeah it timed out on hello

stable hatch Mar 26, 2023, 8:19 PM

#

Yknow, I do wonder if we can deadlock shards like that

dusty dove Mar 26, 2023, 8:19 PM

#

than subsequent reconnects timed out on the TPC handshake

dim oracle Mar 26, 2023, 8:19 PM

#

dusty dove why do you not run with --enable-source-maps

committed this now so I don't forget

dusty dove Mar 26, 2023, 8:19 PM

#

sooo discord was just dying

stable hatch Mar 26, 2023, 8:19 PM

#

imma test if shards deadlock when ws never responds with a hello

dusty dove Mar 26, 2023, 8:20 PM

#

wut why would it

#

it just times out n starts over

dim oracle Mar 26, 2023, 8:20 PM

#

Just to make sure, ^0.8.0-dev.1679789487-b8b852e.0 would be the latest right

stable hatch Mar 26, 2023, 8:20 PM

#

dusty dove wut why would it

god rays who tf knows

#

theres enough async to kill an elephant

stable hatch Mar 26, 2023, 8:20 PM

#

dim oracle Just to make sure, `^0.8.0-dev.1679789487-b8b852e.0` would be the latest right

sounds correct

#

why did we bump major again Thonk

#

eh w/e

#

yes, its correct

dim oracle Mar 26, 2023, 8:21 PM

#

Will update prod somewhere tomorrow

stable hatch Mar 26, 2023, 8:27 PM

#

@dusty dove so this is fun...

#

{
  message: 'Connecting to ws://localhost:8080?v=10&encoding=json',
  shardId: 0
}
{ message: 'Waiting for event hello for 60000ms', shardId: 0 }
[WSS] Connected
Exception in PromiseRejectCallback:
file:///Users/vlad/Development/Discord/discord.js/node_modules/@vladfrangu/async_event_emitter/dist/index.mjs:305
      });
      ^

RangeError: Maximum call stack size exceeded
node:events:958
      reject(new AbortError(undefined, { cause: signal?.reason }));
             ^

AbortError: The operation was aborted
    at EventTarget.abortListener (node:events:958:14)
    at [nodejs.internal.kHybridDispatch] (node:internal/event_target:735:20)
    at EventTarget.dispatchEvent (node:internal/event_target:677:26)
    at abortSignal (node:internal/abort_controller:308:10)
    at AbortController.abort (node:internal/abort_controller:338:5)
    at Timeout.<anonymous> (/Users/vlad/Development/Discord/discord.js/packages/ws/src/ws/WebSocketShard.ts:258:65)
    at listOnTimeout (node:internal/timers:569:17)
    at process.processTimers (node:internal/timers:512:7) {
  code: 'ABORT_ERR',
  [cause]: DOMException [AbortError]: This operation was aborted
      at new DOMException (node:internal/per_context/domexception:53:5)
      at AbortController.abort (node:internal/abort_controller:336:18)
      at Timeout.<anonymous> (/Users/vlad/Development/Discord/discord.js/packages/ws/src/ws/WebSocketShard.ts:258:65)
      at listOnTimeout (node:internal/timers:569:17)
      at process.processTimers (node:internal/timers:512:7)
}
Node.js v18.14.0```

#

i.. dont know why or how theres a rangeerror from async ee

dusty dove Mar 26, 2023, 8:30 PM

#

this looks like not my problem

#

kek

stable hatch Mar 26, 2023, 8:30 PM

#

well

#

the process crashed

#

you said it should resume

#

dusty dove Mar 26, 2023, 8:31 PM

#

so true, how could i forget to catch the stack overflow error

stable hatch Mar 26, 2023, 8:32 PM

#

what in the name of sweet lord

#

I logged the error that kept getting thrown

#

AEE ERROR Error: Unhandled 'error' event emitted, received [object Object]
    at WebSocketManager.emit (file:///Users/vlad/Development/Discord/discord.js/node_modules/@vladfrangu/async_event_emitter/dist/index.mjs:213:19)
    at WebSocketShard.<anonymous> (file:///Users/vlad/Development/Discord/discord.js/packages/ws/dist/index.mjs:977:51)
    at file:///Users/vlad/Development/Discord/discord.js/node_modules/@vladfrangu/async_event_emitter/dist/index.mjs:297:34
    at new Promise (<anonymous>)
    at Object.wrappedFn [as wrappedFunc] (file:///Users/vlad/Development/Discord/discord.js/node_modules/@vladfrangu/async_event_emitter/dist/index.mjs:292:23)
    at WebSocketShard.emit (file:///Users/vlad/Development/Discord/discord.js/node_modules/@vladfrangu/async_event_emitter/dist/index.mjs:224:25)
    at file:///Users/vlad/Development/Discord/discord.js/node_modules/@vladfrangu/async_event_emitter/dist/index.mjs:301:30
    at new Promise (<anonymous>)
    at Object.wrappedFn [as wrappedFunc] (file:///Users/vlad/Development/Discord/discord.js/node_modules/@vladfrangu/async_event_emitter/dist/index.mjs:292:23)
    at WebSocketShard.emit (file:///Users/vlad/Development/Discord/discord.js/node_modules/@vladfrangu/async_event_emitter/dist/index.mjs:224:25) {
  context: { context: { context: [Object], shardId: 0 }, shardId: 0 }
}```

#

I'll have to take a look at that

#

that might be my bad

#

dead

#

ok so that aside, @dusty dove didnt u say that uhhh...if the hello times out it should retry?

#

bc it sure doesnt do that

#

It DOES work if the conn is closed

#

but not if it times out

dusty dove Mar 26, 2023, 8:46 PM

#

stable hatch ok so that aside, <@223703707118731264> didnt u say that uhhh...if the hello tim...

oh no sorry

#

this is only if its not the initial connect, actually

#

im not sure why its that way

#

anymore

#

but it is on purpose

stable hatch Mar 26, 2023, 8:46 PM

#

meguFace

#

sounds..stupid

dusty dove Mar 26, 2023, 8:47 PM

#

maybe, i think its this way bcus like

stable hatch Mar 26, 2023, 8:47 PM

#

First off how are you sure its only on initial connect

#

secondly it shouldn't even be like this ever

dusty dove Mar 26, 2023, 8:47 PM

#

the idea is that connect only resolves once its ready right

stable hatch Mar 26, 2023, 8:47 PM

#

hello timeouts can happen bc internet is just dead

dusty dove Mar 26, 2023, 8:47 PM

#

and to accomplish that itd just

#

recurse down on every failure

#

so in practice the retry logic would need to be handled outside the shard class

stable hatch Mar 26, 2023, 8:48 PM

#

Right, then explain why connect fails on waiting for hello but not when conn dies

#

meguFace

dusty dove Mar 26, 2023, 8:48 PM

#

stable hatch First off how are you sure its only on initial connect

the field literally being set and checked just for this

dusty dove Mar 26, 2023, 8:48 PM

#

stable hatch Right, then explain why connect fails on waiting for hello but not when conn die...

elaborate

#

what does "conn dies" mean

stable hatch Mar 26, 2023, 8:50 PM

#

Consider a wss local server
Consider ws = the connection to the wss

When the manager spawns the shard that connects to the local wss

if the timeout is reached, process exits with the abort error
if the ws is closed in the wss, it tries to reconnect constantly, as it should

dusty dove Mar 26, 2023, 8:50 PM

#

mmmh

#

that actually sounds bad

stable hatch Mar 26, 2023, 8:50 PM

#

oh god I've found bugs in AEE too brb crying

dusty dove Mar 26, 2023, 8:51 PM

#

does the promise ever resolve in that latter case

#

i have a vague feeling it doesnt

stable hatch Mar 26, 2023, 8:51 PM

#

dusty dove does the promise ever resolve in that latter case

I mean hard to test that locally

dusty dove Mar 26, 2023, 8:51 PM

#

the more i think about this the more absurd i realize it is to make it so connect is catchable

#

so many edge cases

#

its why all the error handling is so dank

stable hatch Mar 26, 2023, 8:53 PM

#

HAH

#

uhhh

#

Check dms

dusty dove Mar 26, 2023, 8:53 PM

#

oh i just had an epiphany @stable hatch

#

i know how to fix all of this

stable hatch Mar 26, 2023, 8:53 PM

#

i broke shit so hard

#

check dms

#

KEKW

dusty dove Mar 26, 2023, 8:54 PM

#

what version is this

stable hatch Mar 26, 2023, 8:54 PM

#

latest main

#

and uh

#

ok so tbf it could be my test script

#

B u t uhhhhh it like breaks breaks

dusty dove Mar 26, 2023, 8:55 PM

#

either way

#

heres what ill do

stable hatch Mar 26, 2023, 8:55 PM

#

handle this not at midnight

#

meguFace

dusty dove Mar 26, 2023, 8:55 PM

#

ill scrap the thing that makes connect throw if things timeout during the initial connects

#

anddd i also know a way to still guarantee it only resolves on ready

dusty dove Mar 27, 2023, 2:37 PM

#

@dim oracle I tracked down a new bug related to what you ran into a couple of days ago

#

it seems waitForEvent calls were never cancelled by the WS shard closing regularly

#

I just did a massive refactor to address this and some of the janker error handling

dim oracle Mar 27, 2023, 2:39 PM

#

hmm, prod is currently on the dev release I mentioned yesterday

dusty dove Mar 27, 2023, 2:40 PM

#

yeah nws

#

I haven't even opened the PR

#

vlad has been toying with things and edge cases using a fancy script

#

import { WebSocketManager } from '@discordjs/ws';
import { REST } from '@discordjs/rest';
import { WebSocketServer } from 'ws';

let initial = true;
const wss = new WebSocketServer({ port: 8080 });

wss.on('connection', (ws) => {
    console.log('[WSS] Connected');

    ws.on('close', () => {
        console.log('[WSS] Disconnected');
        initial = false;
    });

    ws.close();
});

const rest = new REST({}).setToken('');

const manager = new WebSocketManager({
    intents: 0,
    rest,
    token: '',
    retrieveSessionInfo(shardId) {
        if (initial) {
            return {
                shardId,
                shardCount: 1,
                sequence: 1337,
                resumeURL: 'ws://localhost:8080',
                sessionId: 'owo',
            };
        }
        return null;
    },
    shardCount: 1,
    shardIds: [0],
    helloTimeout: 10000,
});

manager.on('debug', console.log);
manager.on('heartbeat', console.log);
manager.on('ready', console.log);

await manager.connect();
console.log('Connected');```

#

we hijack the WS server it connects to using the resumeURL, lol

#

to test some weirder stuff like if it insta closes

dim oracle Mar 27, 2023, 2:41 PM

#

interesting

dusty dove Mar 27, 2023, 2:41 PM

#

@stable hatch you around to mess w my branch

#

im pushing now

stable hatch Mar 27, 2023, 2:41 PM

#

dusty dove vlad has been toying with things and edge cases using a fancy script

if thats fancy I should be paid 7 digits

dusty dove Mar 27, 2023, 2:41 PM

#

lmao

stable hatch Mar 27, 2023, 2:43 PM

#

dusty dove <@139836912335716352> you around to mess w my branch

haven't touched ur branch btw

#

but i mean

dusty dove Mar 27, 2023, 2:43 PM

#

the zlib one?

#

dw this is probs more important

stable hatch Mar 27, 2023, 2:43 PM

#

ye

#

aite

stable hatch Mar 27, 2023, 2:43 PM

#

dusty dove ```ts import { WebSocketManager } from '@discordjs/ws'; import { REST } from '@d...

can u try this without ws.close too

dusty dove Mar 27, 2023, 2:43 PM

#

oh yea

#

sure

#

like, just let it time out?

stable hatch Mar 27, 2023, 2:43 PM

#

yes

dusty dove Mar 27, 2023, 2:44 PM

#

worked

stable hatch Mar 27, 2023, 2:44 PM

#

i'm more interested in the behavior in that condition

dusty dove Mar 27, 2023, 2:44 PM

#

mostly

stable hatch Mar 27, 2023, 2:44 PM

#

Deadge

dusty dove Mar 27, 2023, 2:44 PM

#

just a range error from AEE

#

lmao

#

➜ node --enable-source-maps vlad.mjs
{
  message: 'Connecting to ws://localhost:8080?v=10&encoding=json',
  shardId: 0
}
{ message: 'Waiting for event hello for 10000ms', shardId: 0 }
[WSS] Connected
Exception in PromiseRejectCallback:
file:///home/didinele/Documents/Code/didinele/discord.js/node_modules/@vladfrangu/async_event_emitter/dist/index.mjs:308
    }, "wrappedFn");
    ^

RangeError: Maximum call stack size exceeded

Exception in PromiseRejectCallback:
file:///home/didinele/Documents/Code/didinele/discord.js/node_modules/@vladfrangu/async_event_emitter/dist/index.mjs:308
    }, "wrappedFn");
    ^

RangeError: Maximum call stack size exceeded

Exception in PromiseRejectCallback:
file:///home/didinele/Documents/Code/didinele/discord.js/node_modules/@vladfrangu/async_event_emitter/dist/index.mjs:308
    }, "wrappedFn");
    ^

RangeError: Maximum call stack size exceeded

{
  message: 'Destroying shard\n' +
    '\tReason: Something timed out or went wrong while waiting for an event\n' +
    '\tCode: 1000\n' +
    '\tRecover: Reconnect',
  shardId: 0
}
{
  message: 'Connection status during destroy\n\tNeeds closing: true\n\tReady state: 1',
  shardId: 0
}
[WSS] Disconnected
{
  message: 'Connecting to wss://gateway.discord.gg?v=10&encoding=json',
  shardId: 0
}
{ message: 'Waiting for event hello for 10000ms', shardId: 0 }
{
  message: 'Preparing first heartbeat of the connection with a jitter of 0.6635785282947928; waiting 27372ms',
  shardId: 0
}
{ message: 'Waiting for identify throttle', shardId: 0 }
{
  message: 'Identifying\n\tshard id: 0\n\tshard count: 1\n\tintents: 0\n\tcompression: none',
  shardId: 0
}
{ message: 'Waiting for event ready for 15000ms', shardId: 0 }
{
  data: {
    v: 10,
    user_settings: {},
    user: {
      verified: true,
     ----------- [snip] ----------
  },
  shardId: 0
}
Connected```

#

huh, verified: true

#

what bot token have I been using

#

kek

stable hatch Mar 27, 2023, 2:45 PM

#

dusty dove huh, `verified: true`

All bots have this

dusty dove Mar 27, 2023, 2:45 PM

#

do they

stable hatch Mar 27, 2023, 2:45 PM

#

Its not the verified bot flag

dusty dove Mar 27, 2023, 2:45 PM

#

oh its email verified

#

lol

#

yes that makes a lot of sense

stable hatch Mar 27, 2023, 2:46 PM

#

dusty dove ``` ➜ node --enable-source-maps vlad.mjs { message: 'Connecting to ws://localh...

Oh it did work

#

Past the aee errors

#

Nice

gloomy skyBOT Mar 27, 2023, 2:47 PM

#

pr_draft #9282 in discordjs/discord.js by didinele created <t:1679928436:R> (review required)
refactor(WebSocketShard): waitForEvent and its error handling

dusty dove Mar 27, 2023, 2:47 PM

#

have fun

stable hatch Mar 27, 2023, 2:47 PM

#

I sure love getting 5 notifications whenever a pr is open

dusty dove Mar 27, 2023, 2:48 PM

#

this should really iron things out

dim oracle Mar 27, 2023, 2:50 PM

#

watch there be many edge cases after this anyways

stable hatch Mar 27, 2023, 2:50 PM

#

dim oracle watch there be many edge cases after this anyways

Shhh

dusty dove Mar 27, 2023, 2:50 PM

#

well of course

#

but i've addressed a p fundamental flaw

#

lmao

stable hatch Mar 27, 2023, 2:50 PM

#

We're just glad you're reporting these and that, overall, ws has been more stable than djs ws

dusty dove Mar 27, 2023, 2:51 PM

#

yeah honestly

#

its been better out the gate

#

any version past 0.3

stable hatch Mar 27, 2023, 2:51 PM

#

Gives me more confidence for 14.10

dusty dove Mar 27, 2023, 2:51 PM

#

that didn't have the send bug that caused all shards to eventually reconn loop

stable hatch Mar 27, 2023, 2:51 PM

#

dim oracle Mar 27, 2023, 2:53 PM

#

Running debug logs for a week straight learned me I need to set a size limit for the logfile

stable hatch Mar 27, 2023, 3:46 PM

#

@dusty dove heres the thing with your emitting of error events in waitForEvent

#

it will ALWAYS throw the error

#

because just like in node, emitting an error event when theres no error listeners will throw

#

so the destroy call will never happen

#

and same with the return

dusty dove Mar 27, 2023, 3:47 PM

#

wait what

#

I thought the throw was async on the next tick or something

#

not on the .emit call

stable hatch Mar 27, 2023, 3:47 PM

#

its on the emit call

dusty dove Mar 27, 2023, 3:47 PM

#

at least that's how native EE behaves IME

#

huh really

#

I must be miss-remembering

stable hatch Mar 27, 2023, 3:47 PM

#

no, native ee also does it

#

no listeners on error event = throw error

dusty dove Mar 27, 2023, 3:48 PM

#

what's interesting is if you run simple strategy there'll always be a bound error event anyway

#

so the throw ends up coming from the manager

dusty dove Mar 27, 2023, 3:49 PM

#

stable hatch <@223703707118731264> heres the thing with your emitting of error events in wait...

I guess I just get rid of it?

stable hatch Mar 27, 2023, 3:49 PM

#

i mean u only emit it for abort errors

#

which

#

as i've said before

dusty dove Mar 27, 2023, 3:49 PM

#

yeah, was just a consistency thing

stable hatch Mar 27, 2023, 3:49 PM

#

is kinda useless

dusty dove Mar 27, 2023, 3:50 PM

#

fair

stable hatch Mar 27, 2023, 3:50 PM

#

I mean an abort error emitted in error events is kinda...useless?

#

especially since the method can reject instead

dusty dove Mar 27, 2023, 3:50 PM

#

ye, done

#

either way

#

I like this solution

#

I'm finally actually pleased with waitForEvent

sullen snow Mar 28, 2023, 11:42 AM

#

    at [kNewListener] (node:internal/event_target:514:17)
    at eventEmitter.<computed> (node:internal/worker/io:307:12)
    at MessagePort.addEventListener (node:internal/event_target:623:23)
    at MessagePort.on (node:internal/event_target:873:10)
    at VanguardBootstrap.setupThreadEvents (/main/node_modules/@discordjs/ws/src/utils/WorkerBootstrapper.ts:83:5)
    at VanguardBootstrap.bootstrap (/main/node_modules/vanguard/src/worker/VanguardBootstrap.ts:51:14)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) ```
@dusty dove there is a memory leak issue from event emitter on thread ws

#

not sure how it started, but I doubt this is from my code

dusty dove Mar 28, 2023, 11:49 AM

#

sullen snow ```(node:41) MaxListenersExceededWarning: Possible EventTarget memory leak detec...

can u show me ur worker src

sullen snow Mar 28, 2023, 11:49 AM

#

dusty dove can u show me ur worker src

https://github.com/Deivu/Vanguard/blob/master/src/worker/VanguardBootstrap.ts

GitHub

Vanguard/VanguardBootstrap.ts at master · Deivu/Vanguard

A drop in replacement for Discord.JS v14 websocket - Vanguard/VanguardBootstrap.ts at master · Deivu/Vanguard

#

there are some uneeded code there that I dont use like the extendederrordata since d.js support it already

dusty dove Mar 28, 2023, 11:52 AM

#

that implies the setup method is called a bunch of times? hm

sullen snow Mar 28, 2023, 11:52 AM

#

yes

#

not sure why as well

#

you can probably prepatch it by just checking if the method was initialized once, but then again that would mean this would be a patch fix rather than fixing it from the base

dusty dove Mar 28, 2023, 11:53 AM

#

id rather you hacked in a trace that figures out where its being called from

sullen snow Mar 28, 2023, 11:53 AM

#

not sure how I can make a trace for that

dusty dove Mar 28, 2023, 11:54 AM

#

like so:

const err = new Error();
Error.captureStackTrace(err);

console.log(err);```

sullen snow Mar 28, 2023, 11:54 AM

#

dusty dove Mar 28, 2023, 11:54 AM

#

used this trick a bunch to debug shard

sullen snow Mar 28, 2023, 11:54 AM

#

is trace warnings not complete in this regard?

dusty dove Mar 28, 2023, 11:54 AM

#

doesnt seem to be, since it only goes to bootstrap

#

but i guess an error trace wont help more either then

#

idk, if setup is only called in bootstrap that'd imply bootstrap is called multiple times

#

wait

#

is that the only warning u got

#

setup makes multiple listeners

#

u shouldve gotten one for each event i feel

#

ohhh wait

#

@sullen snow whats your shardsPerWorker

#

this might actually just not be a leak

#

because https://github.com/discordjs/discord.js/blob/main/packages/ws/src/strategies/context/WorkerContextFetchingStrategy.ts#L22

#

we do actually just bind that many listeners

#

one per shard

sullen snow Mar 28, 2023, 11:58 AM

#

dusty dove <@325231623262044162> whats your shardsPerWorker

1

dusty dove Mar 28, 2023, 11:58 AM

#

nvm then

#

lol

sullen snow Mar 28, 2023, 11:58 AM

#

thats why theoretically it should not emit

dusty dove Mar 28, 2023, 11:59 AM

#

does this repro every startup

sullen snow Mar 28, 2023, 11:59 AM

#

nope usually it emits after some time

dim oracle Mar 28, 2023, 11:59 AM

#

ehh this does occur for every cluster we spawn

#

so if we spawn 60 clusters during startup we get the message 60 times

dusty dove Mar 28, 2023, 12:00 PM

#

so like.. only after some time huh

#

v odd

sullen snow Mar 28, 2023, 12:00 PM

#

oh

dim oracle Mar 28, 2023, 12:00 PM

#

nah directly after we spawn the cluster iirc

sullen snow Mar 28, 2023, 12:00 PM

#

my bad then

#

CheshireXD

dim oracle Mar 28, 2023, 12:01 PM

#

but only past a certain amount of shards per cluster, so e.g. for like 2 shards per cluster we don't get the message

#

did some testing this morning, didn't mention that to saya yet ^^

dusty dove Mar 28, 2023, 12:06 PM

#

but for 3 you do?

dim oracle Mar 28, 2023, 12:06 PM

#

didn't test at what point we got the message, can do in a bit though

dusty dove Mar 28, 2023, 12:07 PM

#

ye would be helpful

dim oracle Mar 28, 2023, 12:36 PM

#

dusty dove ye would be helpful

So we get it at 10 shards per cluster, anything below that does not trigger it

#

And we get it instantly after we launch the cluster

dusty dove Mar 28, 2023, 12:45 PM

#

dim oracle So we get it at `10` shards per cluster, anything below that does not trigger it

lol ok then its not a leak

dusty dove Mar 28, 2023, 12:45 PM

#

dusty dove because <https://github.com/discordjs/discord.js/blob/main/packages/ws/src/strat...

^

dim oracle Mar 28, 2023, 12:47 PM

#

cc @sullen snow I guess

sullen snow Mar 28, 2023, 12:48 PM

#

we usually run around

#

1 shard per worker

#

@dim oracle are you changing how many shards we run

#

worker !== cluster

#

@dusty dove we run the same amount of threads for our websocket

#

so basically if we run 32 shards

#

thats 32 threads

dim oracle Mar 28, 2023, 12:49 PM

#

Used to be 32 per cluster, now its 24 per cluster to match the core count of the dedi

dusty dove Mar 28, 2023, 12:49 PM

#

make up your minds derpsnail

sullen snow Mar 28, 2023, 12:49 PM

#

probably its just confusing but

#

the structure is like this

#

master process -> cluster process -> thread for websocket

#

where we run 24 threads in that cluster

#

each thread handles 1 websocket

#

if you ask me why I do that is because I want each ws thread to have its dedicated event loop, and its not that expensive to spawn them once, its not like its being spawned everytime

#

this way heartbeats is as accurate as it can be

#discordjs/ws big bot memes (old)