My bot is at a point where it's too big for its own good and im on the verge of shutting it down. Bot works completely fine while testing locally, soon as it goes to prod it breaks and its because of my caching/presence event being blasted a million times a second. I was testing on prod yesterday and got hit with the "you are out of shards for the day, resets at blah blah"
#Anyone know of any tools to test a bot at scale?
1 messages ยท Page 1 of 1 (latest)
So you want to just send a fuck ton of events locally right?
Hmm, could call the state.py method with example payloads
how would i go about doing that tho and at scale
Can you find out roughly how many events your bot is getting per cluster?
i can try when my discord decides to reset my shards
latest count of just pure members, was around 11 million
Christ that'd do it
Try reset your token
might work, not sure if it does anymore
Do you know exactly what the issue is? Cos id suggest make a couple example payloads and just throw some random ids in em and asyncio gather them to your state.py event method
sooo i just realized im making a db call, caching the data all in the presence update event.
it would prob help if i cache all the data before hand then enabled the event
cause after about 3 mins of the event being active it nukes itself.
Hahaha yes
oh shoot forgot about this, i believe it does work.
Orrrrr I have something for you to reduce load, but first. How up to date do you need presence data?
Is there a difference between having it within 30s of starting and 5mins
Like I spose slower boot but being fine would also be nice right
soo pretty much anytime someone adds a specific custom status to their status the bot adds a role to them or removes it if they remove it.
nah startup idc it can take however long it needs to tbh
Then something like this maybe?
Whenever you get a presence event come in your throw it into a asyncio.Queue and call it done.
Then somewhere else you define a method to handle a presence event with your logic n caching and whatever. Then when you start your bot you simply run a few of these methods as asyncio.Task's in the background forever consuming from the queue
It'd take some playing to figure out how many consumers to have at once without causing a massive bottleneck / backpressure issues but it could be a nice way to introduce managed processing to something which currently seems like it struggles to handle the bursts of events from things like startup
I'm not 100% sure itd work but based on how you described your issue its what Id guess at
what exactly would i put into the queue itself
cause my presence event looks like
async def on_presence_update(before: disnake.Member, after: disnake.Member):
settings = await get_or_create_settings(after.guild)
_embed = await get_or_create_embed(before.guild)
if settings is None:
return
role1 = after.guild.get_role(settings.role1)
role2 = after.guild.get_role(settings.role2) if settings.role2 else None
if role1 is None:
return
if not after.guild.me.guild_permissions.manage_roles:
return
if settings_should_skip(after, settings):
return
if should_handle_vanity(after, settings):
return await handle_vanity(before, after, settings, role1, role2, _embed)
if should_handle_removed_vanity(after, settings):
await handle_removed_vanity(before, after, settings, role1, role2, _embed)
so i would assume implementing the queue wouldn't be too bad
queue.put_nowait((before, after))
Then just move everything you currently have to a method which runs as a task with the information being fetched from the queue
data = await queue.get()
while True:
# Your stuff here
queue.task_done()
data = await queue.get()
how often should i run the task
You could also refactoryour code so that the if ... return statements are right at the start to avoid un-needed processing in some cases
The task would run forever
You start X tasks that all handle shit from the queue for the lifetime of your bot
yeee definitely needed
You'll need multiple consumers as well likely, otherwise the queue will forever grow and never empty. But that'll take some tinkering
I.e. (average queue insertions per X) * (Average runtime of your method) + 1 or some shit like tha
Could be worth a shot
yeee im giving it a go rn.
i learned python an unconventional way so this is somewhat new to me lmao.
this, where would i start/create the task that calls the function for all the q stuff?
I do something like this https://github.com/suggestionsbot/suggestions-bot/blob/master/suggestions/bot.py#L578
https://github.com/suggestionsbot/suggestions-bot/blob/master/main.py#L57
But give me a bit and ill make something for oyu
suggestions/bot.py line 578
async def load(self):
main.py line 57
await bot.load()
async def on_presence_update(before: disnake.Member, after: disnake.Member):
queue.put_nowait((before, after))
async def process():
while True:
before, after = await queue.get()
queue.task_done()
if not after.guild.me.guild_permissions.manage_roles:
continue
if settings_should_skip(after, settings):
continue
settings = await get_or_create_settings(after.guild)
_embed = await get_or_create_embed(before.guild)
if settings is None:
continue
role1 = after.guild.get_role(settings.role1)
role2 = after.guild.get_role(settings.role2) if settings.role2 else None
if role1 is None:
continue
if should_handle_vanity(after, settings):
return await handle_vanity(before, after, settings, role1, role2, _embed)
if should_handle_removed_vanity(after, settings):
await handle_removed_vanity(before, after, settings, role1, role2, _embed)
async def load():
asyncio.create_task(process())
asyncio.create_task(process())
asyncio.create_task(process())
asyncio.create_task(process())
asyncio.create_task(process())
Something like that
Otherwise go profile your prod bot and figure out exactly what your bots dying from and what needs speeding up ๐
Prolly that tho
I assume you also already have speedups installed?
Yea give it a go, see what happens
You should also log the queue size every so often to see if your keeping up or need to start more tasks to consume from it
Should help alleivate the pressure from burst updates
Other then that you likely still will have othe rissues from stuff
with this, beforehand i was
@plugin.load_hook(post=True)
async def wait_until_ready():
await plugin.bot.wait_until_ready()
#some other shit
await asyncio.sleep(120)
plugin.bot.add_listener(on_presence_update, "on_presence_update")
should i still do or let it rip now that it's using queues.
Shouldddd be fine to let it rip I reckon
Essentially your workers will consume at a constant rate, and as long as that rate is higher then the input rate she'll be all good and it should handle bursts a lot better because the only performance issue will be that they end up in the queue. And given you don't need it to be realtime it can work on it as it pleases
That's the theory atleast
while i have you here, i noticed you made a post about this before but thing is im not using a proxy lmao
legit spamming
unlit it finally connects it
im assuming its because all my shards within the clusters are all trying to connect at once?
ima let the bot run for like an hour and see what happens, right now im not getting an logging from the presence event. but all cmds work fine, the functions within the presence event aren't being called so we will see what happens.
Yea idk tbh without looking into it further I cant remember
๐
safe to say need more workers? lmao
thats just one cluster also
oh my god
How many do you have lmao
May uh, may want more yes
If you want lil more complicated you could have a set amount of 'forever' workers and then scale as appropriate lol
Can you try do some maths to find out how many events are added per second? And how long it takes for one loop to complete
Like im not surprised its climbing lol
well i had 5 bumped it to 15, bot was working fine in my server, queue was down to 0, but some servers werent working at all(as in roles not being added for custom status)? changed some things reset bot now its not working but queue is at 0?
im so lost because all cmds work fine
but anything inside the presence update event is a hit or miss
I mean tasks suppress errors so maybe thats it
do i dare enable debug in logger
wtf
why is it dispatching events already?
this is the second i start the bot it floods logs
and it shouldn't be
Try deleting pycache
@plugin.load_hook(post=True)
async def wait_until_ready():
await plugin.bot.wait_until_ready()
#bunch of junk
logging.info("Cached guilds. Starting presence updates. Took %.2fs", time.monotonic() - started_at)
plugin.bot.add_listener(on_presence_update, "on_presence_update")
the listener gets added here
[2024-03-04 09:06:29,483] DEBUG [disnake.client.dispatch:750] Dispatching event socket_event_type
[2024-03-04 09:06:29,483] DEBUG [disnake.client.dispatch:750] Dispatching event raw_presence_update
[2024-03-04 09:06:29,484] DEBUG [disnake.client.dispatch:750] Dispatching event presence_update
[2024-03-04 09:06:29,489] DEBUG [disnake.gateway.received_message:553] For Shard ID 43: WebSocket Event: {'t': 'PRESENCE_UPDATE', 's': 5129, 'op': 0, 'd': {'user': {'id': '1129948038643843084'}, 'status': 'dnd', 'guild_id': '1071267739580252170', 'client_status': {'desktop': 'dnd'}, 'broadcast': None, 'activities': [{'type': 0, 'timestamps': {'start': 1709536768000}, 'state': ' Speeding on Pillbox Hill', 'session_id': '9fa180939190229eadcf98c578d6a896', 'name': 'TPLA', 'id': '22a55f840028e879', 'details': 'Players: 164/260 | Queue: 0 Players', 'created_at': 1709543189426, 'buttons': ['Tebex', 'Discord'], 'assets': {'large_text': 'TPLA', 'large_image': '1178868371194904627'}, 'application_id': '1144568003195850803'}]}}
[2024-03-04 09:06:29,489] DEBUG [disnake.client.dispatch:750] Dispatching event socket_event_type
[2024-03-04 09:06:29,489] DEBUG [disnake.client.dispatch:750] Dispatching event raw_presence_update
[2024-03-04 09:06:29,490] DEBUG [disnake.client.dispatch:750] Dispatching event presence_update
[2024-03-04 09:06:29,490] DEBUG [disnake.gateway.received_message:553] For Shard ID 43: WebSocket Event: {'t': 'PRESENCE_UPDATE', 's': 5130, 'op': 0, 'd': {'user': {'id': '743956688926801960'}, 'status': 'dnd', 'guild_id': '778438158605615115', 'client_status': {'desktop': 'dnd'}, 'broadcast': None, 'activities': [{'type': 0, 'timestamps': {'start': 1709542604000}, 'state': 'In A Squad', 'party': {'size': [4, 4], 'id': '549744c92c72032bbd5fd4fedab33f6a'}, 'name': 'Fortnite', 'id': 'baa5df061b353164', 'details': 'Battle Royale - 25 Remaining', 'created_at': 1709543189444, 'assets': {'small_text': 'Tier 100', 'small_image': '443127519386927104'}, 'application_id': '432980957394370572'}, {'type': 0, 'session_id': 'd0f00b9e0f7910626de37c1d622eddde', 'name': 'Rainbow Six Siege', 'id': '9ba7c6776a719ec4', 'flags': 1, 'details': 'in MENU', 'created_at': 1709539055002, 'assets': {'large_image': '446301881636225042'}, 'application_id': '445956193924546560'}]}}
[2024-03-04 09:06:29,490] DEBUG [disnake.client.dispatch:750] Dispatching event socket_event_type
[2024-03-04 09:06:29,490] DEBUG [disnake.client.dispatch:750] Dispatching event raw_presence_update
[2024-03-04 09:06:29,492] DEBUG [disnake.client.dispatch:750] Dispatching event presence_update
[2024-03-04 09:06:29,539] DEBUG [disnake.gateway.received_message:553] For Shard ID 43: WebSocket Event: {'t': 'PRESENCE_UPDATE', 's': 5131, 'op': 0, 'd': {'user': {'id': '1069411509698035713'}, 'status': 'online', 'guild_id': '1058361754016546878', 'client_status': {'web': 'online'}, 'broadcast': None, 'activities': [{'type': 4, 'state': 'discord.gg/member-service', 'name': 'Custom Status', 'id': 'custom', 'created_at': 1709543189459}]}}
[2024-03-04 09:06:29,540] DEBUG [disnake.client.dispatch:750] Dispatching event socket_event_type
[2024-03-04 09:06:29,540] DEBUG [disnake.client.dispatch:750] Dispatching event raw_presence_update
[2024-03-04 09:06:29,540] DEBUG [disnake.client.dispatch:750] Dispatching event presence_update
There are default listeners for things iirc
What does this aim to solve?
so the
[2024-03-04 09:06:29,540] DEBUG [disnake.client.dispatch:750] Dispatching event raw_presence_update
[2024-03-04 09:06:29,540] DEBUG [disnake.client.dispatch:750] Dispatching event presence_update
are just fired soon as the bot starts?
cause its blowing up logs lol
disnake/state.py line 971
def parse_presence_update(self, data: gateway.PresenceUpdateEvent) -> None:
disnake/client.py line 749
def dispatch(self, event: str, *args: Any, **kwargs: Any) -> None:
It will attempt to dispatch the event, and thus log, regardless of if you have listeners or not
Indeed
could i be getting rate limited?
addings roles/sending a message to a channel
like global?
Aight gl
appreciate the help, def gonna keep the q implmentation
Any idea what could cause the events to not work on some clusters but others? im using ur cluster/shard setup so all clusters are exact same code etc. but only some are giving roles/sending notification when a user changes their custom status and other clusters just don't at all. but all clusters, the commands/interacting with the bot works fine?
queue size on the non-working clusters is active, as it's constantly consuming the queue so the event/function itself is working.
figured it out. now it's just a matter of processing the queue faster cause look at this shit.
{
"cluster7": {
"size": 284302
},
"cluster5": {
"size": 337271
},
"cluster4": {
"size": 273814
},
"cluster9": {
"size": 252700
},
"cluster3": {
"size": 239151
},
"cluster6": {
"size": 274352
},
"cluster2": {
"size": 219467
},
"cluster8": {
"size": 268732
},
"cluster1": {
"size": 0
}
}
thats w 50 consumers
a cluster
and its just rapidly growing ๐
yeee so it boils done to consumers. i upped clusters and spawned 100 consumers and more clusters are working and most are 0 qsize but some are still constantly climbing. assuming its just more active users on some clusters than others
ive got alot of power soooo just a matter of optimizing it all
Yea, you could build something to auto scale consumers
If it's not keeping up, spawn more for a bit
That kinda vibe
yeee im just do sumn like
for i in range(plugin.bot.queue.qsize() / 2):
asyncio.create_task(process())
thats as far as my brain can math tbf
how would I cancel/delete em after the queue is brought down some
how shit is something like this?
plugin.bot.queue.put_nowait((before, after))
if plugin.bot.queue.qsize() > 5000:
for _ in range(plugin.bot.queue.qsize() // 2):
asyncio.create_task(process())
return
else:
loop = asyncio.get_event_loop()
tasks = list(asyncio.all_tasks(loop))
for t in tasks[100:]:
t.cancel()
Well the current ones, atleast as I provided are forever loops which means it won't work
You need a way to signal to the task to return for cancel to actually cancel
fuck me
wdym by this, like storing the task object i created then task.cancel like that?? or is it something like adding a check in the task itself to see if it's been requested to cancel?
Hmm actually given you await a lot you could likely get away with a simple check
async def load():
for i in range(size):
create_task(process(i, cancel_set))
queues.add(i)
...
async def process(_id, cancel: set[int]):
while True:
if _id in cancel:
return
... process stuff
And just add ids to cancel set when you want em to die on next loop
what's the queues.add(i) referring to? i understand the rest.
or is that just another set/list of tasks that's been created
Its just a list of all tasks
So if you wanna kill half the consumers, iterate over half the list
It could also be
queue: list[tuple[int, Task]]
queue.append((i, process(i, cancel))
...
i, task = queue.pop()
task.cancel()
cancel.add(i)
@old lagoon im ab to glaze ๐ but dude you have helped me so much from code help and your libs. my bot lowkey be fucked without your help. its all working now, and this actually gave me an idea for premium version of my bot too with the queue system.
Haha no worries, love solving this kinda stuff (software at scale)
Let me guess, it runs zonis and function cooldowns ๐
Possibly alaric as well if you use mongo as a backend