#Koyamies caching hell
318 messages · Page 1 of 1 (latest)
hi
so basically here are my cache sizes
Don't disable the PermissionOverwriteManager, there's a reason altering it is unsupported
yeah was joking sorry lmao
purging users & messages are the key ones for me
bans also seems sensible to nuke to 0
yeah I use bans cache a lot
Ah
hmm
thinking about it, it doesn't seems that huge for a 1.4M guilds bot
What's your total usage again?
453GB right now
I'm at 3.5gb w/ 14k, so that roughly scales up
but in fact my issue is more that since v13 update my ram just definitely climbs up hitting the max memory
and that, in few hours
instead of 1 week before
my bot could run for 2 weeks straight without memory issue (still was close to limit at some moments but not enough to kill database and so kill the bot)
What caches seem to be climbing?
i'll need more time so see which, I just added caches to my metrics
- I can notice some shards growing up in ram suddenly, average usage is 500MB and sometimes I see a shard using 2GB
like right now
Hey guys
what
made a quick graph so you see the difference
I have to reboot approx 2 times per day right now
it takes 8 hours to reach ram limit
it was taking ~9 days before v13 and was even working fine (see 14 days graph), it was just playing with limits but never at a point it would make the db crash (wich is the case right now)
there is definitely something wrong
this is past 2 days
this is prior to the real 13 release:
each line is a shard
now:
and it will keep increasing until death
here are my relevant client options: ```js
intents:
Intents.FLAGS.GUILDS |
Intents.FLAGS.GUILD_MEMBERS |
Intents.FLAGS.GUILD_BANS |
Intents.FLAGS.GUILD_EMOJIS_AND_STICKERS |
Intents.FLAGS.GUILD_MESSAGES |
Intents.FLAGS.GUILD_MESSAGE_REACTIONS |
Intents.FLAGS.DIRECT_MESSAGES,
makeCache: Options.cacheWithLimits({
...Options.defaultMakeCacheSettings,
ApplicationCommandManager: 0,
GuildInviteManager: 0,
GuildMemberManager: {
maxSize: 2_000,
sweepInterval: 900,
},
PresenceManager: 0,
StageInstanceManager: 0,
UserManager: {
maxSize: 2_000,
sweepInterval: 900,
},
VoiceStateManager: 0,
}),
this is interesting
im holding pretty steady now @ 14k guilds
looking back 56 days ago, so roughly 2 months, i wasn't much different @ 12k guilds
Koyamie, have you got any plots of collection sizes that correlate to that increase in mem usage?
This sounds super odd, smells of leak
Similar to koyamie, please log collection sizes so we can see if anything is growing rapidly that might be the cause
Fwiw, sweepInterval w/o a sweepFilter ain't gonna do anything
you're right, i thought it had a default filter for some reason
will do
not much, it just grows as bot gets new users/guild members in cache
guild members cache goes from 2M on boot to 6M when bot reach its limits
for users, 1M to 6M
heads up about this. I have made a workaround to have my shard running in "hybrid" mode or "clustering", this has (for now) fixed my memory issues. I'll continue monitoring but its in the right way
cluster mode on the shardingmanager?
why does the up arrow put a sticker in the message
stupid discord
yes, I have implemented something that seems to be working, I do 41x sharding managers that each spawns 16 shards
but doing that means I handle concurrency somewhere else, otherwise you can say hello to cf ban
I've got some news about this, using nodejs v16 seems to be the reason of increased memory usage.
After talking to another dev, testing his d.js v12 code on nodejs v16 is also significantly increasing his memory usage.
We cannot try d.js v13 code using nodejs v14 because v16 is required but I guess it could be possible if we just remove v16 features from v13?
We'll eventually have to change either way, sadly :/
It's not like we can stay in v12/v14 forever
ikr
I would say, as a temporary solution
on a fork
my question is more "is it technically possible with v13"
It's really not
@queen chasm did you measure the memory impact of having potentially hundreds of thousands of WeakRef's?
(One MessageManager for each TextBasedChannel)
whats the reason behind? do we use that much of v16 features that it would broke everything to stop using them?
Yeah, WeakRef is v16+
ah
alright thanks
We use it to cancel the sweep interval when it's removed from cache
I see
I only had abort controller in mind
I... think, maybe it was added in v12?
yes, but was working with a npm package
now its native
AbortController and ??= (used first time in the https.Agent PR) are the only v15+ features
Guides use v16+'s packages (timers/promises) to enormously simplify a lot of things
when did weakrefs get added
With SweptCollection (now renamed, I think), Ckohen added it after I suggested it
Oh, so as part of v13 original
is it a v13 feature? because as said d.js v12 was tested with node16 and the memory issue appears
Yeah, WeakRef was used first time in djs v13 for sweeping archived text channels
But the feature itself is available since Node.js v12 iirc
If this is reproduceable on v12 code, that might make it easier to compare two dumps?
Sorry for the confusion there 
I suppose you could check the memory usage before and after the SweptCollection, mind you it's also very outdated by now, and was long before the option helpers
Nah if this is happening on v12, sweptcollectiton isnt the issue
But the truth is, SweptCollection has a very large memory impact
There are sort of memory leaks in v12, namely when a channel's type changes between GUILD_TEXT and GUILD_NEWS, since a lot of messages would have a TextChannel instance instead of a NewsChannel one, but that happens very rarely as the feature is rarely used
Wait, where does sweptcollection (limitedcollection) use weakrefs/
It's in the manager
Uh, GitHub doesn't give results
bruh

yeah i see no weakrefs anywhere now
Koyamie, if you can repro this on v12 code, maybe do two heap dumps on diff node versions and see if anything stands out?
Mind you, that bind creates a new function for each manager, which has a large impact
It's not WeakRef, but it's the feature created alongside it
Hm yeah that could be part of the mem diff between v12 and v13
but it sounds like this is more just a node v14 vs v16 issue, if its also happening on v12
Let's wait for the results then
I personally didn't get a memory increase when going from node v14 to v16, although I did when going from djs v12 to v13, badly
My bot has always been sitting at 1860 MB RAM
d.js v12 with nodejs v16
d.js v12 with nodejs v14
more details
more details
Having the same issue, node 16 has a higher ram impact in comparision to node v14. Using 35% more ram.
I never actually used WeakRef
I saw that later
It's more about FinalizerRegistry having potentially hundreds of thousands of callbacks and stuff
(Which should in theory use more memory than Map itself, thus using more memory than it helps reducing)
I mean, an unreffed timeout I would hope just gets discarded during GC, but I never tested that after our conversations
It doesn't get discarded
It's just not waited for their completion when the Node.js process is awaiting for close
so it could be the issue?
No
17 is the newest nightly
Yeah lol, rip brain
wym?
Meister said "try on v18"
oh lmao, I see
node v14 | Same Code | 2.1 gb | v13 | 21 Shards | 41k Guilds
node v16.6 | Same Code | 3.5 gb | v13 | 21 Shards | 41k Guilds
I will send mines today, downgraded to v14 and booted tonight
Have you thought of making an issue in Node.js's repo?
Not a big difference here, my ram usage looks similar to node v16
Same, it's nearly the same for me
not the same code for me tho 
you have summarized the situation well, RAM usage is increasing very fast
Okay I have a thought that kaira sparked, that 12% RAM increase on v13 is probably caused by Finalization Registry.
Here is a list of things that may have impacted memory usage from v12 to v13:
- timeout caches in client
- removal of typing caches
+ application command managers
+ Guild bans managers
+ Guild Invite managers
+ Guild Sticker Managers
+ Stage Instance managers
+ Thread managers
+ Finalization Registry callbacks
The guild managers are small in comparison to channels, which have a MessageManager, most likely with a finalization registry callback
I bet the manager changes have negligible impact compared to that
It was just a list of the possible things, bans, invites, stickers, and especially stages probably don't contribute much
but they were thing s that weren't cached at all before
I'm trying to figure out if we can possibly figure out a way to not use finalization registry and still stop the interval. Cause in my testing I never even saw a mark and sweep happen, had to manually call it
i have the ApplicationCommandManager, GuildInviteManager, and StageInstanceManager limits set to 0
and I'm using the defaultMakeCacheSettings thing as well
so, +1 to this
oh, you can watch the memory consmption of Client#_cleanups (its a Set). A reference to every one of those bound functions is stored ther
Also wait, am I dumb, can we just WeakRef the reference to "this" in the setInterval, so the object gets GCed, then if the sweepFn no longer exists, cancel the interval. Get ird of all of the finalization registry stuff
(would be opt in behavior cause semver minor)
~~It's literally the example in the mdn docs
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WeakRef#examples~~ Ignore me, that would be if we stored references to the manager in the collection
You also forgot to include the removal of Client#_timeouts, Client#_intervals, and Client#_immediates, which decreased memory usage significantly as well
In the list you wrote earlier
At this point we're gonna need C++ Ckohen, lol, C++'s destructors have near to no memory impact
It's just a number that goes up by one when referenced, and down bu one when dereferenced
I wish 
Come think of it
In my first year of C++, my professor refused to teach us std::shared_ptr
So we had to ++ and -- the reference count manually

We... could achieve something similar
oh
oh
I see
yes
I like
Make Map#delete do entry.decreaseReferenceCount()
As well as Map#clear
(The latter becomes O(n) but... we need to do so to clear intervals either way)
but hold on, we don't clear the caches of managers that just disappear
we relied on GC for that
Uhmm
It's more for handling an edge case
aka the user calling clear() directly
oh, but also the interval is on the map, not the entry
Uhmmm
We can optional chain the decr ref call
An alternative is to just don't do ref counters
And instead, as soon as we stop referencing it from Client (aka its holder has been removed), we cancel its timeouts if any
but...how
Although that'd require Base#destroy
So when Guild#destroy is called, all channels, roles, members, etc, are destroyed
(obv we can fast-track some parts so they don't have to do a full scan)
hmmm, okay. We probably don't need destroy() on base, the majority of structures don't have managers on them right?
That's right
So entry.destroy?.(), perhaps?
yea
I was thinking of having it in base so it's a NOP
fair, abstract (but not quite)
Which is less expensive than a conditional
By fast-tracking some stuff I mean destroying each channel individually only when it has a timer
Same for message reactions
what about if the guild channel manager doesn't have an interval, but each channel has a thread manager with a timer on it
ugh, lol
okay, let me propose something rq
This would be so much better with -next's design
yes
TypeScript, global access to client via DI, and also the option to memoise client options
So we can resolve and globally reference a memoised dependency map that tells whether or not something should recursively destroy
Okay```js
let interval;
const maybeThis = WeakRef(this);
/**
* The id of the interval being used to sweep.
* @type {?Timeout}
*/
this.interval =
sweepInterval > 0 && sweepInterval !== Infinity && sweepFilter
? setInterval(() => {
const reffedThis = maybeThis.unref();
if (!reffedThis) {
clearInterval(interval);
return;
}
const sweepFn = reffedThis.sweepFilter(reffedThis);
if (sweepFn === null) return;
if (typeof sweepFn !== 'function') throw new TypeError('SWEEP_FILTER_RETURN');
reffedThis.sweep(sweepFn);
}, sweepInterval * 1000).unref()
: null;
interval = this.interval;
so this is on LimitedCollection
that way the collection (cache) itself can get GCed
and if it does, interval magically clears itself
Is WeakRef even invalidating with the interval being accessible somewhere in Node.js's internals, and having access to this in the scope?
I don't have a good way to test that, I have no idea if having those variables within the constructor makes the entire class considered refferenced
We have been over this before, although admittedly we didn't test because I didn't see the logic/sense if it did
Not really on that last bit
The GC wants to run fast
It doesn't have time for smart tricks
right
Ugh, if we could do something like ECS here, lol
Because in a way, Discord's structures are similar to the patterns used with ECS
Why can't they just give us a GC reference counter
There are two alternatives that pop up in my mind, @queen chasm
One is to make all caches global, so there's only 1 instance for all managers
Comes with the upside that cleaning them up is VERY fast, similar to ECS, and very reliable, also no need to do memory transversing
But the downside that we'd need to be careful when removing structures from cache, and *Remove events (such as guildRemove) would return empty caches unless ofc we do that in the following tick
The other is to do deep memory transversing just like we did with messages, but for all other caches
It doesn't need magic nor advanced features, no GC tricks, no more 2+ properties for each limited collection (admittedly, LimitedCollection takes more memory for me than it achieves to clear), no LimitedCollection at all (simpler cache, -1 inheritance level thus faster), is highly reactive to changes (if something becomes inaccessible from client, it's not swept)
Only downside is deep memory transversing being slow, but all of them have that issue anyways, just in 1 level instead of ~5 times nested
So, sweeping would then fall on the client?
We could also check for making dependency graphs so if a sweeper that depends on another would probably run together, so we save a lot of transverses
Yeah
It's how we did it for many years and always worked fine
You see, one of the things that annoy me the most, is that we're sweeping dead caches with the current system too
by dead you mean empty right
Because the GC takes up to 5 minutes since the cache removal to call the finalizer
oh, yeah
mark-sweep doesn't run often at all
So it's going to sweep a few times if it's below 5 mins
(For those who run sweepers more aggressively, they're customisable after all)
So, client#sweepers for deep transversal of every cache type
I know a lot of people who had their message sweep interval set at 2 minutes, so I wouldn't be surprised to see that case
Yeah
not a bad idea, how would we do global caches though, cause I like the sound of that, just don't see the way
global caches?
this thing
That's a separate idea
yeah
I wanted to hear more about that
I guess they're not technically mutually exclusive
It's basically client.caches.messages, client.caches.channels, client.caches.roles, etc
One of its upsides is that we don't need to rely on guild_id too
But local cache management is pain
oooh
and those caches would have the entire cache?
It's extremely complicated at the end of the day, unless we go @klasa/core's way and implement... ProxyCache
Which was basically a Cache (similar to djs Collection) but holding keys only
oh, so basically our datamanagers
except storing the keys so filter isn't slow, using get instead
And when you queried it, it'd check if the key existed in it and grab the entry from the cache it references to if it does
Yeah
global caches sound like a good idea for -next
@klasa/core uses ProxyCache for member.roles
I like client#sweepers as a replacement to LimitedCollection (do we keep it for max size? I assume yes)
And a few more places too
We keep it for maxSize, yes
luckily thats a super easy semver minor change, deprecate the sweep properties in Limited
and the deprecation on message sweeping still applies
Yeah
Cool, I'll add that to my todo list, hopefully to get in before 13.2
Hopefully that gets Koyamie's caching hell solved
v13 was supposed to use far less memory than v12
yeah
I could give a try removing my clustering thing to see after the changes if it is better or not, but in fact I think ill keep my workaround because having 656 processes is way too huge
@queen chasm are you going to write an issue regarding all the issues and ideas we discussed here?
I'd write one if it weren't for the fact I'm going to be phone only until late Sunday
I can try tomorrow, I don't know if I'll remember everything, I'll be scrolling here for a bit 
Otherwise I can write it on Sunday
how do make its graph?
datadog 🗿
never heard of it 😶🌫️
it's cool and easy to set up
but I'd prefer using graphana
@queen chasm @weak kraken writing that document now, I'll share it here before posting it in the repo's issue tracker so I can add your feedback on it. Should take 15-30 minutes to complete 
bet
Since you replied before, and ckohen will check later, @weak kraken ^
Enjoy the read, 7852 characters
I'd also like your input @autumn gazelle, since you have dealt with caches a lot
I like it
seems fine to me
Personally I vote for 2 (global sweepers) first as this is likely semver patch/minor, and then work to also do 1 (global cache) if folks feels its viable down the road
2 is semver minor as long as we stick to using the utility over LimitedCollection
Which is the most ideal solution as well, due to its 0 memory overhead
Added some notes
You can check https://gist.github.com/kyranet/fbb3a523844ffa80860faaacfff94531/revisions for the changes
Looks good. I'll note that we cannot prevent people from sweeping channels because threads are stored in the channel manager and guild channel manager, meaning they need to be swept if trying to sweep threads. I don't like that paradigm, but with global sweepers, we don't even need to do deep transversal for channel, simply allowing sweeping to happen ever so slightly differently on the channel manager, using the _remove method on ChannelManager which already does that for us.
_remove(id) {
const channel = this.cache.get(id);
channel?.guild?.channels.cache.delete(id);
channel?.parent?.threads?.cache.delete(id);
this.cache.delete(id);
}
Sorry, but I'm having trouble understanding your message in there
You say we can prevent sweeping channels, we can't
guilds yes
separate from that
The global sweepers (without the global caches) require us to write code in how they must transverse the structures to find the relevant objects
We just don't write the code for the channels, but for threads
we don't need to transverse for channels because we do have a global cache for all channels, with a helpful existing method
We do for messages, reactions, etc
yeah
It's the same system, Ckohen
you can't not write the code for channels if you want to sweep threads
it can be, but I don't think it should be, if you want to sweep threads, presumably you want to sweep them everywhere, which requires you to set sweeping for channels, guild channels, and threads, unless you are saying isolate threads specifically in the sweepers
Anything I should note in the doc? I don't know what exactly to write
I just explained to myself how we can prevent sweeping channels while allowing sweeping threads, so I think you're fine
although idk if you want to allow sweeping dm channels maybe?
Good point
Feel free to write something for me to add in the doc, so it's included in the issue I'll create tomorrow or so
I suppose we could do threads, dms, and even voice??? not much should rely on a voice channel except itself now unless the voice adapter uses it, which you probably know way more about
Voice adaptor only relies on the ID
so we could allow that too
I can write a little something in a bit
Alrighty 
Something like this probably
- And because we need to write this code manually, we can hard-forbid specific managers from being swept, such as guilds and non-thread channels.
+ And because we need to write this code manually, we can hard-forbid specific managers from being swept, such as guilds and permission overwrites. It also provides additional flexibility so that We can handle channels significantly better in two ways.
+ 1. Make it much easier for users to set up sweeping for channels by providing one channel sweeper (or one per type) instead of users having to remember to sweep up to 3 levels of caching (i.e. no more duplication between ChannelManager, GuildChannelManager, and ThreadManager). We can do this because every channel is stored in the client channel manager, which already has a way to traverse guild channel managers and thread managers to remove the desired channel.
+ 2. We can hard code a filter that runs before sweeping of channels to only allow sweeping of channels we determine can be swept.
Added
If there are no further requests, I'll proceed to posting the Gist's contents in a couple of hours
Just asking a couple of things in internal, and then I'll post if nobody has anything to add
Dropped the nuke https://github.com/discordjs/discord.js/issues/6539
blessed
👍
Sorry for opening this Thread again, but after doing some modifications on the library with the support of nearly all events.
I came to the result, that with the raw-cache storage, which is planned on v14/v15, you can decrease your ram about 50%
@queen chasm @lofty mist look at this
I built it as a layer on Discord.js with just having one Collection, which handles the raw api data (and removes the properties, when a rule has been provided.).
When you call .get it transforms the modified raw api data and pass its to the transformerClass (-> Djs Class)
The raw object will be modified as mentioned above. Firstly for the property removing and secondly for changing the array of roles/channesl/threads/emojis objects into a array of ids, whereby the objects are cached on their appropriated collection.
So on a transform of a guild object, it calls the values from the collections via the ids, which are in the array.
Modify Guild - roles: [{}] -> roles: [id] + roles.set
Transform Guild Modified Data: roles: [id] -> roles.get -> [{}]
This allows just caching channels... 1x time instead in the guild itself.
https://media.discordapp.net/attachments/861124237221560320/934830814598934618/unknown.png
Would be interesting to check the usage with redis.
To sum up here the advantages:
- 1 level cache, instead of duplicated cache
- raw objects uses less memory in comparision to nested classes
- Raw Data can be modified, very interesting for large bot devs
- Compability for using with redis and easier handling of hot reloads
- Easy handling of sending data between procceses via IPC
- Classes do not have to be converted back, when doing rest requests
- Less breaking changes on user side, when they use the raw api data. Since Discord do less breaking changes
- Avoiding Memory Leaks
- Lot of Changes required
- Degraded Perfomance, bc of transformation on every .get ? - Maybe a temporary class cache to solve this.
the whole "raw data" storage thing is planned, probably for the ts rewrite though (looking like it might be possible on v15, but more likely v16
as for duplicate caches, the duplicate caches don't take up nearly as much memory as the original since they store by reference. Obviously its still > 1 map with a complete list of duplicates (+threads have 3) so memory still goes
The question becomes the trade off between memory and performance at some point, and it seems performance was chosen in the past. (filtering 5000+ channels down based on what channels would be in the current guild cache or compiling the client cache from the guild caches etc...) I think the performance on that isn't super terrible though, and I think given a proper investigation its likely the benefits of memory decrease far outweigh the performance improvement
v16
, I thought that this has a higher priority with Discord adding even more and more features.
But can understand that this requires a lot of changes and is better when it is done on the ts-rewrite.
V16 isn’t that far out either. It’s been said a couple times we want to have majors every few months instead of once a year now