#Memory leak / GC appears to stop in stale app (no dependencies)

1 messages · Page 1 of 1 (latest)

keen urchin
#

update: opened an issue on gh with a lot more details https://github.com/oven-sh/bun/issues/18265

I have been noticing an unusual increase in memory consumption with my app and I can't figure out if it's my code. Last night I started measuring process.memoryUsage() and this screenshot is how it went overnight. The app is stale and not doing anything really... that's on purpose to see how that would affect memory.

It's really hard to figure out what's going on because it starts out okay and you can see GC happening every now and then, but then suddenly few hours later GC just gives up. Again, the app is not doing anything for the entirety of this period.

I don't have any dependencies. I even removed most of the functionality to see if that would make a difference but it didn't. I tried --inspect but it hasn't been helpful as there isn't an immediate increase, it takes hours for this to happen. Any tips on how to get to the bottom of this? I need to know where/what to look for... has anyone had luck dealing with this stuff?

GitHub

What version of Bun is running? 1.2.6-canary.74+74768449b What platform is your computer? Linux 6.1.62 x86_64 / Darwin 24.3.0 arm64 What steps can reproduce the bug? I have been noticing an unusual...

#

Here's a close up look at the first few hours before the usage increases for no apparent reason (the heap total is growing steadily though, is this normal?)

#

Then ~12 hours later this happens! The app didn't restart or crash it's still alive. If it hadn't drop at all I would think maybe it's my code, but because it eventually did i don't think that's the case

fringe crown
#

TBH, I experience similar behavior in more complex app, but without any Bun.spawn.

After my app reached some point of memory usage: the CPU went up high but it looks like GC is unable to set free any memory 🤔

My app is way more complicated, to just post it here... I'm waiting for the results of your problem... Maybe it will be somehow connected

keen urchin
#

I feel you! my app is somewhat complex but luckily i didn't have any dependencies so I knew it was either my code or something else in the runtime. It took me about a week of just debugging and profiling every code path in my codebase separately (and a lot of patience 😅 ) until i was able to narrow it down to Bun.spawn. It was a lot of work so hopefully it will be worth it haha.

btw, Jarred might have hinted yesterday that this could be related to the stdio streams (that's my interpretation at least) because when i disabled them everything was normal, so I wonder if it's just an issue with streams in general.

fringe crown
#

Hopefully he will find something 🙂
BTW: did you find any practical way of profiling? Profiler via bun web always hangs for me if I try to record sth like 5s of my app during request...
Probably the app is too big for profiling 🤔

keen urchin
#

I haven't had that exact problem, but bun's web inspector kept crashing whenever i hover over things, so i couldn't reliably use it. What i ended up doing is only record a snapshot, then export it and import back in Safari's devtools which was very stable in comparison, but the downside is you lose references to your code (no "source maps"/"stack trace" so to speak).

Also, another thing that was very helpful is to generate v8 heap snapshots from within your code (you can also it via the the web inspector console too - see option 2) and then use Chrome's devtools to import them. Keep in mind, this is just heap snapshots, not a memory timeline, but you can still compare two snapshots in Chrome devtools which is really nice.

// Option 1:

import { writeHeapSnapshot } from 'node:v8';

writeHeapSnapshot() // optionally, set a path to the output file

// Option 2 (you can call this from the bun's webinspector console):

Bun.write('path/to/file.heapsnapshot', Bun.generateHeapSnapshot('v8'))

More on this here:

https://nodejs.org/api/v8.html#v8writeheapsnapshotfilenameoptions

https://bun.sh/guides/runtime/heap-snapshot

tbh, memory profiling for this specific issue didn't help much but what I described above has definitely helped me catch memory leaks in my code in the past.

And lastly, for this specific issue, i had prometheus reading process.memoryUsage() every 5 seconds.

fringe crown
#

Lot's of advices 🙂 Thank you so much 🙂

young glade
#

This guy is definitely not lazy

#

I get annoyed after few lines of text

fringe crown
keen urchin
iron pier
#

Hi! I'm working with @fringe crown on the same project. I would put fresh light of what I've already noticed.

  1. When bun has given limited resources in memory it eagear to eat all of it, till 85-95% percent of the limit, after that, the HeapHelper subprocess in bun is constantly trying to release memory.
  2. While it's trying to release memory it's consuming almost 100% given cpu resources
  3. After yesterday night I've noticed first time OOM on two pods.
  4. Somehow bun is able to work long with limited memory available.

Possible points what I want to ivestigate what may cause Memleak on our application for today are:
Buffer (concatenation)
File access via 3rd party ffi (tailwindcss & lightnighcss)
React 18.3 renderToPipeableStream

This situation is bothering us only on x64 platform, locally tested on docker arm64 (Macbook M1 Pro) preassures the memory, but never spins the cpu by HeapHelper like that

iron pier
#

We are going to build a healthcheck that checks if cpu is throttled in relatetion to throughput we handle. But first, we want to check if the problem occurs when we use alpine distro with bun

iron pier
#

unfortunately alpine distros didn't helped, so the memory leak seems to be somewhere in Bun itself. Haven't found yeat 1.2.6 or even 1.2.7, will do next week probably.

keen urchin
#

yeah it's pretty bad tbh, it's def in Bun itself... after they merged the memleak fix i kept that test script i posted on my gh issue running for 2 days on a debian bookworm distro and it reached 1.2 GB, let alone a large app. I'm also noticing a very high memory usage locally on macos whenever i use the HMR devserver for a minimal react+tailwind app, it quickly reaches >2.5 GB within a couple of hours 😕 i'll probably post something about it in #bake

fringe crown
#

One thing that I can add, which seems to be important @scarlet cairn :

  • it doesn't matter how big is the traffic.

Our case originally has 1 req / 1s. New app based on bun start from 5% of that, now it's 75% of that traffic and the memory seems to grow in exactly the same timing 🤔

iron pier
#

@fringe crown that's true, but for the bun defend, it has to be somehow consistent, for example our test instance does not grow same as production, where the main difference is traffic

naive kettle
#

Facing a similar issue. Didn't notice any problems in dev but reached an OOM error in prod after two days. It's a small site built with React Router v7 and Elysia (+ the bridging package elysia-react-router). No database, just reading some Markdown files using the Bun I/O API and fetch requests to other web APIs. It's running in k8s on Bun 1.2.5 with the alpine image.

No clue what is causing it or how to even debug it. If there's anything else I can contribute to this thread to help resolve the issue, please @ ping me. I'll probably switch to the Node runtime until it's resolved.

scarlet cairn
#

@naive kettle what specific I/O APIs are in use

naive kettle
# scarlet cairn <@94420222554931200> what specific I/O APIs are in use

Mainly reading (Bun.file()), checking if they exist (using .exists() method) and outputting them as a string (using .text() method). For example, I would do:

const filePath = path.join(import.meta.dirname, 'content', `${name}.md`);
const contentFile = Bun.file(filePath);
if (!(await contentFile.exists())) {
  throw new Error(`File '${myPath}' doesn't exist.`);
}

return await contentFile.text();
scarlet cairn
#

ah, doubt the leak is there - we have tests for that

iron pier
#

@scarlet cairn for me, the most frustrating behaviour is a significant CPU usage when idle on swollen bun (where the almost all accessible memory is in use). It's due to internal HeapAnalyzer threads, that are trying to release memory. Maybe this would give a clue what may causes the issues, but definitely it'll not resolve the memory leak itself. Today we'll test bun 1.2.8 against the memory leak.

Added: The situation causes our HPA (Horizontal Pod Autoscaler) to add more instances cause it looks like a traffic is that heavy that we should add more instances.

naive kettle
fringe crown
naive kettle
#

Ended up switching to Node, changed Elysia with Hono together with the bridging React Router package because the Elysia one was using Bun specific code (elysia-react-router -> react-router-hono-server), swapped out the Bun file calls with Node fs and in the entry.server.tsx file (used by React Router), I'm now using renderToPipeableStream from react-dom/server instead of renderToReadableStream from react-dom/server.bun. All other code is the same as previously deployed. And now the memory usage is stable with Node.

I tried using renderToReadableStream from react-dom/server.browser yesterday instead but the result was still the same. I'll keep runing it on Node for now until a fix is available. Happy to test canary versions or help with further debugging if needed, just ping me 🙂

hallow sedge
#

I migrated a large app to bun in order to easily use a mixed ts / js repo while we migrate to ts
The memory leaks with files are killing us
Moving back to node is tempting but I am hopeful the bun team can fix...

hallow sedge
#

For context: we are constantly at 30 pods even when theres little traffic due to huge ram usage. This never happened with node 🙃

naive kettle
#

The newest Node version can run TS natively with some caveats (like doesn't use tsconfig files): https://nodejs.org/en/learn/typescript/run-natively
Not sure if it works for your use case and how many Bun APIs you're using, but might be worth looking into if it's a big issue. Since I moved to Node earlier today, memory usage is stable, k8s isn't going crazy and the alerting is quiet. It was fairly painless as the app is pretty new and small. There's a handy docker image with Bun and Node, allowing you to still use Bun as a package manager so you can keep your lockfile intact and use the Node runtime: https://github.com/ImBIOS/bun-node

keen urchin
#

I'm going to add that even development with the fullstack server makes my computer sluggish after a few hours of work, and that's for a very small react app!

For example, this is the memory usage after about 100 saves/reloads (x113 hot reloads to be precise) for a clean slate react app (literally the entire thing is just App.tsx with tailwind, a few radix-ui components, and tanstack/react-query).

Vite on the other hand hovers around 80–100mb.

I am also hopeful the bun team can address these issues soon...

lapis flicker
#

dev server currently doesn't release memory source map information because it needs to hold it as long as any tab is open, including any reachable hot updates.

scarlet cairn
#

if we used inline source maps would we be able to avoid that?

#

(since devtools will usually be able to read the original sources)

lapis flicker
#

the server needs to read them

#

unless the logic for the error overlay can be ported to run entirely in the client. this would have to be able to extract, parse, and lookup the source map in the browser

scarlet cairn
fringe crown
fringe crown
#

Well, I'm looking at initial snapshots from test enviro and I can see:

#

however next snapshot shows that it is able to handle some of those:

#

when the traffic is low - looks like GC is able to handle such increases quite well 🤔

scarlet cairn
#

That looks more Ike high memory usage than a leak. Leaks go up without going down

fringe crown
#

As I mentioned before: looks like the CPU works hard to free some memory, but (maybe because of high traffic and constant allocating memory during new requests) it fails, so memory is constantly rising and the CPU goes weird as well...

fringe crown
#

Actually there is a lot of empty (?) strings 🤔 There is no "one big" string or something, but a lot of small ones 🤔

fringe crown
young glade
#

has this been resolved?

fringe crown
hallow sedge
#

try 1.2.10 im seeing improvements

fringe crown
fringe crown
#

Nope... memory still rising...

hallow sedge
#

rising vs leaking?
this is the difference in mine top vs bottom

fringe crown
young glade
#

😭

scarlet cairn
young glade
#

shouldnt something like this be top priority

fringe crown
#

this measures as a ~40% memory reduction locally
TBH I would rather use more memory, but constantly rather than have low usage which continuously grows 🤔

We developed an ugly workaround to just restart k8s pods whenever they reach certain level of memory usage - but it is still an issue.

@scarlet cairn do yo need more specific feedback for this? How can we be more helpful? This PR: https://github.com/oven-sh/WebKit/pull/84 looks some kind of freeze. So, I'm not sure, is there something that we (as users) can do?

GitHub

This makes GC trigger based on allocation rate (which is updated on every allocation) more than relying on timers (which are infrequently run)

This makes it so we can delete our GarbageCollectionC...

primal mulch
hallow sedge
#

Our app in prod saw a reduction in ram going to 1.2.10 but our api services that aren't updated for a week at a time went back to the same old
So we also need to restart on a schedule...

hallow sedge
#

our api servers use websockets a lot so maybe its that...

primal silo
#

We run our NestJS server in a k8s environment. We have a lot of memory leaks in a specific server, so we restart it regularly. That server communicates a lot with external servers (fetch, axios).

light gale
hallow sedge
#

this is what overnight looks like with only a health check being hit...

hallow sedge
#

im finally starting to dig into this because its really killing my costing
i have an express app with NO middleware between app and healthcheck
i hit it 1k times and here are the areas blowing up

Headers: 2,680 (+2,401, +860% increase!)
NodeHTTPResponse: 2,675 (+2,401, +876% increase!)
Arguments: 2,754 (+2,407, +693% increase!)

hallow sedge
violet coral
hallow sedge
#

ive had leaks since i started at the 1.2.2 announcement

#

some were resolved with file uploads in 1.2.10 but this remains in latest

violet coral
#

So before the rewrite of node:http in 1.2.5?

#

Curious if you run things with the last 1.1.x version and get same or different results

fringe crown
hallow sedge
#

Yeah 1.1 is a non starter for my prod app too

violet coral
#

Oh, I didn’t mean for production, I meant for your test harness. Would be great to know if there is a specific version that introduced this behavior.

I’ll see if I can try tonight.

#

BTW: I saw some stuff for web sockets. Is that required to cause the issue in the test harness?

fringe crown
#

Actually, I'm unable to reproduce this on my test instances 🤔

fringe crown
violet coral
# hallow sedge https://github.com/oven-sh/bun/issues/19930

Alpine does not work:

#10 [6/7] RUN bun install
#10 0.103 Error loading shared library libstdc++.so.6: No such file or directory (needed by /root/.bun/bin/bun)
#10 0.103 Error loading shared library libgcc_s.so.1: No such file or directory (needed by /root/.bun/bin/bun)
#10 0.103 Error relocating /root/.bun/bin/bun: _ZSt20__throw_length_errorPKc: symbol not found
#10 0.103 Error relocating /root/.bun/bin/bun: __cxa_thread_atexit: symbol not found
#10 0.103 Error relocating /root/.bun/bin/bun: _ZSt9terminatev: symbol not found
#10 0.103 Error relocating /root/.bun/bin/bun: _ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE: symbol not found
#10 0.103 Error relocating /root/.bun/bin/bun: _ZdlPvm: symbol not found
#10 0.103 Error relocating /root/.bun/bin/bun: _ZSt13set_terminatePFvvE: symbol not found

scarlet cairn
#

alpine needs libstdc++ installed (node does too), it's something like apk add libstdc++ libgcc

violet coral
#

I don't see the 8000% numbers in any of the cases though

#

I see some moderate things, but they went away when I added:

setInterval(() => Bun.gc(true), 1_000); // Every 1 seconds

I see websocket code in there that is not utilized, so wondering if what was published was only part of it

scarlet cairn
#

going to have some good news for this thread

scarlet cairn
#

TLDR: bun upgrade --canary and give it ago. it probably won't help with websockets. but it will help with reading from streaming files. and with spawned processes. and maybe file uploads in certain cases.

young glade
scarlet cairn
#

63 should be a little bit better than 57

young glade
hallow sedge
#

can anyone tell when i migrated my project back to node? 😭
hope to be back to bun soon 🙏

winged harness
hallow sedge
#

bun docker
my app is an api server

50% rest calls
45% ws
5% file upload

fringe crown
#

Tested for a short period of time, but it doesn’t seem to help much:

#

I'll be back after next couple of hours

#

Just for clarity: in my case, there are no problems with Bun.spawn, must be sth else then...

hallow sedge
#

aww man

#

im under so much fire
ive costed my company thousands of dollars over the past couple months moving to bun 😭

#

was hoping this was it

winged harness
#

lol how restarting your pods every hours cost thousands ?

hallow sedge
#

we cant restart during peak hours - we get pinned at 40 high resource pods

#

someone else is doing the regular restarts

fringe crown
#

Generally speaking we are restarting pods, but steel: it is more like a workaround... Sth rather not suitable for prod ready runtime. I understand: Bun is young, but still, sth that needs to be fixed 🤞

#

Chart update:

#

Slowly, but rising 🤔

scarlet cairn
hallow sedge
#

This change will have to go through qa team etc
I can maybe test on a dev instance though
Currently working on some high priority features though
But I can give a response next week either way

fringe crown
scarlet cairn
fringe crown
scarlet cairn
#

yeah

#

did you already file an isuse some time ago?

fringe crown
grim sequoia
#

You have uncapped loop somewhere in code? I had this problem and it turned out to be my case. Throttle all while, and even actual tick logic, fixed for me

#

await Bun.sleep(sleepTime); in loops

winged harness
winged harness
grim sequoia
# winged harness what kind of code has uncapped loops?

while(true) { doStuff(); }

or

const TICK_RATE = 20;
const interval = 1000/TICK_RATE;
let nextTick = performance.now();
while(true){
const now = performance.now();
if (now>=nextTick){
doStuff();
nextTick += interval;
}
//needs below
const sleepTime = Math.max(0,nextTick-performance.now()(
await Bun.sleep(sleepTime);
}}

just my 2c, could be simple as uncapped loop, it will not yield and look like a memory leak. Even though logic is gated

fringe crown
grim sequoia
# fringe crown Nothing I'm aware of :/ I would describe everything as precisely as I can in the...

Worth checking, just search for Bun.sleep in VS Code. If you don’t see any and you know you have loops running, that could very likely be the cause.
Even gated loops (based on time or logic) will spin at full speed unless you explicitly yield. Without await Bun.sleep(...), Bun will hammer the CPU, block GC, and prevent async tasks from running cleanly (cascading) which can cause timeouts, memory pressure, or 500s under load.

fringe crown
grim sequoia
fringe crown
#

I'll do my best with coworkers and be back with the feedback 🙂 Thanks for those ❤️

hallow sedge
#

finally rolled out 1.2.17 - no change to ram
still pinned at 40 pods on gcp 🙃

scarlet cairn
#

we're going to need more of a reproduction for us to be able to help you

hallow sedge
#

our app is fairly large. itd be difficult to track down where exactly things are going wrong
i can throw in a memory dump onto prod and get a readout periodically as a start

keen urchin
hallow sedge
#

I was assigned a task to migrate us back to node. I put it on pause to wait for 1.3
Literally praying lmao

hallow sedge
#

this week (7/21)
pls dont play with my heart this way 😭

scarlet cairn
#

if you run bun upgrade --canary do you notice a difference?

hallow sedge
#

before pic - a dev instance crashes every few hours reaching 4gb ram and no traffic

hallow sedge
hallow sedge
#

the green is canary
theres literally no traffic other than healthcheck because we hang on any actual endpoint due to the grpc issue (so no one is using this dev instance rn)
definitely see that gc is able to happen so we arent crashing with oom but it still shouldnt rise like this
node is a flat 500mb

#

significant cpu difference (likely due to not needing to panic gc as we leak to oom)

hallow sedge
#

the new vm spinning up at the end is due to a bun panic

scarlet cairn
scarlet cairn
#

it might also end up fixing the gRPC regression but i'm not sure

hallow sedge
#

ok. sending this out and will test in a moment

hallow sedge
scarlet cairn
#

is the rss still climbing?

hallow sedge
scarlet cairn
#

Yeah that looks like a memory leak. For us to figure out where we will need some more info. If you could run heapStats() from bun:jsc that would be the easiest starting point - is there any non-default type that has a large number in either objectTypeCounts or in protectedObjectTypeCounts? The ideal would be a v8 heap snapshot but you may not want to post that publicly as it could have credentials in it

hallow sedge
#

heap stats ^

#

snapshot. i can take another one after like an hour to see whats changed

scarlet cairn
#

its strange none of this really sticks out. i do think it's likely related to instrumentation though.

hallow sedge
#

we are loading sentry via import
"start": "bun run --trace-warnings --import src/instrument.ts src/index.ts",

scarlet cairn
#

also those 2000 maps - thats kind of a large number

scarlet cairn
#

basically I’m wondering if sentry is keeping the requests alive forever

hallow sedge
#

sure, I'll try without

hallow sedge
#

with sentry on i waited like 30 mins and did comparison view of the snapshot

#

going to send out the sentry disabled version

scarlet cairn
#

what did it look like

hallow sedge
#

I went scorched earth and created a new file that is just barebones express + bodyparser + health check
gonna slowly add things back
this is the comparison view after 8 hours and the ram view
its definitely not skyrocketing to 4gb in 2hrs, but it has an interesting pattern within a 5mb range
ill add sentry next

hallow sedge
#

I tried getting the 'without sentry' one but the server crashes
I need to bump its ram in order to get it
Here is the heap stats before I ran the attempt
Strings rose from 400k to 1.6m

#

blue is without sentry, the green is the bare express

hallow sedge
#

For barebones
I added back our react app that we serve with express static, no api routes so accessing it gives 404 to apis
I mounted our websocket server as well (green)

hallow sedge
#

damn, guess it really is sentry @scarlet cairn
my previous attempt just marked it disabled rather than not import it at all

scarlet cairn
#

the blue one is without sentry?

hallow sedge
#

yeah

#

ive iterated on an empty express app a few times and its been mostly flat until i finally added sentry in this last update and got the slope again

#

the bump at the end of the blue is running a memory dump before closing
likewise the green spike is doing a starting memory dump

light gale
#

is this with the node or bun sentry? (wait does bun even work with the node one?)

hallow sedge
#

its with bun, and i think that just wraps the node one with some extras to support bun servers
we were using the node one a month or two ago and still had the issue

#

yeah, we dont have sentry node in our package.json (anymore) but its in the bun.lock as a dependency of sentry/bun

hallow sedge
#

so took a diff for sentry start and end, im seeing strings rising, and it contains request input and output information
see the bearer tokens and stuff... 😬

#

the associated object

hallow sedge
#

left it for a full day to be super sure

#

start script:
"start": "bun run --import ./src/instrument.ts ./src/index.ts" where instrument.ts initializes sentry

hallow sedge
slow grove
#

Just jumping into this thread because we also had memory issues 😓

Still some small leaks left (which is probably Sentry as well)

BUT the biggest culprit was Bun.s3, with a peak of 83GB RAM across 3 replicas haha. After the fix it's pretty stable-ish with about 300-700MB RAM/instance.

See https://github.com/oven-sh/bun/issues/20487#issuecomment-3153362860

GitHub

What version of Bun is running? 1.2.16+631e67484 What platform is your computer? Darwin 24.5.0 arm64 arm What steps can reproduce the bug? After downloading a file using the @google-cloud/storage S...

#

If you need more information, just tell me and I'll try to provide it 🫶 Love bun buncat

hallow sedge
#

I am solving our issue by disabling the sentry default integrations which hook into http and fetch as well as removing tracing

defaultIntegrations: false

Let me know when if/when there is a patch to try Prayge

hallow sedge
#

damn. on my testing dev instance with low traffic this change fixed everything
however testing on the full app with this applied and a bit of traffic and we're still exploding
ahh my personal was using newer version on sentry and express think_spin
updating sentry didnt fix it :c
OK! looks like sentry was a leak fix as confirmed on my smaller instance but for the other dev instance there was an additional issue where it was retrying a connection constantly and that caused an oom too. fixing that and we are flat ram on both!

hallow sedge
#

looks like we're still leaking, just slower on 1.2.20 with sentry tracing + integrations disabled 🙁

#

(green is 1.2.20)

hallow sedge
#

here's prod. need to figure out the cause of the spikes but otherwise still slowly increasing with just the sentry changes (currently on 1.2.17 with 1.2.20 under test)

scarlet cairn
#

mimalloc v3 upgrade intended to reduce memory usage further did not make it to the release due to causing crashes