#Container creation failure due to PID limits is undocumented

1 messages ยท Page 1 of 1 (latest)

frail robin
#

Heyyo ๐Ÿ˜„ I'm posting because this took me a bit to track down due to no other google searches appearing for it. I'm running into failures when using Dagger with Podman because of the fact that Podman sets default PID limits where Docker does not. An example trace where you can see a PID limit related error is dagger trace cd572db34091693da11e533ec8730162. After thinking about this for a while, I don't think this should be something dagger "fixes", it would be nice if the engine was PID limit aware, but ultimately the user has full control over resource limits so I believe this a fair failure scenario. I wanted to get other thougts on this before raising a contribution to add this to the dagger podman docs.

Possible Engine failure due to this too
I admit that I need to go back to reproducing and documenting this scenario more concretely but I actually first began tracking this down due to dagger call "randomly" getting these "POST /query: Unexpected EOF" errors. I've only concretely observed the engine crash once and am planning to setup a proper repro once I get a chance but for now I'll attach the engine logs and Opus analysis of the logs which points to the PID limit issue being the source of the engine crash. Here is the trace which corresponds to the engine crash logs, dagger trace d99108af299d26d317564fcee8eb3e88

frail robin
#

Here's a screenshot where you can see PIDs for the engine container exceeding the podman default of 2048. Note, I manually increased the PID limit of the engine container to be able to avoid the crash and capture this screenshot. This is just to show that a large number of PIDs are being created within the engine container.

harsh turtle
#

thx for the @frail robin! I'll add a note about this in podman docs page

frail robin
#

That was fast ๐Ÿ˜„

#

Have you had a chance to look at the engine logs and the associated Post "http://dagger/query": unexpected EOF error? I'm not familiar enough with the dagger code base yet to quickly track down how PID limits could cause the engine itself to crash. Mostly just curious to know what other think about it. If its not the PID issue then I'll keep working to replicate it because it could be another thing to do with podman.

#

Oh forgot to mention that the Post "http://dagger/query": unexpected EOF error occurs for me on both linux and mac. Only common denominator being I use podman as the runtime on both systems.

harsh turtle
# frail robin Have you had a chance to look at the engine logs and the associated `Post "http:...

yes, the error is here in the logs you have attached

time="2026-05-10T04:52:24Z" level=debug msg="handling http request" client_hostname=dagger client_id=qqzajnq1kj9lha6qo98x01vsb contentType=application/json method=POST path=/query session_id=1yjbkuj055pnnoftv6zlgt9sp span=d8ff7b26fbc79ddf spanID=d8ff7b26fbc79ddf trace=d99108af299d26d317564fcee8eb3e88 traceID=d99108af299d26d317564fcee8eb3e88 upgradeHeader=
time="2026-05-10T04:52:24Z" level=debug msg="handling http request" client_hostname=dagger client_id=4u77tgqrcz117h23daf4uh0v5 contentType=application/json method=POST path=/query session_id=1yjbkuj055pnnoftv6zlgt9sp span=00953f136133c196 spanID=00953f136133c196 trace=d99108af299d26d317564fcee8eb3e88 traceID=d99108af299d26d317564fcee8eb3e88 upgradeHeader=
runtime: failed to create new OS thread (have 97 already; errno=11)
runtime: may need to increase max user processes (ulimit -u)
fatal error: newosproc

runtime stack:
runtime.throw({0x3ddf1ed?, 0xbb3d829fe58?})
    /usr/lib/go/src/runtime/panic.go:1229 +0x48 fp=0xbb3d829fe30 sp=0xbb3d829fe00 pc=0x48e208
runtime.newosproc(0xbb3ee0b4808)
    /usr/lib/go/src/runtime/os_linux.go:199 +0x165 fp=0xbb3d829fea0 sp=0xbb3d829fe30 pc=0x450bc5
runtime.newm1(0xbb3ee0b4808)
    /usr/lib/go/src/runtime/proc.go:2927 +0xbf fp=0xbb3d829fee0 sp=0xbb3d829fea0 pc=0x45babf
runtime.newm(0x20?, 0xbb3cbd30008, 0x0?)
    /usr/lib/go/src/runtime/proc.go:2902 +0x125 fp=0xbb3d829ff10 sp=0xbb3d829fee0 pc=0x45b985
runtime.startTheWorldWithSema(0x0, {0xe0?, 0xbb3ec43a008?, 0xbb3cd146d20?, 0xbb3edc8a690?})
frail robin
#

Do you want me to raise an issue on github for this engine crash bug or are you already doing that?

harsh turtle
#

it's the same as if the engine container runs out of memory for example

#

there's pretty much nothing you can do at this stage but letting the engine crash

frail robin
#

ah ok that makes sense. So my assumption that letting the user control the limits and the engine crash if it exceeds those limits like any other container was correct.

frail robin
#

One thing I was curious about is why does it create so many processes? I wonder if there's a potential optimization possible there. Naively I'd expect it to create a process per "end" container/service aka (.Stdout() or .Up()) but based on the stats its clear that more than that are being created. I'm assuming there's some sort of intermediate processes that get created

#

Are there any "under the hood" docs by chance? Would love to learn more about the internals of dagger!

harsh turtle
#

take into account that both processes a threads take slot in the pids.max table, so it's not only processes

harsh turtle
#

so whatever you run in your Dagger pipelines will ultimately have a very important effect on this

frail robin
#

And I'm running all of module tests with the parallel package so lots of containers and lots of threads get created... yeh not much to optimize other than run things sequentially instead of in parallel.

#

When does the engine clean containers up?