#ah... i think this [edge merging](https
1 messages ยท Page 1 of 1 (latest)
it's definitely possible, do you have repro instructions? I could give it a try off that PR
my repro lowkey sucks, it takes 20+ minutes on a fresh engine:
- setup claude with container use-- optionally i'm using single tenant mode โ cu config agent claude, then edit claude.json like this:
"mcpServers": {
"container-use": {
"type": "stdio",
"command": "container-use",
"args": [
"stdio",
"--single-tenant"
],
"env": {
"_EXPERIMENTAL_DAGGER_RUNNER_HOST": "docker-container://dagger-engine.dev"
}
}
},
- build dev engine, enable debug (optional, this happens without debug)
- start up a ton of claudes in terminal splits, give them a hard task so they run for a long time, mine is inside container-use:
I want to add LSP integration to container-use.
For inspiration as to the shape of the feature, i want it to work like zed agent's "diagnostics" tool or OpenCode's LSP servers. Search the web to understand what that might look like.
It should not expose tools for starting and stopping LSP servers, but should allow the user to configure LSPs that start on environment creation that will then be exposed to the agent via an lsp_diagnostics MCP tool.
Carefully research the existing code to find a safe place to root the LSP process. If it runs in the container, we'll likely need to utilize dagger's CacheVolumes. If it runs on the host, it MUST NOT be rooted in the user's git repository, but can probably be rooted in the container-use managed worktree associated with a given environment. USE LIBRARIES AS APPROPRIATE - do not reinvent the LSP client wheel if you don't have to.
DO NOT write extraneous developer-facing technical documentation (user-facing docs in /docs are fine).
DO NOT return an incomplete implementation full of placeholders.Use test-driven development practices to build the complete feature incrementally, and then demonstrates that it works with gopls using manual test methods.
but i'd be happy to try out your PR, gimme a SHA and like 2 hours and i'll report back if it's any different
@timid sinew is https://github.com/dagger/dagger/pull/10761/commits/2a9924bb3b5266c1da1c2730ac231e00a8ea09ec in a state where it's reasonably likely to work?
I think my last push triggered some extra failures, lemme fix them quick and i'll push to a separate stable branch
yeah just lmk, imma run an errand real quick and maybe grab lunch
hm ironically I seem to have hit the engine hanging while loading modules in the most recent run
may be different though
kk well im done with my errand, should i try it with https://github.com/dagger/dagger/pull/10761/commits/5afb0d75d1089825dcf9577f5694df72a91a8d9e?
Yeah use 5afb0d75d1089825dcf9577f5694df72a91a8d9e
I pushed that one to another branch so shouldn't disappear
(ftr this was something else entirely, shouldn't be an issue)
engine exited 137 partway through my stress test, gonna kick it off again and watch htop
a bunch of this in the logs:
alongside some other known red herrings
watching the progression of memory utilization, looks like it'll get oomkilled in 10 minutes lol
so prior to it getting oomkilled, the time it takes to run dagger functions in container use slowly crawls up from ~5s to ~30s. on 18.16, also with a warm cache, it's 2.5s.
gonna stress it one more time and hook you up with a heap dump.
the goroutine dump did no longer show the isDep chain, which indicates to me that the cu blocking is at least tangentially related to edge merging
at the time of these dumps, the time to call dagger functions has crawled up to 15s
(dumps are uploading)
a couple dumps for ya
Thanks! looks like it has a GB of data related to content-hashing... probably doing that too often or something
happy to help ๐ this is the top bug in CU i really really want fixed both for myself and because you're almost guaranteed to hit it if you start using cu "for real"
we've had 2 or 3 people report in discord and each time i can make the consequences of the blocking a lil less bad, but im pretty sure at this point the core issue is what you're tryna fix
also like, idk if it's helpful, but the shape of container-use graphs is kinda weird and i have to wonder if that contributes... we've basically got shared base layers, a layer that's a NoCache hostdir turned into a repo, asRef'd, and then turned into a dir, and then every environment is layering per-tool-call in such a way that they're, naively, non-overlapping branches of a tree with the same shared base... i think maybe there are actually edges to merge, like in the case where you have the same file written twice in the same exact way in 2 different envs, producing 2 identical-content layers even though their parents are different
There's a new commit e2f669d964602326ccbce8676ff676747ea77092 with the memory usage fixed if you happen to have any time to try again (no biggie if not, we'll obviously see once it's merged)
sick i can prolly test tomorrow
That could definitely explain it, when it comes to content-based caching it's definitely possible to end up in weird situations where operations with a given digest end up depending on other operations that end up with the same digest, which is made an order of magnitude worse by edge merging...
stressing it as we speak, so far so good, seems stable at 3.5g ram and a 5-8s warm-cache dagger functions invocation
and for once on the agent end, it looks like claude is the bottleneck instead of the tool calls
although i did add an extra variable on that side here https://github.com/dagger/container-use/commit/a10f3387586377b695b8dbdacd6fce0dfb0c2171
im gonna walk off at let it run to the end, will report back.
got 1 random 15s dagger functions
oh and 26s
ah shit:
5 : load module: .
6 : โ finding module configuration
6 : โ finding module configuration DONE [1.3s]
7 : โ initializing module
8 : โ inspecting module metadata
7 : โ initializing module DONE [22m46s]
8 : โ inspecting module metadata DONE [0.3s]
9 : โ loading type definitions
9 : โ loading type definitions DONE [1.2s]
5 : load module: . DONE [22m49s]
Name Description
build Build creates a binary for the current platform
build-multi-platform BuildMultiPlatform builds binaries for multiple platforms using GoReleaser
lint Test runs the linter
release Release creates a release using GoReleaser
source -
test Test runs the test suite
test-nix-hash TestNixHash tests if nix-hash binary is available in our custom container
1369.71 real 6.80 user 2.95 sys
all container-use tools are blocked during this too.
cpu looks sane during this, so does memory -- nothing is strained
dumps uploading
heap was taken first, but this is the engine-wide block happening again
here's another goroutine one from a while later
they do all have the isDep thing going on still
claude seems fairly convinced that there's a cycle here that bk is not detecting:
Dump 1, 2, original**: Checking if various edges depend on `0x40924d8900`
**Dump 3**: After going through 100+ levels of recursion, we're now checking if `0x40924d8900` depends on `0x40a8bb4e80`
This proves there's a **true circular dependency**:
- Edge `0x40924d8900` depends (transitively) on Edge `0x40a8bb4e80`
- Edge `0x40a8bb4e80` depends (transitively) on Edge `0x40924d8900
Yeah I'd agree w/ claude here. I wouldn't be surprised if edge merging makes it easier to hit but I bet we're still somehow submitting LLB w/ a circular dep here...
Thanks again though! I'll treat this a separate issue from the edge merging rm'ing
Are there any cloud traces for this run where you hit the hang? Even if I need to download the raw trace and write some parser/grapher for it, that might be useful
I just need to extract out what corner case triggers this
https://dagger.cloud/dagger/traces/092a0540c62bf418e39f2f7b82d31f89 the traces are a pain to read, and i have enough seperate IDs that it's hard to say if any specific one is what's triggering
on vibes alone, it also does seem to be significantly harder to make the issue occur post edge-merging removal...
0.5*๐
anyways with the trace ids, trust that there's literally 100s if not 1000s of trace ids that look very similar to those, but no way for me to pull them from my logs and categorize them meaningfully. no guarantee that those specific ones are pre, post, or during hang, either, but the hang does recur periodically while the stress test continues to run. there is one "special" code path that cu can trigger but doesn't always where the agent re-configures its own environment, and if that's in play here, finding a trace with that would be like finding a needle in a haystack
https://github.com/dagger/container-use/blob/f59a1b4fd487be904ad2fbbdd0bde98e3e66e52f/environment/environment.go#L229 is the line that makes env config special... we basically take a directory from one built-up env container and plop it down near the base of a new one. in my head, that's not cyclical, but it definitely produces a more complicated graph than the "normal" case