#ah... i think this [edge merging](https

1 messages ยท Page 1 of 1 (latest)

timid sinew
#

it's definitely possible, do you have repro instructions? I could give it a try off that PR

rough olive
#

my repro lowkey sucks, it takes 20+ minutes on a fresh engine:

  1. setup claude with container use-- optionally i'm using single tenant mode โ€“ cu config agent claude, then edit claude.json like this:
      "mcpServers": {
        "container-use": {
          "type": "stdio",
          "command": "container-use",
          "args": [
            "stdio",
            "--single-tenant"
          ],
          "env": {
            "_EXPERIMENTAL_DAGGER_RUNNER_HOST": "docker-container://dagger-engine.dev"
          }
        }
      },

  1. build dev engine, enable debug (optional, this happens without debug)
  2. start up a ton of claudes in terminal splits, give them a hard task so they run for a long time, mine is inside container-use:
I want to add LSP integration to container-use. 

For inspiration as to the shape of the feature, i want it to work like zed agent's "diagnostics" tool or OpenCode's LSP servers. Search the web to understand what that might look like.

It should not expose tools for starting and stopping LSP servers, but should allow the user to configure LSPs that start on environment creation that will then be exposed to the agent via an lsp_diagnostics MCP tool.

Carefully research the existing code to find a safe place to root the LSP process. If it runs in the container, we'll likely need to utilize dagger's CacheVolumes. If it runs on the host, it MUST NOT be rooted in the user's git repository, but can probably be rooted in the container-use managed worktree associated with a given environment. USE LIBRARIES AS APPROPRIATE - do not reinvent the LSP client wheel if you don't have to.

DO NOT write extraneous developer-facing technical documentation (user-facing docs in /docs are fine).

DO NOT return an incomplete implementation full of placeholders.Use test-driven development practices to build the complete feature incrementally, and then demonstrates that it works with gopls using manual test methods.
#

but i'd be happy to try out your PR, gimme a SHA and like 2 hours and i'll report back if it's any different

timid sinew
rough olive
#

yeah just lmk, imma run an errand real quick and maybe grab lunch

timid sinew
#

hm ironically I seem to have hit the engine hanging while loading modules in the most recent run

#

may be different though

rough olive
timid sinew
#

I pushed that one to another branch so shouldn't disappear

timid sinew
rough olive
#

engine exited 137 partway through my stress test, gonna kick it off again and watch htop

#

alongside some other known red herrings

#

watching the progression of memory utilization, looks like it'll get oomkilled in 10 minutes lol

#

so prior to it getting oomkilled, the time it takes to run dagger functions in container use slowly crawls up from ~5s to ~30s. on 18.16, also with a warm cache, it's 2.5s.

gonna stress it one more time and hook you up with a heap dump.

#

the goroutine dump did no longer show the isDep chain, which indicates to me that the cu blocking is at least tangentially related to edge merging

#

at the time of these dumps, the time to call dagger functions has crawled up to 15s

#

(dumps are uploading)

timid sinew
#

Thanks! looks like it has a GB of data related to content-hashing... probably doing that too often or something

rough olive
#

happy to help ๐Ÿ™‚ this is the top bug in CU i really really want fixed both for myself and because you're almost guaranteed to hit it if you start using cu "for real"

#

we've had 2 or 3 people report in discord and each time i can make the consequences of the blocking a lil less bad, but im pretty sure at this point the core issue is what you're tryna fix

#

also like, idk if it's helpful, but the shape of container-use graphs is kinda weird and i have to wonder if that contributes... we've basically got shared base layers, a layer that's a NoCache hostdir turned into a repo, asRef'd, and then turned into a dir, and then every environment is layering per-tool-call in such a way that they're, naively, non-overlapping branches of a tree with the same shared base... i think maybe there are actually edges to merge, like in the case where you have the same file written twice in the same exact way in 2 different envs, producing 2 identical-content layers even though their parents are different

timid sinew
rough olive
#

sick i can prolly test tomorrow

timid sinew
rough olive
#

stressing it as we speak, so far so good, seems stable at 3.5g ram and a 5-8s warm-cache dagger functions invocation

#

and for once on the agent end, it looks like claude is the bottleneck instead of the tool calls

#

im gonna walk off at let it run to the end, will report back.

#

got 1 random 15s dagger functions

#

oh and 26s

rough olive
#

ah shit:

5   : load module: .
6   : โ”† finding module configuration
6   : โ”† finding module configuration DONE [1.3s]
7   : โ”† initializing module
8   : โ”† inspecting module metadata
7   : โ”† initializing module DONE [22m46s]
8   : โ”† inspecting module metadata DONE [0.3s]
9   : โ”† loading type definitions
9   : โ”† loading type definitions DONE [1.2s]
5   : load module: . DONE [22m49s]


Name                   Description
build                  Build creates a binary for the current platform
build-multi-platform   BuildMultiPlatform builds binaries for multiple platforms using GoReleaser
lint                   Test runs the linter
release                Release creates a release using GoReleaser
source                 -
test                   Test runs the test suite
test-nix-hash          TestNixHash tests if nix-hash binary is available in our custom container
     1369.71 real         6.80 user         2.95 sys

all container-use tools are blocked during this too.

#

cpu looks sane during this, so does memory -- nothing is strained

#

dumps uploading

#

they do all have the isDep thing going on still

#

claude seems fairly convinced that there's a cycle here that bk is not detecting:

Dump 1, 2, original**: Checking if various edges depend on `0x40924d8900`  
**Dump 3**: After going through 100+ levels of recursion, we're now checking if `0x40924d8900` depends on `0x40a8bb4e80`

This proves there's a **true circular dependency**:
- Edge `0x40924d8900` depends (transitively) on Edge `0x40a8bb4e80`
- Edge `0x40a8bb4e80` depends (transitively) on Edge `0x40924d8900
timid sinew
#

Thanks again though! I'll treat this a separate issue from the edge merging rm'ing

#

Are there any cloud traces for this run where you hit the hang? Even if I need to download the raw trace and write some parser/grapher for it, that might be useful

#

I just need to extract out what corner case triggers this

rough olive
rough olive
rough olive
#

anyways with the trace ids, trust that there's literally 100s if not 1000s of trace ids that look very similar to those, but no way for me to pull them from my logs and categorize them meaningfully. no guarantee that those specific ones are pre, post, or during hang, either, but the hang does recur periodically while the stress test continues to run. there is one "special" code path that cu can trigger but doesn't always where the agent re-configures its own environment, and if that's in play here, finding a trace with that would be like finding a needle in a haystack

#

https://github.com/dagger/container-use/blob/f59a1b4fd487be904ad2fbbdd0bde98e3e66e52f/environment/environment.go#L229 is the line that makes env config special... we basically take a directory from one built-up env container and plop it down near the base of a new one. in my head, that's not cyclical, but it definitely produces a more complicated graph than the "normal" case

GitHub

Development environments for coding agents. Enable multiple agents to work safely and independently with your preferred stack. - dagger/container-use