#I've been working on "migrating" my Nx

1 messages ยท Page 1 of 1 (latest)

vocal kelp
#

The problem I had is this is a monorepo, where the whole point/benefit is I configured my dependencies/targets in one place and let Nx do all the heavy lifting of figuring out how to run in what order. Potentially that means installing and building everything from scratch (you should be able to clone the git repo -> run the full passing test suite in <5 commands).

I spun up a little library locally that basically mapped my Nx targets into chunks that either perform a "build" (e.g. install dependencies or compile files) that will be used in subsequent steps or a "check" (e.g. tests/linters) which simply passes or errors and is theoretically parallelizable.
Then it auto-generates two go dagger modules:

  • One that runs at the top level that runs the packages "in order" (it is generally up to dagger to actually manage execution order via DAG that is inferred by dependency input based on Nx's DAG)
  • One that runs per package that is based on nx configuration to chose which targets to run per package and exports a "built" directory to be used by dependents

This will call out to custom dagger modules to perform whatever the task at hand is

For instance, this is my file for running eslint (per package): eslint

#

For a while I was hitting a wall that Dagger was simply not able to execute this workflow. Unfortunately it was mostly opaque failures where the local terminal just freezes up and the online traces were simply failing to continue updating. I would only know dagger was "done" because my CPU would stop spiking (max would be as high as 1200%).

But as of 0.16.2 it seems I can finally get far enough to have a successful build! 0.16.3 appears to have had some minor improvement on that, especially in the local terminal.

#

So getting to the data. My monorepo is honestly pretty small, with only ~25 projects. All "utilities", nothing like a full fledged server that gets built into a web app or something (although I intend to work on some soon).
But each requires its own set of dependencies, building the code, listing, formatting, and unit testing.

My workspace directory is a total of 1.2GB (based on du -hs ., but:

  • ~400MB of that is node_modules
  • ~500MB of that is the pnpm store
  • ~150MB of that is my Nx cache

So those (and a few others) get ignored by the dagger engine (via +ignore)

Running normal Nx "from scratch" (no pnpm or nx cache) (pnpm i && nx run-many -t test): ~65 seconds
Running again with both pnpm and nx cached (see above that costs ~650MB): ~8 seconds

Lots of caching wins for Pnpm + Nx, even if it comes at the cost of non-deterministic behavior.

Running my dagger module took 10 minutes and 25 seconds!
For full transparency, I never actually have seen an uncached success. Usually I have to break execution into two runs. The first gets pretty far but eventually fails. A common failure is Error: Post "http://dagger/query": command [docker exec -i dagger-engine-v0.16.3 buildctl dial-stdio] has exited with exit status 137, make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr= (am I DDOS-ing my own machine?). But the second execution can pick up where it left off.

#

Based on dagger core engine local-cache entry-set, it ends up caching 4004 distinct entries, with a total of 4.69GB of data!
When looking at my OrbStack dashboard, the associated volume is actually 10GB. So I'm not entirely sure who to trust...

The good news is that caching gets lots a wins here, and re-running post-success takes only about ~2 minutes.
IMHO we could even do signficantly better. I would expect the high level module function to detect:

  1. It is the exact same dagger code being executed as before
  2. It is the exact same directory as before

So I would have expected the top-level module itself to just return a cached result and skip literally everything internal. If that were the case, spinning up the module would be the most expensive part. Actual "execution" could probably be sub-second!

#

I'm not surprised that dagger takes both more time and space than Nx (the cache has to include things like the Node runtime itself) but I think it is important to acknowledge the order of magnitude difference. I'm also running on a fairly new MacBook M3 with 36GB of RAM. The performance of CI machines is potentially even worse!

For comparison, my current Github Action CI workflow (which basically just runs the pnpm + nx workflow above with minimal caching) takes about 5 minutes
The equivalent job using Dagger can't even successfully complete. It fails at ~27 minutes after running out of storage space.. From what I can tell, it got about ~75% of the way through.

I've opened two issues that I would hope could help narrow the gap: Limit parallelism and defer caching

I also believe the general functionality of storage drivers and theoretically remote execution to follow could massively improve the workflow performance as well.

#

Overall I just wanted to say that the functionality of Dagger is incredibly promising and I'm very happy with what I was able to achieve. That said, I'm still wary about the "scalability" of this. It seems like Dagger really struggled to scale with my codebase, and I'm just a single person. Imagine trying to run a similar process of a real companies codebase with 100s of projects and a lot more data.

I'm sure there are options to do things like split the build up into multiple parts and multiple dagger executions. But then it feels like I'm back to spending tweaking CI logic that Dagger was supposed to do for me. And for local dependencies, how can I access the successful builds without running the jobs with the same engine (same instance), which comes back to limiting parallelization.

The reality is I'm probably using Dagger "wrong", but it just seems so close to my dream CI tool. Where I can build my entire application stack "from scratch" and a single monorepo. Of course a true "from scratch" is going to take a while and that's fine. But odds are someone else (or some other CI process) has already built it and I should be able to fetch a cached version really quick (with all the deterministic guarantees that Dagger comes with).

Sure updating a low level package is going to result in a lot of cache busts, but the dependents have to get rebuilt sometime. Might as well do it now during the "test" phase of CI and guarantee correctness, and then the actual deployment will be speedy fast because it just loads from cache!

I also think this workflow might be a decent candidate to unofficially judge Dagger performance and benchmark any regressions going forward. I've noticed incredible performance improvements just from 0.14->0.16, and am excited to see what comes next.

#

The caveat on probably all this is that I am just doing this on my own personal repo for a couple of minor npm packages.

This isn't part of any paying companies codebase, will probably only have myself as the consumer, and doesn't have strict performance requirements.

Heaven knows I don't need something as robust as Dagger for my workflows, but I see it as a chance to prove Dagger can work on a "simple" use case, before attempting to endorse it for anything with higher stakes.

I'm 100% aware that I am overkilling my dev process, that is the whole fun of it. There is no PM or CTO holding me accountable for results. The consequences of the issues I've had above are mostly educational and a chance to highlight here (my opinions) on what the good, the bad, and the potential of Dagger.

frozen osprey
#

This is a really great write-up with data and I am very interested in the final outcome. We have very similar (and even larger) Nx based monorepos which I'd love to see being used with Dagger. However, as you pointed out, the performance difference is that much, it's not going to be viable. ๐Ÿ‘€ intently.

uncut pasture
#

Hey, thanks for this very very detailed feedback -- will read it more thoroughly later today ๐Ÿ™

soft brook
#

cc @plucky crown thought you might find it interesting

abstract vortex
#

Agreed, this is as helpful as it gets in terms of feedback, thank you!

I've noticed incredible performance improvements just from 0.14->0.16, and am excited to see what comes next.
That is great to hear, and yeah quite a few of the performance improvements across those versions were actually just setup for even more performance improvements that are being worked on now (mention a few below)

So I would have expected the top-level module itself to just return a cached result and skip literally everything internal. If that were the case, spinning up the module would be the most expensive part. Actual "execution" could probably be sub-second!
Yeah, this is what we refer to as "global function caching" and it's being actively worked towards now. Basically, at the moment we explicitly disable caching of function calls across sessions (a "session" encapsulates a dagger CLI invocation and all functions called directly or indirectly by it).

What we're working towards will:

  1. enable caching of function calls across sessions by default
  2. enable optional cache control of function calls, so you can do things like mark a function as never cached, cached but with a TTL, etc.

It should make a pretty massive difference like you said. In fact, if you use dagger shell today and invoke the same function multiple times in one shell session, you actually do get function caching (since it's all in the same session) and it indeed is pretty "instantaneous" feeling. Once global function caching is enabled, you'll be able to get pretty close to that same latency but across sessions (modulo some session setup overhead, which shouldn't be that much).

Limit parallelism
Yeah 100%, we absolutely need that. We're focusing on caching for a while but after that's settled this is probably the next best thing we can do to improve performance. The actual implementation work needed for our caching work will actually set-up parallelism control pretty nicely too.

defer caching
...
I also believe the general functionality of storage drivers and theoretically remote execution to follow could massively improve the workflow performance as well.
Yes a lot of this should end up falling under the umbrella of "remote caching", which is what we're going to work on after global function caching shipped.

Full scope and details are more TBD here, but we do want to support finer grain cache control as part of this, possibly down to individual calls (which is what the "defer caching" issue you opened is getting at IIUC). I think that level of cache control could easily be made to apply not only to remote cache, but also local cache too, so it all kind of blends together in my mind right now.

The equivalent job using Dagger can't even successfully complete. It fails at ~27 minutes after running out of storage space.. From what I can tell, it got about ~75% of the way through.
Is the module code you're invoking there something anyone could run (i.e. doesn't rely on secrets or other data only you/your repo has access to)? I'd be curious to try myself and see if something is going obviously haywire and consuming way too much disk space. If not, while it may be painful if you were able to repro and get some debugging info out after it fails, that could be helpful. Something as simple asdu -h -d1 /var/lib or similar could help track down where the disk space is getting consumed. Another possibility would be to stop it after ~20m (before it fails) and then get the output of dagger core engine local-cache ... to see what's in there

cyan cove
#

Thank you @vocal kelp ! I'm very curious to dive into why your particular codebase seemed to hurt Dagger performance so much. Dagger is by no means perfectly scalable, but it is used in production by teams of hundreds of engineers, and it's been through many scaling gauntlets... I really want to understand what particular dimension of scale caused it to struggle so much here

vocal kelp
#

Is the module code you're invoking there something anyone could run

Yup, the only "secret" on github actions is my Dagger CI token, which is easily swappable with your own.

Ideally you should be able to:

  1. Clone github repo
  2. Open in DevContainer (it installs dagger for you)
  3. dagger login
  4. ./scripts/dagger-test-and-build.sh (just runs the correct dagger function)

Technically if you have dagger installed globally on your computer and are already logged in, you could even skip the middle two steps.

cyan cove
#

For ideal performance, my naive approach would be to create a Dagger module for each component of your monorepo, and map dependencies within your monorepo to dependencies between their respective dagger modules. Then you can optimizally tune each directory filter to only load the files you need. My guess is that you have a similar setup with NX (one nx component per monorepo component). So you would have overlap between the NX "dag" and Dagger "dag". But if you want an apples-to-apples comparison of their performance, that would be the way

vocal kelp
#

dive into why your particular codebase seemed to hurt Dagger performance
I only have guesses, but basically I have ~25 projects which all independently do

  • Load a debian container
  • Install Node
  • Install PNPM dependencies (they all use the same pnpm-lock.yaml so ideally caching is heavily usable here)
  • Execute typescript builds (potentially memory heavy on later packages)
  • Execute eslint (potentially memory heavy on later packages)

My guess is that even if cache is being hit, as discussed above it isn't actually caching the top-level function and still going through each step, for each module, for each package. Which apparently is a lot

#

don't think you need this Exclude
Agreed it isn't a technical requirement, but I'm trying to be consistent where I currently duplicate +ignore due to this issue: https://github.com/dagger/dagger/issues/9490

Then you can optimizally tune each directory filter to only load the files you need
Yea so my (not heavily optimized_ approach here was to exclude a ton of things at the top level (basically equivalent to .gitignore plus couple other files which have no impact on builds like READMEs).
Then my dagger/monorepo/main.go file will attempt to build all the packages by only passing in that package + the built files of its dependencies (leaving it up to Dagger to actually figure out execution order, which it does).

So the files seen by every run of /dagger/monorepo/monorepo-builder/main.go .BuildProject() should ideally be fairly minimized to the relevant context of that project

I could probably go even further, like my Biome (a JS formatter) Module could strip out all the type files (.d.ts). I can experiment with that a bit more next

plucky crown
#

Very interesting. I have pending some more research on the integration of Nx and Dagger but I do remember having issues with Dagger not knowing when the task finished but this was a long time ago.

I think that my approach for the integration was different but I hear what you are saying. Nx Inputs can be a pain to proper configure.

cyan cove
#

@vocal kelp re: the difference between engine local-cache entry-set and size of docker volume... That's probably due to the cache volumes. Dagger distinguishes between 1) execution cache and 2) cache volumes. local-cache entry-set only counts the former. docker volume stores both --> wrong

abstract vortex
#

There are some things that wouldn't be included in local-cache entry-set though, the containerd content store, our otel client DBs, the boltdbs used for metadata in buildkit/containerd come to mind.

Gonna try running locally in a bit to see what's going on

cyan cove
abstract vortex
#

@vocal kelp started taking a look, did manage to finish it successfully (trace) on my laptop but it does indeed take up a lot of disk space; I saw 32GB in the engine cache after it finished. The memory usage of the engine also got excessively high, over 10GB RSS, while it was running (though it did at least go down to a normal value once done, so at least not a leak).

I'm gonna take a deeper look and try some stuff, but a few things I noticed so far:

  1. There's a ton of cache entries for copys of directories, many of which are 300+MB. I think these are from heavy usage of WithDirectory, which in most cases requires copying files on disk. I'll look more at your code and get back on some ways of optimizing it (there's a few options depending on the specific situation)
  2. I see in a few places you are iterating over maps and calling WithDirectory in the loop (e.g. here). Since go map iteration is randomized, this means that the order you are calling WithDirectory can change, which can impact caching depending on the situation. It's worth doing a sort or otherwise making order of operations like that deterministic. That may improve caching on re-runs.
#

Also, this whole thing you built is honestly super cool, I'm sort of tempted to see if we can add it to our CI's benchmark test suite since it's a great real-world stress test that doesn't have a ton of special requirements (cc @sage osprey, just an intriguing idea)

vocal kelp
#

Thanks for taking a look!

I wonder if I may have naively over-optimized parts that are resulting in a lot of these copies.

In general I've tried taking an approach that the input of a function should just be directories, and any containers should be spun up internally. This does make container setup very efficient and cacheable, and never have any unexpected behavior from whatever runtimes or paths are set in a container. But it means I need to do a lot of copying to/from containers->directory to make it work.

A good example would be my pnpm install which inputs and exports are only directories, and node/pnpm container only exists internally

Another good example of over-optimizing parallelizable computer is my tsc step: https://github.com/JacobLey/leyman/blob/main/dagger/executors/tsc/main.go#L69

In theory there are 4 different builds that can run in parallel (tsc itself for type checking + .d.ts generation, then swc for .js, .mjs, and .cjs).

I split that up into 4 commands for maximum parallelism, but that means I then need to copy the files afterwards to merge them back together.

What I'm "winning" by decoupling any runtimes/dependencies/environment variables from the actual source/deployable code, I'm probably losing in large cache costs.

go map iteration is randomized
Good to know! I've also been learning go as part of this project

#

Obviously I haven't fully investigated the cost of each step, but that is generally what I had in mind on this feature request of "defered caching"

If it was possible to do a bunch of steps (like copies) and "merge" it into a single request to dagger engine's cache, the really expensive copies in the middle can be removed and all we get is the lightweight directories at the end

abstract vortex
# vocal kelp Obviously I haven't fully investigated the cost of each step, but that is genera...

Ah okay, I understand better now. Your instinct there is spot on actually. There actually is an optimization (literally called MergeOp ๐Ÿ˜„ ) we pick up from buildkit where certain copy operations can be done much more efficiently by hardlinking files rather than copying them, but unfortunately right now in Dagger it can only be triggered under pretty specific circumstances. One of the requirements currently is that include/exclude isn't used, which unfortunately rules it out for much of your usage.

  • Fortunately, all the caching work I mentioned earlier in this thread also entails dagger implementing operations in our own code rather than relying on buildkit. Once that's done we'll be free to lift restrictions like that so this can all "just work" a lot more magically.

I have found so far a few places where you are copying directories that can instead be mounted via WithMountedDirectory, which can save some copies in some places. I think there's probably also some pretty easy improvements from using WithMountedCache in the right places.

Real-life is beckoning atm so I have to sign off for the night but I'll pick this back up tomorrow. I'll send a PR to your repo with whatever improvements I find and some guidance on anything else I notice

sage osprey
frozen osprey
sage osprey
sage osprey
abstract vortex
#

@vocal kelp sorry for the delayed response, got bogged down in other work but took a look at this again finally today. Was able to make some small adjustments in a few places and cut both the disk space and runtime by over half for the "fully uncached" case.

Before (trace)

  • 25m57s
  • ~50GB disk space (remeasured more accurately than last week, was actually worse than I originally thought)

After (trace)

  • 12m35s
  • ~20GB disk space

The diff is here. Happy to send as a PR if you want but feel free to copy-paste it, whatever you prefer.

(writing up an explanations of the fixes now too)

#

Some of the simplest improvements are from just replacing WithDirectory/WithFile with WithMountedDirectory/WithMountedFile. WithMounted* skips the requirement for the engine to copy files on disk. The only thing to look out for is that WithMounted*(<path>, source) will entirely replace the contents at <path> with source (whereas With* is an additive copy)

#

<to be continued after dinner>

abstract vortex
#

The diff here is an example of that MergeOp optimization I mentioned earlier. It by itself saved ~7m of runtime and 10GB of disk space.

Like I said, it's unfortunately very specific when that optimization can get triggered right now w/out any copying involved. It requires:

  1. The destination directory and source directory are at their "original root"
    • In other words, the Directories can't be the result of obtaining a subdir from another Directory (i.e. calling .Directory(<subdir>)), unless when you obtain the subdirectory it's actually just the root of another mount (i.e. created via WithMounted*) in a Container
    • Obviously incredibly confusing... which is a large part of why it's just an internal best-effort optimization until we can generalize it some more
  2. No include/exclude patterns can be used

This was a particular situation where it was possible to reach those requirements though. A subtle but important part is a few lines above where I changed WithDirectory("dist", dag.Directory()) on nodeContainer to be WithMountedDirectory("dist", dag.Directory()).

Because of that change, the dist Directories from typedContainer, jsContainer, etc. meet the first condition. If I didn't create those directories as a mount they wouldn't have met it.

#

This diff improves caching by moving a "common" step that's always gonna result in the more or less same thing earlier in the chain so it doesn't get invalidated by earlier steps that change more often

#

Here the diff switches away from using Exclude in a WithDirectory and instead just use WithoutDirectory. Part of the motivation is to be able to switch over to WithMountedDirectory but it's also more efficient generally speaking because while WithDirectory requires copying even if it's just an exclude, WithoutDirectory just deletes files

  • I feel like we could probably make this a transparent internal optimization
#

Overall I suspect there's probably more improvements possible similar to the above. I also looked briefly into whether use of WithMountedCache (equivalent of dockerfile --mount,type=cache) would be able to help here but I hit a wall on my knowledge of pnpm,npm etc. Might be worth considering if you haven't already

#

Also, I very much realize a lot of the stuff above is deeply unobvious (it leaks a ton of technical weeds around linux overlayfs and mounts, essentially) and ideally over time we'll be able to hide more and more of it by just automatically optimizing behind the scenes