#Is it possible to query dagger for a list of copied in files for some target?
1 messages · Page 1 of 1 (latest)
hey there! Dagger doesn't currently allow you to query what inputs have changed between runs yet cc ( @brave quiver ). Having said that, it'll handle this automatically for you. If you run the same pipeline multiple times.
Yup, dagger should handle that automatically (either with local cache, or with dagger cloud's cache)
@woven crag out of curiosity what would a tool look like that lists dependencies? How would it get called? The problem is that such a tool would need access to a full dagger engine (which would need to be started) so you could actually inspect what those inputs to the run are - e.g. otherwise we'd have entirely separate implementations of git fetching / http downloads / local directory analysis, etc.
So i dont even need to know what has changed between runs - just what the file inputs are.
The use case is we have a large monorepo with lots of expensive tests that legitimately need to run fairly often, so we need to parallelize runs to get decent performance in ci.
Right now we use a custom dependency tracking system to sort of partition the monorepo to make these decisions. It would be good, i think, if we could instead just query relevant targets to see if they've changed before trying to run them.
The problem is that such a tool would need access to a full dagger engine (which would need to be started) so you could actually inspect what those inputs to the run are
This seems ok? This tool would be generating a buildkite pipeline to run steps in parallel.
It would run on a machine with a dagger engine available.
Right now we use a custom dependency tracking system to sort of partition the monorepo to make these decisions. It would be good, i think, if we could instead just query relevant targets to see if they've changed before trying to run them.
since the DAG is lazily evaluated, I don't think this is currently possible given the architecture of the engine. AFAIK ( @brave quiver jed can probably confirm) operations are being evaluated as they're executed which won't allow you to know beforehand if that build pipeline is cached before starting.
@brave quiver maybe there's a way to run an op in Dagger at the beginning of the pipelime to at least validate if the inputs are cached and return that via stdout to take the decision @woven crag is trying to do? Let me know if you get the idea I'm thinking here
I think I get the idea, and I think it's potentially technically possible - essentially we'd just query the cachekeys of all the sourceops, to determine what's changed
That said... it's not a particularly easy change to make atm, maybe at some point after we've done other cache related improvements it'll be easier.
I was thinking something more basic, like a WithExec for example at the beginning of the pipeline that returns true or false depending if the inputs are cached based on a hash of mtime of the files.
Id like to be able to implement something like this for dagger, basically