Okay, I've been trying to use dagger for | Dagger | Page 1

dark valve May 8, 2024, 1:52 PM

#

It quite works fine for small projects. However, I currently have a project repository that's ~1GB in size. Starting a dagger build in a CI means I have to send all this code to a docker container. Which is both time and space consuming. That's right, I'm not sure if dagger actually sends referred *Directory to the dagger engine, but it surely feels like it does just that. Why not provide an ability to mount such *Directory and *File parameters if the build is local anyway...? Sure, the state will be shared, but that can at least be worked around by making copies of source code manually, or a separate data type could be introduced: *Directory and *DirectoryLink, *File and *FileLink, where .*Link objects instruct dagger to just link/mount the entity not make a copy of it.
docker volumes. Oooooh boy. The said project, once build, requires 16GB of disk space. And I've noticed that every time I issue some kind of a dagger command, referring to the source directory, it copies the referred directory to the dagger engine. Not only that, but it does not seem to be removing previously made copies. So if I run ~10 such builds locally, I'm fucked, because that means I'll use up at least 160GB of disk space. That's 160GB out of which I only need the most recent 16G.... IDK, it just does not make sense to bloat that volume, especially w/o providing any easy way to clean it up (not that I vote for making it cleanable... I vote for making it either self-cleaning or reusing artifacts already present in that volume... I mean, in a CI I'll be dealing with the same directory multiple times, so why not just reuse it upon subsequent dagger invocations...?)
In my case, I barely can get 20GB of free space on my laptop (please leave the "storage is cheap" argument aside! I should not have to buy a SAN to be able to use a CI tool)

#

docker. Why.... Why.....? Why docker? Why is dagger glued to the docker executable so much? Why not allow for alternatives? Yes, I know you can rename nerdctl to docker, but come on... That just feels hacky and doesn't really say "we support standard tools OOTB". It says "do this, do that, try that other thing... it should work".
Dagger seems to be rigid in terms of integrations. docker is one of them. Another one is github.com, as I can only load modules from github. Come on!! I was so happy to build a reusable module and push it to BitBucket, hoping I could reuse it so easily for all my CI needs, and then.... I'm greted with a message saying that only GH is supported. That's a bummer! Why not allow for a variety of git repositories/sources? Now I can't really upload my module to GH, because it contains business-related keywords neither I nor the client want to be visible publicly. I'm forced to either git-clone'ing a separate repo during my CI solely for that single module or copy-pasting it to all my repositories and maintaining the same copy everywhere.

#

Images. Look, I really lke what you have going on with all those images: pipelining, *Container versatility, interactivity, etc. But I just happen to come across some shortcomings that just, for the lack of the better word, suck. IDK, maybe I'm expecting too much, maybe my intuition is diferent that others'... But I'd really like to be able to use an OCI image already available locally (docker image ls). I haven't found a way to do that; if I use From(), I'm thrown an error in my face saying that dagger can't find such an image in docker.io. Why.... Why...? You're already integrating with docker. Why not leverage what it's got to offer then? Another nuance is an ability to publish a *Container locally. I mean, to a local docker's image registry (i.e. docker image ls). And so on.
Integration with CIs. Look, I really appreciate that dagger can run in this nice progress mode locally where it prints status in a ci-like fashion. But what the hell is happening if it's run in an actual CI...? How am I supposed to trace back which step it was that failed and at which point? The verbosity seems to be cranked up to the max and everything is printed everywhere and one must be quite familiar with dagger's internals to understand what more than half of it means. It's noisy. It's messy. And dare I say - it's not human-readable. Most CI tools offer a very nice way to separate jobs visually: lint, build, test, sec-test, oci-build, publish, deploy,... You only need a single glance at the dashboard/output and you immediatelly know which phase, which task failed and sometimes you even know at what point. In dagger -- I only see this if I run it locally. Yes, I agree that not having an interactive TTY or libcurses might have a play in this. But that doesn't mean it's OK to make a mess out of all of it...

#

Speaking of integrations - let's talk about secrets. In CI they are normally passed to the CI as environment variables. There is a doc saying dagger supports them (https://docs.dagger.io/manuals/user/180138/host-env/), but I just can't make it work. If I do dagger -m mod1 call fn1 --pass env:my_pass and in the function I just print the pass value, it pints env:pass, i.e. literally what I passed to the command. And I thought the dagger client would source an env variable called pass, get its value and replace env:pass with the actual value and then use it as the --pass argument for the function call. Was I expecting too much? Am I using it wrong? IDK. But I feel like I'm doing everything according to docs and I'm not getting the expected results.
I'd really like to see an ability to pass *Container to multiple function calls. I mean, through the dagger CLI. Yes, I can chain method calls, but AFAIK I can only do the actual chain, like you do in code, i.e. only invoke functions of a returned object. I can't, say, using a CLI, call a function1 in module1, get a *Container object as a returned result, pass it to a second call of function2 as an argument (or one of the arguments) , and so on. I.e. to script a whole pipeline out of functions through CLI, not through code/custom modules.

#

docs. Oh my, they are messy! Some valuable info is in the archives, but using only CUE. Other bits are in the official docs, but trimmed down and missing some details. Also, docs would greatly benefit from cross-referencing, i.e. ref-linking terms in one article to another article where I could find more info about it. But yea, docs are... messy 🙂

On the other hand, I really love the idea, the general approach and I appreaciate what's already there. I'd really like to use dagger, but I just don't have projects small enough for it to be a good fit.

#

oh, and 10. I couldn't figure out remote builds. I think I saw somewhere that it supports remote builds (a server dedicated for dagger engine, and dagger clients just send comands and data over the network to the server, the server does its thing and returns the result to the client). I didn't find info about this in the official docs (only something in READMEs). I tried using DOCKER_HOST (again docker speciffics....), it kind of worked, but it was just oh so slow that I'm not sure that's a good fit for any CI. I feel like I'm doing it wrong and I'd like to see how to do it right in the docs.

obsidian panther May 8, 2024, 2:11 PM

#

hey @dark valve! thx for the honest feedback and we're sorry that you're having that many frustrations when trying to adopt Dagger in larger projects. Having said that, it seems to me that most of the topics you brought up above already have a solution and it's very likely that the reason why you don't know how to overcome those is because our docs and product needs refinement which we're currently putting all our focus on.

Having said that, I'll let @lofty kestrel address all these concerns specifically since he has a better vision of the roadmap and more accurate data on each of the expressed items. I know for sure that several of the items above already have corresponding GH issues that are being worked on.

On the side note, I understand there's many paper-cuts along the way but, is there something specific that's currently a blocker that would allow you to move forward until we address the other concerns?

#

I'm asking since we can work together to understand if your main pressing issue is something that it might currently have a solution and we haven't had the time to communicate accordingly

dark valve May 8, 2024, 2:43 PM

#

@obsidian panther hey! And thank you for your reply. I appreciate you taking time to read through this whole blanket.

Well, the filesystem bloat (docker volumes) and slowness of transferring data to/from the engine are the main blockers currently. If/when I manage to get past them, the env:my_secret will be the next one in the list, with not being able to access docker's locally cached images being a painful inconvenience (again bloating filesystem, as I'd have to docker-save all the 10+ images I am working on).

The rest are mostly mid-minor inconveniences or intuition misses, adding up to the overall experience.

#

*I'd like to mention ties to github.com being the major inconvenience (not a blocker.. probably)

lofty aurora May 8, 2024, 3:06 PM

#

Hey, just a few quick comments:

The env var thing should definitely work, we need to follow up on what you're trying to do to help you debug. It kind of seems to me that you may be using a normal string as a function argument type instead of the dagger.Secret type.
From works by pulling from a registry and images in docker images isn't a registry. Even if you have a registry running locally, the limitation here is that From and Publish currently lack the ability to access dagger services, so it's a networking limitation.
The restriction on github.com is temporary and it's being worked on. The main problem is that some registries like GitLab don't have a predictable URL structure (groups in groups) and dagger needs to know which part of the URL is the repo and which part is a subdirectory. GitHub makes this simple since it's always /<user>/<repo>, so it allowed us to move faster but we're actively trying to solve it now.
Re: bloat and cache, it's possible you're not excluding/ignoring enough, but we'd have to know more details about your setup to help.

dark valve May 8, 2024, 3:41 PM

#

thank you @lofty aurora !

dagger.Secret -- that was spot on! My mistake. Thank you for pointing this out.
so would it be possible to update From() to also look up for images cached locally? Or perhaps delegate this image fetch to the underlying CRI (docker/containerd/etc.)?
are you (the dagger team) implementing fetchers to adhere for each service separately? Would it be perhaps possible to design it to accept URI links pointing to the 'directory' containing the dagger..json (dagger would append dagger.json to the provided link and find required directories from that point)? This way I could even host my own file server (nginx?) serving dagger modules, a completely vendor-agnostic approach. And combined with a feature where the protocol is respected (git://, ssh://, https://, ftp://, file://, etc.), it would be a rather flexible and, I imagine, intuitive UX. What do you think?
I'm sorry, I must have missed something about exclusions. What can I exclude and where?

grave salmon May 8, 2024, 4:57 PM

#

Hey! Thanks again for all of your great and honest feedback, its going to help us make the best product we can.

For exclusions, its not your fault, I don't think this is documented (to your previous point about docs)

The closest thing I can find is we have some in-line docs in the module that implements the config file. This describes all the valid options in dagger.json https://github.com/dagger/dagger/blob/b8409fb11a163168860259f26fa27cb9e2e7b48b/core/modules/config.go#L9

I added an issue to add this to our docs https://github.com/dagger/dagger/issues/7324

If you're open to it, i would be happy to pair with you for a bit on your existing project to work through some of the friction points and maybe show a couple of common traps.

lofty aurora May 8, 2024, 5:07 PM

#

dark valve thank you <@768585883120173076> ! - dagger.Secret -- that was spot on! My mista...

Like @grave salmon said, it's understandable because it's not well documented, but the fix isn't to document dagger.json, it's to polish dagger config. A couple examples:

You have include and exclude to filter which files get uploaded when loading a module (example)
And you have custom views that help you filter a directory argument for a function (example). Example usage: dagger call --source=./src:default test. The --source argument there is filtered with that view's patterns.

lofty aurora May 8, 2024, 5:15 PM

#

dark valve thank you <@768585883120173076> ! - dagger.Secret -- that was spot on! My mista...

As for From, images in the docker engine are not considered "cached locally" from Dagger's POV because even though Dagger runs containers, it doesn't go through docker run. It's its own thing. But, the plumbing is already there for From and Publish to take advantage of Dagger's networking capabilities so it reaches other systems other than a publicly available OCI registry, it's just pending prioritization to implement.

However, there are workarounds in the meantime for dagger to talk to the docker engine. See:

lofty kestrel May 8, 2024, 6:45 PM

#

Hi all, I will try to address each point, while avoiding repeating what others have already said, as much as possible 🙂 Thanks for all the feedback, it's super appreciated.

#

I have to send all this code to a docker container.

I think that's mostly been addressed. To summarize, yes that's correct, and we think it's a good thing, because having all inputs to your pipelines content-addressed and cacheable is part of the magic of Dagger. Obviously that sets a higher bar for things like filtering content to upload. Clearly we have work to do there. This should really help: https://github.com/dagger/dagger/issues/7199

#

docker volumes. [...] I vote for making it either self-cleaning or reusing artifacts already present in that volume

That's how it works. The data in that volume is Dagger's cache. It is reused across runs, and it does evict data automatically as you add to it. The eviction could definitely be smarter. But it does exist, and the default behavior is what you expected: cached operations are reused, and the volume doesn't grow forever.

This is also the case in CI. The main challenge in a CI environment, is to persist that volume across runs. How to do that depends on your CI infrastructure. Persistence of cache data in various CI configurations is something we spend a lot of time talking about, and there are a lot of improvements coming to that.

#

3. docker. Why.... Why.....?

That made me laugh 😂

I understand the frustration. We were very careful to not tie the design of Dagger to Docker. But then, we have this hardcoded "docker run" and the poorly documented alternatives. So at the moment, we are getting the worst of both worlds. We pay the price for strong decoupling from Docker (for example, no automatic support for local docker images or networks, as that would lock us in); but we don't reap the reward, because we didn't add the polish needed to leverage it.

We plan to fix that. Here's an issue summarizing the problem and planned solution: https://github.com/dagger/dagger/issues/5583

#

Another one is github.com, as I can only load modules from github. Come on!!

I understand the frustration. We had to cut that corner to ship faster. I believe that @fierce brook is working on that issue right now. (not sure if there's an issue)

fierce brook May 8, 2024, 7:02 PM

#

lofty kestrel > 4. Another one is github.com, as I can only load modules from github. Come on!...

Yes totally, the open issue to discuss the design is https://github.com/dagger/dagger/issues/7218, we'd be happy to have your feedback and insights 👼

The current in-progress implementation will follow Go's pattern, as chosen by the community as a strong developer experience. To answer your question, we will also be able to support vanity URLs as Go does (and by relying on the same go-get=1 pattern), so you will be able to expose your own repo / self hosted repo behind a vanity URL too

The design and implementation is well advanced, it is a matter of hours/ days 👼

lofty kestrel May 8, 2024, 7:02 PM

#

local docker image. Why.... Why...? You're already integrating with docker. Why not leverage what it's got to offer then?

That is the other end of the tradeoff discussed in point 3. Dagger is actually strongly decoupled from Docker. As discussed in point 3, we're going to make that more clear, so that you reap the benefits (easy and obvious how to use Dagger without Docker). But, the price remains the same: no magical integration in the Docker engine.

That doesn't mean you can't integrate with your local Docker engine's images. It just means the engine won't do it as a core feature. You can implement that integration in your own code, though (or install a module that does it for you). In practice that means accessing the host docker socket, and making calls to that. As one example, here's a module I wrote that does exactly that (it can also run a docker engine as a service inside dagger, which is also useful but not what you're after I believe) https://daggerverse.dev/mod/github.com/shykes/daggerverse/docker

docker :: Daggerverse

A Dagger Module for integrating with the Docker Engine

#

Integration with CIs.

There are two ways to visualize what your Dagger pipelines do in CI. They are not mutually exclusive (you can do both).

Read the Dagger logs in your CI interface, the same way you read all the other logs. In my experience this works fine, but you seem to have had a bad experience. Can you share more details on what output you found confusing, or too verbose? We can address it in more detail; maybe you hit an edge case, or are using an older version, or we have a blind spot.
Configure your Dagger Engine to send telemetry to Dagger Cloud. Then you can visualize your Dagger pipeline activity as Traces: https://dagger.io/blog/introducing-dagger-traces. If you use Github (I know, Github only for now, sorry about that), Dagger Cloud will also integrate with your Pull Requests, and add checks for each individual Dagger call.

#

Secrets

I'll skip this one, it looks like it's already been addressed

#

script a whole pipeline out of functions through CLI, not through code

Yes! I also want that very badly. We are navigating the best way to do it (without disrupting everyone with too rapid and half-baked UX changes)

Here are some related issues:

#

docs. Oh my, they are messy!

Yes we've had a lot of churn. It takes time to accumulate great content, and we've had to break a lot of it twice: first when switching away from CUE, then again when switching to functions. We are definitely paying the price for that.

The good news is that we are making a big investment to fix this. In addition to @elder ice who has carried a lot of this work on his back (and dealing with the churn of product changes), @hoary hinge and @grave salmon are going to spend more time writing and updating docs, including porting over some of the great content from older versions. @obsidian panther is also working on better tooling to generate multi-language examples, which should help make API docs and guides more useful down the line. Hopefully you will start seeing material improvements in the next few weeks

#

Thanks again for taking the time to write all this, we really appreciate it. We will take 9 paragraphs of "why? why????" over quiet indifference any day, because it means you care enough to write it 🙂

We'll do our very best to get another shot with one of your projects in the future!

dark valve May 9, 2024, 8:00 AM

#

@lofty kestrel this is an incredibly professional response. Thank you! If PR is your role in this project - you're doing it very well; if it's not, then... well, it should be 🙂

I've skimmed over it, got the keywords, links and the overall gist. I'll reread it (at least once) again a bit later.

dark valve May 9, 2024, 9:23 AM

#

Can you share more details on what output you found confusing, or too verbose?

I'll share an invocation of one of my modules as an example. I tried to as comments to the output lines to the best of my ability, but I've got to admit, in some output blocks I simply gave up and skipped them.

As a user, I'd like to see

clear, concise and consistently formatted output, providing understandable and valuable to me information, I could use for confirmation or troubleshooting purposes of the CI job
LOGLEVELS. May I suggest introducing loglevels? Default loglevel could be INFO, only printing short but descriptive, messages relevant to a sunny-path execution of the CI function. Higher loglevels cold offer some debug and/or trace verbosity (IMO the current output is DEBUG at best).
FORMATTING. I just don't get it. What's the purpose of '...' and those numbers in front of each line? Some numbers repeat, some don't,... I don't get it. And there are some inconsistencies.
CLARITY. There are some ambiguout shoutouts in the output, like 'connecting', 'initializing', 'uploading', etc... They do not provide me ANY valuable information. After a while I start to think they do, but after reading the output a bit more I'm back to not having a clue again. Either print an informative message that I might benefit from or don't print it at all (at least not in the INFO loglevel...)

Just my 2¢

📎 dagger_commented.log

lofty aurora May 9, 2024, 9:47 AM

#

dark valve > Can you share more details on what output you found confusing, or too verbose...

Thanks. Some of that comes directly from buildkit and having concurrent things running (the numbers track the same step since there may be other steps running at the same time), but maybe @limpid pendant can explain it better. He's actively working on that output in feat: improve plain progress, as a fix to Output in automatic provisioning like 0.10.

limpid pendant May 9, 2024, 9:51 AM

#

Yup, thanks @dark valve - I'm wanting to almost entirely rework the plain progress output, I think pretty much everything you mention is already on my radar 😄 (except loglevels, that's a really interesting point, might have to think a bit more about this one)

dark valve May 9, 2024, 11:02 AM

#

If it were me, I'd ditch (move to DEBUG) like 90+% of the output in the file I've attached. If I can't make sense of it, it only gets in the way of readability of the lines that I know I need and can make sense of. And get rid of all the duplications.

#Okay, I've been trying to use dagger for