#Live feedback on Go SDK
1 messages · Page 1 of 1 (latest)
Reading the last thread @sullen citrus , you're looking for a direct equivalent of Dockerfile's COPY. This is technically possible today, but not easily "chainable" because you have to exit your Container chain to produce a Directory, modify that directory with Directory.WithDirectory, then merge it back in.
So the possible solution would be a convenience Container.WithDirectory or something like that.
Did I get that right?
(cc @civic shell @lethal horizon )
Yep! I imagined something like this...
builder.Copy(root + "/services/go.*", "/src/").Copy(root + "services/" + servicePath, "/src/")
where root is some representation of my repository root
dealing with Directorys vs HostDirectorys vs directory IDs also feels a bit verbose at the moment
In any case you will need to explicitly designate the source to copy from. In your case it's the repository checkout on the host machine, but it could also be any other directory.
Don't worry about HostDirectory. Directory is what you want.
Sure, but I have to start with a HostDirectory and then I need to know that I have to Read() it into a Directory , right? That confused me at first
Yes correct, what I mean is that the Host part is more cosmetic and more likely to change (I think the read part is already planned to change?). The core types are Directory and Container so it's most important that we make your use case work with those types, because they will be harder to change in general
I think there's some tension here in that, when the build goes wrong, I will want to access/inspect all the pieces of it as first-class objects. And I probably also want that for passing e.g. Directory around between builds/helper functions. But at the level of writing a couple build steps, when the build isn't going wrong, I want to be able to shorten that in my head to "in this image, with these files, do this"
Go probably makes things more wordy here, because in e.g. Node you could have an API .With(string | Directory)
where the string is assumed to be a host path that should be normalized into a Directory()
The limitation is GraphQL in this case, it has pretty strict typing too
Maybe I should think in terms of "building the filesystem I need, and then merging that into the image?"
That would be a good approach too yeah
I don't see any strong reason not to also give you a convenience like Container.WithCopiedDirectory, so you could do both
I'm also having trouble figuring out how to construct a FileID for copying individual files (important for e.g. go.mod, go.sum)
The existence of the With and Withouts makes me imagine a glob-filter api e.g. FS.Filter("go.*")
@sullen citrus are you actually seeing any FS types? That should be completely deprecated by Directory and Container now
@ionic cave I guess I'm not. I think I just sort of merged FS and Directory in my head without really thinking about it
Ok, it's an artifact of you being exposed to earlier versions, I understand 🙂
yep lol
@sullen citrus I think your approach of constructing the rootfs as one pipeline, then setting it as the container's rootfs, sounds really good. Would love to see that in action
@severe moat am I right that the hostfs / read stuff is going to change? Or has already changed? Can't remember
Or is it tied to that broader "dynamic localdirs" design problem
yeah, it's tied to dynamic localdirs/sessions
Any way we could have a "bridge" to get the current API closer to the intended goal, without stepping on the dynamic session landmine?
latest status is @lethal horizon brought up an issue with it, and i'm trying to convince us not to care 😄 https://github.com/dagger/dagger/pull/3421#issuecomment-1288327419 + https://github.com/dagger/dagger/pull/3421#issuecomment-1289356477
i don't think so, unfortunately localdirs are pretty fundamental to the buildkit session lifecycle 😕 I'll keep thinking on it though
Intuition for the above: in our repository, we have a number of go services in subdirectories that all share a go.mod/go.sum in their parent directory. So I'm experimenting with creating an abstraction here that's suitable for multiple services: this function gives you a container context with deps installed and with your service's subdir copied in, and then from there you build.
I don't know that this is necessarily the right or most useful abstraction for my builds, but that's the kind of thing I would like to be able to experiment with easily
I'm trying to write a useful comment in your gist, but the short version is: the less you leak host-specific details into your pipeline, the easier things will be
This would be much easier, I think, if we supported includes and excludes when loading host directory
Then you could simply open the whole monorepo, but with only the parts you want included, and then pass it as a Directory to the rest of your pipeline logic. Tomorrow you could swap out the local dir for a plain git checkout and everything would work
yeah, I think that would help a lot
Also it would allow you to take advantage of chaining much more, with the benefit of reducing the number of intermediary error checks
(which in your case are really killing overall readability)
In the use cases I've been playing with I am really missing something like a ".daggerignore" file which would be used when loading a host directory, the includes / excludes filters mentioned would solve the same issue but being able to effectively re-use dockerignore files would be nice
@pure dragon That would also be useful! But include/exclude at the level of an individual operation are still necessary IMO
Something I'm discovering as I continue building is that the lazy execution makes debugging tricky -- it's sometimes difficult to tell what code in my build app generated which build instructions as expressed in the output. Case in point...
#13 copy /workspaces/core/frontends/app /frontends/app
#13 ERROR: failed to calculate checksum of ref exyujs2e3fpbs04yl6565q1l3::3irgq831w46392rnqgmq6ebgd: failed to walk /tmp/buildkit-mount3755617588/workspaces/core/frontends: lstat /tmp/buildkit-mount3755617588/workspaces/core/frontends: no such file or directory
My code reads /workspaces/core/frontends/app in as a Directory , then performs some transformations on it, then copies the result into a container. By the time I reach this line, I can tell from the logs that the directory has been copied into buildkit and the transformations have been performed. I expected the next copy to be something like / /src/frontends/app (copying the transformed /app, which is a Directory unto itself, into another folder that mirrors the structure of my repository), so it's tricky to figure out how/why I'm seeing this instead.
If the output gave some indication of the identity of the context being copied to and from, that might help -- although with me creating anonymous Directorys all over the place, it might not. I'm wondering if it would make sense to have debug information be an annotation on a Container or Directory, e.g. directory = dagger.Debug("sparse repo dir", directory)
@spice coyote @severe moat @lethal horizon this is why I was asking about possibility of decoupling query building from sending 👆
or I suppose Debug could be an imperative action that forces query evaluation... 🤔
there’s an interactive debug coming @sullen citrus which will help a lot I think
Ah, yeah, I remember hearing about that! That'd be the ability to just drop into a shell in any intermediate container, right?
this also makes me think about my frustrations with the filesystem API: transformations return Directorys, but take in DirectoryIDs, so basically any transformation has to be interrupted by an error check as you evaluate the .ID() of the filesystem you're copying in.
I could imagine that a world where building and sending are decoupled might also allow using unresolved directory representations that only get made concrete at the time the query is sent, if that makes sense?
Hah, https://github.com/dagger/dagger/issues/3558#issuecomment-1293985134 you already had the same idea
Confusingly, that is sort of what is happening when you use ID. ID is actually the recipe for building the Directory, but for reasons™️ it requires checking an error. I think we can fix this. It's probably the number 1 most common source of confusion so far
@sullen citrus not yet launched but if you want to get a sense for the underlying graphql API: https://play.dagger.cloud
Welcome to Dagger API Playground
Just for my own edification, could you expand on that a bit? I assumed that .ID() was forcing a query to be sent to the GQL server, but it sounds like you're saying it's actually doing some intermediate resolution?
or maybe the server is required to do the resolution?
It is forcing a query to be sent to the graphql server, but all the server is doing is calculating the LLB (and some of our own metadata) and then returning that back to you. It's not actually submitting the LLB to buildkit yet. Evaluation only happens once you retrieve other concrete values. E.g. one you read the contents of stdout of an exec, then the exec will actually be submitted as a solve
To be clear, this isn't incredibly logical or obvious, so confusion is to be expected. The reason behind this is that it's nice for ID to be lazily evaluated since combine more evaluations into one solve, which makes it easier to get nice parallelism and efficiency in general.
My hope is that if we hide ID entirely, then this behavior will become more intuitive. ID is just a huge glaring exception to the rule that throws a wrench into intuition
would "hiding ID" mean "doing a bunch of ID-computing queries at the last minute before starting the final solve?" Or would it mean "sending some other representation to the server such that IDs get computed as part of the final solve?" Not that the difference matters much to me as a consumer; I'm just curious
Great question, I think we'd start with "doing a bunch of ID-computing queries at the last minute" (all hidden from users of course). The second possibility is probably possible along a number of different routes, but it's a lot more work up front since there's not a standard way of submitting multiple graphql queries that reference each other. There are unofficial ways bubbling up in the graphql community, so I wouldn't be surprised if we move to that model someday in the future
- extensions of course 🙂 (I guess with extensions you still need to deal with one of the above to implement the extension. But then it’s hidden to everyone else)
Yes that's another way of abstracting lots of queries into one too 🙂
Fair. That was our fear with lazy-ness … performance vs weird debugging stuff
We leaned on the performance side but maybe we should be less lazy and force evaluation at every step, so things blow up where they blow up
I definitely wouldn't swing that far in the other direction! I'm going to be using Dagger to write stuff that runs on every commit, I want to be able to make it fast.
I think it's OK to have debugging take some more work as long as you give me tools I can use for it
Yeah that would be great. However it’s tricky because:
-
it wouldn’t really fix the “weirdness” of failures happening in a different place (e.g. grabbing the wrong directory will fail when it’s used, not when it’s declared)
-
we’d have to rethink query builder a bit. You can’t build a single query (at the end of the day, the API does take IDs), so somehow we should magically split that up into multiple queries?
Yeah makes sense. At the very least, logs should be contextual (e.g. tell you the operation that failed)
Yeah hiding ID is a tangential thread in response to this: #1034221596853932084 message
The suggestion on how to implement it is here (could be missing something): https://github.com/dagger/dagger/issues/3558#issuecomment-1293985134
We had that in dagger 0.2 (contextual build logs), we should port them over.
This basically would prefix build logs with the operation that’s being done (e.g. instead of “no such file or directory” it would say “host.workdir.directory(path: “foo”).read: no such file or directory”)
Basically a “default” value for the dagger.Debug you suggested above
@sullen citrus would that help or I misinterpreted the problem?
I think that would solve the problem, yeah!
Oh yes of course
Actually we could piggyback on the graphqlmarshaller interface I hacked together for extensions (so they can return native types instead of IDs)
Except in query builder, we’d parallel resolve all of them
Completely unrelated to all this -- a little thing I just noticed is that Directory.entries() gives me strings, whereas I was hoping to get paths + directory-vs-file. What I'm really trying to do is a filesystem walk -- though perhaps this won't be necessary once include/exclude are in...? not sure
Neat!
We didn’t port it back because we weren’t super happy with it, but I guess it’s better than what we have now
(/cc @severe moat @lethal horizon the whole buildkit progress writer and custom vertex names topic that’d include the gql path being solved)
This seems like it'd be hard to do in Go, but it'd be cool if it could be annotated with filename/line number
I think we could use Go’s stacktrace stuff — although I don’t know what’s the penalty of doing that on every query (it’s meant for errors and “special” occasions)
Tons of good stuff in this thread 🙂
Another API note: WithDirectory(dirId, path) feels like it conflicts with WithCopiedFile(path, dirId) -- I would expect the order to be the same in both
And I know this would also be hard (impossible??) in Go, but I'd love it if the API were just .With(path, fileOrDir)
It would be a big change from our current approach to codegen, but just speaking hypothetically it doesn't seem inherently impossible to codegen something like this from our graphql schemas:
type Container struct {
Args []string
Env []string
Rootfs Directory
Mounts []Directory
...
}
type ContainerOpt func(*Container)
func (c *Container) With(opts ...ContainerOpt) *Container {
...
}
func FileAt(path, contents string) ContainerOpt {
...
}
There'd be lots of tricky stuff, but it's interesting food for thought if nothing else.
I hate Go sometimes 😅