#I did read you mentioning sessions etc
1 messages ยท Page 1 of 1 (latest)
Yeah long-running engine is going to require a lot more changes since it would by default result in there only ever being one buildkit session, which breaks a lot. Fixable, but most likely not by the next SDK release.
Last night what I did was put buildkitd+cloak in an image and then use commandconn to do docker exec into that container to exec the engine. But then I later realized that -v /:/host doesn't really work on macos, so I think we probably need to fallback to just having SDKs do a local fork/exec of the binary, same as before.
Either way, need the stdio-dialer
Hey @cyan stag and @last wyvern, quick check-in before going to bed... need some feedback from me?
No nothing urgent at all! We are good to go for now
Ok, can you sum up for me how we are on provisioning in the other SDKs? There's quite a bit of discussion but I don't have time to go through.
I will but first I want to double-check that everyone else is on the same page, and before that I want to finish the docs I'm writing right now ๐
I will absolutely write the summary out though once we have it (either tonight or tomorrow)
Ok ๐ I've put a pause on that feature until we know what's best ๐
Oh man, I'm behind 67 commits ๐ฒ (from main)
Just to update here, I'm currently building on the previous prototype but this time instead of doing commandconn to docker exec I'm bundling cloak in the dagger-engine image and then having the SDK copy it out of there locally (for its platform) and using that to talk to buildkitd running in the dagger-engine image. Makes packaging way easier (just an image, not an image plus also figure out how to bundle binaries in pypi+npm) and also makes it way easier to switch to the approach where engine is fully remote in the future.
I'm just running through it w/ go first to make sure it actually works (so don't take as final design or anything), but if anyone has any immediate thoughts lemme know. The end result would be that SDKs need to do a little bit of shelling out to docker (similar to what engine does today), but I'm thinking that's a good tradeoff given the packaging work it saves us and other benefits
Hit roadblocks with commandconn'ing the engine from a container?
(the volume hack I assume?)
Oh I've abandoned the volume hack at this point, but my new approach of having the sdks obtain the cloak binary out of the container image is working
(to me it makes sense -- long running engine > engine as a command in the container > engine copied from the container and executed locally)
in order of "getting closer to the real deal"
Yep that's exactly it, there's a nice step-by-step path
option 1 is the long term approach
option 2 is kinda long term as well: SDKs will always exec a command (dial-stdio). We cheat to start and dial-stdio spawns a new engine instead of proxying traffic to long running
option 3 is the stopgap
what I liked about option 2 is that there kinda wouldn't be any SDK changes -- we just change the meaning of dial-stdio (and btw, that's how buildkit does it -- dial-stdio is a proxy from stdio to unix)
The code I have works on linux-client<->linux-host and now I'm just making some updates so it works on darwin-client<->linux-host, once I see that working I'll send out a draft PR. Code will need cleanup but we can get consensus on the details of the plan
but if it's not possible, the next best thing is option 3 (downside is extra work for the SDKs, copying stuff into the user machine). Not the end of the world
We can go to option 2 once we have localdir syncing, which @round dove is looking into, so we're already taking steps towards it
awesome
Yeah I'm doing what I can to make that minimally painful, but right now there's a TODO for cleaning up old binaries
it's going in ~/.cache/dagger
yeah to me, option 3 is way better than "just talk to a CLI that got installed somehow, disregard compat issues"
Yeah trying to package binaries for every package manager scares me
pre-emptive nit: should probably include <os>-<arch>-<version>. Making sure that multiple SDK versions on the same machine don't end up using "whatever the other SDK pulled"
Oh I have TODOs for all that yeah
I'm doing all the good stuff like including the sha with name, downloading it to a tmp file and then rename(2) it to the final things, etc. etc.
posted an update here btw: https://github.com/dagger/dagger/issues/3629#issuecomment-1301189176 tl;dr - gRPC over HTTP possible, seems like a good bet for the various server->client reqs we might need to support, aiming to figure out how much the client needs to implement and try to avoid any hard dependency on github.com/moby/buildkit
But also making sure i'm only doing stuff that's available in nodejs+python stdlibs (nodejs lacks flock...)
go:embed of a go binary in a Go SDK is ๐คข
One thing to possibly consider: there might be a way to do gRPC over websockets. Like, as a transport wrapper
Like, not a "simpler websocket endpoint", but literally "over websockets"
which someday could become a simpler websocket endpoint (e.g. v1 is gRPC as is, v2 is a simpler protocol)
I found this POC online, not sure how good it is: https://github.com/tmc/grpc-websocket-proxy
is there a practical difference between that and a HTTP/2 connection upgrade? the latter seems to be what docker<->buildx<->buildkit does
Fair.
The only practical difference is we'll probably end up using websockets one way or another. GraphQL subscriptions are websockets-based
That's what I picked websockets for dagger service attach -- could have been http/2, but I figured we'll pull in WS for other reasons (e.g. if we expose logs over GraphQL, it's probably going to be a WS subscription)
But ... it's a nice to have. Like, priority -12
gotcha. i don't feel strongly really, just curious. sounds like something we can slip in later too if/when we want it right? (including if "when" is just after I finish my prototype)
also what are our plans for graphql subscriptions?
yeah, agree. Eventually I think we'll have our own protocol (not sure if you agree with this) so that SDKs can implement it directly. Right now with the gRPC based approach, SDKs will have to spawn a binary to handle this
("none yet but want to keep the door open" works for me)
There's a long standing TODO in my brain for "Figure out how we could use subscriptions"
but anyway -- whether we do this or not, at that point we could switch to another protocol
none yet
I think in the future the first use case is logs
Right now for instance, the only way for SDKs to grab logs is from stderr
As soon as the Engine is a long running container and we communicate exclusively over HTTP, that trick is gone -- we'll need to stream logs somehow, subscriptions sound like a good candidate
e.g. right now we're cheating: the API is GraphQL + stderr. Make the engine a "true" API server, and we need to shove logs somewhere else
gRPC/protobuf is a good candidate for implementing in other languages too, no? I haven't tried in non-Go yet but I thought that was the point
Yep. But if we decide we want a simpler protocol than the buildkit one (because it's hard to implement), then we might as well reconsider transport (gRPC is not the easiest to work with)
it did take a while to grok. seems really handy once you know it, there's a lot of moving parts
just tried to test on macos but I can't run git because I updated to ventura and now I have to reinstall all dev command line tools including git........
While I'm waiting @last wyvern we're gonna need a container image registry by the next SDK release and associated automation for releasing to it when we do engine releases. Do you have any thoughts on what registry to use? I'm also guessing it would be nice to use a vanity URL so we can switch backends in the future, but I have no clue if that's possible with registries, never tried before
we're supposed to chat with the docker folks about rate limiting etc (/cc @midnight widget @stiff temple)
Using a vanity URL would be nice to avoid being tied to one provider
It's not a widespread practice but I've seen a few doing this
If rate limiting or other issues become a blocker there's always the fallback option of just putting the image tarball somewhere and having the SDKs download it and docker load it in. But if we can have a solid registry setup by the next release that seems ideal
@cyan stag Yeah agreed. Especially since we still need to run the "engine" image for buildkit itself
I mean, not a hard requirement for next week, but we could have an engine image that includes the engine binaries and also a "runner" binary (buildkitd), so you pull this one thing and done
that's what I've implemented
(draft PR is imminent)
yeah, neat!
so either way -- even if we pull tarballs, we still got to distribute the image right?
Yep exactly, I was just thinking if it's easier to put a tarball in S3 or github releases and not have to deal with ratelimiting, then we can fallback to it
https://github.com/dagger/dagger/pull/3647 Gonna go do more cleanup passes to fix the stuff mentioned in the description, but if there's any high level concerns with the approach let me know
Do we have a timeline on talking to dockerhub about rate limiting? It would be good to figure that out sooner than later. I can make an issue too (presuming there is not one already)
lol, I just implemented another singleConnListener before finding we had one already. (not that it's a ton of work, just funny that we've now run into such an odd thing twice.)
We do weird things here I think
(in a good way ๐ )

On the topic of hosting providers:
- @stiff temple is talking to Docker about their special startup program
- @dapper grotto mentioned that ECR is a very strong candidate: pricing is basically S3, will be hard for Docker to beat that since they're on top of AWS
- Either way I agree that having a vanity URL would be good for future flexibility. Note that docker hub short name is not a vanity url! since obviously it's tied to Docker Hub
Chatted with @dapper grotto about ECR and so forth.
So far my preference would be:
- Vanity URL + ECR.
- With a vanity URL users don't see the "real" registry address, and that's the only thing Docker Hub has going on
- Vanity requires infrastructure which will be on AWS, might as well have the registry there and not pay traffic between the 2
- ECR is cheaper and way more stable than Hub
- IF vanity URLs don't work ... I'd suggest we go with GitHub
- We're already "leaking" github.com/dagger/dagger, might as well also leak ghcr.io/dagger if vanity doesn't work
- I trust the stability more
- Hub: this would be last preference, depending on the startup program
- If vanity doesn't work, we get a "vanity-ish" image (hub short name)
Hub startup program might be "promo pricing for a year and then 10 (basic + remove-rate-limiting) to 30k(add advanced reporting)/year after that"
As a comparison, on ECR that price buys us between 100TB & 600TB of data transfer -- roughly 2 to 12 million image pulls outside of AWS. On AWS->AWS it's 2.5 times that
whatโs the pricing like on Github?
So for example where image is 50MB (and we store several copies, maybe up to 1TB storage) and 100TB of egress
But if the package is consumed by any user's GH Actions workflow, then egress is not charged, and there is some egress included for all plans
I wonder about rate limits, which seem to be always present with GH. We can see if we hit them ๐
Yeah ๐
Wouldnโt be surprised we hit rate limits that are only lifted for projects not backed by a company
someone had same question: https://github.com/orgs/community/discussions/27387#discussion-4256987