#vito9876 Erik Sipsma3294 jlongtine6168

1 messages ยท Page 1 of 1 (latest)

smoky pawn
mellow thorn
#

seems like it might, but I'm not sure (not enough Cue context). any path can be added as a host dir with read/write support, it works the same as the old graphql API, just different names

clear briar
#

Yeah, that should do it, honestly.

smoky pawn
#

I was hoping we could support core.#Source without supporting arbitrary filesystem access... That may prove impossible but it would be nice to at least discuss options

empty oracle
#

My only thought was if we could enforce that all files you want to core.#Source either come with the workdir or could be downloaded directly into the engine rather than loaded from a local dir. But I think @clear briar said that wasn't feasible, so I don't have any other ideas right away besides supporting arbitrary local dirs

mellow thorn
#

in my experience with bass, being able to pass multiple focused local dirs instead of a monolithic workdir becomes really important to minimize cache busting for fast feedback loops (which @clear briar also pointed out) so I went ahead with that implementation, but I could be missing something about how we plan for this to be used

empty oracle
#

Splitting up into multiple localdirs means that buildkit syncs all of them separately though too, so you have to make sure they don't overlap at all or else buildkit will sync duplicate data. But that also means that if you are just going to sync every subdir anyways, you may as well just sync the whole thing. The cache busting behavior is complicated by the fact that we never use local refs directly, we copy them first. So depending on exactly how core.Source is used we may not need to worry about the cache busting; it should hopefully be content-based hashing on just the isolated data rather than content-based hashing on the entire synced dir.

#

The above requires that when we use core.#Source the llb state can be mounted either read-only or as ForceNoOutput (which disables the ability to obtain changes made to the mount). I think that should be the case based on how it is used in europa? If it is, then I think we should see if we can get away with just workdir alone. If it's not, then the caching behavior becomes even more complicated, I would have to go back and refresh myself on how it works when you have a rw mount with a subselector. All I remember is that it's complicated ๐Ÿ™‚

mellow thorn
#

i meant more for use cases beyond core.#Source - e.g. it doesn't really make sense for Bass to be restricted to a single workdir, since it has 'host path' types which can be passed around independently and would typically be scoped to specific dirs without any overlap

empty oracle
#

Ah okay I see, yeah I am just thinking about the back compat use case right now. I agree that we want some way of doing this in the long-run, the right api just isn't super obvious to me immediately (and ties into all the stuff around architecture+session-state) so if there's a chance we could avoid it for now that is nice in some ways. If it's unavoidable, that's not the end of the world, it just creates a point where we may have to make a breaking change in the future.

mellow thorn
#

I think we at least need multiple directories, but I wonder if we can have a streaming export API of some sort instead of being able to write to HostDirectories. having an API that can write to the host feels spooky. for Bass I actually just need a streaming export, and now I can do it by using HostDirectory { write } to a tempdir and streaming from there, but bleh

#

that doesn't fit graphql at all though, so I'm not sure how it'd look

empty oracle
# mellow thorn that doesn't fit graphql _at all_ though, so I'm not sure how it'd look

We were casually discussing sync-as-a-service in this thread: https://discord.com/channels/707636530424053791/1027669081748422810

Basically, doing something like using the rsync algorithm but running over websockets, which we are trying to center all of our "streaming" use cases around. The HLB folks actually did essentially that w/ mutagen directly over buildkit ssh sockets, kind of similar.

If we did that it could be used both directions (into buildkit and out). It's extremely appealing for a ton of different reasons but also just a pipe dream at this moment; would involve refactoring all sorts of stuff around sessions+local-engine too. That's one reason why it might be nice to avoid supporting arbitrary local dirs right away if we can in a reasonable way, but obviously if doing so creates a huge burden then it's not worth creating enormous hurdles in the present for the sake of a fantasy about what we might want to do in the future.

mellow thorn
#

i've been very lightly following that thread, a lot of it goes over my head, hoping to stumble into the true meaning/challenges as I touch more code ๐Ÿ˜…

empty oracle
smoky pawn
#

So my takeaway is that, no, there is no easy way to implement core.#Source without allowing arbitrary access to the host filesystem (ie arbitrary localdirs)

#

I'm not super convinced by the "upload too much" argument, the actual number and nature of buildkit localdirs are an implementation detail: if we wanted, we could implement each access to a workdir subdirectory as its own buildkit localdir (it doesn't have to be all chained from the same llb.Local). So that would take care of the "upload too much" issue.

BUT it wouldn't take care of the situation where cue.mod is outside of workdir altogether...

empty oracle
smoky pawn
#

Yes technically cue.mod can be outside. I don't know if anyone uses it that way though

smoky pawn
#

I'm fine either way, with a slight preference for deferring unlimited filesystem access (just out of general principle to cross one-way doors as late as possible)

clear briar
#

It largely depends on where they run their workflows from.

#

I'm also noticing that there isn't a way to filter a host directory.

#

I found that pretty important/valuable when we were implementing Europa, because otherwise simply changing the Plan busts the cache.

empty oracle
#

I was just starting to look at @mellow thorn's host API PR, so I guess I'll find out if he beat us to it already ๐Ÿ™‚