#MergeOp bug
1 messages Ā· Page 1 of 1 (latest)
Was looking into this, there's a few problems happening:
-
tmateis just exiting right away, but you can't see that becauseshellhides the output of what started once the process exits. So need to fix that by probably just not resetting the whole screen once the shell'd process exits? Or one way or another not hiding that output. -
The reason
tmateis exiting is because/bin/shdoesn't exist in the container. Was very confused but realized it's actually a flaw in our core logic that uses merge-op to optimize copies...
In ubuntu, /bin is a symlink to /usr/bin. But when we internally use merge-op for the WithFile call, we end up doing essentially merge(ubuntu, /bin/tmate). We're using the same logic as COPY --link in dockerfiles use.
The problem is that the /bin in /bin/tmate is a directory, not a symlink. So the merge overwrites the resulting container to have a directory /bin that only contains tmate. So then /bin/sh can't be found, which breaks lots of programs (including tmate).
Unfortunately, this isn't even a flaw in merge-op, it's the correct behavior for it. The golden rule for merge-op is that the result must look the same as if layers were unpacked as specified by the oci image spec. And in the implementations of the spec (i.e. containerd, docker), directories in a layer will overwrite symlinks from lower layers rather than following them.
I'm sort of surprised we've gone this long without noticing the bug (I totally forgot about it when I made this change a few months back), but either way I think the best solution is sadly going to be to not use merge-op to internally optimize WithDirectory/WithFile anymore š¢ In theory we could try to detect these situations, but it would be quite complex and defeat many of the performance benefits of merge-op.
We can instead just add a separate API like WithMerged that allows the users to opt in to the "layer merge behavior" and the subtleties that come along w/ it. Would be for advanced users only, but still useful.
@short violet in the meantime, there's at least a very trivial workaround by just changing to use baseCtr.WithMountedFile instead of WithFile. That fixes the problem for me
but either way I think the best solution is sadly going to be to not use merge-op to internally optimize WithDirectory/WithFile anymore š¢ In theory we could try to detect these situations, but it would be quite complex and defeat many of the performance benefits of merge-op
šæ
I've been thinking about it a bit more, my current idea is to add an optional flag to the With* APIs called like Overlay: bool or something. It would be documented not in term of layers/merge-op but in terms of the behavior in these corner cases, roughly: "Overlay results in the contents directly overwriting any existing contents; this means that directories will replace any existing symlinks rather than following them".
Basically, the thing that would really make me sad is if the only way to expose merge-op related optimizations would be to have a completely specialized API (WithMerged) or to overly leak details about layers to users. I really prefer to avoid that since our long-term hope is to be disentangled from the layer model entirely (though that will take a while of course).
Overlay as an option would technically just be a flag that changes the behavior of how things are copied (at least from the user perspective) and would thus avoid those problems. Overlay could also plausibly be named DoNotFollowSymlinks or something too.
And there is also the option to inverse the behavior; make the current behavior the default but give an option for enabling the version where symlinks do get followed. That would give better performance by default, but possibly be less intuitive in these sorts of corner cases. Still thinking it over...
Honestly, the general idea of this approach could result in us being able to deprecate WithMounted* too and just centralize everything on the With* APIs, with flags available for different behavior. WithMounted* would instead be a flag like Replace (since it causes all the existing contents of the destination to disappear and be replaced with the new ones).
Not saying we should do that, just thinking out loud
Thinking even further, I can see a quick route that doesnāt require any api changes. Basically, only enable merge-op optimizations if the destination is ā/ā. Thereās a bit more subtly than that but thatās the basic idea. Only real downside is you have to be āin the knowā or lucky to trigger the optimizations (similar to how sometimes you have to write code a certain way to get the compiler to optimize it better), but thatās not a huge deal at the end of the day.
I'm leaning towards the overlay option with the ability to make it a default tbh. Once we have a better DX to debug containers by either shelling into any step of the pipeline or inspecting the FS at any given time these kinds of situations might be better prevented?
MergeOp bug
The approach Iām honing in on will keep the opportunistic merge-op optimizations by default and only fall back to llb copy when needed.
The only tricky part is that we want it all to be hidden from users, which means the end result has to be identical for both backend implementations, whether we can optimize to merge op or not.
It almost works today but we need one upstream change to the copy implementation to support a flag that enables it to work in the same situations merge op works, sent out a pr for that: https://github.com/tonistiigi/fsutil/pull/169
Without that, thereās situations where the mergeop implementation will work but copy returns an error, which would leak the details to the user. But once that flag is available, then it can be totally seamless.
At most, we could consider giving our users a flag that results in an error being returned if an existing path is going to be overwritten (in which case our backend would just have to skip merge op optimizations), but that would be an option totally agnostic to merge op vs copy and would thus still avoid leaking too much to users. And we donāt need to add that option immediately, just if a need comes up.