#Hey @Erik Sipsma
1 messages · Page 1 of 1 (latest)
I guess I'm confused exactly how you'd use fscopy.Copy in Directory.Chown. Directory.Chown is making a new overlay layer and then running chown on all the files, which triggers an overlay copy-up and changes the owner. If you use fscopy.Copy you'd also be copying all the data to a new ref and resetting the metadata, which should if anything be a little slower (in theory) since overlay copy-up is in-kernel.
I believe you that it's faster for some reason but I'd really want to understand why filepath.WalkDir + os.Lchown is slower
might be easiest if you send out a PR w/ the fix so I can see exactly what the change is
@night vapor you nerd sniped me here, I'm using my ebpf pr to debug further and can confirm that for some unknown reason each of the chown syscalls (i.e. the time spent in the kernel to run a single chown) is averaging about 8ms, so 8ms*100k = 800s (using your repro)
8ms is an absolute eternity though
something very very strange is going on
I can show you in lounge if you want and are around
yes please
this ebpf is very useful it's amazing ahahah 😍
time="2025-12-12T00:57:15Z" level=info msg=filetracer dur=1.8us op=LSM_SETATTR process=dagger-engine tgid=1520166
time="2025-12-12T00:57:15Z" level=info msg=filetracer dur=7.98ms op=OVL_COPY_UP process=dagger-engine tgid=1520166
time="2025-12-12T00:57:15Z" level=info msg=filetracer dur=1.3us op=LSM_SETATTR process=dagger-engine tgid=1520166
time="2025-12-12T00:57:15Z" level=info msg=filetracer dur=7.99ms op=OVL_SETATTR process=dagger-engine tgid=1520166
time="2025-12-12T00:57:15Z" level=info msg=filetracer dur=8.01ms gid=1000 op=CHOWN path=/tmp/buildkit-mount623003268/src/file12935.txt process=dagger-engine tgid=1520166 uid=1000
drilling deeper, it seems like the copyup is extremely slow for some reason (I could understand if these were large files, but they are each 1 byte)...
Why is overlay fsyncing after every chown???
time="2025-12-12T01:05:20Z" level=info msg=filetracer dur=7.90ms op=VFS_FSYNC process=dagger-engine tgid=1532820

I will laugh if we get another performance improvement by not syncing to disk as much
I guess that's just how overlay works unless you pass the volatile option... My manual test earlier with handcreated overlays wasn't correct because I was doing it under /tmp, which is tmpfs on my system, so fsync is free
My PR is not totally wrong, it just doesn't work with the named uids, still digging on that. And for the number based uids, it seems to be fast-er
It's driving me crazy too 🤣 Ok, timebox on my end has been entirely used, time to resume lazy git 😢