#eager git ls-remote pulls *all refs*

1 messages ยท Page 1 of 1 (latest)

glad granite
#

Starting a thread ๐Ÿงต

#

<@&946480760016207902>

smoky swift
#

my take? we should drop this behavior - clone with --no-tags

#

this avoids the host of issues with that behavior (and i don't have a bunch of bugs to go fix in the future), and also means the pinning can entirely drop those.

smoky swift
glad granite
#

In terms of pure lockfile size, the main issue seems to be that github servers include all pull requests, ever, in their list of refs

smoky swift
#

so it seems that it's actually just accidental behavior

glad granite
#

@smoky swift sorry I didn't understand what change you're proposing

smoky swift
#

sorry, suggesting removing the behavior where if you say git(url).branch(main).tree(depth=0) you'll get a tags that on the main branch

#

getting those tags is the default behavior of git clone and such

smoky swift
#

my suggestion is to officially remove this behavior

#

i think this would simplify the lockfile

#

i do have a proposal though about lockfiles ๐Ÿ™‚
i would really like to implement this through dagql. my idea would be that we could add a new "pin" field to return as part of IDs.

the server could record all IDs that have a pin, and write them to a file. the key of the pin would be a human readable version of the ID (e.g. container.from("alpine") or http("url"), the value would be specific to each handler.

a handler that returns a pin would be expected to read the pin back in (and handle that)

#

but then, a lockfile is just an id rewriter! it annotates ids with pin values it finds.

#
  1. this feels super powerful - not just for locking, we could even one day use this kind of logic to dynamically rewrite IDs - i could write a lock file that replaced my container.from("alpine") with a host container. think "vendoring".
  2. adding new things to the lockfile is just a matter of modifying the dagql handler - there's no special handling in lockfile.go
  3. it means we could eventually extend pinning to module functions ๐Ÿ™‚ e.g. i could write an s3 module, that allows pinning to a sha.
#

oh and of course:
4. better visualization in cloud! if it's part of an id, we can add that to the trace viewer, and show that it was pinned.

#

also this fits better with caching - otherwise, everywhere we use the lockfile session attachable we'll need to give per-client caching, or we'll inevitably be mixing up the lockfiles from different clients
adding the per-client cache keys could potentially make performance worse in some cases.

glad granite
#

@smoky swift I'm interested in learning more. It's not immediately obvious how your proposal relates to mine.

#

adding new things to the lockfile is just a matter of modifying the dagql handler - there's no special handling in lockfile.go

This is already the case.

#

it means we could eventually extend pinning to module functions ๐Ÿ™‚ e.g. i could write an s3 module, that allows pinning to a sha.

Also planned in my proposal (note the module field, currently always set to core)

#

better visualization in cloud! if it's part of an id, we can add that to the trace viewer, and show that it was pinned.

I don't really understand the fundamental difference. In the case of container.from, the pinning should already show up in the trace (because container.from already recursively called itself with the new arguments. Maybe that's what you mean by "ID rewriting"?

I didn't do it this way for git and http, but it should be trivial to do.

So perhaps we're already talking about the same thing?

#

Re-reading your explanation, it looks like a slight generalization of my prototype, where my "hooks" could be simpler because they could rely on a low-level facility where you can attach a pin field to the ID. In which case: sure why not!

--> Since either way we'll need a stable file format and session attachable to actually read to and read from the lockfile, perhaps we can start with my less ambitious version, then seamlessly "upgrade" to the more powerful version based on ID pinning, with no breaking change in wire protocol or file format?

smoky swift
smoky swift
smoky swift
#

maybe I can have two demos for later ๐Ÿ˜›

glad granite
#

That actually sounds like a robust long-term fix - just independent from any pinning system (which to be clear also sounds super cool)

smoky swift
#

yeah we could make ref lookups lazy - but the point of the original PR I did was to make it so we didn't do that ๐Ÿ˜ข we used to do that

#

the problem was, that meant:

  1. we were sometimes doing lots of listing remote tags
  2. in between getting remote tags, the remote tags could change
glad granite
#

If so, then it's best of both worlds

#

you get a reliable snapshot for the duration of the session (same as eager fetch) but also, benefits of laziness

smoky swift
#

not really, we were just calling it everywhere, not through dagql

glad granite
#

seems fixable though no?

smoky swift
#

yeahhh, I just don't see it as an easy fix, it's essentially a revert

#

we could do it, but there were other nice things about it, like better cache key computation

#

and simpler checking of is private (it used to be worse)

#

if it's necessary for the pinning? id just exclude the ls-remote from the pin, just put the git sha+ref for the git ref pin.

glad granite
#

If we keep it the way it is, a viable mitigation is to hardcode an exception where we exclude the github PR refs from the list. At least for our repo (and most typical github repos) that should remove 99% of the file size. But it will still be an unreasonably large upfront download for eg. large enterprise repos

smoky swift
glad granite
#

Even excluding the lock file, it's really hard to justify an upfront download of so much data

glad granite
smoky swift
#

mm

#

fuck

#

kinda wish we'd had this discussion two weeks ago

glad granite
#

Unless you can make that data a layer, so it can be retrieved from the cache

#

then I only need to store the digest of that file in the lockfile

#

that would completely decouple my lockfile size problem, from your "eager vs lazy" dilemma

#

Given its size, it kind of makes sense to make it an actual file

smoky swift
#

but then you can't share a lockfile between different engines

#

unless we do persistent objects

#

lemme have a think about this one

glad granite
#

ok ๐Ÿ™‚

#

actually maybe even the lockfile can be removed from this equation

#

If you can make that ls-remote offlineable (eg. by making it a cached operation) then that's maybe all we need

#

or is it already, and I did my experiment wrong?

#

basically I turn off my wifi, then see what blows up, and investigate that part of the code ๐Ÿ™‚

#

that's how I found the "is it public" check

smoky swift
#

either - we kinda fundamentally rework how git works and fetches data, or we find a better way to pin it, orrrr, we say git can't be offlineable

#

off to bed with you btw