#Problems with private SDKs and Git Pulls

1 messages · Page 1 of 1 (latest)

scarlet mulch
#

Hi all, I found this project about a week ago, and it looked like it would be really helpful in cleaning up our current messy CI workflows. The model we've decided to try is to put Dagger scripts in each repo that we are building and then combine them into a full build and publish in a centralized repo. This was partially decided because some repos only build specific applications in which multiple applications are combined into a single image.

The problem I am running into is that on macOS, the mechanism to pass in the SSH_AUTH_SOCK seems really flakey. I've tried multiple versions, and it only seems to work 15% of the time. I've dug through the GitHub issues, this Discord, and even tried asking AI, but I'm still not sure how to get it to be more reliable. I've verified that the commands Dagger is trying to run works on the local CLI, and I've made sure that ssh-add only has one key, the key I want it to use. I've also tried deleting the Docker container and pruning the cache but nothing else seems to work.

Here are a few snippets of how I have the project set up:

dagger.json:

{
  "name": "project-name",
  "engineVersion": "v0.20.8",
  "sdk": {
    "source": "python"
  },
  "dependencies": [
    {
      "name": "repo-1",
      "source": "ssh://git@github.com/my-org/repo-1@feature/dagger-build",
      "pin": "<..omitted..>"
    },
    {
      "name": "repo-2",
      "source": "ssh://git@github.com/my-org/repo-2@feature/dagger-build",
      "pin": "<..omitted..>"
    },
    {
      "name": "repo-3",
      "source": "ssh://git@github.com/my-org/repo-3@feature/dagger-build",
      "pin": "<..omitted..>"
    }
  ],
  "source": ".dagger"
}

main.py:

@object_type
class ProjectName:
    @function
    def build_main_from_git(
            self,
            sock: Socket,
            repo_1_ref: str = DEFAULT_BRANCH,
            repo_2_ref: str = DEFAULT_BRANCH,
            platform: str = DEFAULT_PLATFORM,
    ) -> Container:
        repo_1_src = (
            dag.git("ssh://git@github.com/my-org/repo-1", ssh_auth_socket=sock)
            .branch(repo_1_ref)
            .tree(discard_git_dir=True)
        )
        repo_2_src = (
            dag.git("ssh://git@github.com/my-org/repo-2", ssh_auth_socket=sock)
            .branch(repo_2_ref)
            .tree(discard_git_dir=True)
        )

        return self.build_main(repo_1_src, repo_2_src, platform)

And it would be invoked like this:

dagger -c 'build-main-from-git $SSH_AUTH_SOCK | terminal'

Any thoughts on what I might be doing wrong?

Edit: Here is a sample error I am seeing:

✘ git fetch --no-tags --update-head-ok --force --depth=1 origin <...omitted...> 40.0s ERROR
Connection closed by 140.82.114.3 port 22

fatal: Could not read from remote repository.
Please make sure you have the correct access rights and the repository exists.

! git error: exit status 128
scarlet mulch
#

So for what it is worth, I experimented with moving everything into a single repo and it is working. The issue seems to be with how the cache is interacting with the SSH socket and it appears to be losing it, so when it goes to pull the modules defined in dagger.json, it is failing with that error.

#

I really don't want to keep everything in a single repo, so any suggestions on how to fix this are appreciated.

modest charm
scarlet mulch
#

Then it slowly stopped working. I was able to get dagger develop to work yesterday by deleting the container in Docker and then explicitly passing in SSH_AUTH_SOCK="$SSH_AUTH_SOCK" dagger develop but it never worked for the dagger -c calls nor did it work for any subsequent dagger develop calls.

#

As soon as I removed the remote dependencies from the dagger.json and moved everything into the same repo it started working perfectly every time even though I was using dag.git(…) on a remote private repo with the SSH_AUTH_SOCK passed in.

#

I honestly thought it was GitHub’s fault at first because it was sometimes timing out and sometimes the running the commands in my CLI also timed out (usually git ls-remote commands).

modest charm
#

I'll create a similar setup and see if I can repro somehow

scarlet mulch
modest charm
#

we do have a handful of users in our community using private module dependencies and this is the first time that I've seen this kind of behavior. I'm more inclined to thing that something might be happening with your local setup somehow

#

if you could by any chance create a Dagger Cloud account and send us a trace with the error that will very likely give us more info about what could be happening here 🙏

scarlet mulch
modest charm
scarlet mulch
#

Ok let me set up a separate set of repos to repro

scarlet mulch
modest charm
#

@scarlet mulch mind running git ls-remote --symref ssh://git@github.com/incendium/dagger-a a few times from your terminal to check if you eventually get the same error Dagger gets?

scarlet mulch
#

If it fails, it's because of a timeout on the GH side and not an auth issue

#

But I haven't seen that today

modest charm
modest charm
scarlet mulch
#

You mean specs?

modest charm
scarlet mulch
#

macOS

modest charm
#

you're using Docker Desktop on a mac, right?

scarlet mulch
#

Correct

#

I can try on my Linux desktop as well if it would be helpful but I don't have a development environment or any development tools set up there

modest charm
scarlet mulch
#

Sure, for reference it's on Fedora 44

modest charm
#

I think this might be related to how macOS is setting up the SSH_AUTH_SOCK

#

mind sharing the value of your SSH_AUTH_SOCK variable?

scarlet mulch
#

/var/run/com.apple.launchd.9pWbrnTc0p/Listeners

modest charm
#

mostly checking if it's macOS native agent or some wrapper tool like 1password or things like that

#

@keen wadi still around?

#

since you have a mac, have 2 minutes to see if you can repro?

#

@scarlet mulch I'm on linux so probably that's why I can't repro

keen wadi
scarlet mulch
#

Gotcha

keen wadi
#

catching up

scarlet mulch
#

I wonder if maybe there is some sort of rate limiting on the GH side?

#

I have 5 traces with that git core command and the first 4 worked but the 5th failed

#

But then it doesn't make sense that it still works at the OS level

modest charm
#

@keen wadi the easy repro would be: if you run dagger core git --url ssh://git@github.com/$private_repo branches a few times in your local machine in v0.20.8. Do you eventually get any access denied errors? or does it always work for you?

#

also verify that your SSH_AUTH_SOCK is set to the apple launchd thing

#

just to make sure that you're using the same agent that the OP is using

keen wadi
modest charm
#

you're also using Docker Desktop, don't you?

keen wadi
#

ran it 25 times, yes docker desktop

#

now doing: local dagger.json with an SSH dependency and a pin

modest charm
#

and you're also on v0.20.8, correct/

keen wadi
#

yes: 0.28

modest charm
keen wadi
scarlet mulch
#

Hrm

#

ls-remote command just seems to be running on the Linux system

modest charm
scarlet mulch
#

What I mean is that it has been running for 5+ minutes

#

Oh the core command eventually got the issue but I also ran dagger develop on the repo and that is still running

#

That is on Fedora 44 + Docker CE

#

fwiw:

  matt  ~  1  nslookup github.com
Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
Name:   github.com
Address: 140.82.114.4
#

Seems to jump between 114.3 and 114.4

modest charm
#

you're not using any proxies or VPNs, aren't you?

scarlet mulch
#

Nope

keen wadi
#

just ran it 100 times, still no race

#

as you're saying it was consistent on 0.26 ... checking

scarlet mulch
#

The proxy or VPN question makes me think it might be an issue on my side

#

Specifically, I have a firewall with IDS/IPS, it could be detecting GH traffic as malicious, currently looking through the logs

modest charm
#

the more tests that I do here the more I think it's related to your setup somehow

scarlet mulch
#

That's what I'm thinking too

#

And yeah, I just need to lookup the GitHub IP block to whitelist it

keen wadi
#

or do it from a phone hotspot

modest charm
#

still doesn't explain why running git ls-remote directly seems to always work

keen wadi
#

If its true, can you please do :

  for i in {1..20}; do
    tmp="$(mktemp -d)"
    echo "fetch $i in $tmp"
    git -C "$tmp" init
    git -C "$tmp" remote add origin ssh://git@github.com/incendium/dagger-a
    git -C "$tmp" fetch --no-tags --update-head-ok --force --depth=1 origin main || break
    rm -rf "$tmp"
  done
#

as part of our git operations, we do things that are longer than what ls-remote deos ? This shouold fail for him more consistently (due to the same firewall)

scarlet mulch
#

Seems to be working a lot better now. Not seeing the auth issues but still seeing the occasional timeout (which is probably a GH issue)

#

Apologies for wasting y'all's time, it didn't even occur to me that my firewall might be detecting the traffic as an intrusion attempt

keen wadi
scarlet mulch
#

No failures with the whitelist rule

keen wadi
#

Yaaay

scarlet mulch
#

I did have an unrelated question

#

Is there a way to define a STOPSIGNAL entry in a container definition?

keen wadi
keen wadi
scarlet mulch
#

Any plans? I'm doing conditional container modifications based on input flags and it would be handy to have support for something like that.

#

Oh sure let me change the rules and try again

keen wadi
#

Not against it at least, just a matter of priority ahahh

#

the hardest is the design of the API, discussions are great for that

scarlet mulch
#

I hear you on that

modest charm
# scarlet mulch I hear you on that

@scarlet mulch as @keen wadi is pointing out, the hardest part is usually the UX bikeshedding . For example, in the case of Dagger, i'd assume that stopsignal would only makes sense in the case of services, doesn't it?

#

mostly because beyond services, all other container commands are always expected to eventually finish

#

since there's no docker stop equivalent in Dagger

scarlet mulch
#

Not sure I would necessarily agree that non services wouldn't need it, if you have a long running task and for some reason you need to reboot (assuming you're running it on your local machine) then it might make sense there as well to do cleanup

#

But I do agree that it would make more sense for images that end up published and not necessarily those that are managed by/running inside dagger itself

scarlet mulch
keen wadi
#

Adding it to my toolbelt to isolate firewall issues, thanks 🙏

scarlet mulch
#

FWIW, it's an Ubiquiti firewall and this is what I saw in the logs: A network intrusion attempt from <hostname> to 140.82.112.3 has been detected and blocked.

modest charm
#

probably because it uses all the network bridge NAT traffic routing

keen wadi
#

I think it's a device named Desktop, not Docker Desktop here

scarlet mulch
#

I just changed that to obscure the actual hostname of the system

keen wadi
#

ChatGPT says: git ls-remote works because it is short and only lists refs. git fetch --depth=1 can trigger Ubiquiti IDS/IPS because it opens a fuller Git-over-SSH transfer

#

i guess it's part of the detection heuristic of htat firewall

scarlet mulch
#

Maybe. If you want to research more, here are some additional details:

#

IPS Alert 2: Attempted Information Leak. Signature ET SCAN Potential SSH Scan OUTBOUND. From: <redacted>:57198, to: 140.82.112.3:22, protocol: TCP

modest charm
keen wadi
# scarlet mulch

I guess Ubiquiti is classifying repeated GitHub SSH fetches as outbound SSH scanning 🤔

scarlet mulch
#

Yeah, that would track. I've had this firewall for I guess 3-4 years now and this is the first time I've run into something like this. And I'm a pretty heavy user of Git, but I don't normally make repeated Git commands in quick succession. Though I also haven't used a Git repo with submodules either so I dunno if that could also trigger it.

modest charm
scarlet mulch
scarlet mulch
#

Thanks again for the help y'all.