anyone have an example github actions | Dagger | Page 1

shadow skiff Jul 6, 2022, 4:12 PM

#

cc @jade plover have u seen this in the wild?

wary plover Jul 6, 2022, 4:18 PM

#

it's kind of a moot point right now, i can't get my dagger based workflow to complete successfully on github actions ... x86 even ... it either abruptly aborts (and google seems to indicate this is from a process using too much cpu) or else it hangs indefinitely :/

shadow skiff Jul 6, 2022, 4:23 PM

#

GH aborts the action if it's consuming a lot of CPU? interesting..

wary plover Jul 6, 2022, 4:24 PM

#

i had several attempts simply cancel on their own with no error message in the standard web interface, but when i viewed the workflow debug raw output the last thing that it showed was that the workflow had been cancelled with SIGINT or something similar, i googled the exact message and the results seemed to indicate that yes, too much cpu and the runner cancels the workflow

#

but i'm more concerned about the workflow runs that ran for 30min (with no more debug output after about 15min) which i endded up cancelling myself

shadow skiff Jul 6, 2022, 4:28 PM

#

would it be possible to share the dagger plan with us in a way that we could run it without the actual project code?

wary plover Jul 6, 2022, 4:30 PM

#

not without a lot of auditing on my side first i think

#

when everything is cached, it takes about 2min to run the identical dagger plan on my local 3-year-old workstation

#

gonna run it with --no-cache here to see how long it takes

#

        User time (seconds): 151.11
        System time (seconds): 6.92
        Percent of CPU this job got: 78%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 3:21.17
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 11848448

#

that's with --no-cache

jade plover Jul 6, 2022, 4:37 PM

#

We get some timeouts in our CI with GitHub Actions which forces us to re-run the workflows. I see that when reviewing PRs.

wary plover Jul 6, 2022, 4:37 PM

#

ugh

#

this is not promising 😦

jade plover Jul 6, 2022, 4:37 PM

#

Seems to be a GHA thing

wary plover Jul 6, 2022, 4:38 PM

#

i never saw any such timeouts or aborts with our old build system... which is leading me to believe that dagger+buildx is just too intense for gha :/

jade plover Jul 6, 2022, 4:42 PM

#

I'm looking to see what other projects encounter this and perhaps what they do to work around:

https://github.com/golang/vscode-go/issues/2165

#

@bleak terrace @fading sun do you have any insights into why GHA timeouts may pop up in CI?

@wary plover can you share what your GHA workflow looks like? Caching bits, etc

wary plover Jul 6, 2022, 4:47 PM

#

jade plover <@707661676056674346> <@949034677610643507> do you have any insights into why GH...

env:

  DAGGER_CACHE_BASE: dagger-ci-build
  DAGGER_LOG_LEVEL: debug

jobs:
  build-publish:
    name: Build image and publish
    runs-on: ubuntu-latest

    steps:
      - name: Checkout Rentals-API
        uses: actions/checkout@v2
        with:
          fetch-depth: 0
          path: src/Rentals-API
          ref: ${{ github.ref }}

      - name: Extract branch name
        shell: bash
        run: echo "GITHUB_BRANCH=$(echo ${GITHUB_REF#refs/heads/})" >> $GITHUB_ENV

      - name: Configure caches
        run: |
          echo "DAGGER_CACHE_TO=type=gha,mode=max,scope=${{env.DAGGER_CACHE_BASE}}-${{env.GITHUB_BRANCH}}" >> $GITHUB_ENV
          echo "DAGGER_CACHE_FROM=type=gha,scope=${{env.DAGGER_CACHE_BASE}}-${{env.GITHUB_BRANCH}}" >> $GITHUB_ENV

      - name: Dagger Release
        uses: dagger/dagger-for-github@v3
        with:
          workdir: src/Rentals-API
          cmds: |
            do verboseRelease
            do verboseRollout

      - name: Print Buildkitd Logs
        if: ${{ failure() }}
        run: |
          docker logs dagger-buildkitd

#

oh sorry, should have used a pastebin

#

obviously that doesn't include any of our primary env vars

#

verboseRelease action is basically a docker.#Build that has 3 steps:

emit start message via slack
build everything and push to AWS ECR
emit end message via slack

fading sun Jul 6, 2022, 4:52 PM

#

wary plover ``` User time (seconds): 151.11 System time (seconds): 6.92 ...

Maximum resident set size (kbytes): 11848448
If I'm reading that correctly, it's saying it used up to 11 GB of RSS when you ran locally. GHA runners appear to only have 7GB of RAM (https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources) so, it could be that you are entering swap hell in GHA and timing out the build?

wary plover Jul 6, 2022, 4:53 PM

#

fading sun > Maximum resident set size (kbytes): 11848448 If I'm reading that correctly, it...

heh, yeah, i hadn't noticed that but i think you're right ... my local workstation has 64gb ram which is probably why i didn't notice anything

#

so...... this seems a problem with dagger ?

fading sun Jul 6, 2022, 4:54 PM

#

wary plover so...... this seems a problem with dagger ?

Possible, does 11 GB seem high for what you are actually trying to build/deploy?

wary plover Jul 6, 2022, 4:55 PM

#

absolutely... the build itself is basically along the lines of...

fetch official python docker image
install build-essential tools in case any of the deps require C lib or compilation
setup virtualenv
install this source package along with any of it's dependencies

but considering that all runs in like 3min on my local workstation, i dunno

#

ok... my most minimal action which does...

download latest debian-slim
do an apt dist-upgrade
install git
mount local src
run git in local src mount to get some info like branch name

uses 4.4gb ram ... something doesn't seem right

#

(this is all with --no-cache)

fading sun Jul 6, 2022, 5:01 PM

#

I also just realized that the data you posted must have been from the dagger binary itself (not BuildKit), right? In which case, there almost certainly must be something wrong since the client binary is not the one doing any sort of intensive work

wary plover Jul 6, 2022, 5:04 PM

#

it's actually running like this: /usr/bin/time -v make dagger ACTION="--no-cache report" ... all make is doing is loading a .env to setup necessary env vars and then invoking dagger with the ACTION param

fading sun Jul 6, 2022, 5:08 PM

#

wary plover it's actually running like this: `/usr/bin/time -v make dagger ACTION="--no-cach...

Ah good to know, so yeah the time command will collect the usage data from the command being invoked and any subprocesses, which will include the dagger binary. However, BuildKit is not a subprocess of dagger (it's more like a server running in docker that the dagger binary is a client of), so I don't think the 11GB could include BuildKit.

#

If that's the case, then 11GB definitely seems like some sort of memory leak

#

(thinking of how best to debug this further)

wary plover Jul 6, 2022, 5:10 PM

#

well... i was just running top while running the same dagger command... and this was a quick cut/paste snapshot of the dagger entry when it was running (about halfway done)...
1082307 rocky 20 0 3582184 2.8g 24096 S 152.3 9.1 0:38.87 dagger

#

so... that's 152% of CPU as well which means dagger is definitely doing something intensive

#

i understand how you think buildx is being run via network connection in the docker daemon, but there must be more going on

fading sun Jul 6, 2022, 5:15 PM

#

wary plover i understand how you think buildx is being run via network connection in the doc...

Oh for sure, the data is clearly saying that the dagger binary is doing something intensive, I just am questioning if it actually should be or if there's some sort of bug causing this to happen

wary plover Jul 6, 2022, 5:16 PM

#

gotcha

#

well, perhaps it's the Cue interpreter process going haywire

#

dagger is a Go app right ? using an embedded Cue interpreter of sorts ?

fading sun Jul 6, 2022, 5:18 PM

#

wary plover dagger is a Go app right ? using an embedded Cue interpreter of sorts ?

Yes that's one suspect for sure. I'm thinking we may need to run the dagger binary w/ a profiler enabled so we can get insight into what is actually taking up so much memory+cpu.

wary plover Jul 6, 2022, 5:18 PM

#

well, atm it effectively means GHA (at least the github hosted runners) is a no-go for me atm 😦

#

i'm toying with possibly setting up our own AWS based runners, since that seems to be the best way to do our arm64 builds regardless

bleak terrace Jul 6, 2022, 5:31 PM

#

if there is a memory leak somewhere in dagger, we should fix it. We had similar issues with cue a while back. Is there a way for us to repro outside of GHA? Something we could dagger do locally would be awesome.

bleak terrace Jul 6, 2022, 5:54 PM

#

@slow peak for context ☝️

wary plover Jul 6, 2022, 6:03 PM

#

bleak terrace if there is a memory leak somewhere in dagger, we should fix it. We had similar ...

well in fact the memory problems where i'm seeing like the huge amounts of memory being used was all with local runs, not on gha... and those memory issues are what i think we currently think are causing the problems on gha

fading sun Jul 6, 2022, 6:30 PM

#

@wary plover have a branch w/ pprof profiling enabled here: https://github.com/sipsma/dagger/commit/a56bd9b2f7b800e2408b29542a27792147e1cdd8

If you run with that binary and then in the middle of the build when memory usage is high separately run go tool pprof http://localhost:6060/debug/pprof/heap you'll drop into a profiling shell from which you can run top to see what's using so much memory.

Are you comfortable with building dagger from that branch yourself? I don't have an x86 machine around at the moment

GitHub

add pprof endpoint for temp debugging · sipsma/dagger@a56bd9b

Signed-off-by: Erik Sipsma

wary plover Jul 6, 2022, 6:31 PM

#

fading sun <@269520075948556298> have a branch w/ pprof profiling enabled here: https://git...

that's something i can toy around with .. yes... but it'll have to be in off-business hours (ie later this evening)

slow peak Jul 6, 2022, 6:31 PM

#

I can try and get you pre-compiled binaries in a few minutes, if you prefer

wary plover Jul 6, 2022, 6:31 PM

#

i should be fine building it myself, if i have trouble, i know where to scream 😉

slow peak Jul 6, 2022, 6:33 PM

#

Started a build in parallel, which binaries do you need? (e.g. linux amd64?)

wary plover Jul 6, 2022, 6:35 PM

#

yep

slow peak Jul 6, 2022, 6:45 PM

#

📎 dagger_v0.2.21_linux_amd64-PPROF.tar.gz

slow peak Jul 6, 2022, 6:46 PM

#

fading sun <@269520075948556298> have a branch w/ pprof profiling enabled here: https://git...

This is the prebuilt binary for this

wary plover Jul 6, 2022, 6:47 PM

#

cool, got em

slow peak Jul 6, 2022, 6:51 PM

#

Also -- if you can't run the go tool pprof command on your end, you can just grab the heap information using curl curl -s http://localhost:6060/debug/pprof/heap > ~/Downloads/base.heap, and send it our way -- we should be able to run pprof on our end

#

Either way, it's a point in time snapshot -- you should run go tool pprof or curl once you notice it's taking a bunch of memory

#

(basically it gives insight as to WHAT is taking memory, at that particular point in time)

wary plover Jul 7, 2022, 4:37 PM

#

so i have the profiling dagger installed and i have go tool pprof installed, but when i run dagger now i get...

dagger do report --log-format plain
4:36PM ERROR system | failed to load plan: this plan requires dagger 0.2.21 or newer. Run `dagger version --check` to check for latest version
this plan requires dagger 0.2.21 or newer. Run `dagger version --check` to check for latest version

#

rocky@devwork:~/dev/rentals/src/Rentals-API$ dagger version
dagger v0.2.21-next (a56bd9b2) linux/amd64

#

i don't even recall where in my dagger setup/plan i declared it needed dagger >= 0.2.21

#

@slow peak perhaps the "-next" version suffix you gave it is confusing dagger ?

#

also, i just tried running a very very simple dagger plan on an amazon EC2 t3a.small instance and it completely froze the VM ... i'm guessing swap-hell

slow peak Jul 7, 2022, 4:56 PM

#

wary plover <@707661669819613324> perhaps the "-next" version suffix you gave it is confusin...

My bad, fixing it

#

📎 dagger_v0.2.21_linux_amd64-PPROF-2.tar.gz

wary plover Jul 7, 2022, 5:08 PM

#

fading sun <@269520075948556298> have a branch w/ pprof profiling enabled here: https://git...

here's the output from running a very basic "report" action that i have that just runs git to get branch info...

(pprof) top
Showing nodes accounting for 1521.79MB, 72.08% of 2111.16MB total
Dropped 206 nodes (cum <= 10.56MB)
Showing top 10 nodes out of 102
      flat  flat%   sum%        cum   cum%
  238.02MB 11.27% 11.27%   238.02MB 11.27%  cuelang.org/go/internal/core/adt.updateCyclic
  211.02MB 10.00% 21.27%   763.58MB 36.17%  cuelang.org/go/internal/core/adt.(*nodeContext).addStruct
  204.15MB  9.67% 30.94%   260.23MB 12.33%  cuelang.org/go/internal/core/adt.(*OpContext).NewPosf
  189.02MB  8.95% 39.89%   189.02MB  8.95%  cuelang.org/go/internal/core/adt.(*Vertex).GetArc
  176.54MB  8.36% 48.26%   176.54MB  8.36%  cuelang.org/go/internal/core/adt.(*Vertex).addConjunct (inline)
  144.02MB  6.82% 55.08%   311.04MB 14.73%  cuelang.org/go/internal/core/adt.(*ForClause).yield
  136.01MB  6.44% 61.52%   136.01MB  6.44%  cuelang.org/go/internal/core/adt.(*Vertex).AddStruct (inline)
   97.48MB  4.62% 66.14%    97.48MB  4.62%  cuelang.org/go/internal/core/adt.getScratch
      67MB  3.17% 69.31%       67MB  3.17%  cuelang.org/go/internal/core/adt.CloseInfo.SpawnRef (inline)
   58.52MB  2.77% 72.08%    58.52MB  2.77%  cuelang.org/go/internal/core/adt.(*ValueError).AddPosition

#

so it looks like it is Cue that is the memory hog

wary plover Jul 7, 2022, 5:42 PM

#

i'm gonna continue testing, but take my project plan out of the equation (and all of it's many associated custom actions) and try on a blank project

#

so at first glimpse, it appears to be the very many layers nested deep actions with their deps that seems to be the culprit

#

almost makes me think there's a memory leak

#

fwiw, the absolute barest plan in a fresh project is still using 124mb ram according to /usr/bin/time

jade plover Jul 7, 2022, 6:04 PM

#

@young belfry @timber mist FYI ☝️

wary plover Jul 7, 2022, 7:53 PM

#

so here's issue #1 ... when i remove a bunch of actions that aren't needed for my simple "report" action and re-run the "report" action... the dagger memory consumption goes from 5gb to about 1gb and runs waaaay faster

#

so it's obviously parsing everything even when 95% isn't required

bleak terrace Jul 7, 2022, 9:03 PM

#

looks like indeed a mem leak in CUE, that's what we suspecting yesterday with @slow peak

#

@timber mist do you know if it's a known issue on cue upstream of some mem leak was fixed in the latest release?

timber mist Jul 7, 2022, 9:07 PM

#

@bleak terrace I’m not aware of these sorts of cases being solved in cue yet. I suspect it won’t happen until the cycle fixes are in.

#

And those changes have been taking a while to land.

wary plover Jul 7, 2022, 9:32 PM

#

i'm not sure what i can do on my end... i mean consuming 5gb or higher for a simple run makes running my build in GHA impractical and we depend on GHA :/

#

running this simple dagger command actually killed a AWS EC2 t3a.small vm on me... 2gb of ram

bleak terrace Jul 7, 2022, 9:46 PM

#

we need to investigate further, is there anything in your config that you can share? If we can reproduce locally, it'll help a lot

wary plover Jul 7, 2022, 9:49 PM

#

bleak terrace we need to investigate further, is there anything in your config that you can sh...

i'll see what i can do about extracting some portion, but for me right now, for every action i comment out in my main.cue plan file, memory usage drops quite noticeably ... seems like it's just the overall using/import of lots of cue files

slow peak Jul 7, 2022, 9:49 PM

#

We don't need a fully working config, just enough bits to make it slow and memory intensive

#

Chances are there's something innocent in there triggering massive amounts of ram huge by CUE. Could be nesting, or references, or something like that

#

If you can share something close enough, we'd be happy to run the investigation on our end

jade plover Jul 7, 2022, 9:51 PM

#

yep, even a code "shape" that will allow us to repro? Lots of nested actions, or for loops, or nested definitions, etc. We don't need details/secrets/working scripts/etc.

wary plover Jul 7, 2022, 9:55 PM

#

i'm trying... it just seems like it's the amount of everything that's the central issue

jade plover Jul 7, 2022, 10:23 PM

#

Were you logged in to Dagger Cloud by any chance @wary plover ?

wary plover Jul 7, 2022, 10:24 PM

#

no

jade plover Jul 7, 2022, 10:24 PM

#

ok. if you were ever in doubt, you could dagger logout and run again.

wary plover Jul 7, 2022, 10:24 PM

#

i don't even know what dagger cloud is 😉

jade plover Jul 7, 2022, 10:25 PM

#

helps us to work with folks on debug and such.

wary plover Jul 7, 2022, 10:25 PM

#

ah

#

i'm still trying to extract enough of this Cue code to have something reasonable that consumes far too much ram

jade plover Jul 7, 2022, 10:26 PM

#

Yeah, if you were logged in to it, on a current version of dagger you'd see something like this at the start of a run:

wary plover Jul 7, 2022, 10:26 PM

#

nope, not seeing anything like that, nor have i ever 😉

jade plover Jul 7, 2022, 10:27 PM

#

Only certain Dagger folks (and yourself) could see the URL which will show the CUE file, how it was invoked, stats, errors.

#

Yep, you need to log in to activate it

wary plover Jul 7, 2022, 10:27 PM

#

so are you saying i should be using dagger cloud so you guys can see more ?

jade plover Jul 7, 2022, 10:27 PM

#

It could be helpful.

wary plover Jul 7, 2022, 10:28 PM

#

and it won't expose any secrets or anything ?

jade plover Jul 7, 2022, 10:28 PM

#

right. no secrets

wary plover Jul 7, 2022, 10:29 PM

#

ok, i'm cool with doing that... trying to manually extract things is painful and i can't just dump my source base somewhere

jade plover Jul 7, 2022, 10:29 PM

#

https://docs.dagger.io/1241/dagger-cloud/

Debugging with Dagger Cloud | Dagger

Dagger Cloud is under development, but we have just released the first telemetry feature!

wary plover Jul 7, 2022, 10:30 PM

#

does it only expose cue files? or does it also expose the entire source set of my project?

jade plover Jul 7, 2022, 10:30 PM

#

Only cue files.

#

In fact. Right now, only the main cue file you invoke

wary plover Jul 7, 2022, 10:30 PM

#

and you don't have any access to build artifacts ? 🙂

jade plover Jul 7, 2022, 10:30 PM

#

correct

#

so I'm hoping we can see enough in that main cue file to get a sense of what's happening for a repro 🤞

#

I know you've got lots of includes, etc

wary plover Jul 7, 2022, 10:32 PM

#

heh, dagger login is failing because i'm in a ssh terminal and the env i'm using has no gui :/

jade plover Jul 7, 2022, 10:32 PM

#

oof. I know there's a workaround. not sure if it's pretty. @shadow skiff ?

shadow skiff Jul 7, 2022, 10:34 PM

#

jade plover oof. I know there's a workaround. not sure if it's pretty. <@336241811179962368>...

yeah... you can login in your computer and copy the ~/.config/dagger/credentials file to the destination host. I know its very hacky but it's the only way until we implement headless auth 😢

wary plover Jul 7, 2022, 10:45 PM

#

ok, got it working

#

@jade plover so if i share the run url here only me and the developers can access?

jade plover Jul 7, 2022, 10:46 PM

#

yes

wary plover Jul 7, 2022, 10:46 PM

#

https://dagger.cloud/runs/70c3120b-0da2-47df-a84f-6a0c1782d290

jade plover Jul 7, 2022, 10:47 PM

#

got it. Thanks! Taking a look.

wary plover Jul 7, 2022, 10:48 PM

#

it looks like that doesn't show the many packages i wrote

#

Rentals-DaggerIO is not public

jade plover Jul 7, 2022, 10:49 PM

#

yep. it's a bit shallow at the moment.

wary plover Jul 7, 2022, 10:58 PM

#

i recently asked my company for permission to opensource Rentals-DaggerIO but haven't yet gotten a response 😦

wary plover Jul 8, 2022, 3:19 PM

#

for anyone still paying attention, i just removed all docker.#Build use from Rentals-DaggerIO and my plan ... cut the memory usage of my primary action down from approx 10gb to 4.2gb ... so progress 😉

bleak terrace Jul 8, 2022, 4:09 PM

#

Still paying attention! we've been digging into this with the team. Thanks for the extra context. @slow peak spotted a big leak in cue, we're still on an old version. Knowing that docker.#Build makes this will help isolate. Ideally we can repro on the latest cue version and send a repro upstream. I also see @young belfry is on this thread, (Paul is co-author of cue 👋 ), so clearly all the right people have their eyes on this!

wary plover Jul 8, 2022, 4:12 PM

#

bleak terrace Still paying attention! we've been digging into this with the team. Thanks for t...

i very much appreciate this... it's amazing knowing the level of support you folks are providing 👏

young belfry Jul 8, 2022, 5:00 PM

#

Happy to help look at a repro

wary plover Jul 9, 2022, 1:28 PM

#

don't suppose there's been any developments on this? i apologize in advance if i'm being impatient 😉

bleak terrace Jul 9, 2022, 6:25 PM

#

wary plover don't suppose there's been any developments on this? i apologize in advance if i...

unfortunately not much progress, I'd recommend relying on a workaround for now (less relying on docker.#Build with the steps array). There are multiple ways to rely on this, for instance replacing some of those base images with inlined Dockerfile (https://docs.dagger.io/1241/docker#dockerdockerfile), using docker.#Dockerfile. That should cut your memory although the problem won't disappear. We're working on a fix in the meantime et will update you of course.

The docker package | Dagger

The universe.dagger.io module is meant to provide higher level abstractions on top of core actions. Of these, the universe.dagger.io/docker package provides a general base for building and running docker images.

#

btw, if you need help adapting your config, let us know

#anyone have an example github actions