#anyone have an example github actions

1 messages Β· Page 1 of 1 (latest)

shadow skiff
#

cc @jade plover have u seen this in the wild?

wary plover
#

it's kind of a moot point right now, i can't get my dagger based workflow to complete successfully on github actions ... x86 even ... it either abruptly aborts (and google seems to indicate this is from a process using too much cpu) or else it hangs indefinitely :/

shadow skiff
#

GH aborts the action if it's consuming a lot of CPU? interesting..

wary plover
#

i had several attempts simply cancel on their own with no error message in the standard web interface, but when i viewed the workflow debug raw output the last thing that it showed was that the workflow had been cancelled with SIGINT or something similar, i googled the exact message and the results seemed to indicate that yes, too much cpu and the runner cancels the workflow

#

but i'm more concerned about the workflow runs that ran for 30min (with no more debug output after about 15min) which i endded up cancelling myself

shadow skiff
#

would it be possible to share the dagger plan with us in a way that we could run it without the actual project code?

wary plover
#

not without a lot of auditing on my side first i think

#

when everything is cached, it takes about 2min to run the identical dagger plan on my local 3-year-old workstation

#

gonna run it with --no-cache here to see how long it takes

#
        User time (seconds): 151.11
        System time (seconds): 6.92
        Percent of CPU this job got: 78%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 3:21.17
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 11848448
#

that's with --no-cache

jade plover
#

We get some timeouts in our CI with GitHub Actions which forces us to re-run the workflows. I see that when reviewing PRs.

wary plover
#

ugh

#

this is not promising 😦

jade plover
#

Seems to be a GHA thing

wary plover
#

i never saw any such timeouts or aborts with our old build system... which is leading me to believe that dagger+buildx is just too intense for gha :/

jade plover
#

@bleak terrace @fading sun do you have any insights into why GHA timeouts may pop up in CI?

@wary plover can you share what your GHA workflow looks like? Caching bits, etc

wary plover
# jade plover <@707661676056674346> <@949034677610643507> do you have any insights into why GH...
env:

  DAGGER_CACHE_BASE: dagger-ci-build
  DAGGER_LOG_LEVEL: debug

jobs:
  build-publish:
    name: Build image and publish
    runs-on: ubuntu-latest

    steps:
      - name: Checkout Rentals-API
        uses: actions/checkout@v2
        with:
          fetch-depth: 0
          path: src/Rentals-API
          ref: ${{ github.ref }}

      - name: Extract branch name
        shell: bash
        run: echo "GITHUB_BRANCH=$(echo ${GITHUB_REF#refs/heads/})" >> $GITHUB_ENV

      - name: Configure caches
        run: |
          echo "DAGGER_CACHE_TO=type=gha,mode=max,scope=${{env.DAGGER_CACHE_BASE}}-${{env.GITHUB_BRANCH}}" >> $GITHUB_ENV
          echo "DAGGER_CACHE_FROM=type=gha,scope=${{env.DAGGER_CACHE_BASE}}-${{env.GITHUB_BRANCH}}" >> $GITHUB_ENV

      - name: Dagger Release
        uses: dagger/dagger-for-github@v3
        with:
          workdir: src/Rentals-API
          cmds: |
            do verboseRelease
            do verboseRollout

      - name: Print Buildkitd Logs
        if: ${{ failure() }}
        run: |
          docker logs dagger-buildkitd
#

oh sorry, should have used a pastebin

#

obviously that doesn't include any of our primary env vars

#

verboseRelease action is basically a docker.#Build that has 3 steps:

  1. emit start message via slack
  2. build everything and push to AWS ECR
  3. emit end message via slack
fading sun
wary plover
#

so...... this seems a problem with dagger ?

fading sun
wary plover
#

absolutely... the build itself is basically along the lines of...

  1. fetch official python docker image
  2. install build-essential tools in case any of the deps require C lib or compilation
  3. setup virtualenv
  4. install this source package along with any of it's dependencies

but considering that all runs in like 3min on my local workstation, i dunno

#

ok... my most minimal action which does...

  1. download latest debian-slim
  2. do an apt dist-upgrade
  3. install git
  4. mount local src
  5. run git in local src mount to get some info like branch name

uses 4.4gb ram ... something doesn't seem right

#

(this is all with --no-cache)

fading sun
#

I also just realized that the data you posted must have been from the dagger binary itself (not BuildKit), right? In which case, there almost certainly must be something wrong since the client binary is not the one doing any sort of intensive work

wary plover
#

it's actually running like this: /usr/bin/time -v make dagger ACTION="--no-cache report" ... all make is doing is loading a .env to setup necessary env vars and then invoking dagger with the ACTION param

fading sun
#

If that's the case, then 11GB definitely seems like some sort of memory leak

#

(thinking of how best to debug this further)

wary plover
#

well... i was just running top while running the same dagger command... and this was a quick cut/paste snapshot of the dagger entry when it was running (about halfway done)...
1082307 rocky 20 0 3582184 2.8g 24096 S 152.3 9.1 0:38.87 dagger

#

so... that's 152% of CPU as well which means dagger is definitely doing something intensive

#

i understand how you think buildx is being run via network connection in the docker daemon, but there must be more going on

fading sun
wary plover
#

gotcha

#

well, perhaps it's the Cue interpreter process going haywire

#

dagger is a Go app right ? using an embedded Cue interpreter of sorts ?

fading sun
wary plover
#

well, atm it effectively means GHA (at least the github hosted runners) is a no-go for me atm 😦

#

i'm toying with possibly setting up our own AWS based runners, since that seems to be the best way to do our arm64 builds regardless

bleak terrace
#

if there is a memory leak somewhere in dagger, we should fix it. We had similar issues with cue a while back. Is there a way for us to repro outside of GHA? Something we could dagger do locally would be awesome.

bleak terrace
#

@slow peak for context ☝️

wary plover
fading sun
#

@wary plover have a branch w/ pprof profiling enabled here: https://github.com/sipsma/dagger/commit/a56bd9b2f7b800e2408b29542a27792147e1cdd8

If you run with that binary and then in the middle of the build when memory usage is high separately run go tool pprof http://localhost:6060/debug/pprof/heap you'll drop into a profiling shell from which you can run top to see what's using so much memory.

Are you comfortable with building dagger from that branch yourself? I don't have an x86 machine around at the moment

wary plover
slow peak
#

I can try and get you pre-compiled binaries in a few minutes, if you prefer

wary plover
#

i should be fine building it myself, if i have trouble, i know where to scream πŸ˜‰

slow peak
#

Started a build in parallel, which binaries do you need? (e.g. linux amd64?)

wary plover
#

yep

slow peak
slow peak
wary plover
#

cool, got em

slow peak
#

Also -- if you can't run the go tool pprof command on your end, you can just grab the heap information using curl curl -s http://localhost:6060/debug/pprof/heap > ~/Downloads/base.heap, and send it our way -- we should be able to run pprof on our end

#

Either way, it's a point in time snapshot -- you should run go tool pprof or curl once you notice it's taking a bunch of memory

#

(basically it gives insight as to WHAT is taking memory, at that particular point in time)

wary plover
#

so i have the profiling dagger installed and i have go tool pprof installed, but when i run dagger now i get...

dagger do report --log-format plain
4:36PM ERROR system | failed to load plan: this plan requires dagger 0.2.21 or newer. Run `dagger version --check` to check for latest version
this plan requires dagger 0.2.21 or newer. Run `dagger version --check` to check for latest version
#
rocky@devwork:~/dev/rentals/src/Rentals-API$ dagger version
dagger v0.2.21-next (a56bd9b2) linux/amd64
#

i don't even recall where in my dagger setup/plan i declared it needed dagger >= 0.2.21

#

@slow peak perhaps the "-next" version suffix you gave it is confusing dagger ?

#

also, i just tried running a very very simple dagger plan on an amazon EC2 t3a.small instance and it completely froze the VM ... i'm guessing swap-hell

wary plover
# fading sun <@269520075948556298> have a branch w/ pprof profiling enabled here: https://git...

here's the output from running a very basic "report" action that i have that just runs git to get branch info...

(pprof) top
Showing nodes accounting for 1521.79MB, 72.08% of 2111.16MB total
Dropped 206 nodes (cum <= 10.56MB)
Showing top 10 nodes out of 102
      flat  flat%   sum%        cum   cum%
  238.02MB 11.27% 11.27%   238.02MB 11.27%  cuelang.org/go/internal/core/adt.updateCyclic
  211.02MB 10.00% 21.27%   763.58MB 36.17%  cuelang.org/go/internal/core/adt.(*nodeContext).addStruct
  204.15MB  9.67% 30.94%   260.23MB 12.33%  cuelang.org/go/internal/core/adt.(*OpContext).NewPosf
  189.02MB  8.95% 39.89%   189.02MB  8.95%  cuelang.org/go/internal/core/adt.(*Vertex).GetArc
  176.54MB  8.36% 48.26%   176.54MB  8.36%  cuelang.org/go/internal/core/adt.(*Vertex).addConjunct (inline)
  144.02MB  6.82% 55.08%   311.04MB 14.73%  cuelang.org/go/internal/core/adt.(*ForClause).yield
  136.01MB  6.44% 61.52%   136.01MB  6.44%  cuelang.org/go/internal/core/adt.(*Vertex).AddStruct (inline)
   97.48MB  4.62% 66.14%    97.48MB  4.62%  cuelang.org/go/internal/core/adt.getScratch
      67MB  3.17% 69.31%       67MB  3.17%  cuelang.org/go/internal/core/adt.CloseInfo.SpawnRef (inline)
   58.52MB  2.77% 72.08%    58.52MB  2.77%  cuelang.org/go/internal/core/adt.(*ValueError).AddPosition
#

so it looks like it is Cue that is the memory hog

wary plover
#

i'm gonna continue testing, but take my project plan out of the equation (and all of it's many associated custom actions) and try on a blank project

#

so at first glimpse, it appears to be the very many layers nested deep actions with their deps that seems to be the culprit

#

almost makes me think there's a memory leak

#

fwiw, the absolute barest plan in a fresh project is still using 124mb ram according to /usr/bin/time

jade plover
#

@young belfry @timber mist FYI ☝️

wary plover
#

so here's issue #1 ... when i remove a bunch of actions that aren't needed for my simple "report" action and re-run the "report" action... the dagger memory consumption goes from 5gb to about 1gb and runs waaaay faster

#

so it's obviously parsing everything even when 95% isn't required

bleak terrace
#

looks like indeed a mem leak in CUE, that's what we suspecting yesterday with @slow peak

#

@timber mist do you know if it's a known issue on cue upstream of some mem leak was fixed in the latest release?

timber mist
#

@bleak terrace I’m not aware of these sorts of cases being solved in cue yet. I suspect it won’t happen until the cycle fixes are in.

#

And those changes have been taking a while to land.

wary plover
#

i'm not sure what i can do on my end... i mean consuming 5gb or higher for a simple run makes running my build in GHA impractical and we depend on GHA :/

#

running this simple dagger command actually killed a AWS EC2 t3a.small vm on me... 2gb of ram

bleak terrace
#

we need to investigate further, is there anything in your config that you can share? If we can reproduce locally, it'll help a lot

wary plover
slow peak
#

We don't need a fully working config, just enough bits to make it slow and memory intensive

#

Chances are there's something innocent in there triggering massive amounts of ram huge by CUE. Could be nesting, or references, or something like that

#

If you can share something close enough, we'd be happy to run the investigation on our end

jade plover
#

yep, even a code "shape" that will allow us to repro? Lots of nested actions, or for loops, or nested definitions, etc. We don't need details/secrets/working scripts/etc.

wary plover
#

i'm trying... it just seems like it's the amount of everything that's the central issue

jade plover
#

Were you logged in to Dagger Cloud by any chance @wary plover ?

wary plover
#

no

jade plover
#

ok. if you were ever in doubt, you could dagger logout and run again.

wary plover
#

i don't even know what dagger cloud is πŸ˜‰

jade plover
#

helps us to work with folks on debug and such.

wary plover
#

ah

#

i'm still trying to extract enough of this Cue code to have something reasonable that consumes far too much ram

jade plover
#

Yeah, if you were logged in to it, on a current version of dagger you'd see something like this at the start of a run:

wary plover
#

nope, not seeing anything like that, nor have i ever πŸ˜‰

jade plover
#

Only certain Dagger folks (and yourself) could see the URL which will show the CUE file, how it was invoked, stats, errors.

#

Yep, you need to log in to activate it

wary plover
#

so are you saying i should be using dagger cloud so you guys can see more ?

jade plover
#

It could be helpful.

wary plover
#

and it won't expose any secrets or anything ?

jade plover
#

right. no secrets

wary plover
#

ok, i'm cool with doing that... trying to manually extract things is painful and i can't just dump my source base somewhere

jade plover
wary plover
#

does it only expose cue files? or does it also expose the entire source set of my project?

jade plover
#

Only cue files.

#

In fact. Right now, only the main cue file you invoke

wary plover
#

and you don't have any access to build artifacts ? πŸ™‚

jade plover
#

correct

#

so I'm hoping we can see enough in that main cue file to get a sense of what's happening for a repro 🀞

#

I know you've got lots of includes, etc

wary plover
#

heh, dagger login is failing because i'm in a ssh terminal and the env i'm using has no gui :/

jade plover
#

oof. I know there's a workaround. not sure if it's pretty. @shadow skiff ?

shadow skiff
wary plover
#

ok, got it working

#

@jade plover so if i share the run url here only me and the developers can access?

jade plover
#

yes

jade plover
#

got it. Thanks! Taking a look.

wary plover
#

it looks like that doesn't show the many packages i wrote

#

Rentals-DaggerIO is not public

jade plover
#

yep. it's a bit shallow at the moment.

wary plover
#

i recently asked my company for permission to opensource Rentals-DaggerIO but haven't yet gotten a response 😦

wary plover
#

for anyone still paying attention, i just removed all docker.#Build use from Rentals-DaggerIO and my plan ... cut the memory usage of my primary action down from approx 10gb to 4.2gb ... so progress πŸ˜‰

bleak terrace
#

Still paying attention! we've been digging into this with the team. Thanks for the extra context. @slow peak spotted a big leak in cue, we're still on an old version. Knowing that docker.#Build makes this will help isolate. Ideally we can repro on the latest cue version and send a repro upstream. I also see @young belfry is on this thread, (Paul is co-author of cue πŸ‘‹ ), so clearly all the right people have their eyes on this!

wary plover
young belfry
#

Happy to help look at a repro

wary plover
#

don't suppose there's been any developments on this? i apologize in advance if i'm being impatient πŸ˜‰

bleak terrace
# wary plover don't suppose there's been any developments on this? i apologize in advance if i...

unfortunately not much progress, I'd recommend relying on a workaround for now (less relying on docker.#Build with the steps array). There are multiple ways to rely on this, for instance replacing some of those base images with inlined Dockerfile (https://docs.dagger.io/1241/docker#dockerdockerfile), using docker.#Dockerfile. That should cut your memory although the problem won't disappear. We're working on a fix in the meantime et will update you of course.

The universe.dagger.io module is meant to provide higher level abstractions on top of core actions. Of these, the universe.dagger.io/docker package provides a general base for building and running docker images.

#

btw, if you need help adapting your config, let us know