little veldt Aug 20, 2024, 2:05 PM

#

👋 hello! with the ssh auth stuff in, i'm now motivated by the idea of doing another release soon 🙂

#

context dirs feels like it's in the final stages? so maybe we should aim for early next week? maybe tuesday?

daring pier Aug 20, 2024, 5:01 PM

#

SGTM!

little veldt Aug 21, 2024, 12:58 PM

#

pinging <@&946480760016207902> 👋 (just so everyone's in the thread)

#

i'm gonna suggest next wednesday (monday is a bank holiday in the uk, and would like at least one day next week to test/get things in order)
we'll include the ssh private modules, --interactive-command from https://github.com/dagger/dagger/pull/8171, and a bunch of misc fixes

#

✨ v0.12.6 - 28th August 2024

tough vessel Aug 26, 2024, 10:16 PM

#

opened https://github.com/dagger/dagger/pull/8234 which fixes #go message and the telemetry.Close() breakage

little veldt Aug 27, 2024, 10:38 AM

#

there's a few remaining prs in the milestone, that would be good to get some eyes on 🎉 https://github.com/dagger/dagger/milestone/57 (will be going through all of not-mine in there today)

GitHub

v0.12.6 Milestone · dagger/dagger

An engine to run your pipelines in containers. Contribute to dagger/dagger development by creating an account on GitHub.

silent notch Aug 27, 2024, 11:35 AM

#

SGTM! I am aiming to get the resolution of https://github.com/dagger/dagger/pull/8212 merged too, so that we can test the effectiveness of the split with this new release.

#

Just added it to the milestone so that we track it all together.

#

I would really like to get this in the Helm chart release: https://github.com/dagger/dagger/pull/8219. I have added it to the milestone & pushed the commit that makes it OK to merge from my POV. Go for it @scarlet nacelle if it works for you too.

fiery bison Aug 27, 2024, 9:25 PM

#

little veldt there's a few remaining prs in the milestone, that would be good to get some eye...

I also approved some of your PR so you can merge them tomorrow!

#

Btw it seems there's something weird with the CI, all my PRs are extra red after a rebase, I'm confused

fiery bison Aug 27, 2024, 9:47 PM

#

@coral pilot Context directory CI is green, can I merge it? 😄

#

https://github.com/dagger/dagger/pull/7744

GitHub

Feat/context directory by TomChv · Pull Request #7744 · dagger/dagg...

This is an implementation of #7647
API changes
This PR updates the Function and FunctionArg from our engine API to support defaultPathFromContext

Function

little veldt · 2024-08-20T14:05:59.883Z

🚀 v0.12.6 - 28th August 2024 | Dagger | Page 1

withArg(
""&...

#

(sorry to spam ping you, I know it's a big PR so I don't want to make any mistake)

coral pilot Aug 27, 2024, 9:48 PM

#

fiery bison <@949034677610643507> Context directory CI is green, can I merge it? 😄

yep LGTM!

fiery bison Aug 27, 2024, 9:48 PM

#

https://tenor.com/view/minions-yehey-excited-gif-5483182

Tenor

YOUHOU!

▶ Play video

#

@rustic gorge

#

^^^

rustic gorge Aug 27, 2024, 9:51 PM

#

https://tenor.com/view/diddy-kong-racing-diddy-kong-conker-pipsy-checkered-flag-gif-6432125918942089752

Tenor

#

@fiery bison did we resolve your question of "relative path in dep"?

fiery bison Aug 27, 2024, 9:53 PM

#

Yep!

#

There's also a test for it

#

I'm so happy that this PR is finally merge, almost 2 months of common work haha

#

I can focus on docs and extending ignore metadata later, I'll work on interface support on TS first because Helder started to work on Python's support

#

I don't know if doc update should be part of the release but here's an update of the TS doc with package manager config & runtime version config: https://github.com/dagger/dagger/pull/8251
/cc @prime spruce

GitHub

docs: add package manager config page & update runtime by TomChv · ...

Add packageManager config doc page.
Update runtime page with version configuration.

rustic gorge Aug 27, 2024, 10:25 PM

#

@fiery bison so we're aligned on dep/$(jq -r .source dep/dagger.json)?

fiery bison Aug 28, 2024, 7:53 AM

#

rustic gorge <@281874480651829250> so we're aligned on `dep/$(jq -r .source dep/dagger.json)...

I didn’t write a test for that but you should try it

#

And let me know

rustic gorge Aug 28, 2024, 8:00 AM

#

Yeah but is it the way it's implemented?

fiery bison Aug 28, 2024, 8:12 AM

#

Can you re explain to me what you mean by dep/$(jq -r .source dep/dagger.json so I'm sure I can answer your question accurately

rustic gorge Aug 28, 2024, 8:17 AM

#

fiery bison Can you re explain to me what you mean by `dep/$(jq -r .source dep/dagger.json` ...

When defaultPath is a relative path, for example +defaultPath="./foo" then it should be relative to the source directory of the callee module, as defined in that module's source field in dagger.json.

For example:

$ cat ./dep/dagger.json
{
 "name": "dep",
 "source": "./src"
}
$ ls ./dep/src
go.mod
go.sum
main.go
countries.txt

func (m *Dep) Countries(
 ctx context.Context,
 // +defaultPath="./countries.txt"
 file *dagger.File,
) string {
 return file.Contents(ctx)
}

dagger call -m ./dep countries

In this example, the function countries from module ./dep with +defaultPath="./countries.txt" should open ./dep/src/countries.txt and not ./dep/countries.txt (source dir and not root dir)

fiery bison Aug 28, 2024, 8:20 AM

#

rustic gorge When `defaultPath` is a relative path, for example `+defaultPath="./foo"` then i...

Oh no, right now it's pointing to the root dir, but I can update that really quick, should be a one liner fix
However I thought we all agreed that absolute path = context dir & relative path = root dir :/

rustic gorge Aug 28, 2024, 8:20 AM

#

fiery bison Oh no, right now it's pointing to the root dir, but I can update that really qui...

That's why I was checking 🙂

fiery bison Aug 28, 2024, 8:21 AM

#

Okay, I need to fix that and also fix tests then

rustic gorge Aug 28, 2024, 8:22 AM

#

@fiery bison re-reading the comments in https://github.com/dagger/dagger/issues/7647, it's definitely source dir, unless there was more discussions later outside of the issue.

#

I see Helder suggesting source dir, and Alex agreeing

fiery bison Aug 28, 2024, 8:26 AM

#

Right there: https://github.com/dagger/dagger/pull/7744#issuecomment-2191285230

GitHub

Feat/context directory by TomChv · Pull Request #7744 · dagger/dagg...

This is an implementation of #7647
API changes
This PR updates the Function and FunctionArg from our engine API to support defaultPathFromContext

Function

withArg(
""&...

#

I remember that I switched from source dir to root dir with that one

#

Maybe you forget to precise source root dir, and I didn't asked any question :/ My bad

rustic gorge Aug 28, 2024, 8:34 AM

#

In the spec I actually say "source file" which would be the dream, but not technically possible sadly. So in the comments Helder and Alex propose the next best thing which is source dir

fiery bison Aug 28, 2024, 8:40 AM

#

Yeah I know, I got confused by your comment on my PR, that's my bad

#

I'll fix that

rustic gorge Aug 28, 2024, 8:40 AM

#

Thank you! The end is in sight 🙂

fiery bison Aug 28, 2024, 8:46 AM

#

I added the fix, tests are in progress 😄

little veldt Aug 28, 2024, 9:28 AM

#

Relative to the source makes sense, but just to check, can a relative path also be "../" if I want to go up to the parent if the source is in .dagger for instance?

little veldt Aug 28, 2024, 10:23 AM

#

if anyone's around can i get a review on https://github.com/dagger/dagger/pull/8217 ?

little veldt Aug 28, 2024, 10:54 AM

#

oops, we also need to bump goreleaser after bumping go to 1.23 https://github.com/dagger/dagger/pull/8256

little veldt Aug 28, 2024, 11:19 AM

#

moving the ci test split out of the milestone - https://github.com/dagger/dagger/pull/8212
since this just affects our test suite/ci, this shouldn't block the release

little veldt Aug 28, 2024, 12:03 PM

#

@fiery bison what's the timeline on the ts optimization work you've been doing? as in, will it land in the next couple hours?
otherwise, i'd like to go ahead with the release even if it's not in there, there's a pretty critical go fix (pinning otel deps) we should be getting out asap

fiery bison Aug 28, 2024, 12:11 PM

#

little veldt <@281874480651829250> what's the timeline on the ts optimization work you've bee...

It should land soon, I'm fixing context dir first

#

@little veldt Fix of the context dir: https://github.com/dagger/dagger/pull/8260

GitHub

fix: context dir relative path point to source dir by TomChv · Pull...

Follow up to #7744 to point relative path to source dir instead of root dir.

eager river Aug 28, 2024, 12:25 PM

#

just saw this thread and excited pepe_hands . Are we planning to release 0.12.6 today? If we're planning, I need to update my demo for tomorrow to new version due to this ssh support.

fiery bison Aug 28, 2024, 12:36 PM

#

would love to get a quick approval on https://github.com/dagger/dagger/pull/8251 btw 😄
/cc @silent notch

GitHub

docs: add package manager config page & update runtime by TomChv · ...

Add packageManager config doc page.
Update runtime page with version configuration.

#

@little veldt CI's green, ready to be merged: https://github.com/dagger/dagger/pull/8260

GitHub

fix: context dir relative path point to source dir by TomChv · Pull...

Follow up to #7744 to point relative path to source dir instead of root dir.

little veldt Aug 28, 2024, 1:23 PM

#

somehow managed to mess up the linting about a couple weeks back: https://github.com/dagger/dagger/pull/8261

GitHub

chore: fix linting by jedevc · Pull Request #8261 · dagger/dagger

Fixes these issues on main: https://github.com/dagger/dagger/actions/runs/10596511181/job/29364633106, which was introduced in #8151.
Some of these lints had stopped working, since the paths had go...

silent notch Aug 28, 2024, 2:50 PM

#

fiery bison would love to get a quick approval on https://github.com/dagger/dagger/pull/8251...

Looking now.

silent notch Aug 28, 2024, 2:52 PM

#

fiery bison would love to get a quick approval on https://github.com/dagger/dagger/pull/8251...

It needs a few quick fixes, but it's 95% there. Making them now, then approving & merging.

silent notch Aug 28, 2024, 3:10 PM

#

fiery bison would love to get a quick approval on https://github.com/dagger/dagger/pull/8251...

Does this include v3, or does it mean Yarn v4 and above?

fiery bison Aug 28, 2024, 3:13 PM

#

silent notch Does this include `v3`, or does it mean Yarn v4 and above?

yarn v3 and above

#

so yep, it includes v3

little veldt Aug 28, 2024, 3:28 PM

#

i'm about half an hour from my eod, so don't have time to cut the release - if anyone else is around and wants to, they can, otherwise i'll pick it up first thing tomorrow

silent notch Aug 28, 2024, 3:44 PM

#

little veldt i'm about half an hour from my eod, so don't have time to cut the release - if a...

Thursday is a good day to release 😉

silent notch Aug 28, 2024, 3:48 PM

#

fiery bison would love to get a quick approval on https://github.com/dagger/dagger/pull/8251...

Where is this property configured?

#

Same question for

fiery bison Aug 28, 2024, 4:04 PM

#

silent notch Where is this property configured?

I explained it just above in the doc page, (it will generate a xx file)

silent notch Aug 28, 2024, 4:04 PM

#

fiery bison I explained it just above in the doc page, (it will generate a xx file)

Good to merge from my side.

fiery bison Aug 28, 2024, 4:05 PM

#

Thanks!

silent notch Aug 28, 2024, 4:05 PM

#

eager river just saw this thread and excited <:pepe_hands:717706059501928549> . Are we plann...

Unlikely to happen today, most likely tomorrow.

silent notch Aug 28, 2024, 4:05 PM

#

little veldt somehow managed to mess up the linting about a couple weeks back: https://github...

Looking at this now.

fiery bison Aug 28, 2024, 5:42 PM

#

https://github.com/dagger/dagger/pull/8236 CI is green, waiting for an approval to be merged

GitHub

feat: optimize TS SDK runtime by TomChv · Pull Request #8236 · dagg...

Reorder steps to not list all entries when configuring modules. Move install dependencies step & corepack init before mounting sources.
Add template after initializing dependencies.
It seem...

#

/cc @fluid yew

#

Btw where is your benchmark workflow? Or is there a way for me to try it?

fluid yew Aug 28, 2024, 5:48 PM

#

fiery bison /cc <@707661669819613324>

reviewed

fluid yew Aug 28, 2024, 5:48 PM

#

fiery bison Btw where is your benchmark workflow? Or is there a way for me to try it?

it runs off of main every night (or manually), not out of PRs

#

it's not doing anything fancy, just an init and then 3 dagger functions in a row:

the first one to try performance out of the box after dagger init
the second one to try caching
the third one after changing main.ts to check the performance after a code change

#

you can do the same locally and just look at a trace

prime spruce Aug 28, 2024, 7:01 PM

#

fiery bison I don't know if doc update should be part of the release but here's an update of...

Just saw this was merged, sorry that I didnt have a chance to review it until just now. I had some questions which I added in the PR. Let's discuss further in https://discord.com/channels/707636530424053791/1278425034586587300

fiery bison Aug 28, 2024, 9:02 PM

#

prime spruce Just saw this was merged, sorry that I didnt have a chance to review it until ju...

Okay

little veldt Aug 29, 2024, 11:27 AM

#

hey @fiery bison i'm going to push out https://github.com/dagger/dagger/pull/8236 into the next release

#

andrea is on holiday, and i'm not fully aware of the context of the pr to approve today - since it's not fixing a regression in the last release, i don't think there's a rush and we can wait till next week?

fiery bison Aug 29, 2024, 11:41 AM

#

Yes sure! We can keep it for the next week 😄

#

There's already plenty of changes for this release

little veldt Aug 29, 2024, 11:54 AM

#

prep pr (release notes, sdk updates, etc): https://github.com/dagger/dagger/pull/8268

#

could also get this super minor little typo fix in https://github.com/dagger/dagger/pull/8267 (mostly ci flake debugging related)

#

going for lunch now, once these are merged, i can tag and release

little veldt Aug 29, 2024, 1:14 PM

#

hm the wolfi publishing seems to have broken

#

https://github.com/dagger/dagger/actions/runs/10615424278/job/29423476460#step:8:441

#

re-running, maybe it's an infra fluke?

#

hm, nope, not a fluke

#

we're getting io errors?

tough vessel Aug 29, 2024, 1:47 PM

#

that's an odd one. maybe an infra thing? bad disk? thinkies cc @silent notch

#

(38/38) Installing go-1.23 (1.23.0-r0)
2 errors; 703 MiB in 50 packages
Stderr:
ERROR: Failed to create usr/lib/libisl.so.23.3.0: Input/output error
ERROR: isl-0.26-r4: IO ERROR
ERROR: Failed to create usr/bin/ld.gold: Input/output error
ERROR: binutils-gold-2.43.1-r0: IO ERROR

little veldt Aug 29, 2024, 1:47 PM

#

yeah, i can't repro this locally

#

wolfi container builds and runs fine

#

hm, doesn't look like we're running out of space

/dev/nvme1n1    1.8T   40G  1.7T   3% /host/var/lib/dagger

from df -h for that node

#

hm, okay, the job just passed 🤔

#

maybe the bad node is gone 👀

#

okay, i'm gonna tag main now

#

hmmm

#

no it happened again, on an entirely different note

#

node uptime is 6 minutes

#

suspiciously, this seems to only happen on the wolfi publish?

silent notch Aug 29, 2024, 2:14 PM

#

tough vessel ``` (38/38) Installing go-1.23 (1.23.0-r0) 2 errors; 703 MiB in 50 packages Stde...

That is rare, but possible. Two different nodes, highly unlikely. Which nodes are these?

little veldt Aug 29, 2024, 2:14 PM

#

perhaps there's some new update? unfortunately, does wolfi have patch notes?

#

there's a currently running job on ip-192-168-109-156.us-east-2.compute.internal

#

https://github.com/dagger/dagger/actions/runs/10616587869/job/29427944487

GitHub

chore: prep for v0.12.6 release (#8268) · dagger/dagger@5fd87a2

An engine to run your pipelines in containers. Contribute to dagger/dagger development by creating an account on GitHub.

#

job is dagger-g2-v0-12-5-16c-od-wxc9f-runner-l8hss

silent notch Aug 29, 2024, 2:16 PM

#

looking now

little veldt Aug 29, 2024, 2:17 PM

#

looking at the wolfi/os history, i see no indication of what might have changed in the last few hours 🤔

#

we have a few half done engine builds in https://github.com/dagger/dagger/pkgs/container/engine
does it make sense to delete those and delete the tag while we investigate?

silent notch Aug 29, 2024, 2:21 PM

#

yes

little veldt Aug 29, 2024, 2:21 PM

#

i don't have permissions to delete packages i don't think

silent notch Aug 29, 2024, 2:21 PM

#

yeah, that is an odd one. I don't think that Wolfi is pinned, but it should be. this may help us: https://github.com/dagger/dagger/pull/7782/files#diff-71da6ba00bc676920605e36f15ded23d1da35d117eff8c671c7dc0aa62e7b539R33-R34

#

I can see that we had issues publishing here too: https://github.com/dagger/dagger/actions/runs/10615397944/attempts/2 , and then it eventually worked.

silent notch Aug 29, 2024, 2:23 PM

#

little veldt i don't have permissions to delete packages i don't think

on it

little veldt Aug 29, 2024, 2:23 PM

#

i'll do the tag

little veldt Aug 29, 2024, 2:24 PM

#

silent notch on it

should be doable from this page i think: https://github.com/dagger/dagger/pkgs/container/engine/versions?filters[version_type]=tagged

silent notch Aug 29, 2024, 2:24 PM

#

wolfi published earlier, just not the gpu variant:

little veldt Aug 29, 2024, 2:24 PM

#

mm, but the cli didn't run at all, since it's dependent on everything succeeding

silent notch Aug 29, 2024, 2:25 PM

#

done, all those versions are now gone

little veldt Aug 29, 2024, 2:25 PM

#

awesome 🎉

#

right okay, we could try pinning wolfi, but what do we pin it to?

#

ideally we want a hash from something like yesterday

silent notch Aug 29, 2024, 2:27 PM

#

We keep all COMMIT-wolfi & COMMIT-wolfi-gpu images, so I would pick one of those.

#

checking now.

little veldt Aug 29, 2024, 2:29 PM

#

okay, so the history of c5687d86a6ba78ec2fffcb46be5caaa73f561b54-wolfi pulls in cgr.dev/chainguard/wolfi-base:latest@sha256:72c8bfed3266b2780243b144dc5151150015baf5a739edbbde53d154574f1607

silent notch Aug 29, 2024, 2:30 PM

#

dive registry.dagger.io/engine:ff17731b8ca5e2a86850fef100cb88cf8d239955-wolfi for https://github.com/dagger/dagger/commit/ff17731b8ca5e2a86850fef100cb88cf8d239955 is showing cgr.dev/chainguard/wolfi-base:latest@sha256:72c8bfed3266b2780243b144dc5151150015baf5a739edbbde53d154574f1607

little veldt Aug 29, 2024, 2:30 PM

#

nice 😄 jinx

silent notch Aug 29, 2024, 2:31 PM

#

Another thing that we should do is run this locally and confirm that latest wolfi does indeed fail

#

doing that now

little veldt Aug 29, 2024, 2:32 PM

#

dagger -m . call --source=.:default engine with-base --image=wolfi container terminal seems to work for me 😢

little veldt Aug 29, 2024, 2:33 PM

#

silent notch `dive registry.dagger.io/engine:ff17731b8ca5e2a86850fef100cb88cf8d239955-wolfi` ...

hm, this is currently what :latest points to - will look back a bit further

#

hm, yeah, this is the same version that appeared to be fine for the last commit merged yesterday

#

> docker buildx imagetools inspect --raw ghcr.io/dagger/engine:ff17731b8ca5e2a86850fef100cb88cf8d239955-wolfi@sha256:03731c12adeff0682a5a3a4ffb8f74624f1d272bc08baa3b943c84e1011275e2 | jq '.history[0]'

{
  "created": "2024-08-28T16:57:36.26278684Z",
  "created_by": "pulled from cgr.dev/chainguard/wolfi-base:latest@sha256:72c8bfed3266b2780243b144dc5151150015baf5a739edbbde53d154574f1607",
  "comment": "buildkit.exporter.image.v0"
}

#

some googling indicates that this could potentially also be a networking error

silent notch Aug 29, 2024, 2:43 PM

#

OK, so these could be both disk or network issues. While one bad disk is possible, these failed across 3 different machines, each using the local disk, which makes it very unlikely. I suspect network issues, which are usually transient.

#

yes, that is my assumption too.

little veldt Aug 29, 2024, 2:43 PM

#

potentially an issue in the wolfi registry? https://status.cgr.dev/

#

no issue reported yet though

silent notch Aug 29, 2024, 2:44 PM

#

I checked, nothing there.

#

Small blips usually go unnoticed. All systems should assume 99.9% reliability, which I suspect is the case here.

#

Want to try again?

little veldt Aug 29, 2024, 2:44 PM

#

i can tag again, yup

silent notch Aug 29, 2024, 2:45 PM

#

Running this locally too.

little veldt Aug 29, 2024, 2:45 PM

#

publishing here: https://github.com/dagger/dagger/actions/runs/10617352810/job/29429955970

silent notch Aug 29, 2024, 2:48 PM

#

watching the node via dmesg too

little veldt Aug 29, 2024, 2:49 PM

#

imo, we can take this opportunity to update this process and daggerize more - ideally we should build all the images, and then push all the images

#

i'll queue that work for tomorrow

#

looks like it's happened again 🤔

silent notch Aug 29, 2024, 2:52 PM

#

Indeed. This dmesg output makes me suspect that the networking on the AWS EC2 instances themselves is dropping:

#

more context:

[   22.945923] IPv6: ADDRCONF(NETDEV_CHANGE): enia46631b86c3: link becomes ready
[   22.948513] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   22.991230] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   23.043544] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   24.211421] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   26.797440] pci 0000:00:06.0: [1d0f:ec20] type 00 class 0x020000
[   26.799595] pci 0000:00:06.0: reg 0x10: [mem 0x00000000-0x00001fff]
[   26.801762] pci 0000:00:06.0: reg 0x14: [mem 0x00000000-0x00001fff]
[   26.803937] pci 0000:00:06.0: reg 0x18: [mem 0x00000000-0x000fffff pref]
[   26.806354] pci 0000:00:06.0: enabling Extended Tags
[   26.808626] pci 0000:00:06.0: BAR 2: assigned [mem 0xc0000000-0xc00fffff pref]
[   26.811112] pci 0000:00:06.0: BAR 0: assigned [mem 0xc0100000-0xc0101fff]
[   26.813418] pci 0000:00:06.0: BAR 1: assigned [mem 0xc0102000-0xc0103fff]
[   26.815775] ena 0000:00:06.0: enabling device (0000 -> 0002)
[   26.826648] ena 0000:00:06.0: ENA device version: 0.10
[   26.828435] ena 0000:00:06.0: ENA controller version: 0.0.1 implementation version 1
[   26.932621] ena 0000:00:06.0: Forcing large headers and decreasing maximum TX queue size to 512
[   26.938067] ena 0000:00:06.0: ENA Large LLQ is enabled
[   26.953325] ena 0000:00:06.0: Elastic Network Adapter (ENA) found at mem c0100000, mac addr 02:95:49:22:53:e7
[   27.225434] ena 0000:00:06.0 eth1: Local page cache is disabled for less than 16 channels
[   37.200361] xfs filesystem being remounted at /var/lib/kubelet/pods/0604b692-52ad-4c9b-8682-4dda5f4ef5d0/volume-subpaths/dagger-engine-config/dagger-engine/2 supports timestamps until 2038 (0x7fffffff)
[   37.311880] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[  192.113385] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[  327.349581] IPv6: ADDRCONF(NETDEV_CHANGE): eniee287a2af97: link becomes ready
[  327.353455] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

#

and the node just went away

little veldt Aug 29, 2024, 2:54 PM

#

hm, if we still suspect something ephemeral, is it worth pausing the process until tomorrow/next week?

silent notch Aug 29, 2024, 2:54 PM

#

eth0 should not change state

#

yes, that sounds reasonable. we should also be able to switch CI runners in cases like this one.

little veldt Aug 29, 2024, 2:57 PM

#

i'll delete the tag? can you delete the engine packages?

silent notch Aug 29, 2024, 2:57 PM

#

yes

#

done

little veldt Aug 29, 2024, 3:00 PM

#

tag done too

tough vessel Aug 29, 2024, 3:09 PM

#

oldmanyellsataws

little veldt Aug 29, 2024, 3:10 PM

#

little veldt imo, we can take this opportunity to update this process and daggerize more - id...

working on this now 😛

silent notch Aug 29, 2024, 3:43 PM

#

we are re-running the job which failed and watching the instance. we couldn't find any networking issues. no disk issues either. things are pointing to overlayfs. maybe a new kernel?

#

jumping in team so that others can join us if interested.

little veldt Aug 29, 2024, 4:06 PM

#

little veldt working on this now 😛

https://github.com/dagger/dagger/pull/8269

#

also our test publish github job should now attempt to build all the variants as well, so we should hopefully catch the weird EOF issue in PRs now as well if it keeps happening

coral pilot Aug 29, 2024, 4:38 PM

#

That bpfcc package installed in the sysadmin pod now comes with a ton of tracers besides just mountsnoop, can see them with ls /sbin/*-bpfcc or in the README here: https://github.com/iovisor/bcc

Suspect if we can get the right one running while that IO error is hit we might find something useful. Some that stick out as potentially related (depending on whether the IO error is coming from a filesystem syscall or network syscall): https://github.com/iovisor/bcc/blob/master/tools/opensnoop.py and https://github.com/iovisor/bcc/blob/master/tools/tcplife.py

Might be tricky with the timing but getting a random IO error while installing packages with apk is otherwise kind of a hopeless situation, so just throwing out as a possibility

silent notch Aug 29, 2024, 5:34 PM

#

coral pilot That bpfcc package installed in the sysadmin pod now comes with a ton of tracers...

I would really like to try that out. Care to show us how to use it? @scarlet nacelle is still looking into this, I am unlikely to have more time today, but would really like to circle back to this tomorrow if you're up for it. Finding 30 mins in your calendar so that we don't forget.

coral pilot Aug 29, 2024, 5:37 PM

#

Yeah happy to meet, have to go to a dr appt w/ Alice at 9:45 Pacific, so maybe 9? It's also pretty much just a matter of opening a shell and running opensnoop-bpfcc or tcplife-bpfcc and watching the output, so not a ton to go over

silent notch Aug 29, 2024, 5:39 PM

#

It's the unexpected things that we find out which I am most interested in. I am sure that we will find things to improve in the 30 mins that we get together.

coral pilot Aug 29, 2024, 8:08 PM

#

While trying to repro flakes I think I may have hit a similar-ish looking situation: https://github.com/dagger/dagger/actions/runs/10621505762/job/29443632581?pr=8203#step:3:1244

In the testdev workflow, it took like ~20m to build the dev engine (seemed to take a long time to install apk packages) and then when it was trying to get the server version got internal HTTP2 stream errors while downloading go deps (for modules)

#

So would air on this being some deep networking problem rather than disk if this is indeed the same problem manifesting elsewhere

#

I didn't catch it while it was happening so didn't get to run any of the trace tools, hopefully I can catch it on re-runs

#

(If there's more updates I'll make a separate thread so we don't overly pollute the release thread)

scarlet nacelle Aug 29, 2024, 8:32 PM

#

One of the questions we had was whether this could be related to some of the commits we added lately. We think not, but we did the following to validate: start a pod with the same wolfi image variant (cgr.dev/chainguard/wolfi-base:latest****) and run the command that is giving us the issue: apk add --no-cache git openssh pigz xz iptables ip6tables . Doing that shows the exact same failure we see on our pipelines. Doing the same thing but using a regular alpine images works every time. First screenshot is alpine, second screenshot is wolfi. Could there be some issue related to https://packages.wolfi.dev/os/x86_64/APKINDEX.tar.gz? I'm not sure. I'll run the tools suggested by @coral pilot while reproing the issue and see if something interesting pops up

coral pilot Aug 29, 2024, 8:36 PM

#

scarlet nacelle One of the questions we had was whether this could be related to some of the com...

see my messages right above, I'm seeing extremely slowness installing apk packages from alpine (not wolfi) and then weird internal networking errors from go later. If it's the same problem, then doesn't seem specific to wolfi

#

Possible it's unrelated

scarlet nacelle Aug 29, 2024, 8:37 PM

#

I'm not able to repro the slowness in alpine, at least when executing it outside of the dagger engine

#

Both screenshots above are containers running directly on the host

#

Were you able to use the bcc tools in a sysadmin pod? Getting failures at the moment

coral pilot Aug 29, 2024, 8:39 PM

#

scarlet nacelle Were you able to use the bcc tools in a sysadmin pod? Getting failures at the mo...

Yeah I just hit that too, it worked yesterday when I manually installed the package 🤷‍♂️ I figured out the right symlinks to get the headers in place, let me grab the commands from the shell history, one sec

scarlet nacelle Aug 29, 2024, 8:39 PM

#

In case its useful: right now we have a host that we are keeping around for this investigation. It won't way away until we want it to. The host is 192-168-187-242. If you want to add containers on that host, the easiest way is to do something like this: kubectl debug -n dagger-runners -it dagger-od-engines-v0-12-5-engine-lkg6j --image=cgr.dev/chainguard/wolfi-base:latest --target=dagger-engine

coral pilot Aug 29, 2024, 8:40 PM

#

mkdir /lib/modules
ln -s /host/lib/modules/5.10.223-211.872.amzn2.x86_64/ /lib/modules/5.10.223-211.872.amzn2.x86_64
ln -s /host/usr/src/kernels /usr/src/kernels

that should put the headers in the right place in the container and get those commands to work ^

coral pilot Aug 29, 2024, 8:42 PM

#

scarlet nacelle I'm not able to repro the slowness in alpine, at least when executing it outside...

To repro what I'm seeing right now, I just am pushing empty commits to a PR: https://github.com/dagger/dagger/pull/8203

Last two pushes:

First push - Extreme slowness in installing alpine apks in one of the testdev workflows, followed by internal http2 stream errors https://github.com/dagger/dagger/actions/runs/10621505762/job/29443632581?pr=8203#step:3:1244
Second push - Extreme slowness in installing alpine apks in testdev, but no failures ultimately https://github.com/dagger/dagger/actions/runs/10621995772/job/29445265710?pr=8203

#

It's inconsistent though, other testdev workflows are fine

scarlet nacelle Aug 29, 2024, 8:43 PM

#

👍. I'm 100% getting the slowness on wolfi now

#

Yeah, its working okay now

#

Trying to make it fail now with opensnoop on the side and it works every time 😆

coral pilot Aug 29, 2024, 8:56 PM

#

This potentially related issue has some good suggestions on root causes (which could all be extremely ephemeral issues, and later comments suggest the problem just went away): https://github.com/moby/buildkit/issues/746#issuecomment-447311499

GitHub

IO ERROR after enabling BuildKit · Issue #746 · moby/buildkit

We enabled buildkit in our project by adding DOCKER_BUILDKIT=1 to our builds in docker-ce, and for one build we consistently get this error with it enabled (and no errors when buildkit is disabled)...

#

None of those suggestions are actually that specific to docker afaict either, they could all happen outside docker

#

The path MTU thing mentioned in particular is something that could result in both slowness and/or bizarre errors

coral pilot Aug 29, 2024, 9:05 PM

#

scarlet nacelle Trying to make it fail now with opensnoop on the side and it works every time 😆

if you happen to get it to happen again, other things to try:

apt install -y traceroute; traceroute packages.wolfi.dev - could be helpful if this is some weird network path thing. may need to run a few times since you won't always get the same path
since you can repro sometimes with a one-off command, plain old strace might be easier than the bpf tools (those tools are mainly helpful when you don't even know what process is gonna break and just need to trace everything): strace -f --seccomp-bpf <command>

coral pilot Aug 29, 2024, 9:29 PM

#

Still getting super random network errors everytime I push to that PR, just saw a brand new one: https://github.com/dagger/dagger/actions/runs/10622859102/job/29448066758?pr=8203#step:3:57

It's a "fun" game because I need to know the node before the gh job dies otherwise I won't know which eks node to pop a shell on

coral pilot Aug 29, 2024, 10:13 PM

#

They seem to have dissipated now... I also just tried to re-run the publish job on main and wolfi built successfully: https://github.com/dagger/dagger/actions/runs/10619063519/job/29448999619

I really wouldn't be surprised if this was just a very ill-timed networking blip either with AWS or some other intermediate network that packets get commonly routed through, especially since it seemed to intermittently affect multiple endpoints besides wolfi. Path MTU issues in particular triggered memories of similar problems when I was actually working on this stuff at AWS, you can just randomly lose packets in a black hole if anywhere in the route has a misconfiguration 😵‍💫

coral pilot Aug 30, 2024, 12:11 AM

#

Separate from above, Helm CI fails everywhere right now because it's looking for a v0.12.6 engine? e.g. https://github.com/dagger/dagger/actions/runs/10624216581/job/29452098917

Side effect of starting the release today but then hitting those errors and stopping?

little veldt Aug 30, 2024, 7:30 AM

#

coral pilot Separate from above, Helm CI fails everywhere right now because it's looking for...

Yeah woops 😭 I'm gonna follow up after the release to try and get this to install the :main engine instead, like the rest of the provisioning tests 🎉

little veldt Aug 30, 2024, 9:59 AM

#

hopefully the weird wolfi errors are gone now 🤞

#

i've merged the pr to build all the variants before pushing as well, so hopefully even if it is happening, we won't have half-published packages

#

so i'm gonna go ahead and tag 🎉

#

yes #releases message

#

happy now 😄

little veldt Aug 30, 2024, 10:41 AM

#

some slight issues in the sdk automated release notes - python accidentally had the elixir ones, php tried to publish them to the wrong repo (both easy to fix, just took a bit of manual intervention, will fix for next time)

#

engine + sdks successful, published docs as well now

#

cc @daring pier @scarlet nacelle @silent notch dagger playground can now be updated 🎉

#

cc @daring pier @scarlet nacelle @leaden hollow likewise with the daggerverse

#

dagger-for-github pr here: https://github.com/dagger/dagger-for-github/pull/144 (cc @silent notch @vagrant loom)

#🚀 v0.12.6 - 28th August 2024

Function

Function