@Win thanks for looking at the elixir | Dagger | Page 1

raw dirge Sep 3, 2024, 5:40 PM

#

Here's the relevant diff:

#

📎 message.txt

#

(I only included the 3 relevant files in the diff)

coral goblet Sep 3, 2024, 5:47 PM

#

This is my guess. It's because we use the same cache key across 3 SDKs in the integration tests. Those tests run in SHARED mode, which is the default. When your PR comes, it makes everything run simultaneously, which causes that to fail because of cache behaviour.

#

So, I don't think the YAML configs on that PR do anything wrong.

raw dirge Sep 3, 2024, 6:06 PM

#

Ah I see, i kept the same concurrency grouping logic, but it uses the workflow ID in the group

#

mmm, but the grouping key also includes the job ID no?

coral goblet Sep 3, 2024, 6:25 PM

#

It seems that it uses the workflow name (github.workflow) in the concurrency.group. So, no any workflows use the same key.

raw dirge Sep 3, 2024, 6:27 PM

#

OK that's the issue then. I will change to workflow + job

raw dirge Sep 3, 2024, 9:45 PM

#

@coral goblet I'm working on implementation now FYI

#

I thought I understood the issue, but actually I'm not so sure

#

Before: one big workflow called SDK, with 41 jobs (14 SDKs x 3 jobs - 1 exception). Concurrency group is SDK-$REF
After: 41 workflows with 1 job each. Each workflow has its own concurrency group.

#

I suspect that the elixir SDK relies on lint,test,test-publish being called in sequence, with the same cache volume being reused. But in my PR they run in parallel, in 3 different machines

#

So I don't think the concurrency group is the issue

raw dirge Sep 4, 2024, 12:18 AM

#

Update: I implemented this fix, and just pushed it. Let's see if it works! As a bonus, it simplifies both the code and the generated yaml. Also brings our job count from 41 to 30 🙂

coral goblet Sep 4, 2024, 1:10 AM

#

This error is new to me

https://github.com/dagger/dagger/actions/runs/10692693511/job/29641584369?pr=8241

Happens during start the engine and fail, bring the rest of workflow cancelled.

GitHub

chore: generate CI workflows · dagger/dagger@910e1c2

An engine to run your pipelines in containers. Contribute to dagger/dagger development by creating an account on GitHub.

raw dirge Sep 4, 2024, 1:48 AM

#

Yeah I think I broke something 😭

#

Doesn't seem specific to elixir though

coral goblet Sep 4, 2024, 3:12 AM

#

It pass on the next retry. 😭

raw dirge Sep 4, 2024, 7:21 AM

#

OK it looks like my simplification fixed it 🙂

#

The problem was indeed that the elixir SDK needs lint->test->publish-test to be done serially, and not in parallel

coral goblet Sep 4, 2024, 7:29 AM

#

Could you make it run in parallel? I think it should work because it's not related. If it's not, then I should investigating and eliminating it. ✌️

raw dirge Sep 4, 2024, 7:35 AM

#

But if everything still works when I run them in parallel, then I really don't understand why it was failing in the first place 😭

coral goblet Sep 4, 2024, 7:48 AM

#

Is it possible that 3 SDKs (Go, Python, Elixir) conflict each others because of the same cache key? If you see my screenshot above, those SDKs use the same cache key and mount the cache with SHARED mode.

split acorn Sep 4, 2024, 8:57 AM

#

coral goblet This error is new to me https://github.com/dagger/dagger/actions/runs/10692693...

mmmmmm i have seen this one somewhere before, something internal going quite wrong there

#

i'll split this to a separate issue

#

https://github.com/dagger/dagger/issues/8330

GitHub

Query failed with `evaluating released result` · Issue #8330 · dagg...

Seen last in https://github.com/dagger/dagger/actions/runs/10692693511/job/29641584369?pr=8241 (but I also do think I've seen this flake out before): Stdout: marshal: json: error calling Marsha...

coral goblet Sep 4, 2024, 10:28 AM

#

Thank you.

raw dirge Sep 4, 2024, 11:17 AM

#

coral goblet Is it possible that 3 SDKs (Go, Python, Elixir) conflict each others because of ...

I don't think so because they are running on different machines (1 job = 1 machine)

coral goblet Sep 4, 2024, 11:49 AM

#

So that is weird then. I didn’t used this cache key on other places than this test. 🥲

#@Win thanks for looking at the elixir