#@Win thanks for looking at the elixir

1 messages · Page 1 of 1 (latest)

raw dirge
#

Here's the relevant diff:

#

(I only included the 3 relevant files in the diff)

coral goblet
#

This is my guess. It's because we use the same cache key across 3 SDKs in the integration tests. Those tests run in SHARED mode, which is the default. When your PR comes, it makes everything run simultaneously, which causes that to fail because of cache behaviour.

#

So, I don't think the YAML configs on that PR do anything wrong.

raw dirge
#

Ah I see, i kept the same concurrency grouping logic, but it uses the workflow ID in the group

#

mmm, but the grouping key also includes the job ID no?

coral goblet
#

It seems that it uses the workflow name (github.workflow) in the concurrency.group. So, no any workflows use the same key.

raw dirge
#

OK that's the issue then. I will change to workflow + job

raw dirge
#

@coral goblet I'm working on implementation now FYI

#

I thought I understood the issue, but actually I'm not so sure

#
  • Before: one big workflow called SDK, with 41 jobs (14 SDKs x 3 jobs - 1 exception). Concurrency group is SDK-$REF
  • After: 41 workflows with 1 job each. Each workflow has its own concurrency group.
#

I suspect that the elixir SDK relies on lint,test,test-publish being called in sequence, with the same cache volume being reused. But in my PR they run in parallel, in 3 different machines

#

So I don't think the concurrency group is the issue

raw dirge
#

Update: I implemented this fix, and just pushed it. Let's see if it works! As a bonus, it simplifies both the code and the generated yaml. Also brings our job count from 41 to 30 🙂

coral goblet
raw dirge
#

Yeah I think I broke something 😭

#

Doesn't seem specific to elixir though

coral goblet
#

It pass on the next retry. 😭

raw dirge
#

OK it looks like my simplification fixed it 🙂

#

The problem was indeed that the elixir SDK needs lint->test->publish-test to be done serially, and not in parallel

coral goblet
#

Could you make it run in parallel? I think it should work because it's not related. If it's not, then I should investigating and eliminating it. ✌️

raw dirge
#

But if everything still works when I run them in parallel, then I really don't understand why it was failing in the first place 😭

coral goblet
#

Is it possible that 3 SDKs (Go, Python, Elixir) conflict each others because of the same cache key? If you see my screenshot above, those SDKs use the same cache key and mount the cache with SHARED mode.

split acorn
#

i'll split this to a separate issue

coral goblet
#

Thank you.

raw dirge
coral goblet
#

So that is weird then. I didn’t used this cache key on other places than this test. 🥲