I am still seeing builds failing wit | Dagger | Page 1

slow citrus Sep 20, 2024, 10:34 PM

#

I filed this https://github.com/dagger/dagger/issues/8524

Please add any other details you might have

GitHub

🐞 Typescript SDK - Intermittent Issues "Encountered an unknown erro...

What is the issue? When running a build with the Tyepscript SDK this error causes the entire build to fail. We have seen this happen in several different Typescript projects since 0.13 came out. Da...

bright venture Sep 20, 2024, 11:49 PM

#

No progress. But I’ve hit it

slow citrus Sep 21, 2024, 12:02 AM

#

Ive been hitting it all afternoon unfortunately, but its def not consistent. It happens to me like 30% of the time

bright venture Sep 21, 2024, 12:43 AM

#

I thought it was due to a longer running function and a timeout perhaps, but haven’t built a function with a sleep to attempt repro yet

slow citrus Sep 21, 2024, 1:12 AM

#

Do you have a sense of the order of magnitude for "longer"?

Heres a very recent example of one https://dagger.cloud/levs-test-org/traces/028defc376ba5be1c93145f7f349c6cb

The step that triggered it was only running for about 5 minutes.

#

Just happened again about 7 minutes this time https://dagger.cloud/levs-test-org/traces/8bff13ba44e63f2d7702b03955c6579a

#

Oh interesting I got a stack trace this time

Full trace at https://dagger.cloud/levs-test-org/traces/b4c15419bf49ce5834a877a24f9ac175

Error: response from query: input: medplum.buildMatrix resolve: call function "buildMatrix": process "tsx --no-deprecation --tsconfig /src/.dagger/tsconfig.json /src/.dagger/src/__dagger.entrypoint.ts" did not complete successfully: exit code: 1

Stderr:
/src/.dagger/sdk/api/utils.ts:234
    throw new UnknownDaggerError(
          ^


UnknownDaggerError: Encountered an unknown error while requesting data via graphql
    at compute (/src/.dagger/sdk/api/utils.ts:234:11)
    at computeQuery (/src/.dagger/sdk/api/utils.ts:158:10)
    at Container.stdout (/src/.dagger/sdk/api/client.gen.ts:1831:39) {
  cause: TypeError: fetch failed
      at node:internal/deps/undici/undici:12502:13
      at <anonymous> (/src/.dagger/node_modules/graphql-request/src/legacy/helpers/runRequest.ts:191:10)
      at runRequest (/src/.dagger/node_modules/graphql-request/src/legacy/helpers/runRequest.ts:72:25)
      at GraphQLClient.request (/src/.dagger/node_modules/graphql-request/src/legacy/classes/GraphQLClient.ts:131:22)
      at compute (/src/.dagger/sdk/api/utils.ts:202:20)
      at computeQuery (/src/.dagger/sdk/api/utils.ts:158:10)
      at Container.stdout (/src/.dagger/sdk/api/client.gen.ts:1831:39) {
    [cause]: HeadersTimeoutError: Headers Timeout Error
        at Timeout.onParserTimeout [as callback] (node:internal/deps/undici/undici:7569:32)
        at Timeout.onTimeout [as _onTimeout] (node:internal/deps/undici/undici:6659:17)
        at listOnTimeout (node:internal/timers:573:17)
        at process.processTimers (node:internal/timers:514:7) {
      code: 'UND_ERR_HEADERS_TIMEOUT'
    }
  },
  code: 'D101'
}

Node.js v20.15.0

#

https://dagger.cloud/levs-test-org/traces/b4c15419bf49ce5834a877a24f9ac175#183b30b67af20167

jaunty frost Sep 23, 2024, 4:30 PM

#

Ok so it definitely come from there: https://github.com/dagger/dagger/blob/f15c2996d26dbbc17bec4b6377629bdf1e934ddf/sdk/typescript/api/utils.ts#L202

So it's during the request

GitHub

dagger/sdk/typescript/api/utils.ts at f15c2996d26dbbc17bec4b6377629...

An engine to run your pipelines in containers. Contribute to dagger/dagger development by creating an account on GitHub.

bright venture Sep 23, 2024, 4:32 PM

#

cc @lime inlet 👆

jaunty frost Sep 23, 2024, 4:33 PM

#

Maybe I could upgrade the timeout: https://github.com/jasonkuhrt/graffle/issues/103 ??

GitHub

Timeout configuration · Issue #103 · jasonkuhrt/graffle

Hei, would it be possible to support configuring the timeout ?

bright venture Sep 23, 2024, 4:34 PM

#

So maybe back to a timeout with https://github.com/nodejs/undici/?

GitHub

GitHub - nodejs/undici: An HTTP/1.1 client, written from scratch fo...

An HTTP/1.1 client, written from scratch for Node.js - nodejs/undici

#

code: 'UND_ERR_HEADERS_TIMEOUT'

jaunty frost Sep 23, 2024, 4:34 PM

#

I also see that our graphql-client will have a big update with 8.0.0, I'll have some work to do convert to the new code.

#

I'll open a PR to extend the timeout

#

@bright venture I wonder what should be the timeout of the request though

#

cc @lime inlet if we have an operation that takes 10minutes to resolves, what should happen? Should I set a 30minutes timeout? I'm not sure that actually make sense

lime inlet Sep 23, 2024, 4:42 PM

#

jaunty frost cc <@949034677610643507> if we have an operation that takes 10minutes to resolve...

should work, there's no expected timeouts on requests as a whole

jaunty frost Sep 23, 2024, 4:43 PM

#

So 30minutes timeout? 😮

jaunty frost Sep 23, 2024, 5:10 PM

#

Pr is opened: https://github.com/dagger/dagger/pull/8549

GitHub

feat: extend typescript client timeout by TomChv · Pull Request #85...

lime inlet Sep 23, 2024, 9:39 PM

#

jaunty frost So 30minutes timeout? 😮

There's no expected timeouts, if a user exec is running a build that takes several hours (not implausible for certain use cases), then it shouldn't timeout

#

(will comment on the PR)

slow citrus Sep 25, 2024, 11:11 PM

#

Hey @jaunty frost sadly I am still seeing this exact same error on latest dagger

Setup tracing at https://dagger.cloud/traces/setup. To hide: export NOTHANKS=1

Error: response from query: input: medplum.buildMatrix resolve: call function "buildMatrix": process "tsx --no-deprecation --tsconfig /src/.dagger/tsconfig.json /src/.dagger/src/__dagger.entrypoint.ts" did not complete successfully: exit code: 1

Stderr:
/src/.dagger/sdk/api/utils.ts:234
    throw new UnknownDaggerError(
          ^


UnknownDaggerError: Encountered an unknown error while requesting data via graphql
    at compute (/src/.dagger/sdk/api/utils.ts:234:11)
    at computeQuery (/src/.dagger/sdk/api/utils.ts:158:10)
    at Container.stdout (/src/.dagger/sdk/api/client.gen.ts:1843:39) {
  cause: TypeError: fetch failed
      at node:internal/deps/undici/undici:12502:13
      at <anonymous> (/src/.dagger/sdk/graphql/client.ts:19:14)
      at <anonymous> (/src/.dagger/node_modules/graphql-request/src/legacy/helpers/runRequest.ts:191:10)
      at runRequest (/src/.dagger/node_modules/graphql-request/src/legacy/helpers/runRequest.ts:72:25)
      at GraphQLClient.request (/src/.dagger/node_modules/graphql-request/src/legacy/classes/GraphQLClient.ts:131:22)
      at compute (/src/.dagger/sdk/api/utils.ts:202:20)
      at computeQuery (/src/.dagger/sdk/api/utils.ts:158:10)
      at Container.stdout (/src/.dagger/sdk/api/client.gen.ts:1843:39) {
    [cause]: HeadersTimeoutError: Headers Timeout Error
        at Timeout.onParserTimeout [as callback] (node:internal/deps/undici/undici:7569:32)
        at Timeout.onTimeout [as _onTimeout] (node:internal/deps/undici/undici:6659:17)
        at listOnTimeout (node:internal/timers:573:17)
        at process.processTimers (node:internal/timers:514:7) {
      code: 'UND_ERR_HEADERS_TIMEOUT'
    }
  },
  code: 'D101'
}

Node.js v20.15.0

Running with dagger v0.13.4-010101000000-dev-2a8c3f8854b9 (registry.dagger.io/engine:) linux/amd64

This is for a step that failed in under 5 minutes so I dont think the timeout is actually the issue here.

You can see a trace inside of the terminal tab here: https://dagger.cloud/levs-test-org/traces/60677125f038f95c65c821ce8cec9987

#

Here's a one liner to reproduce the issue, but note that it does not happen consistently for me.

dagger -m https://github.com/dagger/dagger call dev with-mounted-directory --path "medplum" --source "https://github.com/levlaz/medplum#daggerize" with-workdir --path "medplum" with-exec --args "dagger,call,build-matrix"

#

since this does not happen consistently and the steps do get cached, one way to try to reproduce this is to go into a terminal and pass a different value for node-version

dagger -m https://github.com/dagger/dagger call dev with-mounted-directory --path "medplum" --source "https://github.com/levlaz/medplum#daggerize" with-workdir --path "medplum" terminal

then

dagger call build --node-version 19

That should bust the cache and get the full build to run

jaunty frost Sep 26, 2024, 8:13 AM

#

slow citrus Hey <@281874480651829250> sadly I am still seeing this exact same error on lates...

Thanks! I'm checking rn

#

https://github.com/nodejs/undici/issues/1272#issuecomment-1501147658

GitHub

Headers Timeout Error · Issue #1272 · nodejs/undici

Bug Description On our error reporting system, we see Headers Timeout Errors. HeadersTimeoutError: Headers Timeout Error at Timeout.onParserTimeout [as _onTimeout] ([NODE_MODULES]/undici/lib/client...

jaunty frost Sep 26, 2024, 8:50 AM

#

Currently testing another fix, btw I found an quicker way to test with the dagger engine:

./hack/dev

dagger -m https://github.com/levlaz/medplum@daggerize call build --node-version 19

jaunty frost Sep 26, 2024, 9:08 AM

#

@slow citrus I've hit a different error with your repro: https://dagger.cloud/Quartz/traces/d90f485ff2d457c1544662010d04133a?span=094ea0eeec4eddeb#910af4513f45ddfd

#

Trying with node 20, I want to see if I can repro it with my latest changes

jaunty frost Sep 26, 2024, 9:48 AM

#

okay with node 20 it works, I'm trying with node21

#

I also opened a PR, if you wanna give it a try: https://github.com/dagger/dagger/pull/8576

GitHub

feat: ts-sdk replace `fetch` with `node-fetch` by TomChv · Pull Req...

Following up with #8549, it seems the issue is caused by the native fetch. I'm trying to replace the client with node-fetch which is a more optimized version of fetch for server-side request.

#

We'll wait for your tests before merging it this time

jaunty frost Sep 26, 2024, 10:34 AM

#

I'm not able to repro the bug with my PR's version (node 20, 21 & 22)

jaunty frost Sep 26, 2024, 3:27 PM

#

Pr is green, waiting for your feedbacks @slow citrus

slow citrus Sep 26, 2024, 3:56 PM

#

jaunty frost Currently testing another fix, btw I found an quicker way to test with the dagge...

Thanks for this! I am trying to test this now but please do note that the problem was intermittent so its unfortunately difficult to confirm

One other thing to note I like the other approach because anyone can run it without needing to clone our repo.

For example when Marcos said "install dagger cli from main and test it out"

jaunty frost Sep 26, 2024, 4:03 PM

#

slow citrus Thanks for this! I am trying to test this now but please do note that the proble...

True!

slow citrus Sep 26, 2024, 5:18 PM

#

Hey @jaunty frost !

I am getting some other error that I was not getting before (I think the same one you saw) which feels odd :/

This entire pipeline failed intermitently but sometimes succeeded

Now it fails consistently on node 20, posible its not related to dagger but I am suspicious of that because my code has not changed

https://dagger.cloud/levs-test-org/traces/ff98dd5fba5f43c96f78bcf6c4d76f62

#

@copper orbit just FYI - in that "nested trace" above the thing fails and shows up as failed at the high level but the specific failing step appears to be green for some reason

jaunty frost Sep 26, 2024, 5:22 PM

#

slow citrus Hey <@281874480651829250> ! I am getting some other error that I was not getti...

It worked on node 20 last time I tried, btw I don't see the failing trace details, (as you mentioned to Alex)

copper orbit Sep 26, 2024, 5:23 PM

#

it's because it never saw the end of the relevant spans, for some reason

jaunty frost Sep 26, 2024, 5:23 PM

#

But no error throwed...

#

Weird

#

Can you try again @slow citrus ?

copper orbit Sep 26, 2024, 5:24 PM

#

well, the error may have been thrown, but we just never received it

jaunty frost Sep 26, 2024, 5:25 PM

#

hmm

#

Did you see anything in your terminal @slow citrus ?

slow citrus Sep 26, 2024, 5:44 PM

#

Yeah I got some stuff in terminal

      ┃ medplum-nextjs-demo:build:
      ┃ @medplum/graphiql:build: rendering chunks...
      ┃ medplum-nextjs-demo:build:    Creating an optimized production build ...
      ┃ medplum-websocket-subscriptions-demo:build: vite v5.4.5 building for produ
      ┃ ction...
      ┃ medplum-websocket-subscriptions-demo:build: transforming...
      ┃ medplum-task-demo:build: vite v5.4.5 building for production...
      ┃ medplum-task-demo:build: transforming...
      ┃ medplum-provider:build: vite v5.4.5 building for production...
      ┃ medplum-provider:build: transforming...
      ┃ medplum-live-chat-demo:build: vite v5.4.5 building for production...
      ┃ medplum-live-chat-demo:build: transforming...
      ┃ medplum-scheduling-demo:build: ✓ 6960 modules transformed.
      ✔ Container.withExec(args: ["npm", "run", "lint"]): Container! 34m28.2s
      ● Container.stdout: String! 34m28.2s
● Container.sync: ContainerID! 34m48.4s

Full trace at https://dagger.cloud/levs-test-org/traces/ff98dd5fba5f43c96f78bcf6c4d76f62

Error: response from query: Post "http://dagger/query": command [docker exec -i dagger-engine-v0.13.3 buildctl dial-stdio] has exited with exit status 137, make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=
Run 'dagger call dev with-mounted-directory with-workdir with-exec --help' for usage.

#

Hm, actually this once again feels like an error with the engine to me

#

Engine logs in case it helps

📎 message.txt

slow citrus Sep 26, 2024, 5:48 PM

#

copper orbit it's because it never saw the end of the relevant spans, for some reason

Doh yeah sorry I should have seen that huge "!" 😄

copper orbit Sep 26, 2024, 5:49 PM

#

slow citrus Yeah I got some stuff in terminal ``` ┃ medplum-nextjs-demo:build: ...

exit status 137 is odd there, is your machine running out of RAM or something? 137 is usually kill -9 so maybe the OOM killer kicking in?

slow citrus Sep 26, 2024, 5:51 PM

#

Yeah the engine seems to be dying 😦

#

@jaunty frost something crazy is happening because the thing where the parallel builds were not working now seem to be working!

My build are happening concurrently and I am running into the same type of issues that motion used to complain about where CPU and memory spikes locally for no apparent reason (its really just running npm build... installing some dependencies)

jaunty frost Sep 26, 2024, 6:31 PM

#

slow citrus <@281874480651829250> something crazy is happening because the thing where the p...

Lol so wait is my fixes working or not? haha

jaunty frost Sep 26, 2024, 6:32 PM

#

slow citrus Yeah I got some stuff in terminal ``` ┃ medplum-nextjs-demo:build: ...

Yup looks like you should kill/restart your engine, not a TS issue there

slow citrus Sep 26, 2024, 6:35 PM

#

I did try to restart the engine but still running into issues

I have no evidence, but I still think this is broken.

The graphql errors really feel to me like a misdirection and this is finally starting to show the real underlying issue (even though its not clear what it might be)

I would still like to try a build with that rolled back version of this library to rule that out as an issue if its possible

jaunty frost Sep 26, 2024, 6:49 PM

#

Okay, will open up a PR tomorrow

slow citrus Sep 27, 2024, 12:00 AM

#

@jaunty frost FYI i ran a bunch of builds and was not able to reproduce the issue, so I would feel comfortable merging your most recent PR, its not any worse than the current state 🙂

However, ill leave it up to you to decide if you want to roll back like you said and wait for the next iteration of that library.

I am having some strange issues with concurrency but I dont think those have anything to do with this issue.

jaunty frost Sep 27, 2024, 8:02 AM

#

slow citrus <@281874480651829250> FYI i ran a bunch of builds and was not able to reproduce ...

Hmmm do we have another way to repro that issue, with another module to confirm that it actually fixes our issue.

slow citrus Sep 27, 2024, 5:03 PM

#

I asked this person to help because they seemed to be running into this issue more consistently #1288358859190308874 message

#

@bright venture do you have a project you could test with too?

#I am still seeing builds failing wit