#I am still seeing builds failing wit

1 messages ยท Page 1 of 1 (latest)

slow citrus
bright venture
#

No progress. But Iโ€™ve hit it

slow citrus
#

Ive been hitting it all afternoon unfortunately, but its def not consistent. It happens to me like 30% of the time

bright venture
#

I thought it was due to a longer running function and a timeout perhaps, but havenโ€™t built a function with a sleep to attempt repro yet

slow citrus
#

Oh interesting I got a stack trace this time

Full trace at https://dagger.cloud/levs-test-org/traces/b4c15419bf49ce5834a877a24f9ac175

Error: response from query: input: medplum.buildMatrix resolve: call function "buildMatrix": process "tsx --no-deprecation --tsconfig /src/.dagger/tsconfig.json /src/.dagger/src/__dagger.entrypoint.ts" did not complete successfully: exit code: 1

Stderr:
/src/.dagger/sdk/api/utils.ts:234
    throw new UnknownDaggerError(
          ^


UnknownDaggerError: Encountered an unknown error while requesting data via graphql
    at compute (/src/.dagger/sdk/api/utils.ts:234:11)
    at computeQuery (/src/.dagger/sdk/api/utils.ts:158:10)
    at Container.stdout (/src/.dagger/sdk/api/client.gen.ts:1831:39) {
  cause: TypeError: fetch failed
      at node:internal/deps/undici/undici:12502:13
      at <anonymous> (/src/.dagger/node_modules/graphql-request/src/legacy/helpers/runRequest.ts:191:10)
      at runRequest (/src/.dagger/node_modules/graphql-request/src/legacy/helpers/runRequest.ts:72:25)
      at GraphQLClient.request (/src/.dagger/node_modules/graphql-request/src/legacy/classes/GraphQLClient.ts:131:22)
      at compute (/src/.dagger/sdk/api/utils.ts:202:20)
      at computeQuery (/src/.dagger/sdk/api/utils.ts:158:10)
      at Container.stdout (/src/.dagger/sdk/api/client.gen.ts:1831:39) {
    [cause]: HeadersTimeoutError: Headers Timeout Error
        at Timeout.onParserTimeout [as callback] (node:internal/deps/undici/undici:7569:32)
        at Timeout.onTimeout [as _onTimeout] (node:internal/deps/undici/undici:6659:17)
        at listOnTimeout (node:internal/timers:573:17)
        at process.processTimers (node:internal/timers:514:7) {
      code: 'UND_ERR_HEADERS_TIMEOUT'
    }
  },
  code: 'D101'
}

Node.js v20.15.0
jaunty frost
bright venture
#

cc @lime inlet ๐Ÿ‘†

jaunty frost
bright venture
#

code: 'UND_ERR_HEADERS_TIMEOUT'

jaunty frost
#

I also see that our graphql-client will have a big update with 8.0.0, I'll have some work to do convert to the new code.

#

I'll open a PR to extend the timeout

#

@bright venture I wonder what should be the timeout of the request though

#

cc @lime inlet if we have an operation that takes 10minutes to resolves, what should happen? Should I set a 30minutes timeout? I'm not sure that actually make sense

lime inlet
jaunty frost
#

So 30minutes timeout? ๐Ÿ˜ฎ

lime inlet
#

(will comment on the PR)

slow citrus
#

Hey @jaunty frost sadly I am still seeing this exact same error on latest dagger

Setup tracing at https://dagger.cloud/traces/setup. To hide: export NOTHANKS=1

Error: response from query: input: medplum.buildMatrix resolve: call function "buildMatrix": process "tsx --no-deprecation --tsconfig /src/.dagger/tsconfig.json /src/.dagger/src/__dagger.entrypoint.ts" did not complete successfully: exit code: 1

Stderr:
/src/.dagger/sdk/api/utils.ts:234
    throw new UnknownDaggerError(
          ^


UnknownDaggerError: Encountered an unknown error while requesting data via graphql
    at compute (/src/.dagger/sdk/api/utils.ts:234:11)
    at computeQuery (/src/.dagger/sdk/api/utils.ts:158:10)
    at Container.stdout (/src/.dagger/sdk/api/client.gen.ts:1843:39) {
  cause: TypeError: fetch failed
      at node:internal/deps/undici/undici:12502:13
      at <anonymous> (/src/.dagger/sdk/graphql/client.ts:19:14)
      at <anonymous> (/src/.dagger/node_modules/graphql-request/src/legacy/helpers/runRequest.ts:191:10)
      at runRequest (/src/.dagger/node_modules/graphql-request/src/legacy/helpers/runRequest.ts:72:25)
      at GraphQLClient.request (/src/.dagger/node_modules/graphql-request/src/legacy/classes/GraphQLClient.ts:131:22)
      at compute (/src/.dagger/sdk/api/utils.ts:202:20)
      at computeQuery (/src/.dagger/sdk/api/utils.ts:158:10)
      at Container.stdout (/src/.dagger/sdk/api/client.gen.ts:1843:39) {
    [cause]: HeadersTimeoutError: Headers Timeout Error
        at Timeout.onParserTimeout [as callback] (node:internal/deps/undici/undici:7569:32)
        at Timeout.onTimeout [as _onTimeout] (node:internal/deps/undici/undici:6659:17)
        at listOnTimeout (node:internal/timers:573:17)
        at process.processTimers (node:internal/timers:514:7) {
      code: 'UND_ERR_HEADERS_TIMEOUT'
    }
  },
  code: 'D101'
}

Node.js v20.15.0

Running with dagger v0.13.4-010101000000-dev-2a8c3f8854b9 (registry.dagger.io/engine:) linux/amd64

This is for a step that failed in under 5 minutes so I dont think the timeout is actually the issue here.

You can see a trace inside of the terminal tab here: https://dagger.cloud/levs-test-org/traces/60677125f038f95c65c821ce8cec9987

#

Here's a one liner to reproduce the issue, but note that it does not happen consistently for me.

dagger -m https://github.com/dagger/dagger call dev with-mounted-directory --path "medplum" --source "https://github.com/levlaz/medplum#daggerize" with-workdir --path "medplum" with-exec --args "dagger,call,build-matrix"
#

since this does not happen consistently and the steps do get cached, one way to try to reproduce this is to go into a terminal and pass a different value for node-version

dagger -m https://github.com/dagger/dagger call dev with-mounted-directory --path "medplum" --source "https://github.com/levlaz/medplum#daggerize" with-workdir --path "medplum" terminal

then

dagger call build --node-version 19

That should bust the cache and get the full build to run

jaunty frost
jaunty frost
#

Currently testing another fix, btw I found an quicker way to test with the dagger engine:

./hack/dev

dagger -m https://github.com/levlaz/medplum@daggerize call build --node-version 19 
jaunty frost
#

Trying with node 20, I want to see if I can repro it with my latest changes

jaunty frost
#

okay with node 20 it works, I'm trying with node21

#

We'll wait for your tests before merging it this time

jaunty frost
#

I'm not able to repro the bug with my PR's version (node 20, 21 & 22)

jaunty frost
#

Pr is green, waiting for your feedbacks @slow citrus

slow citrus
slow citrus
#

Hey @jaunty frost !

I am getting some other error that I was not getting before (I think the same one you saw) which feels odd :/

This entire pipeline failed intermitently but sometimes succeeded

Now it fails consistently on node 20, posible its not related to dagger but I am suspicious of that because my code has not changed

https://dagger.cloud/levs-test-org/traces/ff98dd5fba5f43c96f78bcf6c4d76f62

#

@copper orbit just FYI - in that "nested trace" above the thing fails and shows up as failed at the high level but the specific failing step appears to be green for some reason

jaunty frost
copper orbit
#

it's because it never saw the end of the relevant spans, for some reason

jaunty frost
#

But no error throwed...

#

Weird

#

Can you try again @slow citrus ?

copper orbit
#

well, the error may have been thrown, but we just never received it

jaunty frost
#

hmm

#

Did you see anything in your terminal @slow citrus ?

slow citrus
#

Yeah I got some stuff in terminal

      โ”ƒ medplum-nextjs-demo:build:
      โ”ƒ @medplum/graphiql:build: rendering chunks...
      โ”ƒ medplum-nextjs-demo:build:    Creating an optimized production build ...
      โ”ƒ medplum-websocket-subscriptions-demo:build: vite v5.4.5 building for produ
      โ”ƒ ction...
      โ”ƒ medplum-websocket-subscriptions-demo:build: transforming...
      โ”ƒ medplum-task-demo:build: vite v5.4.5 building for production...
      โ”ƒ medplum-task-demo:build: transforming...
      โ”ƒ medplum-provider:build: vite v5.4.5 building for production...
      โ”ƒ medplum-provider:build: transforming...
      โ”ƒ medplum-live-chat-demo:build: vite v5.4.5 building for production...
      โ”ƒ medplum-live-chat-demo:build: transforming...
      โ”ƒ medplum-scheduling-demo:build: โœ“ 6960 modules transformed.
      โœ” Container.withExec(args: ["npm", "run", "lint"]): Container! 34m28.2s
      โ— Container.stdout: String! 34m28.2s
โ— Container.sync: ContainerID! 34m48.4s

Full trace at https://dagger.cloud/levs-test-org/traces/ff98dd5fba5f43c96f78bcf6c4d76f62

Error: response from query: Post "http://dagger/query": command [docker exec -i dagger-engine-v0.13.3 buildctl dial-stdio] has exited with exit status 137, make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=
Run 'dagger call dev with-mounted-directory with-workdir with-exec --help' for usage.
#

Hm, actually this once again feels like an error with the engine to me

slow citrus
copper orbit
slow citrus
#

Yeah the engine seems to be dying ๐Ÿ˜ฆ

#

@jaunty frost something crazy is happening because the thing where the parallel builds were not working now seem to be working!

My build are happening concurrently and I am running into the same type of issues that motion used to complain about where CPU and memory spikes locally for no apparent reason (its really just running npm build... installing some dependencies)

jaunty frost
jaunty frost
slow citrus
#

I did try to restart the engine but still running into issues

I have no evidence, but I still think this is broken.

The graphql errors really feel to me like a misdirection and this is finally starting to show the real underlying issue (even though its not clear what it might be)

I would still like to try a build with that rolled back version of this library to rule that out as an issue if its possible

jaunty frost
#

Okay, will open up a PR tomorrow

slow citrus
#

@jaunty frost FYI i ran a bunch of builds and was not able to reproduce the issue, so I would feel comfortable merging your most recent PR, its not any worse than the current state ๐Ÿ™‚

However, ill leave it up to you to decide if you want to roll back like you said and wait for the next iteration of that library.

I am having some strange issues with concurrency but I dont think those have anything to do with this issue.

jaunty frost
slow citrus
#

@bright venture do you have a project you could test with too?