#Slowness with nested modules

1 messages · Page 1 of 1 (latest)

sage barn
#

Hello,

I am trying to debug some slowness with respect to our deployed instance of a dagger engine. It seems that when we are using modules within modules (..within modules), there is quite a substantial overhead impacting the overall build speeds. Even when everything is cached (simply replaying the build without making any changes to source code).

In the attached example, we are building a react app using some submodules related to getting git information, standard naming conventions module, aws cli module and email sending module. Then a root module BuildAndDeployReactApp, which leverages the submodules.

Based on what I can see, is it possible that the telemetry sdk is impacting performance of the dagger operations? Is it possible to disable completely just so I can verify?

Some more info:
Dagger CLI: v0.15.2 (provisioned on a 8vCPU and 16GB ECS Fargate Task)
Engine version: v0.15.2 (Provisioned on 10 vCPU and 30 GB Memory EC2 Task with docker volume mount to /var/lib/dagger, startup command --addr 0.0.0.0:50051)
Modules: All modules are written with the python SDK

The dagger cli task connects to the engine via a Network Load Balancer connecting to the engine port on 50051

Let me know if I can provide anything else!
Thanks,

sage barn
#

Adding some updates: I have tested the engine locally with my cli locally (with Otel) and did not see any perceived slowness. The issue could be a result of the cli and engine not being on the same host?

shadow chasm
# sage barn Adding some updates: I have tested the engine locally with my cli locally (with ...

I was going to ask you if you saw the same locally...

A few follow ups...

  1. Are the ECS Fargate and EC2 tasks running in the same AZs?
  2. Is it possible to run the CLI task on the EC2 instance to eliminate the network?

The awsvpc is a weird configuration for Fargate... its "in" your VPC but not really "in" your VPC since its running on the AWS managed machine

https://www.ecsworkshop.com/ecs_networking/awsvpc/

#

Also, I deployed Dagger Engine in almost the same manner - one thing I did notice was random slowdowns I never got to the bottom of.

I did notice the following:

  1. EC2 usage never peaked_ above 50 percent CPU
  2. We had dagger engine running in an EC2 instance with Fedora CoreOS (so no ECS EC2 just a straight systemd unit) and only had one EC2 handling multiple builds/requests
  3. The slowdown appeared to occur often when more that one ECS Fargate CLI task was running against the single EC2 engine
sage barn
#

Hey Jason, Thanks for your reply -

  • I have noticed the slowdown when multiple clients are connected as well.
  • I am also not seeing massive CPU usage (attached)

Currently it is not guaranteed that they run in the same AZ, within our infra, workloads are distributed across 3 AZs.

Honestly, that is not a bad idea, I can configure our Jenkins ECS agents to run on an overprovisioned EC2 with the Dagger Engine already running so that they can all share a cache.

All this is really just a workaround until we have a shared cache available between engines, either using a container registry backend, or if Dagger cloud can be self-hosted.

Do you have any other pointers or reference architectures that has worked well for you for running workloads using dagger in AWS?

sage barn
#

@shadow chasm After connecting my local to the remote dagger engine I see similar results. A static overhead cost of ~3s. So I don't think it is Fargate related. It just seems that for any dagger module call and for each sub-module call, there is quite a large static overhead. I think we will be forced to rewrite the modules to remove as many dependencies as possible.

shadow chasm