Hello! The first step is to upgrade, we' | Dagger | Page 1

tight blade Feb 17, 2026, 6:01 PM

#

Hi, I just upgraded 4 days ago so the python sdk init is on the latest release and I tried to relaunch some commands in local

weak scaffold Feb 17, 2026, 6:02 PM

#

OK - so it's still slow then 😅 sorry about that

#

copying <@&946480760016207902> since performance is something we all care about

#

@tight blade next question: is it also very slow on warm cache? Or only on cold cache?

In other words what's the time difference between 1st and 2nd run

tight blade Feb 17, 2026, 6:03 PM

#

yes I can still observe the 30-40s between the cli is connected to the engine and seeing my first commands / actions from my dagger functions

#

warm cache is fixing the issue but when I tried it was not working in our case because we have different modules for our monorepo so if I remember well it's avoiding to have the sdk in cache

#

if it can be easier to understand I can try to do a kind of ascii schema in order to represent our dagger modules organizations

weak scaffold Feb 17, 2026, 6:06 PM

#

tight blade if it can be easier to understand I can try to do a kind of ascii schema in orde...

Sure that would be great. You say "avoiding to have the sdk in cache" -> you mean preventing? Or is your system intentionally not keeping sdk in cache?

tight blade Feb 17, 2026, 6:06 PM

#

yes sorry I mean preventing

weak scaffold Feb 17, 2026, 6:08 PM

#

As long as you're loading the same version of the SDK from all your modules, there's no reason for the SDK to not be cached.

If your engine is running many different kinds of modules, at high volume, you could have a situation where the different modules evict each other from the cache more rapidly

tight blade Feb 17, 2026, 6:13 PM

#

if I have a gha pipeline and this one trigger 5 pipelines and all these pipelines are using the python sdk, there is a way to accelerate the sdk initialization ?

weak scaffold Feb 17, 2026, 6:15 PM

#

tight blade if I have a gha pipeline and this one trigger 5 pipelines and all these pipeline...

Yes, normally in this situation Dagger will cache the python sdk loading for all 5 pipelines. But of course the cache has to be available on the next run - which depends on your CI infra configuration

tight blade Feb 17, 2026, 6:17 PM

#

yes in my case nodes are ephemeral but maybe I could have something for warming the sdk but I think I tried that few month ago and the result of the warm part was working only at the module level of my repo but not all cross modules

#

I generate that if it can help to understand

📎 dagger_overview.txt

weak scaffold Feb 17, 2026, 6:18 PM

#

tight blade yes in my case nodes are ephemeral but maybe I could have something for warming ...

warming is only useful if the cache is persisted

tight blade Feb 17, 2026, 6:21 PM

#

yes I was thinking if it's better to initalize the sdk one time at the boot instead of X per jobs

#

because sometime we can have 5-6 pipelines per nodes after that I think karpenter is booting a new node

weak scaffold Feb 17, 2026, 6:24 PM

#

The answer to that depends on your CI infra config, so I don't know

#

Looks like your real problem is lack of cache persistence

tight blade Feb 17, 2026, 6:27 PM

#

ok so without cache this kind of waiting time, is normal ? The only things to do is to try to find a way to cache ?
Actually we are running in EKS using the local nvme of the instance

#

do you have something to recommend or explore in order to reduce the init time ?

fossil owl Feb 17, 2026, 6:30 PM

#

tight blade ok so without cache this kind of waiting time, is normal ? The only things to do...

do you have a dagger.cloud trace link for this run or a similar one?

tight blade Feb 17, 2026, 6:31 PM

#

No I can try to add a personal token but actually we are not using dagger cloud in the company

weak scaffold Feb 17, 2026, 6:31 PM

#

One option is to use the dang scripting language instead of Python. Dang is optimized for speed, there is no codegen phase, and no third party tools to install and configure. If your modules are only orchestrating dagger API calls, and not relying on native python libraries, then it's pretty easy to switch (and you can switch one module at a time)

It won't solve your caching problem, but it will remove the python sdk loading time

fossil owl Feb 17, 2026, 6:32 PM

#

tight blade No I can try to add a personal token but actually we are not using dagger cloud ...

that would help quite a bit since it would show more definitively what specifically is slow, it's hard to tell from just that output in the screenshot

weak scaffold Feb 17, 2026, 6:33 PM

#

Also @tight blade Dagger Cloud has experimental engine hosting, with automatic scale-out and cache persistence. If you want to evaluate that, we can give you early access (we use it in prod for our own CI)

tight blade Feb 17, 2026, 6:37 PM

#

for dang it seems difficult because sometime we are using native python libraries or have some code logic and the company is developing in python
@fossil owl I will set a token temporally in order to have some traces and share something with more information
@weak scaffold yes it could be something as we were thinking to move from github action due to some limitation in the worfklow design possibilities (just maybe I need to have an idea of the future cost more or less and have an idea wha I have to change between my current setup and using the engine hosting)

weak scaffold Feb 17, 2026, 6:41 PM

#

tight blade for dang it seems difficult because sometime we are using native python librarie...

FYI you can also use Dagger Cloud as a complete Github Actions replacement. End-to-end dagger-native CI infra, from git event to check execution. It's an extra layer on top of cloud-hosted engines.

How about we give you a demo, and we can talk about your use case at the same time? I'll DM you a scheduling link

tight blade Feb 17, 2026, 6:46 PM

#

Ok for me

tight blade Feb 17, 2026, 7:44 PM

#

@fossil owl I have a little issue in my gha workflow I don't know why the dagger token is not available in all my steps, I have to go I will investigate on that tomorrow and share some links

tight blade Feb 18, 2026, 1:48 PM

#

Ok now I have traces and looking at the details if I'm correctly reading the trace the issue is not on the sdk init

#

I have one trace here: https://dagger.cloud/Dudesons/traces/72f14068730a442f4499f7b2e812f7ff?listen=5c76e0c88cf9a2b7 (everything seems good at least at the beginning)

Dagger Cloud

Browse and visualize Dagger traces.

#

but from the GHA UI I still see an important delay

#

#

line 28->29

#

I just realized maybe I misread the github action output where all dots printed are commands / processing done between line 28->29 and just the engine is working on different actions like installing packages, setting en vars, etc ...

dusty grotto Feb 18, 2026, 4:07 PM

#

tight blade I just realized maybe I misread the github action output where all dots printed ...

yeah I wonder if it's just that there are no logs to print during that time period, and GHA doesn't print any dots until the line completes? (i.e. their UI doesn't flush on ..... - only on .....\n)

fossil owl Feb 18, 2026, 5:16 PM

#

tight blade I just realized maybe I misread the github action output where all dots printed ...

Yeah I don't believe this is the SDK loading being slow, the load module . step took 1.9s there. Seems like almost all the time is spent running your actual code, this pytest step https://dagger.cloud/Dudesons/traces/72f14068730a442f4499f7b2e812f7ff?listen=5c76e0c88cf9a2b7&listen=385225002e642a75&listen=1ec49988d9fab757#1ec49988d9fab757

Dagger Cloud

Browse and visualize Dagger traces.

#

But I do see what you're saying about the various withEnv steps before that taking quite a bit, ~~though it seems like each of those are bottlenecked by various uv sync calls~~ actually that's just one of them, others don't seem to do much and still take 3s

#

So that overhead might not be the SDK loading per-se, but just the overhead of invoking a python module function at runtime

#

cc @twin wren @snow zodiac I have a vague memory of talking about how the python SDK re-does a bunch of work every time it gets invoked and that we there were theories on how to fix it? I could be totally misremembering though

snow zodiac Feb 18, 2026, 5:25 PM

#

fossil owl cc <@281874480651829250> <@809456513298464798> I have a vague memory of talking ...

The python SDK is analyzing the code at runtime yes. Contrary to Go for instance where we are generating the entrypoint during codegen, meaning all the future calls will use the generated code instead of doing it again. I don't know if we want to go that way for python, but that could be something interesting.

twin wren Feb 18, 2026, 5:27 PM

#

That's quite confusing because the traces shows that the job took 5m28 with the test function taking 5m20.
So is there some time that isn't recorded?

As I understand from the thread, the python sdk loading takes too long but based on the traces it's only 2seconds

#

Something very weird that I'm seeing in the trace is this:

fossil owl Feb 18, 2026, 5:29 PM

#

twin wren That's quite confusing because the traces shows that the job took 5m28 with the...

As I understand from the thread, the python sdk loading takes too long but based on the traces it's only 2seconds
Yes the trace revealed that the python SDK loading is not slow, it's all cached. I'm asking about the withEnv steps where it seems like potentially not much is happening but they all take 3s each anyways

twin wren Feb 18, 2026, 5:29 PM

#

Why the load sdk runtime keeps appearings?

fossil owl Feb 18, 2026, 5:29 PM

#

twin wren Why the `load sdk runtime` keeps appearings?

because of lazy loading

#

it is cached and doesn't do anything, but it hits that codepath each time

twin wren Feb 18, 2026, 5:29 PM

#

Ohhh okay make sense

fossil owl Feb 18, 2026, 5:31 PM

#

@tight blade do those withEnv steps do something expensive directly in your code (i.e. not calling dagger APIs but instead calling some python library or similar)? Just wondering the 3s overhead is that vs. just the python function being slow to invoke

tight blade Feb 18, 2026, 5:37 PM

#

yes I spot the WithEnv call which take lot of time it's function from our internal python module:

@function
    async def with_env(
            self,
            name: Annotated[str, Doc("the targeted container")],
            value: Annotated[str, Doc("the targeted container")] | None,
            value_secret: Annotated[dagger.Secret, Doc("the targeted container")] | None,
    ) -> Self:
        """Add environment variable for a container"""
        if not value and not value_secret:
            raise Exception("value or value_secret should be set")

        if value:
            self._ctr = (
                self.
                _ctr.
                with_env_variable(name, value)
            )
        else:
            self._ctr = (
                self.
                _ctr.
                with_secret_variable(name, value_secret)
            )

        return self

#

and we where using like that in apps pipeline:

ctr = await (
            dag.
            python(
                self.pipeline_id,
                self.source,
                base_workdir="/applications",
                sub_path=self._app_name,
                worskpace_pyproject=self.worskpace_pyproject,
                worskpace_uv_lock=self.worskpace_uv_lock,
                libraries=self.libraries_source,
                build_system_packages = self.__SYSTEM_PACKAGES
            ).
            with_discover_python_version().
            install().
            with_env(name="MONGO_MAIN_URI", value_secret=mongo.uri()).
            with_env(name="CORE_ROUTE_BASE", value="http://core-api.service:9000/").
            with_env(name="CATALOG_APP_URL", value="http://catalog-app-url").
            with_env(name="CASHER_HOST", value="http://casher_url").
            with_env(name="INDUS_ROUTE_BASE", value="http://core-api.service:9000/indus").
            with_env(name="CORE_API_ENV_NAME", value="some-env").
            with_env(name="CORE_API_SERVICE_NAME", value="some-service").
            with_env(name="CORE_API_SERVICE_VERSION", value="some-version").
            with_env(name="DISABLE_AWS_AUTHENTICATION", value="1").
            container().
            sync()
        )

#

now I change it a new call in our python module:

@function
    async def with_envs(
            self,
            source: Annotated[dagger.EnvFile, Doc("Collection of environment variables to set")],
    ) -> Self:
        """Add multiple environment variables at once from an EnvFile.

        This is more efficient than calling with_env() multiple times,
        as it applies all variables in a single operation.
        """
        self._ctr = (
            self.
            _ctr.
            with_env_file_variables(source)
        )

        return self

and the pipeline look like that:

ctr = await (
            dag.
            python(
                self.pipeline_id,
                self.source,
                base_workdir="/applications",
                sub_path=self._app_name,
                worskpace_pyproject=self.worskpace_pyproject,
                worskpace_uv_lock=self.worskpace_uv_lock,
                libraries=self.libraries_source,
                build_system_packages=self.__SYSTEM_PACKAGES,
                python_version=python_version.strip()
            ).
            install().
            with_envs(
                dag.
                env_file().
                with_variable("CORE_ROUTE_BASE", "http://core-api.service:9000/").
                with_variable("CATALOG_APP_URL", "http://catalog-app-url").
                with_variable("CASHER_HOST", "http://casher_url").
                with_variable("INDUS_ROUTE_BASE", "http://core-api.service:9000/indus").
                with_variable("CORE_API_ENV_NAME", "some-env").
                with_variable("CORE_API_SERVICE_NAME", "some-service").
                with_variable("CORE_API_SERVICE_VERSION", "some-version").
                with_variable("DISABLE_AWS_AUTHENTICATION", "1").
                with_variable("MONGO_MAIN_URI", await mongo.uri().plaintext())
            ).
            container().
            sync()
        )

#

I have this trace: https://dagger.cloud/Dudesons/traces/0a3484164d363b2680c345d013b58637?listen=0d086806be80f5ed&listen=1b395ece412ee816

Dagger Cloud

Browse and visualize Dagger traces.

#

where I can see this is reducing the time on env var setup

#

but to be honnest I don't really undestand why my initial function was doing a side effect like that

fossil owl Feb 18, 2026, 5:43 PM

#

tight blade I have this trace: https://dagger.cloud/Dudesons/traces/0a3484164d363b2680c345d0...

Okay nice, yeah that's a good workaround. Now it's just that single .withEnvs taking ~3s rather than a bunch of individual ones taking ~3s.

The fact that the combined step also takes 3s is pretty good evidence that it is indeed just inherent overhead of python functions. So not the loading step but the actual runtime invocation overhead.

Like we were discussing above there's definitely some improvements we can make in this area, but in the meantime I think your workaround is good

tight blade Feb 18, 2026, 5:46 PM

#

ok there is somewhere in the documentation or issue where I can find some overhead like that in the python sdk in order to review our internal modules ?
Because previously I was developing module in golang but in the actual where I'm working people are coding in python so I'm trying to stay on the python sdk

fossil owl Feb 18, 2026, 5:48 PM

#

tight blade ok there is somewhere in the documentation or issue where I can find some overhe...

Not currently, sorry, it's mostly an implementation detail at this point and one that will improve as we get bandwidth to address it

tight blade Feb 18, 2026, 5:53 PM

#

ok no problem it was sure to be sure.
So the idea actually is to reduce the number of chaining functions on my custom modules in python or I keep like I have and sometime I do some workaround like I did ?

#Hello! The first step is to upgrade, we'