#Hello! The first step is to upgrade, we'

1 messages ยท Page 1 of 1 (latest)

tight blade
#

Hi, I just upgraded 4 days ago so the python sdk init is on the latest release and I tried to relaunch some commands in local

weak scaffold
#

OK - so it's still slow then ๐Ÿ˜… sorry about that

#

copying <@&946480760016207902> since performance is something we all care about

#

@tight blade next question: is it also very slow on warm cache? Or only on cold cache?

In other words what's the time difference between 1st and 2nd run

tight blade
#

yes I can still observe the 30-40s between the cli is connected to the engine and seeing my first commands / actions from my dagger functions

#

warm cache is fixing the issue but when I tried it was not working in our case because we have different modules for our monorepo so if I remember well it's avoiding to have the sdk in cache

#

if it can be easier to understand I can try to do a kind of ascii schema in order to represent our dagger modules organizations

weak scaffold
tight blade
#

yes sorry I mean preventing

weak scaffold
#

As long as you're loading the same version of the SDK from all your modules, there's no reason for the SDK to not be cached.

If your engine is running many different kinds of modules, at high volume, you could have a situation where the different modules evict each other from the cache more rapidly

tight blade
#

if I have a gha pipeline and this one trigger 5 pipelines and all these pipelines are using the python sdk, there is a way to accelerate the sdk initialization ?

weak scaffold
tight blade
#

yes in my case nodes are ephemeral but maybe I could have something for warming the sdk but I think I tried that few month ago and the result of the warm part was working only at the module level of my repo but not all cross modules

weak scaffold
tight blade
#

yes I was thinking if it's better to initalize the sdk one time at the boot instead of X per jobs

#

because sometime we can have 5-6 pipelines per nodes after that I think karpenter is booting a new node

weak scaffold
#

The answer to that depends on your CI infra config, so I don't know

#

Looks like your real problem is lack of cache persistence

tight blade
#

ok so without cache this kind of waiting time, is normal ? The only things to do is to try to find a way to cache ?
Actually we are running in EKS using the local nvme of the instance

#

do you have something to recommend or explore in order to reduce the init time ?

fossil owl
tight blade
#

No I can try to add a personal token but actually we are not using dagger cloud in the company

weak scaffold
#

One option is to use the dang scripting language instead of Python. Dang is optimized for speed, there is no codegen phase, and no third party tools to install and configure. If your modules are only orchestrating dagger API calls, and not relying on native python libraries, then it's pretty easy to switch (and you can switch one module at a time)

It won't solve your caching problem, but it will remove the python sdk loading time

fossil owl
weak scaffold
#

Also @tight blade Dagger Cloud has experimental engine hosting, with automatic scale-out and cache persistence. If you want to evaluate that, we can give you early access (we use it in prod for our own CI)

tight blade
#

for dang it seems difficult because sometime we are using native python libraries or have some code logic and the company is developing in python
@fossil owl I will set a token temporally in order to have some traces and share something with more information
@weak scaffold yes it could be something as we were thinking to move from github action due to some limitation in the worfklow design possibilities (just maybe I need to have an idea of the future cost more or less and have an idea wha I have to change between my current setup and using the engine hosting)

weak scaffold
tight blade
#

Ok for me

tight blade
#

@fossil owl I have a little issue in my gha workflow I don't know why the dagger token is not available in all my steps, I have to go I will investigate on that tomorrow and share some links

tight blade
#

Ok now I have traces and looking at the details if I'm correctly reading the trace the issue is not on the sdk init

#

but from the GHA UI I still see an important delay

#

line 28->29

#

I just realized maybe I misread the github action output where all dots printed are commands / processing done between line 28->29 and just the engine is working on different actions like installing packages, setting en vars, etc ...

dusty grotto
fossil owl
#

But I do see what you're saying about the various withEnv steps before that taking quite a bit, though it seems like each of those are bottlenecked by various uv sync calls actually that's just one of them, others don't seem to do much and still take 3s

#

So that overhead might not be the SDK loading per-se, but just the overhead of invoking a python module function at runtime

#

cc @twin wren @snow zodiac I have a vague memory of talking about how the python SDK re-does a bunch of work every time it gets invoked and that we there were theories on how to fix it? I could be totally misremembering though

snow zodiac
twin wren
#

That's quite confusing because the traces shows that the job took 5m28 with the test function taking 5m20.
So is there some time that isn't recorded?

As I understand from the thread, the python sdk loading takes too long but based on the traces it's only 2seconds

#

Something very weird that I'm seeing in the trace is this:

fossil owl
twin wren
#

Why the load sdk runtime keeps appearings?

fossil owl
#

it is cached and doesn't do anything, but it hits that codepath each time

twin wren
#

Ohhh okay make sense

fossil owl
#

@tight blade do those withEnv steps do something expensive directly in your code (i.e. not calling dagger APIs but instead calling some python library or similar)? Just wondering the 3s overhead is that vs. just the python function being slow to invoke

tight blade
#

yes I spot the WithEnv call which take lot of time it's function from our internal python module:

@function
    async def with_env(
            self,
            name: Annotated[str, Doc("the targeted container")],
            value: Annotated[str, Doc("the targeted container")] | None,
            value_secret: Annotated[dagger.Secret, Doc("the targeted container")] | None,
    ) -> Self:
        """Add environment variable for a container"""
        if not value and not value_secret:
            raise Exception("value or value_secret should be set")

        if value:
            self._ctr = (
                self.
                _ctr.
                with_env_variable(name, value)
            )
        else:
            self._ctr = (
                self.
                _ctr.
                with_secret_variable(name, value_secret)
            )

        return self
#

and we where using like that in apps pipeline:

ctr = await (
            dag.
            python(
                self.pipeline_id,
                self.source,
                base_workdir="/applications",
                sub_path=self._app_name,
                worskpace_pyproject=self.worskpace_pyproject,
                worskpace_uv_lock=self.worskpace_uv_lock,
                libraries=self.libraries_source,
                build_system_packages = self.__SYSTEM_PACKAGES
            ).
            with_discover_python_version().
            install().
            with_env(name="MONGO_MAIN_URI", value_secret=mongo.uri()).
            with_env(name="CORE_ROUTE_BASE", value="http://core-api.service:9000/").
            with_env(name="CATALOG_APP_URL", value="http://catalog-app-url").
            with_env(name="CASHER_HOST", value="http://casher_url").
            with_env(name="INDUS_ROUTE_BASE", value="http://core-api.service:9000/indus").
            with_env(name="CORE_API_ENV_NAME", value="some-env").
            with_env(name="CORE_API_SERVICE_NAME", value="some-service").
            with_env(name="CORE_API_SERVICE_VERSION", value="some-version").
            with_env(name="DISABLE_AWS_AUTHENTICATION", value="1").
            container().
            sync()
        )
#

now I change it a new call in our python module:

@function
    async def with_envs(
            self,
            source: Annotated[dagger.EnvFile, Doc("Collection of environment variables to set")],
    ) -> Self:
        """Add multiple environment variables at once from an EnvFile.

        This is more efficient than calling with_env() multiple times,
        as it applies all variables in a single operation.
        """
        self._ctr = (
            self.
            _ctr.
            with_env_file_variables(source)
        )

        return self

and the pipeline look like that:

ctr = await (
            dag.
            python(
                self.pipeline_id,
                self.source,
                base_workdir="/applications",
                sub_path=self._app_name,
                worskpace_pyproject=self.worskpace_pyproject,
                worskpace_uv_lock=self.worskpace_uv_lock,
                libraries=self.libraries_source,
                build_system_packages=self.__SYSTEM_PACKAGES,
                python_version=python_version.strip()
            ).
            install().
            with_envs(
                dag.
                env_file().
                with_variable("CORE_ROUTE_BASE", "http://core-api.service:9000/").
                with_variable("CATALOG_APP_URL", "http://catalog-app-url").
                with_variable("CASHER_HOST", "http://casher_url").
                with_variable("INDUS_ROUTE_BASE", "http://core-api.service:9000/indus").
                with_variable("CORE_API_ENV_NAME", "some-env").
                with_variable("CORE_API_SERVICE_NAME", "some-service").
                with_variable("CORE_API_SERVICE_VERSION", "some-version").
                with_variable("DISABLE_AWS_AUTHENTICATION", "1").
                with_variable("MONGO_MAIN_URI", await mongo.uri().plaintext())
            ).
            container().
            sync()
        )
#

where I can see this is reducing the time on env var setup

#

but to be honnest I don't really undestand why my initial function was doing a side effect like that

fossil owl
# tight blade I have this trace: https://dagger.cloud/Dudesons/traces/0a3484164d363b2680c345d0...

Okay nice, yeah that's a good workaround. Now it's just that single .withEnvs taking ~3s rather than a bunch of individual ones taking ~3s.

The fact that the combined step also takes 3s is pretty good evidence that it is indeed just inherent overhead of python functions. So not the loading step but the actual runtime invocation overhead.

Like we were discussing above there's definitely some improvements we can make in this area, but in the meantime I think your workaround is good

tight blade
#

ok there is somewhere in the documentation or issue where I can find some overhead like that in the python sdk in order to review our internal modules ?
Because previously I was developing module in golang but in the actual where I'm working people are coding in python so I'm trying to stay on the python sdk

fossil owl
tight blade
#

ok no problem it was sure to be sure.
So the idea actually is to reduce the number of chaining functions on my custom modules in python or I keep like I have and sometime I do some workaround like I did ?