#Dagger Python SDK pypi mirror

1 messages · Page 1 of 1 (latest)

silver swallow
#

Hey everyone! I am working on Dagger integration within our corporate environment and have an issue with the Python SDK referencing pypi.org and pythonhosted.org indexes. Internally, we have a hosted PyPI mirror and direct access to public mirrors is not possible in our CI environment. Is there a workaround to instruct the Dagger engine to use a proxy, similar to what is done for Golang? I have read some tickets and documentation regarding this issue, and it seems the only solution is to configure HTTP_PROXY. Is there any simpler way to achieve this? Or could you share how exactly the proxy needs to be configured to work smoothly with the other Dagger engine functionalities? Ps. [[tool.uv.index]] is used on the module level and works as expected but the issue is that Python SDK libraries are still requiring publish indexes.

(I moved my question from #general to here - i guess this is better place for asking questions)

manic relic
# silver swallow Hey everyone! I am working on Dagger integration within our corporate environmen...

👋 double posting from general here: hey there! if HTTP_PROXY is an option for you I'd try to go this route since it's globally more consistent across all the components of the Dagger ecosystem.

We currently don't have other way to set custom indices or URLs that our core SDKs use.

Maybe an alternative could be to fork our python SDK in the meantime and use your fork with modified URLs but not sure if that has been tried before. cc @glossy robin

silver swallow
#

Thanks for a hint @manic relic . I will look into it (forking python sdk).
It seems ordinary proxy forwarding is not possible and I was trying with redirecting pypi request to our internal repository using squid, but this is a rabbit hole (couldn't make it work - yet).

Currently I'm looking into creating custom base python image with PIP_INDEX_URL and UV_INDEX_URL set as env variables pointing to our internal pypi index.I see that it's possible to define image through [tool.dagger].base-image configuraiton in module pyproject.toml
Is such approach possible and would this envs be honored when uv is installing dependencies?

glossy robin
#

I don't think you need to fork, this is supported.

[[tool.uv.index]] is used on the module level and works as expected but the issue is that Python SDK libraries are still requiring publish indexes.

That setting should set it globally on the base runtime container. Can you share your pyproject.toml?

silver swallow
#

Here is project.toml. I'm using [[tool.uv.index]] that points to our internal nexus repository ```[project]
name = "..."
version = "0.1.0"
requires-python = ">=3.12"
dependencies = ["dagger-io"]

[tool.uv.sources]
dagger-io = { path = "sdk", editable = true }

[[tool.uv.index]]
name = "nexus"
url = "https://nexus.company.net/repository/pypi-blessed/simple"
defult = true

[build-system]
requires = ["hatchling==1.25.0"]
build-backend = "hatchling.build"```

#

Also just running dagger develop produces sdk folder with uv.lock pointing to official repositories.

glossy robin
#

Ok, try removing the name field.

silver swallow
#

... by manually adding [[tool.uv.index]] to sdk/pyproject.toml and running sync solves the local issue

glossy robin
#
[[tool.uv.index]]
url = "https://nexus.company.net/repository/pypi-blessed/simple"
defult = true
silver swallow
#

By removing name I don't see difference locally - it still produces the same sdk/uv.lock

glossy robin
#

I'm not sure that matters

#

Do you see the mirror in the module's uv.lock after dagger develop?

silver swallow
#

No. I also tried restarting the engine to avoid caching issues (not sure when things are cached).

glossy robin
#

But if you try dagger functions it returns an error?

silver swallow
#

Locally, I don't have a problem with accessing pypi and dagger call works. Issue is on CI. I was expecting to see a different index used also in /sdk/uv.lock if I configure this correctly.

glossy robin
#

/sdk/uv.lock won't change. That's bundled.

silver swallow
#

ps. non sdk uv.lock is using correct index. On CI we have a problem with dependencies that are required by SDK itself, not ones that are defined in module pyproject.toml.

glossy robin
#

Do you have any env vars for UV setup locally?

silver swallow
#

No. On CI we are dropping all connections to external/public network.

glossy robin
#

Have you setup a custom runner locally?

silver swallow
#

You mean engine.toml - only mirrors for docker images are configured that works ok? Nothing else is configured.e

glossy robin
#

Just trying to figure out what you have in CI that's different.

#

What error do you get on CI?

silver swallow
#
    image: docker.company.net/dagger-engine:v0.15.3
    container_name: custom-dagger-engine
    privileged: true
    volumes:
      - ./docker/docker_agent/files/etc/dagger-engine.toml:/etc/dagger/engine.toml
      - ./docker/docker_agent/files/certs/certs.crt:/usr/local/share/ca-certificates/company.crt
      - ./docker/docker_agent/files/certs/certs.crt:/etc/ssl/certs/company.crt
      - /var/lib/dagger:/var/lib/dagger
    environment:
      - _DAGGER_ENGINE_SYSTEMENV_GOPROXY=https://athens-proxy.company.net/```
manic relic
silver swallow
#

This is basically how engine is started as docker container

glossy robin
#

Is that CI or local?

silver swallow
#

CI

#

btw. dagger with go module works ok

#

we are now trying to also use python SDK....

glossy robin
#

Have you tried running again on CI after removing the "name" field in tool.uv.index?

silver swallow
#
45  : [1.3s] |   ?? Failed to download `graphql-core==3.2.5`
45  : [1.3s] |   ????????? Failed to fetch:
45  : [1.3s] |   ???   `https://files.pythonhosted.org/packages/e3/dc/078bd6b304de790618ebb95e2aedaadb78f4527ac43a9ad8815f006636b6/graphql_core-3.2.5-py3-none-any.whl`
45  : [1.3s] |   ????????? Request failed after 3 retries
45  : [1.3s] |   ????????? error sending request for url
45  : [1.3s] |   ???   (https://files.pythonhosted.org/packages/e3/dc/078bd6b304de790618ebb95e2aedaadb78f4527ac43a9ad8815f006636b6/graphql_core-3.2.5-py3-none-any.whl)
45  : [1.3s] |   ????????? client error (Connect)
45  : [1.3s] |   ????????? Connection reset by peer (os error 104)
45  : [1.3s] |   help: `graphql-core` (v3.2.5) was included because `codegen` (v0.0.0)
45  : [1.3s] |         depends on `graphql-core```
#

I will try it in a few minutes. This is the error I was getting on CI.

glossy robin
#

Without the name field it should work. I think that was the issue.

silver swallow
#

That would be great. Will come back with the results.

#

Thanks

silver swallow
#

The same issue exists on CI. Interestingly, when I configured the mirror without authentication in the URL, it failed to retrieve the hatchling. However, after passing the username and password in the URL, it seems it passes this first step and than fails to retrieve the graphql_core library. Library is available on our company nexus, and is also found locally when I update the sdk/pyproject.toml to use the company index.

#

I see in logs that UV_EXTRA_INDEX_URL is correctly set and passed as an env variable to the runner.

glossy robin
#

How were you passing the credentials?

#

Env vars?

silver swallow
#

As part of URL - https://<username>:<password>@....

glossy robin
#

I mean locally

#

When you said it was working.

silver swallow
#

same

glossy robin
#

But what are you referring as the "runner"?

silver swallow
#

I'm not sure how this is executed and what's the internal architecture. But basically what is see in the logs after we dagger call is executed in the pipeline. So it should be local to the dagger engine container.

#

Nothing fancy is currently setup - one node and docker container for engine.

glossy robin
#

Your uv sync isn't being run in the Dagger engine container. The Dagger engine creates a new container to run the Python SDK in. Every module has their own container, and each SDK tells the engine how to build that container.

#

When the Python SDK detects [[tool.uv.index]] in a module (no name, default=true), it adds UV_INDEX_URL to the module's container. So when it installs from sdk/uv.lock, even though sdk/pyproject.toml doesn't have your mirror setup, it does have the UV_INDEX_URL env var when uv runs to install graphql-core. Then when your module's uv.lock itself is synced, it has both (i.e., the env var and the setting).

silver swallow
#

Ok, I think I understand. I was also trying to set ENV variables directly on engine, but this is pointless.

glossy robin
#

Yes, it's pointless. We don't currently have a way to pass env vars to module containers.

#

But it can be done, with a bit of customization.

silver swallow
#

So in which direction should I look into. Is env variable UV_EXTRA_INDEX_URL that's being passed to this container than not correct?

#

Because the value it ok and retrieved from pyproject.toml it seems

glossy robin
#

If you see that one, it suggests to me you're not using default = true which is strange because you shared above that you had it. Unless you changed it.

#

If it's not the default it means it'll have lower priority, so the public pypi.org will be tried first.

glossy robin
silver swallow
#

Yes, just saw it 😦

#

Now it's passing UV_INDEX_URL however still trying to fetch from https://files.pythonhosted.org. I will try to restart engine if this helps (again not sure what is cached :))

glossy robin
#

I don't think restarting will help.

#

Do you have a dagger cloud trace?

silver swallow
#

No, we don't have it yet.

#

It's the same issue. Somehow hatchling is being retrieved if mirror is correctly set, but it fails with other deps - still trying to fetch from https://files.pythonhosted.org - and this package locations are from pypi.org. I seems that it still uses uv.lock somehow. module uv.lock is using correct mirror tho.

#

Hmm, is it maybe connected to using a local module as a dependency?

#
    {
      "name": "some-module",
      "source": "../../some-module"
    }
  ],```
silver swallow
#

I tried using a minimal example and still encountered the same issue. Not sure where is a problem.

silver swallow
#

Hey. After some more digging, I think I found the issue. It seems there is no workaround other than preparing a forked Python SDK.
As I see it, the issue is that the Python SDK runtime is initialized by invoking uv --freeze, which uses bundled uv.lock with official pypi URLs.
Since the runtime is initialized on our CI (there is no access to pypi.org), the dagger call breaks before actually touching the code and installing project-specific dependencies.
UV_INDEX_URL is passed to the Python base container, but it does not help when uv install deps defined in a lock file.
To validate this, I manually updated uv.lock in the dagger-engine snapshots volume (/var/lib/dagger/worker/snapshots/snapshots/4/fs) using our internal index. After this change, I successfully ran our pipeline using the python sdk on CI.

manic relic
#

I'm a bit surprised that UV_INDEX_URL seems to ignored with uv --freeze 🤔

silver swallow
#

Nice. I was already looking into building a custom dagger-engine, which would also be okeyish temporary solution for us.
I don't know the details, but general solution could be to ship SDK deps together with the dagger-engine (which would also improve initialization time). Ignoring the lock file altogether to enable this to work without pypi.org connectivity is probably also a security risk.

#

ps. If pypi.org is not available for some reason (enterprise network), dagger init --sdk=python also fails with the same issue. In this case, you don't even have any project-specific configuration (custom uv index).

glossy robin
#

Hmm... there's other people using a mirror and it seemed like they didn't need a custom image to make it work. Note that the sdk/uv.lock is only used for codegen, i.e., it's only to get the pinned version of a single dependency (graphql-core). The other SDK dependencies are pinned in your module's uv.lock. I plan on centralizing codegen accross SDKs, meaning that dependency would no longer be necessary for codegen. But until then, I may just vendor that library so it doesn't need to reach for the package index.