mount custom devices | Dagger | Page 1

glacial hatch Mar 7, 2024, 7:57 PM

#

Just to confirm, you're aware of existing GPU support, and want to generalize it beyond specific hardcoded devices?

candid spade Mar 7, 2024, 7:58 PM

#

Yes.

#

We use things like intel GPUs and AMD gpus that dont use cuda.

#

We also have things like funky network cards that have CPUs in them and support remote direct memory access.

#

I'm looking for a path to add these.

#

in a way that isn't hard coded for each repo.

glacial hatch Mar 7, 2024, 8:02 PM

#

Our approach has been to specifically not expose general device access in the API (where the caller can just pass the pass as-is) because I worry that it will introduce fragmentation: "this module works on machines of this type, make sure to run this out-of-platform command on the host system", stuff like that.

#

So my preferred approach would be to generalize gradually, by expanding the abstraction, without removing it

#

For example I wouldn't mind having a WithKVM call

#

Maybe my fragmentation fear is excessive, you could argue that /dev/xx is already a well-defined API that won't cause fragmentation - no different from eg. Linux syscalls. But I don't have enough confidence that that is true.

candid spade Mar 8, 2024, 5:49 PM

#

Fair enough, but as I understand from reading the GPU implementation is that 1) it requires a custom engine build, 2) it's really hard coded for NVIDIA. 3) if we had the ability to mount devices (and some host driver libraries) from the host we don't have to wait on upstream to add these.

While I agree that Nvidia is the leader commercially, 4 of the top 5 most powerful (including the top 2) super computers in the world do not use Nvidia accelerators. They use AMD, or Intel accelerators, and in the case of Fugaku in Japan a custom home grown accelerator. And all of the top 10 use network interconnects that require certain devices to be mounted instead of traditional TCP/IP, and that doesn't include some the other accerlator types like TPUs, Cerebras wafers, etc... And as much as people try to make codes interoperable (it's amazing how interoperable pytorch is), but it's not perfect.

While there is value in modules being supported everywhere, for users like me, there is value in being able to use the same modules on different systems which share a given hardware type because of how allocations work on these machines.

glacial hatch Mar 8, 2024, 5:52 PM

#

@candid spade is there any chance we could get our cake and eat it too? Meaning 1) Dagger supports the diversity of devices that you need, out of the box, and 2) we don't open the pandora box of each host system's device API leaking into the Dagger API?

candid spade Mar 8, 2024, 5:58 PM

#

Could dagger just support what ever is needed to use https://github.com/cncf-tags/container-device-interface?

GitHub

GitHub - cncf-tags/container-device-interface

Contribute to cncf-tags/container-device-interface development by creating an account on GitHub.

#

In otherwords, we configure device support in the container runtime, and pass a list of device vendors to load in Dagger as strings?

glacial hatch Mar 8, 2024, 10:16 PM

#

That looks like the kind of abstraction that could help, yes. I think we actually discussed this exact spec with @glass coyote @whole cradle @earnest hemlock at some point

#

and @sleek coyote who contributed GPU support in the first place

#

To your other point about GPU access requiring a custom engine build: that is just because we're being cautious in rolling out changes to the engine. The goal is to unify so that we have only one engine build for everyone, and it supports GPU out of the box.

I don't know if we are following a pre-established calendar for getting there, or if maybe we just collectively dropped the ball and forgot. @vital drift would know

vital drift Mar 9, 2024, 8:15 AM

#

glacial hatch To your other point about GPU access requiring a custom engine build: that is ju...

It's great to see the GPU variant of the Engine gaining popularity! Now that we see growing interest from the Dagger community in this use-case, I am keen to spend some time on continuing down this path, and trying the first next step in the https://github.com/cncf-tags/container-device-interface integration. Would you be up for a 30 mins sync @candid spade sometime next week? cc @gritty smelt

candid spade Mar 9, 2024, 1:46 PM

#

I'm in ET, would 4pm or 4:30pm on tuesday work? @vital drift

vital drift Mar 11, 2024, 8:09 PM

#

candid spade I'm in ET, would 4pm or 4:30pm on tuesday work? <@796825768600141844>

Following-up via DM

potent tangle Jan 21, 2025, 4:05 AM

#

hey folks - new to dagger and was wanting to port some integration tests of some virtualization code and found this thead - has there been any movement on being able to support KVM inside dagger containers (ie. passing host devices through to the container such as /dev/kvm)

earnest hemlock Jan 21, 2025, 4:51 AM

#

potent tangle hey folks - new to dagger and was wanting to port some integration tests of some...

/dev/kvm is one of the "special" mountpoints that gets passed to the containers if you set the InsecureRootCapabilities flag in WithExec

#mount custom devices