Device pass through | Dagger | Page 1

opal dove Apr 18, 2024, 5:12 PM

#

Haven't tried it yet, but you mean that with dagger I should be able to pass through the FPGA device?

naive zealot Apr 18, 2024, 5:15 PM

#

Dagger doesn't support that today, but I'm saying it would be cool if it did 🙂

#

There's a related design discussion, which is to pass through native GPU from the client to the container, using webgpu. That would mean for example, allowing a container to use the Apple Silicon from a Mac client.

As you can imagine there is a lot of demand for that, with generative AI workloads being all the rage. I'm thinking once we ship that (no timeline at the moment), other devices could benefit.

steady plank Apr 19, 2024, 12:15 AM

#

This is the same feature needed in modules to support KVM for cross os modules (eg windows/macos in Dagger), as well as GPU from non Nvidia vendors, and accelerator network cards used in high performance computing. I’m hyped to hear you think adding this would be cool.

CC: @shell egret

naive zealot Apr 19, 2024, 12:16 AM

#

steady plank This is the same feature needed in modules to support KVM for cross os modules (...

Yes, to be clear this would use webgpu to stream the client's gpu to the container

steady plank Apr 19, 2024, 12:17 AM

#

Here the guy is talking about FPGAs

#

Though webgpu is another cool use case

naive zealot Apr 19, 2024, 12:18 AM

#

yes I know, I'm pointing out a potential difference with past discussions about device access, which is that the device would not be mounted into the container from the buildkit/runc host

#

I love the clean sandboxing of client streaming. Not sure if it works for all devices. But it would be a fantastic way to separate concerns between "dumb" container runners and "smart" clients

steady plank Apr 19, 2024, 12:20 AM

#

Has anyone actually tested webgpu as a backend for megatron deepspeed or the other AI frameworks. My impression talking to people who run those workloads was you need actual Cuda, HIP, or OneAPI for that to work

naive zealot Apr 19, 2024, 12:20 AM

#

steady plank Has anyone actually tested webgpu as a backend for megatron deepspeed or the oth...

No idea. If it works in mainstream browsers today, it will work on Dagger by the end of this year 🤞 . If not, it probably won't.

steady plank Apr 19, 2024, 12:23 AM

#

Also those frameworks often need NCCL or MPI to be efficient, which is yet another set of devices to support : https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/troubleshooting.html

#Device pass through