#Need Help: Optimizing CPU-GPU Communication for LLM Training

2 messages · Page 1 of 1 (latest)

wicked obsidian
#

Does anyone work with LLMs? I'm relatively new to training models and have encountered a rather serious problem: slow communication with the GPU. This translates into high latency and low throughput when transferring data between the CPU and GPU. I've looked through a lot of resources trying to find an effective solution, but so far without success. Does anyone have any ideas/read about anything or have any proven methods to optimize this process?

mellow parcel
#

I recently came across YaFSDP. As I understand it, their extension seems to have fixed the problem of the inefficient loading of communication channels between GPUs during the training process. I haven’t tested it myself yet, but I’m planning to very soon to see how much the processes are optimized. Looks promising. https://github.com/yandex/YaFSDP

GitHub

YaFSDP: Yet another Fully Sharded Data Parallel. Contribute to yandex/YaFSDP development by creating an account on GitHub.