#P2P is disabled between NVLINK connected GPUs 1 and 0

5 messages · Page 1 of 1 (latest)

brisk kettle
#

Hey team! Could you fix NVLink issue for H100 SXM Community pods? I encounter this error frequently. Corrupted pod ID: 4a5acwxj2kene6
P2P is disabled between NVLINK connected GPUs 1 and 0. This should not be the case given their connectivity, and is probably due to a hardware issue. If you still want to proceed, you can set NCCL_IGNORE_DISABLED_P2P=1.

I can proceed with NCCL_IGNORE_DISABLED_P2P flag but this will drop performance ~ 10%

minor pikeBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

misty token
#

forwarded it to team

misty token
#

@brisk kettle so got response and aparently gpu5 is not supporting P2P.
What we can advise for now is to pick diffrent machine

brisk kettle
#

@misty token ok, thanks