hey folks. I've been attempting my first project but have already hit a brickwall. I'm trying to load the llama4-scout model, but my gpu (5070ti) refuses to accept any portion of the load. I've confirmed (many times) that pytorch is compatible (as much as I think it can be: ver. 2.8.0.dev20250408+cu128). Copilot is stumped and gemini insists on loading a non-weighted model - but this doesnt address my concern of my processing power capping my memory/disk (gpu sits idle and cpu sits at 50%). Can anyone provide some insight?
#GPU Help requested
17 messages · Page 1 of 1 (latest)
Hello, I'm facing the same problem too (or a similar one). I'll help you debug your problem. May you plesae specify where you are using your GPU? Is it your own GPU, or is it a rented one (eg. from SageMaker, RunPod)? What's your goal with this? Based on what I've faced, the error usually lies in how the GPU Is configured. The GPU is most likely working and connected; maybe check with Gemini as to verify that you are using your GPU properly?
Are you still facing issues with this?
The issue you are having is that your VRAM is being completely used.
Which doesn't make sense given you are loading a 17B 4-bit float16 model, so I would assume that should use 8GB max. Just in case, you might want to check if other things are loaded in the background that would use your GPU memory, such as a game or something. Otherwise, you should go to task manager to see where this memory could potentially be used up at (there's a GPU tab in task manager)
Hi there! And I appreciate the help. I ended up switching models entirely. Gpu was initially not engaging during the model loading process - only ram. I tweaked it after about a week of troubleshooting and eventually managed to get the gpu to engage - onto to consistently run into this error. Which doesn't make sense. I'm on an rtx 5070ti, ryzen 9 9900 3dx, 64 gb of ram, and all files related to my ai projects are housed on their own 4 tb nvme.
There were no other programs up besides copilot for many of my thousands of attempts.
My unconfirmed diagnosis is that there has to be some underlying issue with the llama4 scout model and the nightly version of pytorch (5070 has no official support).
Have you tried to utilize the non-nightly version or other forms of CUDA?
Cuda and the 5070ti are not compatible.
Without nightly - this was my initial issue when loading the model. No gpu engagement.
Llama 4 forums have reported similar things - my last dive into it saw someone suggest something as easy as setting 'auto' to 'sequence' (something similar - it didn't work)
I see that now, CUDA 12.8 seems to be the compatible version.
Because of the Blackwell architecture
I can't win 😭
I'd be curious if using tensorflow over pytorch would make a difference but I'm fairly exhausted with the idea. And nemotron is loading as I type this - which might be better for my purposes anyways
Have you tried deploying it in a pre-packed Docker container? Or potentially on WSLS or Linux?