#SFT and GRPO on local RTX 3090

12 messages · Page 1 of 1 (latest)

shadow vortex
#

Hello, I’m looking for some help setting up local fine tuning SFT and GRPO. I keep getting segmentation errors and I can’t figure it out.

I’m tuning on Ubuntu 24.04 with cuda toolkit 12.8. I’ve tried various different setups with different cuda wheels for torch cu124 etc. tried different torch installs but then I get dependency errors when using vllm, xformers etc.

Any body managed to get it working smoothly?

Cheers

edgy sparrowBOT
#

Unsloth SFT and GRPO fine-tuning on a local RTX 3090 with CUDA 12.8 can be challenging due to strict dependency requirements between PyTorch, CUDA, Triton, vllm, and xformers. The most reliable approach is to use the exact versions recommended in Unsloth's Blackwell/RTX 50XX install guides, even for 30XX cards, as recent Unsloth and vllm versions are tested with CUDA 12.8+ and PyTorch 2.7/2.8. Segmentation faults often result from mismatched versions or missing dependencies, especially with xformers or flash-attn. Multi-GPU is not yet supported in Unsloth, so ensure you use only one GPU for training. For best results, purge all previous torch, triton, xformers, and vllm installs, then follow the Blackwell install steps, and compile flash-attn from source if needed. See Unsloth Blackwell install guide and relevant troubleshooting.

Would you like a step-by-step example of the recommended installation and environment setup?

Sources:

dark finch
#

Yea I run on a 3090. What is the error? Did you run with CUDA launch blocking? And did you also try from a fresh venv?

shadow vortex
#

Hey @dark finch , thanks for getting back to me. I installed it again as per the bot response and it ran quite long, like 120 steps but then it just stopped and the terminal didn’t have any output but SEGV was next to the path. I was going to try the docker container to see if that works. I don’t remember setting CUDA launch blocking but I have tried lots of things, including many venvs 😂. Seems like I can never quite line the dependencies up, you go torch 2.5 out something but then you can’t use the later vllm versions and then the example GRPO doesn’t work as it’s using it. I’m trying to run the Qwen 4B GRPO example. I am running SFT and GRPO in the same script with an empty _cache in between. Best guess is I might need to run in two different scripts 😮‍💨

dark finch
#

Then you don’t need to worry about package installs

#

But yes vllm can be tricky since it lags behind a bit. One option would be to run a grpo notebook in colab and check out the pip freeze. Then match the versions locally

shadow vortex
#

Cheers, I managed to get docker set up and recognise my GPU after a bit of troubleshooting but not managed to set a run off yet, hopefully it’ll work. What’s the environmental variable I need to set for cuda launch blocking?

dark finch
#

sorry you might not need the variable, I misread the initial error.

shadow vortex
#

Just FYI when I checked the dependencies of the Blackwell setup above, it didn’t quite match my setup after following the steps. Will see if I can get it to match and report back, as well as the Docker setup

dark finch
#

You don’t need the Blackwell setup. Rtx 3090 is not a Blackwell card