grad_norm: 0 and LR: 0 | Unsloth AI | Page 1

storm moat Jan 18, 2026, 2:56 PM

#

📎 message.txt

#

📎 message.txt

#

I am trying to fine-tune Qwen3-4b-instruct using a noval GRPO-like RL approach, but after 1st step, I am getting the following error. that would be great if anyone could help me!

fathom prairie Jan 18, 2026, 2:58 PM

#

can't anything about the rest but LR=0 on step 1 is normal since youre using warmup

storm moat Jan 18, 2026, 2:59 PM

#

I believe the reason of the CUDA error is LR=0 and grad_norm: NaN

fathom prairie Jan 18, 2026, 2:59 PM

#

LR=0 is fine

#

itll just zero out the gradients when applying them

#

grad norm nan isnt normal

#

also

CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1

#

is says right there in the error message how to get a more sensical stack trace

storm moat Jan 18, 2026, 3:00 PM

#

yep I did it, nothing changed still the same log lol

fathom prairie Jan 18, 2026, 3:00 PM

#

same stack trace?

storm moat Jan 18, 2026, 3:00 PM

#

yea

spice gust Jan 18, 2026, 3:10 PM

#

storm moat yea

Before any further steps, did you ensure to try out the official notebooks by Unsloth and also read the docs ?

Start by running the Unsloth notebook without any changes. And if that still throws an error, then there's an environment issue. If not, then you add your changes one by one and see where the crash happens.

#

https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/tutorial-train-your-own-reasoning-model-with-grpo

Tutorial: Train your own Reasoning model with GRPO | Unsloth Docume...

Beginner's Guide to transforming a model like Llama 3.1 (8B) into a reasoning model by using Unsloth and GRPO.

storm moat Jan 18, 2026, 4:13 PM

#

the error have resolved!

#

I just setted up both bf16 and fp16 = False, and its done

#grad_norm: 0 and LR: 0