TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in method matmul of type object at 0x7d24f8af6fa0>(*(GradTrackingTensor(lvl=1, value=
FakeTensor(..., device='cuda:0', size=(1, s3, s2), dtype=torch.float16,
requires_grad=True)
), GradTrackingTensor(lvl=1, value=
FakeTensor(..., device='cuda:0', size=(2048, 65537), dtype=torch.float16)
)), **{}): got RuntimeError('a and b must have same reduction dim, but got [s3, s2] X [2048, 65537].')
from user code:
File "/content/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 346, in accumulate_chunk
(chunk_grad_input,), (chunk_loss, (unscaled_loss, chunk_completion_length, chunk_mean_kl,)) = torch.func.grad_and_value(
File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/apis.py", line 441, in wrapper
return eager_transforms.grad_and_value_impl(
File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/vmap.py", line 48, in fn
return f(*args, **kwargs)
File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/eager_transforms.py", line 1364, in grad_and_value_impl
output = func(*args, **kwargs)
File "/content/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 294, in compute_loss
new_logits = torch.matmul(new_hidden_states.to(lm_head.dtype), lm_head.t())
Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
This was after taking the gemma 1B grpo colab,
changing to unsloth/Falcon-H1-1.5B-Deep-Instruct
adding os.environ['TRITON_JIT_DISABLE_OPT'] = '1'
and !uv pip install --no-build-isolation mamba-ssm[causal-conv1d]
That seemed like it would give me fast track on the t4 and it got to training but fell over. I have the full output if wanted.