This error usually occurs when the input sequence length or batch size is too large for your available GPU memory, causing a mismatch in tensor shapes during training. Try reducing max_seq_length or batch_size to fit within your hardware limits, as large values can lead to shape mismatches and OOM errors during vision model fine-tuning with Unsloth.
If the error persists, ensure your data preprocessing and collator settings are consistent with your model's expected input shapes, and check that your dataset examples do not exceed the specified max_seq_length after tokenization and image embedding. Would you like more detailed troubleshooting steps or code examples?
Sources:
💡 **Hint:** Mention @RunLLM in the thread for followups.