Guys i've been stuck on this problem for the whole day and half, i genuinly dont know whats wrong.
I have 2 files attached here, one is a testing script which uses the same system prompt, same user prompt with 2 images per sample and a sample answer.
Rest of the configs are literally the same as the main script ( same model, same configs, same lora )
how come i dont get any error in the testing script where it runs successfully but then the main file is giving me this error : attached the errror
I need help
summary of what i did so far:
Debugging Notes: GRPO Multi‑Image Training Mismatch
- Problem Summary
Error: ValueError: Image features and image tokens do not match, tokens: 596, features: 298 (and previously 34320 vs 17160).
Token count is always exactly 2× the feature count.
Occurs during the GRPO training step (scoring phase), not during data loading/tokenization.
The same dataset and model work perfectly in a separate testing script but fail in the main training script.
- What Was Debugged (all ruled out)
Suspect Why It Was Ruled Out
Dataset content/format Testing script works with identical data; explicit mapping (convert_prompt_and_resize_images) does not fix it.
Processor vs tokenizer Both scripts load the same object from FastVisionModel.from_pretrained(); diagnostic prints confirm identical type, image token count (142), and config.
FastVisionModel.for_training() call Removing it did not resolve the error.
remove_unused_columns / data collator Same settings used in both scripts.
Reward function Replacing with dummy reward yields same error.
Vision config spatial_merge_size Both scripts print spatial_merge_size: 2 after loading.
Image type/size/PIL handling Both scripts produce identical PIL images with same size, token counts.