#weird mismatch error in grpo multi image training

1 messages · Page 1 of 1 (latest)

undone prawn
#

Guys i've been stuck on this problem for the whole day and half, i genuinly dont know whats wrong.

I have 2 files attached here, one is a testing script which uses the same system prompt, same user prompt with 2 images per sample and a sample answer.

Rest of the configs are literally the same as the main script ( same model, same configs, same lora )

how come i dont get any error in the testing script where it runs successfully but then the main file is giving me this error : attached the errror

I need help

summary of what i did so far:

Debugging Notes: GRPO Multi‑Image Training Mismatch

  1. Problem Summary
    Error: ValueError: Image features and image tokens do not match, tokens: 596, features: 298 (and previously 34320 vs 17160).

Token count is always exactly 2× the feature count.

Occurs during the GRPO training step (scoring phase), not during data loading/tokenization.

The same dataset and model work perfectly in a separate testing script but fail in the main training script.

  1. What Was Debugged (all ruled out)
    Suspect Why It Was Ruled Out
    Dataset content/format Testing script works with identical data; explicit mapping (convert_prompt_and_resize_images) does not fix it.
    Processor vs tokenizer Both scripts load the same object from FastVisionModel.from_pretrained(); diagnostic prints confirm identical type, image token count (142), and config.
    FastVisionModel.for_training() call Removing it did not resolve the error.
    remove_unused_columns / data collator Same settings used in both scripts.
    Reward function Replacing with dummy reward yields same error.
    Vision config spatial_merge_size Both scripts print spatial_merge_size: 2 after loading.
    Image type/size/PIL handling Both scripts produce identical PIL images with same size, token counts.
delicate coral
marsh flare