#Evaluation & custom compute_metrics can't produce coherent output

2 messages · Page 1 of 1 (latest)

latent mango
#

Has anyone here ever tried computing custom metrics during the eval runs whilst finetuning? I'm currently trying to finetune Qwen2.5 VL for a bounding box object detection task and wanna therefore evaluate the performance based on bounding box accuracy metrics. I've got all the necessary code to actually evaluate the output, but I having trouble figuring out how to generate outputs correctly during the eval.

The model is producing very decent outputs when doing a normal inference run after training, but during the training phase, in the evaluation, it generates broken sentences, it seems like every ~10 tokens or so it generates a completely nonsensical token. Especially the beginning is complete jibberish, towards the end it gets slightly better.
Example: https://gist.githubusercontent.com/hugohabicht01/a776b2d5b921b2cd93fd58a4b277dead/raw/93b41a71128811026580563385bb6421c7440b13/bad_generation.txt
Small sample:

I a AI analyzing-level analysis you-depth a the name the paragraph><th through the the in the image describe their they are private not<th with a <think><th that through the, please the analysis a HTMLanalysis>output> block. a JSON. the following keys:-image": "1
 "is": , "isposure": str} "is_box": {"bbox,, y_min, x_max, y_max]} "is": <th examples to consider:

The first incoherent part seems to be my prompt, but rephrased completely incoherently. Everything towards the end looks slightly better, but is still not correct, during normal inference after training it can easily produce parsable data, but this isn't parsable. Even when loading early checkpoints, this phenomen isnt happening.

This is the code I'm using for fine-tuning, its pretty much just the code from the qwen2.5 vl notebook, but slightly adjusted to use an eval_dataset as well
https://gist.github.com/hugohabicht01/fea70df3c004b6cc303e14813467c8cb#file-broken-training-setup-py-L65

Is anyone able to help me please? What am I doing wrong?

Gist

GitHub Gist: instantly share code, notes, and snippets.

half olive
#

@latent mango have you solved your issue?