#ValueError when running inference with Qwen3-VL-8B LoRA checkpoint (FastVisionModel)

1 messages · Page 1 of 1 (latest)

devout pilot
#

Hi Unsloth team,

I’m fine-tuning unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit with LoRA for a Sinhala OCR task using FastVisionModel + UnslothVisionDataCollator and SFTTrainer. Training and saving LoRA work fine, but when I reload the saved LoRA model for inference I get a ValueError from the tokenizer.

Hardware / environment

  • GPU: NVIDIA RTX 3090 24GB
  • OS: Linux (Kaggle environment)
  • Unsloth: 2026.2.1
  • unslothzoo: 2026.2.1
  • PyTorch: 2.10.0cu128
  • CUDA toolkit: 12.8
  • Transformers: 4.57.1
  • Datasets, accelerate, xformers, bitsandbytes etc. are from the Unsloth vision/Qwen3-VL setup
    (versions are from the notebook cell that installs and prints packages)
#

Problem: error after loading saved LoRA for inference
In a fresh cell / session, I load the saved LoRA like this:

from unsloth import FastVisionModel
import jiwer

model, tokenizer = FastVisionModel.from_pretrained(
    model_name = "sinhala_qwen3vl_ocr_lora",
    load_in_4bit = True,
)

FastVisionModel.for_inference(model)

image = dataset["test"][0]["image"]          # PIL Image
ref_text = dataset["test"][0]["text"]        # ground-truth OCR text

test_instruction = "Perform OCR on this image and extract all the text exactly as it appears."

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": test_instruction},
        ],
    }
]

input_text = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
)

inputs = tokenizer(
    image,
    input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to("cuda")

output_ids = model.generate(
    **inputs,
    max_new_tokens=2048,
    do_sample=False,   # greedy for evaluation
    use_cache=True,
)

generated_text = tokenizer.decode(
    output_ids[0][inputs["input_ids"].shape[1]:],
    skip_special_tokens=True,
).strip()

cer = jiwer.cer(ref_text.strip(), generated_text)
wer = jiwer.wer(ref_text.strip(), generated_text)

When I run this after loading the LoRA checkpoint, I get:

ValueError: text input must be of type `str` (single example),
`List[str]` (batch or single pretokenized example)
or `List[List[str]]` (batch of pretokenized examples).
limpid wharf
#

since you're loading from teh lora adapte rdirectly

#

a temporary workaround is to load the processor from teh base model

#

and use that

limpid wharf
#

hello this was just fixed in the main repo

#

can you pull from there instead of pypi

devout pilot
#

{pip uninstall unsloth unsloth_zoo -y
pip install --no-deps git+https://github.com/unslothai/unsloth_zoo.git
pip install --no-deps git+https://github.com/unslothai/unsloth.git
}

you mean like this?

GitHub

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM. - unslothai/unsloth

devout pilot
#

`%%capture
import os, re

if "COLAB_" not in "".join(os.environ.keys()):
!pip install unsloth
else:
import torch; v = re.match(r'[\d]{1,}.[\d]{1,}', str(torch.version)).group(0)
xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, "0.0.34")
!pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer
!pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth
!pip install transformers==4.57.1
!pip install --no-deps trl==0.22.2
!pip install jiwer`

These are the libraries import

limpid wharf
#

add a --force-reinstall to --no-deps for unsloth-zoo and unsloth

devout pilot
#

Can I get a help for if I send the text image testing sample code? because this is not running in Collab. I am run this Kaggle.

devout pilot
#

cell 1
`%%capture
import os, re

!pip install --force-reinstall unsloth
!pip install --force-reinstall unsloth_zoo
!pip install transformers==4.57.1
!pip install --no-deps trl==0.22.2
!pip install jiwer`

cell 2

`from datasets import load_dataset
from huggingface_hub import login

Login to Hugging Face (if private dataset)

login() # Enter your token
`

cell 3

`from datasets import load_dataset

dataset = load_dataset("avishadilhara/sinhala-ocr-lk-acts-1010")

test_dataset = dataset["test"]
print(f"\nTest set: {len(test_dataset)} samples")

Preview

sample = test_dataset[0]
print(f" Image size: {sample['image'].size}")
print(f" Text length: {len(sample['text'])} chars")
print(f" Preview: {sample['text'][:100]}...")
`

cell 4

`import jiwer
from unsloth import FastVisionModel

model, tokenizer = FastVisionModel.from_pretrained(
model_name="/kaggle/input/models/avishauow/qwen-vl-3/pytorch/r16-bs4-ga1-lr2e-4-rtx3090gpu/1/sinhala_qwen3vl_ocr_lora",
load_in_4bit=True,
)
FastVisionModel.for_inference(model)
`
cell 5

`
image = dataset['test'][0]['image']
ref_text = dataset['test'][0]['text'] # ground truth reference
test_instruction = "Perform OCR on this image and extract all the text exactly as it appears."

messages = [
{"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": test_instruction}
]}
]

input_text = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=False,
)

print(type(input_text), input_text[:200])

inputs = tokenizer(
image,
input_text,
add_special_tokens=False,
return_tensors="pt",
).to("cuda")
`

#

cell 6

`
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)

output_ids = model.generate(
**inputs,
streamer = text_streamer,
max_new_tokens = 4096,
do_sample = False, # greedy — deterministic for evaluation
use_cache = True,
)

generated_text = tokenizer.decode(
output_ids[0][inputs['input_ids'].shape[1]:],
skip_special_tokens=True
).strip()

cer = jiwer.cer(ref_text.strip(), generated_text)
wer = jiwer.wer(ref_text.strip(), generated_text)

print("=" * 60)
print("GENERATED OCR OUTPUT:")
print("=" * 60)
print(generated_text)

print("\n" + "=" * 60)
print("REFERENCE TEXT:")
print("=" * 60)
print(ref_text.strip())

print("\n" + "=" * 60)
print("EVALUATION METRICS:")
print("=" * 60)
print(f"Character Error Rate (CER) : {cer:.4f} ({cer100:.2f}%)")
print(f"Word Error Rate (WER) : {wer:.4f} ({wer
100:.2f}%)")
print("=" * 60)
`