#Qwen/Qwen3-Omni-30B-A3B-Instruct
18 messages · Page 1 of 1 (latest)
With this models smaller cousin (Qwen/Qwen3-VL-30B-A3B-Thinking) I got away with these settings:
MODEL_MAX_LEN=131072
GPU_MEMORY_UTILIZATION=0.98
Let me know if that works 😄
oh thanks, but I need help with specifically the omni qwen3 multimodal which uses vllm-omni i believe... Qwen/Qwen3-VL-30B-A3B-Thinking isn't omni with stages.
does the regular vllm serverless template (https://console.runpod.io/hub/runpod-workers/worker-vllm) support omni multimodal? i need to input text and audio and output text. i can ignore video/images to save ram
i think the worker handler the vllm template is based on needs to be made omni compatible... i tried having claude vibe code it for me, but i don't know what i'm doing when it comes to this inference stuff. i've burned through $14 of time just spinning up tests and failing lol...
?
With all due respect I have no idea (if the worker version supports the omni models), I can go find out though. Just a moment.
yes please... ask who ever you need to. thanks 🙏
I made the pod template version but they're very different 😛
I'm just gonna run it myself, one sec
i've been trying on and off for a day and a half now lol... she's not easy to run and zero out there on the net about anyone else doing so either... just full blown instances which i've already got working before... but I don't want the overhead... just want serverless
I get what you mean, and tuning vllm to a specific model sucks a lot it took me maybe half an hour to get the settings I sent earlier for a much smaller model.
vLLM has been working on "Recipes" but it doesn't include this model yet https://docs.vllm.ai/projects/recipes/en/latest/index.html
Maybe useful to check in the future? But trust me I know it takes a lot of back and forward
yea, i figured november til now they would have more out there... qwen3 omni is new but its been months now... wth
i know there is this... https://docs.vllm.ai/projects/vllm-omni/en/latest/ ... thats what I needed to use when i ran it locally on my 3080ti gaming rig
Efficient omni-modality model serving for everyone
and not sure if runpod's vllm is omni capable... it has a --omni cli arg... can i just have an env var as OMNI=1 or something to enable omni? cause as it sits i had to fork the worker-vllm github project and convert to using vllm-omni so it would launch vllm in omni mode properly