#How much GPU vram is actually used for whisper small & medium en models ?

1 messages · Page 1 of 1 (latest)

elfin mason
#

Hello All, just setting my hardware - Lenovo P350 tiny (i5 10th gen), and planning to use either a Yeston 3050 LP 6GB or Modded RTX A2000 LP 6 GB GPU. The issue is I want to use Qwen2.5 3B LLM for conversation assistant at Q6_K_L that alone requires 2.5. GB.

So if intend to use a Wyoming-whisper medium-int8, I have read it consumes atleast model 4-5GB vram. So scratching my head here as I really cannot afford more budget for a better GPU.

Could anyone running Wyoming Whisper on GPU can please let me know how much vram are the small and mediums models consuming on your GPU ? Also any idea on how much piper is consuming if you are running too on GPU ?

mellow blaze
#

I'm running on GPU but run large-v3, only takes around 3.3GB:

mellow blaze
#

Also may be worth noting, not sure how well qwen2.5:3b will do, on the leaderboards the 7b model only hits around 80-85% accuracy, the 3b will probably be much less, so it may not be practical for day-to-day use for home control. If you just want to use it for some Q&A it's probably ok, though it might give some funny answers 😄

elfin mason
#

Interesting ! Seems like among the top performing only few of them are available for free and through OLLAMA ! which one you are using and wonder what GPU you are using ?

mellow blaze
#

I tried Qwen2.5:7b @ q4 with a 32768 context, and it worked ok, but struggled with some more advanced commands or instructions, so I am presently back to using OpenAI until I can get a GPU that allows running a more robust model or running the 14b model with high context and quant. Using a GTX 4060ti in a k8s cluster.

elfin mason
#

.

mellow blaze
#

Yeah

elfin mason
#

K8s cluster - more interesting ? Are u using multiple nodes each with a 4060 TI ?

mellow blaze
#

no, just a worker and a control plane node, running in a large Proxmox server.

#

Just find it easier to manage my pods and ingress and such with k8s since I am used to it 😄

elfin mason
#

Ok. Assume yours is 16GB vram - can you try above OLLAMA models I posted above ! Just scratching my head how the hell they are available for free form local use or if Ollama is just connecting to their paid APIs ?

mellow blaze
#

so first model is a vision model only, not the full gpt-40-mini, it's actually a distillation model it seems

#

dunno about that second one

#

with LLMs best thing to do is try and see 🙂

#

not sure about these models, there's no info really

elfin mason
#

Yeah that’s the point ! No info at all ! I am still setting up my servers and procuring my GPUs before I can try them

#

How about the new kid on the block - Deepseek R1 . I don’t see it in the leaderboard charts

mellow blaze
#

I've had some friends try it with mixed results.

#

for the local/distill models, it's a bit hit or miss, depends on the environment, things exposed, etc

elfin mason
#

Ok good to know that .