#about the RTX 4060 Ti 16gb
1 messages · Page 1 of 1 (latest)
You only need 12GB VRAM for 13B models with 4bit precision and you need 24GB if you need 30-34B models.
So 16GB is more than you need for 13B and less than you need for 30+B ¯_(ツ)_/¯.
The only notable thing that you can't properly load 14B models with 12GB VRAM but these are quite rare.
However you could use these extra 4GB if you want to extend model context with hacks like RoPE, you probably could squeeze 8192 tokens instead of 4096 out of 13B models.
It might be useful to run 13B model with Stable Diffusion at the same time, it's already quite possible even with 12GB card but it would work faster with 16GB since it would offload less to system RAM when swapping models.
It could be useful for advanced image generation with upscaling or animation.
It might be a more future-proof solution compared to 12GB cards.
You could also try to load 13B models at higher quality, as it fills almost entire VRAM at 4.65bpw with 12GB card. You might be able to use 5-6bpw instead with 16GB.
Also notable that there are pretty good Solar-10.7B models and with 12GB VRAM you almost can use 8bpw models, it goes slightly over memory limit. With 16GB it would load fully with 8bpw without any issues.
Overall, is it worth buying? It's currently only $450 on Amazon.
The thing is the low bandwidth this card has.
I have one. 13B runs comfortably and quickly. Unless you push it, offload to the cpu or get into playing with weights and different loaders you wont really notice the bandwidth. its better than what you have (or you wouldnt be looking) and you can get it now
i am using the 4060 ti 16gb, it is good works flawless only consumes 80 watts. or ingame max 135 watt
my pc assigns its own ram 50% of system ram to the card so even tho it might be a bit slower then, but it just works for some big questions!
the +4gb is useful for higher context, higher quant, running a tts/stt module along the LLM and the "low" bandwidth is not a real issue in those use case.
Still far better than offloading on cpu
I think the extreme value prop is 2x4060s vs 1 4090. Seems like a good choice