#Finding ideal GPU for local development

7 messages · Page 1 of 1 (latest)

onyx basin
#

Currently i am using 1660 Super (6GB), Ryzen 3500, 16GB RAM. It seems struggle running RAG. At moment i am only planning to upgrade my GPU, as RAM price nowadays is not reasonable.
For context i am using these :
BAAI/bge-small-en-v1.5
BAAI/bge-reranker-base
PaddleOCR (running under CPU)
meta-llama/Llama-3.2-3B-Instruct

i saw the usage on server, it exceed my local GPU (took over around 7-8 GBs) and also i am planning to expand its usage towards LLava / something similar for image / table / chart parsing

Should i take 3060 12 GB or 5060 8GB ? or maybe 5060 ti 16GB ? is it worth ? as budget kinda tight

muted spade
#

You are going to struggle running a 3B parameter model liked Llama 3.2 3B if you are using pure python (transformers + torch + ...). That model won't even run on my GPU for finetuning or inference (I have a 3060 at 12GB VRAM). You should use a quantized gguf version of that model with something like ollama if you are serious about running it.

In this game, VRAM is the main bottleneck. So if you currently have 8GB VRAM on your card and you buy another card with 8 GB VRAM, you have made no gains.

onyx basin
#

ah i see, i probably i should expand the budget to afford at least 5060 ti

#

how about llava ? does it run under 16GB ? (for development purposes only)

muted spade
#

Again, if you are using python-only, probably not. If you use a gguf quantized copy of the model, maybe (depending on the number of parameters and RAM/VRAM)

#

See what the largest model (in terms of parameters) you can run for inference is on your machine with ollama.

#

You can hook up the ollama instance to your RAG and run it through there. If you intend to finetune the LLM, you will need significantly more resources.