#Help on Choosing a Model for a Prompt-Processing.
7 messages · Page 1 of 1 (latest)
that's over 1 request per second?
even with a highly optimized setup, that is not possible within a single machine, you'll need to scale hozirontally (setup multiple instances running in parallel)
For the validator I'd try something like Qwen3 4B with reasoning enabled first
For the analyzer you'll need to use structured generation regardless of what else you try, but I highly suggest trying to simplify the data model first
Which model you use for it probably doesn't matters as much though
For both cases, try to get it working with only prompt engineering first, then once you identify the models limitations and failure cases fine-tune to cover up for it
no need to be 1 per seconds,
i need a model to can cover the validator and analyzer, i will run some tests to see if needs to fine tune it or no.
yes
can i have a hugging face link for this model?
the main is https://huggingface.co/Qwen/Qwen3-4B but there are also a lot of quantatizations (one offical, a lot of unoffical ones including unsloth's) - your milleage may vary depending on which quant you pick though, some of them are poorly supported
you can try some Guard model
keep in mind that moderating for generic safety and moderating for being on topic are two completely different things though