I have 4 H100 GPU with around 90GB of GPU memory and approximately 8TB of disk space each. .. I'm planning to run an LLM using vLLM and I can dedicate 1–2 H100s for this task depending on the model requirements. I need support for tool calling (function calling or agentic workflows). I’m looking for a model that delivers strong performance and reliable results. Any recommendations on which model would be the best fit?
#Tool calling LLM Help
1 messages · Page 1 of 1 (latest)
A server PC with 4 NVIDIA H100 GPUs and 8TB hard drives each or combined (RAID)? That must be expensive to own. 
My bad .. It's combined 8TB 😅... I was about to type each 90gb gpu ram and added storage also accidentally
do you really want to run such hundred bil param model?
otherwise RTX 3090 could just be sufficient
I'm pretty sure Google's Gemini 2.5 Pro model (possibly other Gemini models as well) allows you to enable a feature called Function Calling, so you can "define functions that Gemini can call". I'm not sure if that's exactly what you're looking for, but it's a really cost-efficient AI (Input price of $1.25 per 1M tokens and output price of $10 per 1M tokens) with up to over a 1 million token input.
unless you fully care about privacy, consider gemini 2.5 pro as suggested above
there are many option for a "tool call"
Yeah I am currently using gemini and also for some open ai as well .. but privacy is a concern and I wanna reduce my network latency to make these api calls and host something on our infra
Then you need to make a large multi-modal model using open source (or some from of copyright that will work for you) models from HuggingFace. That way you can make it local and private. I think that'll work for your H100's.
Yeah .. I tried some .. like qwen 32 b .. llama with tool calling ... But they weren't that good .. so can you suggest some which u feel might be better
Hmm. Sorry, I don't know any others that'll work. Maybe try Deepseek V3 or R1?
those actually doesnt support tool calling natively .. there are some hacks to make it work but i didnt feel them that good .. anyways thanks..
not even qwq 32b or any other 32b/70b models?
just try them and compare with such gemini 2.5 pro, claude, grok, etc.
you may not judge them as "good" but it may get things done with a portion of manual intervention
and you'll mostly end up selling your gpus worth more than a fancy EV