#Tool calling LLM Help

1 messages · Page 1 of 1 (latest)

fiery lark
#

I have 4 H100 GPU with around 90GB of GPU memory and approximately 8TB of disk space each. .. I'm planning to run an LLM using vLLM and I can dedicate 1–2 H100s for this task depending on the model requirements. I need support for tool calling (function calling or agentic workflows). I’m looking for a model that delivers strong performance and reliable results. Any recommendations on which model would be the best fit?

fathom surge
fiery lark
devout rapids
#

otherwise RTX 3090 could just be sufficient

shell stag
#

I'm pretty sure Google's Gemini 2.5 Pro model (possibly other Gemini models as well) allows you to enable a feature called Function Calling, so you can "define functions that Gemini can call". I'm not sure if that's exactly what you're looking for, but it's a really cost-efficient AI (Input price of $1.25 per 1M tokens and output price of $10 per 1M tokens) with up to over a 1 million token input.

devout rapids
# fiery lark Yes

unless you fully care about privacy, consider gemini 2.5 pro as suggested above

exotic wharf
#

there are many option for a "tool call"

fiery lark
shell stag
fiery lark
shell stag
fiery lark
#

those actually doesnt support tool calling natively .. there are some hacks to make it work but i didnt feel them that good .. anyways thanks..

devout rapids
#

just try them and compare with such gemini 2.5 pro, claude, grok, etc.

#

you may not judge them as "good" but it may get things done with a portion of manual intervention

#

and you'll mostly end up selling your gpus worth more than a fancy EV