#Well well.. Open Claw paired with Gemma4 - alopexus

1 messages ยท Page 1 of 1 (latest)

severe copperBOT
dense arch
#

Nice! Also working with Gemma4 --- the ASR for direct speech input into LLM for an Avatars and G4 being a using a local Apache model --- a definite win.... will share it on the showcase when done with POC. Thanks for sharing this... inspires me to keep going.

summer bough
#

Hi, curious which gpu are you running gemma4 26b on

dense arch
#

I am running the 5090 as I have a lot of interest in Video processing and avatars... I use GGUF Version on LMStudio for my POCs

left lynx
# summer bough Hi, curious which gpu are you running gemma4 26b on

RTX 3090 Ti here. Approximately 115 response tokens per second via ollama.
In answer to the question below (there's a senseless timer on posts in this thread it seems?)
I went with the 26b param model for speed and context length. With the Ollama update yesterday, flash attention is enabled for gemma4, so I can get 200,000 tokens of context out of the 26b param model all within the 24GB envelope of the 3090 Ti for a super fast response token rate and plenty of context. The 31b param model overflows into RAM past about 3200 tokens currently, which makes it less than useful. @summer bough