#Well well.. Open Claw paired with Gemma4 - alopexus
1 messages ยท Page 1 of 1 (latest)
Nice! Also working with Gemma4 --- the ASR for direct speech input into LLM for an Avatars and G4 being a using a local Apache model --- a definite win.... will share it on the showcase when done with POC. Thanks for sharing this... inspires me to keep going.
Hi, curious which gpu are you running gemma4 26b on
I am running the 5090 as I have a lot of interest in Video processing and avatars... I use GGUF Version on LMStudio for my POCs
RTX 3090 Ti here. Approximately 115 response tokens per second via ollama.
In answer to the question below (there's a senseless timer on posts in this thread it seems?)
I went with the 26b param model for speed and context length. With the Ollama update yesterday, flash attention is enabled for gemma4, so I can get 200,000 tokens of context out of the 26b param model all within the 24GB envelope of the 3090 Ti for a super fast response token rate and plenty of context. The 31b param model overflows into RAM past about 3200 tokens currently, which makes it less than useful. @summer bough