#Google (Gemini, Vertex)
1 messages · Page 1 of 1 (latest)
Had a lot of provider overload issues with gemini yesterday. Got an email too that gemini-3-pro-preview is decommisioned and 3.1-pro-preview is now what is tied to the "latest" free tier tag.
For those who dont know. An ai studio api key has a free tier of 90 days, where you get ca $300 in credits. It is worth trying for heartbeat (gemini2.5 flash) as it causes less calls per minute than as a primary choice.
can i use those $300 to use nano banana 2 via gemini?
The whole antigravity account banning thing, does this also apply to Gemini CLI auth?
The '3' flash lites are certainly a lot more expensive than the '2's https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-lite/
yep, can tell from experience.
I'm curious if anyone swtich to 3.1 flash lite from Kimi k2.5 allegedly almost same cost but improved performance.
Not tried 3.1 for anything yet...was going to try zero shot vlm but this is an interesting idea
https://skatebench.t3.gg/ according to this matrix 3.1 flash is cheaper and better than kimi so I wonder.
I'm not able to use gemini-3.1-flash-lite directly through Gemini AI Studio. Anyone facing same issue?
my api keys seem to keep getting blocked
gemini flash latest proved over and over to me that it was the slowest and most mistake/lack of info/bad info model. I just killed the agent running it yesterday, wont be going back.
Has anybody been able to use vertex ai api for models other than gemini?
i'm surprised by how far gemini api can take it when its safety filters are turned off. it was unhinged, it could swear, and did some things i can't mention here
How to jailbreak it?
you can turn off its filters https://ai.google.dev/gemini-api/docs/safety-settings
gemma 4 31b is amazing with openclaw, funny the best tool calling model from google is gemma, im using their api i think its free for now
Has anyone managed to setup Gemma 4 to work with openclaw yet ? I'm using lmstudio, it's really fast on my system but tools calling is not working properly for now. Anyone managed to get tools working ?
@steel niche I tried it tonight with Gemma-4-26b-a4b on my ASUS RTX 5090, latest LM Studio, Q4_0 K&V Cache. For reference, I've been running unsloth/Qwen3.5-27b without any issues -- and at max context length). I couldn't max out the context length with G4, so settled on 128k.
It seems stable and fast until you start having it do real work. I told an agent through Discord to do a simple text change on a page and after 5 minutes of 90-100% GPU I called it quits. I switched back to Qwen3.5-27b and it did the job in less than 2 mins. Also tested the unsloth versions and same problem.
The tests on that X link are great and all, but they're not testing in the token-heavy OC environment like we are.
Free you say? Tell me more 
Whee aru guys running your gemma 4?
folks, i often have multiple responses in a turn, like it gives me 5 responses to the same prompt in the same turn. anyone else have the same?
How did you run it? Mine is saying its not a model
How did you run the gemma 4, did you manual set and if so how
Need assistance running gemini-2.5-flash with reasoning enabled. I've tried editing my openclaw.json file and adding the "thinking": true under the agents.defaults.model.primary value. gateway didn't start. Any clue?
Thinking is not reasoning. It's completely different settings. Thinking is for smart vs fast responses. Reasoning is to debug model thought process.
Thinking:
minimal → “think”
low → “think hard”
medium → “think harder”
high → “ultrathink” (max budget)
xhigh → “ultrathink+” (GPT-5.2 + Codex models only)
Reasoning:
Levels: on|off|stream
anyone having the same issues - making it NOT have a default emoji really improves it and stopped the repetitions fo me
I have the same question. How can I use this model via google ai studio api key?
Is it possible to use a gemini subscription for openclaw in any way?
What's your feedback guys after testing gemma 4 for a week?
for my purposes, it "works" for the model tests I run, but I don't have a reason to switch to it (26b-a4b) from qwen3.5 (35b-a3b), it's about 20 tok/s slower (80ish vs 100ish). I'm using unsloth versions w/ llama.cpp, and you do need to keep up with llama.cpp updates, they are continuously fixing gemma4 issues, so maybe it'll get faster too, but b8770 build from yesterday did seem to be improved over the last time I ran with b8733. It did the same work with less requests, so maybe more efficient. gemma4-31b won't work for my model tests, context is limited to 40k for that one, and it's a little slower than qwen35-26b, but it may be fine for smaller tasks. So it's still early, inference engines are still fine tuning, so there will probably be improvements.
https://github.com/khaney64/llm-stuff/blob/main/model-test-report-2026-04-11.md