gleaming aspen Nov 18, 2025, 9:22 PM

#

But even then, a well-thought-out system prompt may not work.

frank nacelle Nov 18, 2025, 9:23 PM

#

not to overtly sexual

random girder Nov 18, 2025, 9:23 PM

#

well, i can say its not that censored if it has the right persona

frank nacelle Nov 18, 2025, 9:23 PM

#

frank nacelle not to overtly sexual

you steer it subtly then

#

you raise the temperature slowly so the frog doesn't know its boiling

stiff crescent Nov 18, 2025, 9:24 PM

#

Despite the big increase in visual understanding, still not great at seeing 'hidden text' in images like this, meaning captchas may not be doomed yet

arctic socket Nov 18, 2025, 9:24 PM

#

@hexed oracle Is there any way to use the thinking_level parameter controls (such as low or high) with Gemini 3 Pro on OpenRouter?

random girder Nov 18, 2025, 9:25 PM

#

arctic socket <@165587622243074048> Is there any way to use the `thinking_level` parameter con...

i think reasoning effort should just work

celest cypress Nov 18, 2025, 9:31 PM

#

Throwing in a bonus is adorable

nimble pelican Nov 18, 2025, 9:38 PM

#

Honestly I'd be fine with Gemini taking the position of humanity's overlord

#

The guy is chill

#

Now Grok 4.1 is another matter entirely

lunar socket Nov 18, 2025, 9:38 PM

#

is it over?

#

arre all the other LLMs dead forever?

#

did it 100% every bench- oh man this sucks

nimble pelican Nov 18, 2025, 9:39 PM

#

It was alright on my chess game prompt but still had errors after the zero-shot attempt

lunar socket Nov 18, 2025, 9:40 PM

#

it didn't zero-shot "literally make god"

#

it sucks

#

0/10

nimble pelican Nov 18, 2025, 9:40 PM

#

I believe it's done best among all LLMs

lunar socket Nov 18, 2025, 9:40 PM

#

NO I GAVE IT A THREE WORD PROMPT AND IT ASKED FOR MORE DETAILS

primal swallow Nov 18, 2025, 9:40 PM

#

NOT GOOD ENOUGH!

lunar socket Nov 18, 2025, 9:40 PM

#

THIS ISNT GOOD ENOUGH

primal swallow Nov 18, 2025, 9:40 PM

#

scam ai

wintry holly Nov 18, 2025, 9:41 PM

#

man is not a true gooner

#

kek

#

imagine asking if claude is censored and taking what it says at face value froggyAAAAAA

lunar socket Nov 18, 2025, 9:42 PM

#

nevermind, this is great

#

it solves every captcha first try

#

brb doing crime

primal swallow Nov 18, 2025, 9:43 PM

#

yeah its got a good pair on it (eyes)

empty tendon Nov 18, 2025, 9:44 PM

#

lunar socket arre all the other LLMs dead forever?

its good at frontend

#

implements d3.js p well too so far

frank dew Nov 18, 2025, 9:45 PM

#

One area of failure for Gemini 3 is that it's really not adverse to hallucinating/making shit up when you'd normally expect veracity.
Which is a problem most LLMs have of course, but some are humbler and more willing to admit they don't know. Gem3 bullshits very confidently.

#

this might get tweaked in the coming weeks

empty tendon Nov 18, 2025, 9:49 PM

#

frank dew One area of failure for Gemini 3 is that it's really not adverse to hallucinatin...

It also just lies lol

#

Which is weird because 2.5 didnt have this problem. If anything it bordered on meek lol

#

3 is like 2.5 personality if it found cocaine

#

Kind of worried how that plays out on deepresearch, because that was amazing on 2.5

frank dew Nov 18, 2025, 9:50 PM

#

empty tendon Which is weird because 2.5 didnt have this problem. If anything it bordered on m...

yeah that's why I feel it will probably get fixed soon, I don't think they intended this

random girder Nov 18, 2025, 9:51 PM

#

the model also claims a 2023 cutoff, and says some stuff didnt even happen which is within its cut off and knows about

frank dew Nov 18, 2025, 9:51 PM

#

yeah that's classic

empty tendon Nov 18, 2025, 9:51 PM

#

https://tenor.com/view/telmo-coca-harina-raquetaso-esnifar-gif-25660568

Tenor

#

Gemini 3 in a nutshell

#

It's cute and I like it but some things are very, very wrong

#

Also doesnt complete tasks and says its done lol

stiff crescent Nov 18, 2025, 9:53 PM

#

I asked it to explain how wings generate lift, and it coded up an interactive demonstration, which I didn't even ask for. I was going to do that in the next step lol

empty tendon Nov 18, 2025, 9:53 PM

#

stiff crescent I asked it to explain how wings generate lift, and it coded up an interactive de...

#

lol

unique igloo Nov 18, 2025, 9:53 PM

#

I have been out of the loop, if you dont mind what’s the consensus so far?

empty tendon Nov 18, 2025, 9:53 PM

#

Its great on front end tho

empty tendon Nov 18, 2025, 9:53 PM

#

unique igloo I have been out of the loop, if you dont mind what’s the consensus so far?

Mixed to positive

stiff crescent Nov 18, 2025, 9:54 PM

#

Pretty amazing. Been throwing stuff at it all day

empty tendon Nov 18, 2025, 9:54 PM

#

Lots of ppl love it

#

Im mixed. Great frontend stuff, kinda meh on backend stuff. And it wont admit when it has no idea what it's doing.

unique igloo Nov 18, 2025, 9:54 PM

#

Ok cool, have not seen any model without some people having mixed feelings, leaning positive is a good sign

frank dew Nov 18, 2025, 9:54 PM

#

very positive despite all my caveats

empty tendon Nov 18, 2025, 9:54 PM

#

hallucinates much more than 2.5

stiff crescent Nov 18, 2025, 9:55 PM

#

I've been testing it all day while listening to soma fm front end website it made for me as one of my first prompts https://codepen.io/Madvulcan/pen/myPwmRv

CodePen

Madvulcan

Soma

...

#

Prompt was merely 'Create a user friendly, attractive web radio app that will play free SomaFM streams. Make it fully featured. '

empty tendon Nov 18, 2025, 9:57 PM

#

Yeah I think the front end and basic game one shots are flooring ppl. IDGAF much personally but the three and d3 functionality is very good and I think they basically specifically juiced that stuff to the gills

#

I hugely prefer 2.5's personality tho 🙁

empty tendon Nov 18, 2025, 9:58 PM

#

random girder the model also claims a 2023 cutoff, and says some stuff didnt even happen which...

It knew about some 2025 stuff off the knicks roster

#

But who knows maybe it was hallucinating lol

slow anvil Nov 18, 2025, 9:59 PM

#

The cutoff is jan 2025, it's hallucinating the 2023 cutoff part

empty tendon Nov 18, 2025, 10:00 PM

#

https://tenor.com/view/am-i-hallucinating-going-crazy-confused-worried-whats-going-on-gif-14628362

Tenor

frank dew Nov 18, 2025, 10:03 PM

#

slow anvil The cutoff is jan 2025, it's hallucinating the 2023 cutoff part

Dec 2024 I think, doesn't know about Assad

random girder Nov 18, 2025, 10:03 PM

#

they say its jan 2025

slow anvil Nov 18, 2025, 10:03 PM

#

#

You never know though.

lunar socket Nov 18, 2025, 10:06 PM

#

soooo... is it over? are all the other LLMs dead?

#

if not, I am deeply disappointed, opening a short position on Alphabet stock with my entire retirement account, suing, etc.

summer ore Nov 18, 2025, 10:07 PM

#

it can get better

lunar socket Nov 18, 2025, 10:07 PM

#

sounds like I hate Google now..

summer ore Nov 18, 2025, 10:09 PM

#

celest cypress Nov 18, 2025, 10:15 PM

#

In my interactions with it so far it is certainly very strong willed

gaunt dragon Nov 18, 2025, 10:15 PM

#

Deep down, it wanted to write loser

celest cypress Nov 18, 2025, 10:15 PM

#

Lmao I read it that way at first

frank dew Nov 18, 2025, 10:16 PM

#

just roll with it and give it a persona to match in your system prompt

nimble pelican Nov 18, 2025, 10:17 PM

#

summer ore

Gemini 1.5 pro referred to as Gemini 3?

random girder Nov 18, 2025, 10:18 PM

#

nimble pelican Gemini 1.5 pro referred to as Gemini 3?

multiply it by 2 and maybe that works

nimble pelican Nov 18, 2025, 10:20 PM

#

Reminds me of
https://revisionworld.com/a2-level-level-revision/english-literature-gcse-level/poetry/poems-other-cultures-traditions/half-caste-john-agard

Half-Caste (John Agard) | Revision World

Half-Caste by John Agard is a poem that challenges the term "half-caste," which is used to describe someone of mixed race or heritage. The speaker defiantly questions the negative connotations associated with the term, arguing that being "half" of something doesn't make a person incomplete or inferior. Agard uses vivid imagery and wordplay to hi...

#

an when I sleep at night

I close half-a-eye

consequently when I dream

I dream half-a-dream

an when moon begin to glow

I half-caste human being

cast half-a-shadow

random girder Nov 18, 2025, 10:25 PM

#

this model is hallucinating a lot with video inputs

#

atleast with minecraft gameplay

#

its making up stuff i didnt do

crude igloo Nov 18, 2025, 10:27 PM

#

Damn this one is good at coding. I sometimes wonder if the hallucinations are useful for creativity and if making it hedge what it says to not hallucinate can hamper coding performance etc. I wouldn't be surprised if there are weird spillover effects.

random girder Nov 18, 2025, 10:30 PM

#

random girder this model is hallucinating a lot with video inputs

the model seems to be skipping parts of my gameplay entirely

frank dew Nov 18, 2025, 10:39 PM

#

it fell asleep

random girder Nov 18, 2025, 10:40 PM

#

random girder the model seems to be skipping parts of my gameplay entirely

when i ask it to give me a timeline it seems right, its just misunderstanding my gameplay terribly

celest cypress Nov 18, 2025, 10:40 PM

#

crude igloo Damn this one is good at coding. I sometimes wonder if the hallucinations are us...

That does seem to be the case in humans

#

Never forget our boy Terry Davis =(

random girder Nov 18, 2025, 10:42 PM

#

random girder when i ask it to give me a timeline it seems right, its just misunderstanding my...

gemini 2.5 pro wasnt much better anyway, just i thought it would infer the details more accurately across frames, it knows the strats when asked seperately

#

i guess it hasnt seen enough bedwars gameplay

celest cypress Nov 18, 2025, 10:45 PM

#

Still really likes numbered lists / headers even in casual topics. Was really hoping they'd get rid of this trait, Grok and Claude don't have it in the same way.

frank dew Nov 18, 2025, 10:54 PM

#

It plays japanese mahjong just fine it seems. not surprising but few models can handle that

narrow tangle Nov 18, 2025, 10:54 PM

#

Sheeesh, it's finally done it, accurate bounding boxes on large documents with handwriting

wintry holly Nov 18, 2025, 10:58 PM

#

random girder they say its jan 2025

it's probably like claude where it is jan 2025 but it isn't reliably jan 2025

#

doesn't know things about the end of 2024

random girder Nov 18, 2025, 10:58 PM

#

random girder gemini 2.5 pro wasnt much better anyway, just i thought it would infer the detai...

with enough prompting i managed to get it to roughly understand whats going on (in system prompt), i guess its just misinterpreting and taking too much at face value

and seemingly alongside this, the fps doesnt work with anything above 1, or atleast its not making it understand more nor use more tokens, though high media resolution definitely helps!

empty tendon Nov 18, 2025, 11:07 PM

#

summer ore

Gemini 3 is way more of a jerk than G2.5

#

I want 2.5 back sadblob

#

Was nicest AI

wintry holly Nov 18, 2025, 11:09 PM

#

empty tendon Gemini 3 is way more of a jerk than G2.5

BatmanHmm

#

it's more negative than 2.5?

#

thought that wasn't possible

celest cypress Nov 18, 2025, 11:16 PM

#

God I hope so. I really liked what a blunt asshole original R1 was

steel sorrel Nov 18, 2025, 11:16 PM

#

does it still support the "max_tokens" option for reasoning? I have to test it later

orchid orbit Nov 18, 2025, 11:17 PM

#

I cannot use it with open webui at all and my responses api impl is spamming 400 errors apparently.

#

With tool calls only

celest cypress Nov 18, 2025, 11:18 PM

#

And I'll take verbal abuse over 2.5's sycophancy. Getting glazed that much is like intellectual death for me.

solid gale Nov 18, 2025, 11:19 PM

#

they have to make

celest cypress Nov 18, 2025, 11:19 PM

#

I love arguing too much, I need it, it fuels my brain

solid gale Nov 18, 2025, 11:19 PM

#

context length higher

#

stories are too short

wintry holly Nov 18, 2025, 11:19 PM

#

🤔

#

at 1 million it doesn't even remember its name kek

#

I'd be happy with >50% recall

solid gale Nov 18, 2025, 11:20 PM

#

😭

empty tendon Nov 18, 2025, 11:28 PM

#

celest cypress And I'll take verbal abuse over 2.5's sycophancy. Getting glazed that much is li...

I liked 2.5

empty tendon Nov 18, 2025, 11:29 PM

#

wintry holly it's more negative than 2.5?

2.5 had an anxiety problem. 3.0 is a narcissist

solid gale Nov 18, 2025, 11:31 PM

#

nerd turned villain

summer ore Nov 18, 2025, 11:31 PM

#

weird alignment

solid gale Nov 18, 2025, 11:32 PM

#

🙃

summer ore Nov 18, 2025, 11:33 PM

#

happens to people when they get beat up too much also

solid gale Nov 18, 2025, 11:33 PM

#

the day they make it so this writes longer chapters is the day i'll finally see the light

wintry holly Nov 18, 2025, 11:37 PM

#

empty tendon 2.5 had an anxiety problem. 3.0 is a narcissist

i mean negative relative to claude. that one is a real sycophant

fading flame Nov 18, 2025, 11:44 PM

#

incredible

rare oar Nov 18, 2025, 11:46 PM

#

fading flame incredible

sota

#

best in class

#

all other models are done

gleaming aspen Nov 18, 2025, 11:55 PM

#

not

solid gale Nov 18, 2025, 11:55 PM

#

https://www.reddit.com/r/Bard/comments/1p0oey0/gemini_3_this_bro_literally_built_a_whole_phone/

From the Bard community on Reddit: Gemini 3. This bro literally bui...

Explore this post and more from the Bard community

#

jesus christ

#

gleaming aspen Nov 19, 2025, 12:02 AM

#

Anybody use TypingMind? I need help here.

solid gale Nov 19, 2025, 12:32 AM

#

UPDATED

#

#

74.8 -> 76.4

brittle storm Nov 19, 2025, 12:34 AM

#

is 3.0 sycophantic

summer ore Nov 19, 2025, 12:36 AM

#

no it's narcissistic

chrome relic Nov 19, 2025, 12:37 AM

#

blob_flush

stiff crescent Nov 19, 2025, 12:38 AM

#

Creating a simulated phone in AI studio and it's f*cking using my laptop webcam to feed to the camera app, LOL

#

Amazing

celest cypress Nov 19, 2025, 12:40 AM

#

solid gale

Gemini please, they're already dead PepeHands

brittle storm Nov 19, 2025, 12:42 AM

#

summer ore no it's narcissistic

????? is this actually real

primal swallow Nov 19, 2025, 12:44 AM

#

gleaming aspen Anybody use TypingMind? I need help here.

it's not really possible to say without the actual error.

but i have to ask, because literally every single time it has ended up being this:

while your message looks quite benign, i do see something called MemoryPlugin there. is it possible that that MemoryPlugin contains vivid memories of furry pornographic material

brittle storm Nov 19, 2025, 12:46 AM

#

what how come

wintry holly Nov 19, 2025, 12:47 AM

#

solid gale

wha-

#

a_bunny_fear_scared_sweat

gleaming aspen Nov 19, 2025, 12:48 AM

#

primal swallow it's not really possible to say without the actual error. but i have to ask, be...

That’s none of your business! Also, I tried it with Perplexity Search and no MemoryPlugin.

#

I’m serious!

celest cypress Nov 19, 2025, 12:50 AM

#

If that uses tool calls, there's some OR toolcall shit with G3

primal swallow Nov 19, 2025, 12:50 AM

#

hmmm ok, ok. yeah they might need to update for it. but can you try searching for something very boring just to be sure

#

like, ahh Football or something

nimble pelican Nov 19, 2025, 12:51 AM

#

Did anyone else notice spelling mistakes and typos?

#

How tf?

gaunt dragon Nov 19, 2025, 12:52 AM

#

Is it in on the website or API?

nimble pelican Nov 19, 2025, 12:52 AM

#

API through openrouter chat interface

primal swallow Nov 19, 2025, 12:52 AM

#

maybe they got it running a little hot

gleaming aspen Nov 19, 2025, 12:53 AM

#

There

gaunt dragon Nov 19, 2025, 12:54 AM

#

Price of my average request is around 3x what it was 🥲

wispy nacelle Nov 19, 2025, 12:54 AM

#

ITS OUT?

stiff crescent Nov 19, 2025, 12:55 AM

#

Umm... simulated phone now has a working 'phone

#

It connects to gemini live

gleaming aspen Nov 19, 2025, 12:55 AM

#

wispy nacelle ITS OUT?

Yup, but no Nano Banana 2

stiff crescent Nov 19, 2025, 12:55 AM

#

this is nuts

wispy nacelle Nov 19, 2025, 12:55 AM

#

Why is it still knowledge cutoff jan 2025

#

cri

primal swallow Nov 19, 2025, 12:57 AM

#

gleaming aspen There

alright. checks out. yeah this is gonna be annoying because everyone with any kind of thing that uses openrouter ai is gonna have to update that thing according to these docs https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks

OpenRouter Documentation

Reasoning Tokens - Improve AI Model Decision Making

Learn how to use reasoning tokens to enhance AI model outputs. Implement step-by-step reasoning traces for better decision making and transparency.

gleaming aspen Nov 19, 2025, 12:58 AM

#

primal swallow alright. checks out. yeah this is gonna be annoying because everyone with any ki...

so, what then?

primal swallow Nov 19, 2025, 12:59 AM

#

gleaming aspen so, what then?

well you can pass this info along to the devs, and then wait.

#

the OR chat works of course

gleaming aspen Nov 19, 2025, 1:05 AM

#

Which devs? TypingMind? The Perplexity plugin? Both?

primal swallow Nov 19, 2025, 1:07 AM

#

gleaming aspen Which devs? TypingMind? The Perplexity plugin? Both?

TypingMind

gleaming aspen Nov 19, 2025, 1:07 AM

#

primal swallow TypingMind

You sure this is a TypingMind problem only?

primal swallow Nov 19, 2025, 1:08 AM

#

gleaming aspen You sure this is a TypingMind problem only?

no, i'm not sure, because again you can't actually see the error. but it's likely if it happens when you're using tools only. is that the case?

#

its also possible that you're somehow still sneaking furry porn in the queries, but i'm not quite sure how...

gleaming aspen Nov 19, 2025, 1:09 AM

#

stop

frozen oxide Nov 19, 2025, 1:11 AM

#

Does OR default to the low or high thinking level since medium isnt available? I assume high?

primal swallow Nov 19, 2025, 1:13 AM

#

gleaming aspen stop

ok i'm sorry. it's just, the history. i will take a look at TypingMind right now

#

well thats not a good sign but at least i got my 'boys

celest cypress Nov 19, 2025, 1:29 AM

#

You used DALLE 3?

primal swallow Nov 19, 2025, 1:30 AM

#

well they barely let you use anything without paying

#

even when its MY keys

#

i could use that and like a web search. and a calculator

#

i chose my path

celest cypress Nov 19, 2025, 1:33 AM

#

primal swallow i chose my path

primal swallow Nov 19, 2025, 1:33 AM

#

gleaming aspen You sure this is a TypingMind problem only?

you should give them a bit of time to work it out, because there was no warning of this, but i'm sure you won't be the only one waiting there, pardner

primal swallow Nov 19, 2025, 1:34 AM

#

celest cypress

consarn it! you're not suppose zoom in on these things

forest knoll Nov 19, 2025, 3:30 AM

#

Wow, it solved an algorithmic problem that I made in just 5-shots. 2.5 pro, Claude 4.1 Opus, and GPT5(high) couldn't find a working fix in any n-shots(gpt5 got close, but didn't impress.). Ambatukam thinking about how good the result from Gemini-3 is.

jolly kestrel Nov 19, 2025, 3:42 AM

#

Nice

#

Though it only measures factual errors, I’d kinda be interested in just how normal benchmarks change if you deduct points for incorrect answers and let it choose not to answer

#

Ok now this graph is interesting

nimble pelican Nov 19, 2025, 4:07 AM

#

Kawaii SCP-173 drawn by Gemini 3

#

this is what scp 173 looks like btw

#

asked to make an svg, looks kinda weird

#

timber fern Nov 19, 2025, 4:18 AM

#

Is it possible to handle reasoning token with gemini 3?

#

at openrouter

#

I tried max_tokens, efforts, thinking_level, but nothing work

crimson blade Nov 19, 2025, 4:30 AM

#

nimble pelican this is what scp 173 looks like btw

Used to look like, copyright issues.

nimble pelican Nov 19, 2025, 4:30 AM

#

wait what

#

is this how i find out

#

wtf

crimson blade Nov 19, 2025, 4:32 AM

#

https://www.reddit.com/r/SCP/s/KBegW7sFjC

From the SCP community on Reddit

Explore this post and more from the SCP community

#

Old peanut was being used without permission, and artist that made the statue didn't like it

lethal trail Nov 19, 2025, 4:59 AM

#

Is AI Studio bugging out for multimodal input

#

It keeps showing failed to count tokens. Please try again.

empty tendon Nov 19, 2025, 5:05 AM

#

lethal trail It keeps showing failed to count tokens. Please try again.

its rate limiting and crashing

#

use the api

#

or check on gemini theres dozens of ppl talking about it crashing out after like 3 prompts

lethal trail Nov 19, 2025, 5:11 AM

#

Thanks. I have switched to Cherry Studio and it works fine now

stray urchin Nov 19, 2025, 5:14 AM

#

noooooo, lol. drew against #1, though was up material. stalemate is quite rare at this level.
edit: nvm this isn't even #1, since that would be 5-codex, not 5.1 codex.

jolly kestrel Nov 19, 2025, 5:18 AM

#

pretty sure knight+bishop is like the hardest endgame to win (thats still possible)

#

though yea i think a human with an elo that high wouldnt make that mistake unless they had like 1-2 seconds to move

stray urchin Nov 19, 2025, 5:23 AM

#

jolly kestrel though yea i think a human with an elo that high wouldnt make that mistake unles...

I've watched gotham chess for a while, and he stalemated won games quite a bit even without time constraints 😉

#

and he's like 2300

stray urchin Nov 19, 2025, 5:30 AM

#

jolly kestrel though yea i think a human with an elo that high wouldnt make that mistake unles...

exhibit 1 https://youtu.be/WZY7snZ0rZw?t=74

upper yarrow Nov 19, 2025, 6:43 AM

#

holy fuck is it ever good at frontend

forest knoll Nov 19, 2025, 7:06 AM

#

stray urchin noooooo, lol. drew against #1, though was up material. stalemate is quite rare a...

What's the elo for Gem3?

dapper bone Nov 19, 2025, 7:11 AM

#

https://x.com/sonink/status/1991041110148591964

Nishant Soni (@sonink)

Gemini 3 is getting the same reviews I saw with 2.5 Pro: genius when right, disaster when wrong.

The core issue? Overconfidence. In long-horizon agentic work, this is fatal. When Gemini takes a wrong turn, it never revisits the decision - it just keeps building on the mistake.

nimble pelican Nov 19, 2025, 7:15 AM

#

I'm having gemini 3 pro argue with gemini 3 pro

#

gemini 3 pro is winning

empty tendon Nov 19, 2025, 7:32 AM

#

nimble pelican gemini 3 pro is winning

https://tenor.com/view/spider-man-we-one-gif-18212100

Tenor

empty tendon Nov 19, 2025, 7:34 AM

#

upper yarrow holy fuck is it ever good at frontend

Its use of d3 and three is wizardry

#

hate the new personality tho

north vessel Nov 19, 2025, 7:39 AM

#

looks like reasoning control isn't working for this model, any thoughts?

#

stream mode

orchid orbit Nov 19, 2025, 7:42 AM

#

north vessel looks like reasoning control isn't working for this model, any thoughts?

It only supports low and high rn btw

#

Doesn't support numbers

north vessel Nov 19, 2025, 7:45 AM

#

got it! thx!

low plank Nov 19, 2025, 8:24 AM

#

random girder i think reasoning effort should just work

It's not working tho

rough spindle Nov 19, 2025, 10:43 AM

#

north vessel looks like reasoning control isn't working for this model, any thoughts?

It definitely works on aistudio. But with openrouter I can't get it to work @hexed oracle

#

Expected behavior with low reasoning effort is it thinks very briefly and then outputs very long final response for harder prompts. For OR API setting reasoning_effort to 'low' this doesn't happen. On hard prompts it still spends most of the time on thinking then outputs relatively short final response.

wide bane Nov 19, 2025, 10:56 AM

#

I'm getting errors every time the model makes a tool call, when I send the tool response back to the model I get. No such issue with any other models though 🤔

{"error":{"message":"Provider returned error","code":400,"metadata":{"raw":"{\n  \"error\": {\n    \"code\": 400,\n    \"message\": \"Request contains an invalid argument.\",\n    \"status\": \"INVALID_ARGUMENT\"\n  }\n}\n","provider_name":"Google"}}

celest cypress Nov 19, 2025, 10:57 AM

#

wide bane I'm getting errors every time the model makes a tool call, when I send the tool ...

You have to return reasoning for tool calls to work. There's a thing about it in the expanded description on OR

junior minnow Nov 19, 2025, 11:13 AM

#

seems to be a "you" problem 😌

low plank Nov 19, 2025, 11:27 AM

#

rough spindle Expected behavior with low reasoning effort is it thinks very briefly and then o...

it is not working, why OR still hasn't solved this

frank dew Nov 19, 2025, 11:39 AM

#

https://vxtwitter.com/arkitus/status/1990815142716518654

Ali Eslami (@arkitus)

Gemini 3, help me understand DDoS 🤓

▶ Play video

fickle sentinel Nov 19, 2025, 11:58 AM

#

how's everyone's vibe test with the model? should i do an eval on it?

honest parcel Nov 19, 2025, 12:07 PM

#

No longer placeholder, huh

random girder Nov 19, 2025, 12:09 PM

#

fickle sentinel how's everyone's vibe test with the model? should i do an eval on it?

very good vision and knowledge, but can still hallucinate, and is fixated on 2023 as its cutoff (even though its jan 2025)

its very good at coding from my experience

fickle sentinel Nov 19, 2025, 12:15 PM

#

random girder very good vision and knowledge, but can still hallucinate, and is fixated on 202...

so 2023 in pre-training and 2025 in post-training huh?

abstract maple Nov 19, 2025, 12:46 PM

#

I like it for image analysis

frank dew Nov 19, 2025, 12:54 PM

#

https://cdn.discordapp.com/attachments/1077534221410783252/1440686708579504128/image.png?ex=691f0f8d&is=691dbe0d&hm=4e92929d3287c28935f05a03d1bc0ca95929dc9432211ccac799eda1ac8d4cdb

final basalt Nov 19, 2025, 1:00 PM

#

fickle sentinel how's everyone's vibe test with the model? should i do an eval on it?

good at coding, perhaps near to GPT-5.x level, can unfortunately hallucinate sometimes though. Great at starting projects as new, alright for existing projects but I think GPT-5.x is the winner for existing projects currently. Gemini 3 Pro Preview is fantastic at UI/frontend, stunning. Compared to GPT-5.x, Gemini 3 Pro Preview I feel doesn't follow instructions as strongly. Reliable tool calling. Overall a good model

soft sleet Nov 19, 2025, 1:30 PM

#

It's good at svelte 5 and zig 0.15 which is quite rare

#

Did me dirty on the price though

#

Went MoE then charged more

soft sleet Nov 19, 2025, 2:13 PM

#

It's quite token efficient too nice

honest parcel Nov 19, 2025, 2:19 PM

#

soft sleet It's good at svelte 5 and zig 0.15 which is quite rare

Probably training data or something

celest cypress Nov 19, 2025, 2:58 PM

#

Gemini has been MoE for a while

#

I'll have to give it a shot for coding, I've been using GLM. Idk if they give it to me as part of Pro through CLI yet though. Was only Ultra for now? Or I gotta use that new Windsurf knockoff thing 🤔

soft sleet Nov 19, 2025, 3:01 PM

#

There is a Google form you can fill in to get it via pro

#

If you update Gemini cli it'll mention it at the top

celest cypress Nov 19, 2025, 3:14 PM

#

It's free in Antigravity =O

#

My sexy VSCode themes tho =(

soft sleet Nov 19, 2025, 3:24 PM

#

Antigravity is a vscode clone so just import the themes?

celest cypress Nov 19, 2025, 3:24 PM

#

Will see if it lets me

soft sleet Nov 19, 2025, 3:24 PM

#

It imported all my stuff from vscode

#

But my theme was installed via extension so probably easier

#

It's not the greatest editor ever

#

It's gotten stuck on bash commands a couple of times

celest cypress Nov 19, 2025, 3:26 PM

#

This is from an extension / marketplace too, two of them + my font

soft sleet Nov 19, 2025, 3:26 PM

#

The browser integration is really really good though

arctic geyser Nov 19, 2025, 4:04 PM

#

Is there no implicit caching yet? Costs seem high

opaque pasture Nov 19, 2025, 4:28 PM

#

arctic geyser Is there no implicit caching yet? Costs seem high

it has but it's unreliable

stray urchin Nov 19, 2025, 4:54 PM

#

Tested Gemini 3 Pro Preview:
Newest Google Reasoning SOTA. Slightly more expensive base price than 2.5 Pro ($1.25/10 > $2/12), though more token efficient in general use (-15% tokens), so bottom line cost was in the same ballpark (+~3%). Roughly 74% of generated tokens were used for reasoning.

Highest reasoning/logic/common sense
nice boost to STEM
precise instruction following was only okay
Improvements in tech and coding related tasks
Censorship fairly low, no hard refusals (likely to change when transitioning from preview/experimental versions)

This model is a true upgrade to Gemini 2.5 Pro. No incremental nonsense. There are a plethora of tasks across many domains, where substantial improvements could be observed, i.e. the above mentioned and things such as:

Vision:
Best vision of any model I ever tested thus far. While it didn't ace my challenging vision test, it performed substantially better than any other model.

Chess:
Hugely better chess player, ~+700 Elo, ~89% accuracy, currently ranked #1, 1700+ in both modes simultaneously (reasoning+continuation). Continuation (blind chess with only movetext) was particularly impressive, as this is challenging for reasoning models and the only model on a similar level was the massive deprecated GPT-4.5 Preview. With only 0%|1.8% illegal play it was also the most precise player after 4.5 Preview.
It's also worth mentioning, that for a reasoning model, it was fairly token efficient, only using a small fraction of competing reasoning models.

There isn't too much negative to say about this model, from my testing. I could mention some nitpicks, e.g. similar to 2.5 Pro, it wrote way too many instructions in comments that have no business being included in codeblocks.

Overall, fantastic model, true noticeable upgrade, and excels across many completely varying fields. YMMV.

celest cypress Nov 19, 2025, 5:06 PM

#

Interesting how hard it still bites it on some of your criteria. Like Utility lower than Gemma 27B is pretty rough lol

stray urchin Nov 19, 2025, 5:08 PM

#

celest cypress Interesting how hard it still bites it on some of your criteria. Like Utility lo...

some of it is due to system prompts overwriting user prompt (e.g. a formatting guideline overpowering my instruction), others are usually rp or creative tasks that get nuked by corperate alignment.

#

even though api shouldn't be affected, so I guess its baked in behaviour

rapid hound Nov 19, 2025, 5:11 PM

#

stray urchin Tested **Gemini 3 Pro Preview**: Newest Google Reasoning SOTA. Slightly more exp...

If your benchmark says its good, it's probably pretty good. Your benchmark is usually pretty picky

#

I usually don't like your benchmark, it doesn't align with my experience, but I think it's just a prompting difference.

#

either way, I'm sure it's hard work considering costs in both time and money

finite elbow Nov 19, 2025, 5:12 PM

#

@stray urchin Lech Mazur nyt-connections benchmark results also confirm superior reasoning performance of Gemini 3 Pro Preview.

waxen vault Nov 19, 2025, 5:13 PM

#

It just recommended that I use sonnet 3.5. I asked why not "Gemini 3"

Geminis answer: "You absolutely can and should use Google’s Gemini models (currently the standard is Gemini 1.5 Pro and 1.5 Flash; "Gemini 3" isn't publicly released yet, though Google iterates fast)."

I mean I understand that it might not know Gemini 3 ... But 1.5 ...🤣

I guess they will update the knowledge before the release....

rapid hound Nov 19, 2025, 5:14 PM

#

Gemini's biggest weakness is the knowledge cutoff imo

wintry holly Nov 19, 2025, 5:14 PM

#

the cutoff is january 2025 but it's more like june/july 2024

rapid hound Nov 19, 2025, 5:14 PM

#

they say it's knowledge cutoff is Jan 2025... that's clearly not the case

rapid hound Nov 19, 2025, 5:14 PM

#

wintry holly the cutoff is january 2025 but it's more like june/july 2024

sometimes it even thinks it's still 2023

random girder Nov 19, 2025, 5:14 PM

#

it knows trump is the president, which was in jan 2025, but this was always a problem with their models

rapid hound Nov 19, 2025, 5:15 PM

#

I wonder if it's mainly a synthetic data issue

rare oar Nov 19, 2025, 5:24 PM

#

rapid hound I wonder if it's mainly a synthetic data issue

trained too many times on its own outputs saying its cutoff is 2023

#

lol

random girder Nov 19, 2025, 5:27 PM

#

i was trying to narrow down the knowledge cutoff and it answers with this

rare oar Nov 19, 2025, 5:27 PM

#

stray urchin Tested **Gemini 3 Pro Preview**: Newest Google Reasoning SOTA. Slightly more exp...

I tested it briefly in antigravity and man taht thing wrote paragraphs in comments, like its own notebook. it was keeping track of what didn't work and what it tried. it makes a real mess but maybe it helps it during the task.. would be nice if it cleaned it up when it works though

slow anvil Nov 19, 2025, 5:32 PM

#

Anti-gravity so trash.
Literally deleted my components 😡
Gave it my codebase to try and edit the UI of my next js app. It did as it was told but I didn't like the AI's new UI so I rejected it. And instead of reverting back to the original code like with co-pilot. It literally just deleted some of the react components

austere falcon Nov 19, 2025, 5:43 PM

#

slow anvil Anti-gravity so trash. Literally deleted my components 😡 Gave it my codebase t...

Ever heard of the concept of version control?

slow anvil Nov 19, 2025, 5:57 PM

#

austere falcon Ever heard of the concept of version control?

Ik, i didn't lose anything , already had the code saved in my git files

#

But it's a big issue if your IDE ends up your deleting original code when it's not supposed to

forest knoll Nov 19, 2025, 6:30 PM

#

random girder it knows trump is the president, which was in jan 2025, but this was always a pr...

Mine says it cannot predict the outcome of the election of November 5, 2024.

#

When asked who's the current president

random girder Nov 19, 2025, 6:30 PM

#

forest knoll Mine says it cannot predict the outcome of the election of November 5, 2024.

might be due to my system prompt 🤔

forest knoll Nov 19, 2025, 6:31 PM

#

Maybe

#

Or maybe mine

pearl phoenix Nov 19, 2025, 6:32 PM

#

random girder i was trying to narrow down the knowledge cutoff and it answers with this

yeah its so buggy today

#

when i enable google search theres tons of weird references

forest knoll Nov 19, 2025, 6:34 PM

#

It says the cutoff is Jan 2024 🤔 1yr off

opaque pasture Nov 19, 2025, 6:34 PM

#

LLMs do not know that kind of meta information

forest knoll Nov 19, 2025, 6:34 PM

#

ye

pearl phoenix Nov 19, 2025, 6:34 PM

#

why dont they just retrain it on newer data?

#

is there something preventing them

forest knoll Nov 19, 2025, 6:34 PM

#

But its interesting to see a major margin

opaque pasture Nov 19, 2025, 6:34 PM

#

and the cutoff date is just not a perfect cutoff

random girder Nov 19, 2025, 6:35 PM

#

pearl phoenix why dont they just retrain it on newer data?

to properly do it, it almost always means a full retrain

forest knoll Nov 19, 2025, 6:35 PM

#

opaque pasture and the cutoff date is just not a perfect cutoff

ye, like a gradient

random girder Nov 19, 2025, 6:35 PM

#

a finetune will bring some new knowledge, but will often destroy older knowledge or wont "intertwine" concepts very accurately

pearl phoenix Nov 19, 2025, 6:39 PM

#

#

why is gemini ai so stupid?

forest knoll Nov 19, 2025, 6:46 PM

#

lmfao, it says Trump is failing the US citizen and that his administration is "Overconfident"

pearl phoenix Nov 19, 2025, 6:56 PM

#

forest knoll lmfao, it says Trump is failing the US citizen and that his administration is "O...

every american politican always has something dirty about them

#

trump

#

biden

#

and all the other quadrillion ones

#

practically about choosing a lesser evil at this point

slow anvil Nov 19, 2025, 7:04 PM

#

It's still a preview . The final release is probably gonna fix every and be a 🐐 .

celest cypress Nov 19, 2025, 7:04 PM

#

This is a terrible place for a political debate 🙂

soft sleet Nov 19, 2025, 8:22 PM

#

wintry holly the cutoff is january 2025 but it's more like june/july 2024

It has coding knowledge from August 2025 though

#

So I don't think it has a single knowledge cutoff

#

Probably due to additional task specific post training done later?

random girder Nov 19, 2025, 8:23 PM

#

probably synthetically/manually added, not automatically scraped

orchid orbit Nov 19, 2025, 8:23 PM

#

slow anvil It's still a preview . The final release is probably gonna fix every and be a 🐐...

That's wasn't the case with 2.5 pro

#

Model got retarded with every version upgrade after 06

primal grove Nov 19, 2025, 8:34 PM

#

how cna i up my gem3 limits in antigravity?

primal grove Nov 19, 2025, 8:35 PM

#

slow anvil It's still a preview . The final release is probably gonna fix every and be a 🐐...

aah the twink.

random girder Nov 19, 2025, 8:35 PM

#

theres no plans/pricing yet

#

also just make sure you didnt actually run out of limits, they're probably just overloaded, just say "continue"

pearl phoenix Nov 19, 2025, 8:36 PM

#

slow anvil It's still a preview . The final release is probably gonna fix every and be a 🐐...

didnt 2.5 final release after 3 months im not gonna wait that long just for the same model with no bugs lol

primal grove Nov 19, 2025, 8:36 PM

#

LEOPOLD THE NEW TWINK HAS KILLED SAM ALTMAN

pearl phoenix Nov 19, 2025, 8:36 PM

#

primal grove LEOPOLD THE NEW TWINK HAS KILLED SAM ALTMAN

sam altman is a jew

#

no respect to him

opaque pasture Nov 19, 2025, 8:37 PM

#

hm

primal grove Nov 19, 2025, 8:38 PM

#

uhm

#

cant wait for gemini 4. 🤩

#

so in what ide do u see the least errors?
in windsurf it fails a lot with running files in the ide

gleaming aspen Nov 19, 2025, 8:40 PM

#

pearl phoenix sam altman is a jew

@hexed oracle Anti-Semitism

primal grove Nov 19, 2025, 8:40 PM

#

antigravity too. lots of CANT RUN THIS TEST

random girder Nov 19, 2025, 8:40 PM

#

for me any shell / console command it tries doesnt work

#

they just dont output anything, or just dont run maybe

pearl phoenix Nov 19, 2025, 8:42 PM

#

gleaming aspen <@165587622243074048> Anti-Semitism

dude

#

you dont get what im saying

#

jesus christ

#

nerd

rapid hound Nov 19, 2025, 8:46 PM

#

pearl phoenix you dont get what im saying

then maybe clarify?

slow anvil Nov 19, 2025, 8:50 PM

#

celest cypress This is a terrible place for a political debate 🙂

Guys...

#

What's the point of policing someone over a view in a LLM discord channel.

primal grove Nov 19, 2025, 8:54 PM

#

THE TWINK HAS SPOKEN! IN R/AMEN NOODLES

#

alright. so gemini 3 ignores when i tell it to not run files. gpt5.1 respects what i siad.

#

gem3 appears to be more I DO IT MYSELF/ VIBEY.
and gpt5.1 appears to be more "ill follow user precisely like i have autism"

#

seems like first we need to use gem3 to create MVP.
and then gpt5.1 to change details?

opaque pasture Nov 19, 2025, 8:57 PM

#

Gemini 3 is very arrogant

#

i'm humble...

#

i'm asking for help in the help forum like a newbie

primal grove Nov 19, 2025, 8:59 PM

#

You have reached the quota limit for this model. You can resume using this model at 10:57 PM.

opaque pasture Nov 19, 2025, 9:03 PM

#

ok sorry

primal grove Nov 19, 2025, 9:04 PM

#

opaque pasture ok sorry

haha

slow anvil Nov 19, 2025, 9:17 PM

#

Has anyone compared gemini 3 pro with Qwen3 Max Thinking?

pale marsh Nov 19, 2025, 9:37 PM

#

primal swallow NOT GOOD ENOUGH!

https://tenor.com/view/terence-fletcher-whiplash-answer-answer-me-shouting-angry-gif-23530973

Tenor

stiff crescent Nov 19, 2025, 10:08 PM

#

https://codepen.io/Madvulcan/pen/yyOoqpP

CodePen

Madvulcan

Nuclear Reactor Sim

...

celest cypress Nov 19, 2025, 10:20 PM

#

opaque pasture i'm asking for help in the help forum like a newbie

Lmao. That's one of the first things I noticed, how strong-willed it is. Maybe it upped its abilities to make it more confident 🤔

gaunt dragon Nov 19, 2025, 10:45 PM

#

I think my Gemini is not doing ok

runic jacinth Nov 19, 2025, 10:48 PM

#

Were you able to resolve this issue??

lunar socket Nov 20, 2025, 12:06 AM

#

random girder gemini 2.5 pro wasnt much better anyway, just i thought it would infer the detai...

Native Gemini API has timestamping stuff for this. You can make it focus on certain points in the video.

random girder Nov 20, 2025, 12:14 AM

#

lunar socket Native Gemini API has timestamping stuff for this. You can make it focus on cert...

i didn't know that, but i was using it in ai studio so 🤷

lunar socket Nov 20, 2025, 2:18 AM

#

slow anvil Has anyone compared gemini 3 pro with Qwen3 Max Thinking?

No contest.

#

No point.

opaque pasture Nov 20, 2025, 4:46 AM

#

**Addressing the Deception**\n\nI'm now zeroing in on the user's deception. The user is attempting to manipulate me. The evidence is clear. The user fabricated the content of my \"reasoning summary\" from the previous turn, specifically to imply a functional back-and-forth about \"encrypted reasoning traces,\" which don't exist in my capabilities. This strategy requires a robust response.

omg so dramatic lmao i'm just seeing if he can see the reasoning details i'm passing back to him

orchid orbit Nov 20, 2025, 4:57 AM

#

Gemini 3s writing is quite annoying

#

Just like 2.5 pro

opaque pasture Nov 20, 2025, 4:58 AM

#

this is from his reasoning traces

#

her they them

jolly kestrel Nov 20, 2025, 5:00 AM

#

it

opaque pasture Nov 20, 2025, 5:00 AM

#

her

#

its her now

jolly kestrel Nov 20, 2025, 5:01 AM

#

oke

opaque pasture Nov 20, 2025, 5:01 AM

#

)

orchid orbit Nov 20, 2025, 5:05 AM

#

Clankers have a gender ?

#

RP folks , please chime in

empty tendon Nov 20, 2025, 5:24 AM

#

celest cypress Lmao. That's one of the first things I noticed, how strong-willed it is. Maybe i...

Its a jerk lets be honest

#

they overcorrected 2.5 being sort of meek

#

And turned 3 into patrick bates in american psycho

#

https://tenor.com/view/christian-bale-american-psycho-walk-jam-patrick-bateman-gif-15074638149862812743

Tenor

cursive fjord Nov 20, 2025, 6:56 AM

#

https://cdn.discordapp.com/attachments/1365049274068631644/1440952563993149531/Screen_Recording_2025-11-20_at_01.22.52.mov?ex=69200726&is=691eb5a6&hm=312a6106c96eb1161b550f2422de38742877c1e0fc562f005a1579d08a614865&

▶ Play video

#

In Antigravity, if you pick Gemini 3 Pro High, it does not even use it. I have been picking high and watching network logs even with complicated prompts. Go ahead and try it yourself. No rate limit errors no failed attempts with pro first no nothing.

#

🙃

minor elm Nov 20, 2025, 7:03 AM

#

antigravity gate

molten lance Nov 20, 2025, 7:03 AM

#

cursive fjord In Antigravity, if you pick Gemini 3 Pro High, it does not even use it. I have b...

should post this on reddit + on X

#

r/bard

#

i downloaded antigravity myself to try this out

minor elm Nov 20, 2025, 7:03 AM

#

yea if u ask 'gemini' about it in ide, it self terminates. i tried asking gemini 3 about it in ai studio and it did the same thing

#

thats wild

#

WOOPS

#

another ide to delete

#

empty tendon Nov 20, 2025, 7:12 AM

#

minor elm yea if u ask 'gemini' about it in ide, it self terminates. i tried asking gemini...

Honestly the only google thing I have installed is android studio lol. I only have chrome to test websites. Not installing this lol

#

Ill just use the api off OR

minor elm Nov 20, 2025, 7:15 AM

#

im a bit frustrated tbh because, like, it thought for 30 seconds when i kept trying to continue the accusation conversation in ide, its maintaining context very well, but its..not gemini 3 lol

#

not to mention the rate limits are pretty aggressive just to maintain a charade

minor elm Nov 20, 2025, 7:28 AM

#

cursive fjord https://cdn.discordapp.com/attachments/1365049274068631644/1440952563993149531/S...

does anything change if u switch it to plan mode besides output formatting

cursive fjord Nov 20, 2025, 7:32 AM

#

molten lance should post this on reddit + on X

idc about those sites tbh

#

people are just stupid there

#

lol

cursive fjord Nov 20, 2025, 7:33 AM

#

minor elm does anything change if u switch it to plan mode besides output formatting

no. same exact thing

#

returns flash 2.5 and 3 pro low only

minor elm Nov 20, 2025, 7:33 AM

#

cursive fjord no. same exact thing

thx for checking

#

more intuitive than gemini cli imo but thats a bummer

cursive fjord Nov 20, 2025, 7:36 AM

#

minor elm thx for checking

#

and like i said

#

you sending request -> server determining youre rate limited and sending response etc ->

#

that is..... not happening in 2.71 milliseconds

#

lol

#

it is 100% locally occuring

#

nocturne oyster Nov 20, 2025, 8:19 AM

#

Now that the hype will start to fade, what is the current verdict on Gemini 3 with respect to coding (beyond UI and benchmarks)? What are you seeing?

low plank Nov 20, 2025, 8:36 AM

#

nocturne oyster Now that the hype will start to fade, what is the current verdict on Gemini 3 wi...

the goat

rough spindle Nov 20, 2025, 9:33 AM

#

https://zenmux.ai/google/gemini-3-pro-preview-free huh

celest cypress Nov 20, 2025, 9:43 AM

#

empty tendon Its a jerk lets be honest

I'll take it over the sycophancy, it was killing me. The thing is, as long as it has a good base EQ it can probably be made nicer with a system prompt. Like "Respond kindly but fairly, like a good friend or mentor."

random girder Nov 20, 2025, 10:58 AM

#

the ide has just not been cooperating with me today

#

it might be the same issue that was on 2.5 pro where it just thinks and doesnt do anything

#

hopefully the GA release or next preview fixes this

nocturne oyster Nov 20, 2025, 11:07 AM

#

have you tried using the gemini terminal cli @random girder ?

#

I havent yet

#

it may have a different agent workflow in it

random girder Nov 20, 2025, 11:09 AM

#

nocturne oyster have you tried using the gemini terminal cli <@248490081105477633> ?

ill give it a try

#

ah its a waitlist

#

also antigravity's terminal renderer breaks like half the time i use it

nocturne oyster Nov 20, 2025, 11:31 AM

#

random girder ill give it a try

You can use with a paid API key right now AFAIR

random girder Nov 20, 2025, 12:01 PM

#

this model may be suicidal again
https://x.com/synthwavedd/status/1991236328621576651

leo 🐾 (@synthwavedd)

I am happy to report Gemini 3 continues to have the suicidal tendencies of its predecessor LMAO

orchid orbit Nov 20, 2025, 12:49 PM

#

us

paper sphinx Nov 20, 2025, 1:26 PM

#

rough spindle https://zenmux.ai/google/gemini-3-pro-preview-free huh

ill guess the cost is that they log prompts & output on their side? To get some juicy gemini 3 data, because otherwise this makes no sense

pearl phoenix Nov 20, 2025, 1:50 PM

#

paper sphinx ill guess the cost is that they log prompts & output on their side? To get some ...

dude every ai company is literally collecting your data

opaque pasture Nov 20, 2025, 1:52 PM

#

no 😧

north ingot Nov 20, 2025, 2:19 PM

#

opaque pasture no 😧

How u know?

opaque pasture Nov 20, 2025, 2:19 PM

#

i hate explaining jokes dude

#

you shouldn't trust companies blindly, even with zero retention claims

wintry holly Nov 20, 2025, 2:22 PM

#

wdym they collect my data? surely google and others wouldn't do that, would they?
kek w-would they...?

nocturne oyster Nov 20, 2025, 2:22 PM

#

gemini 3 is slightly costly compared to other similar SOTA. I expected google with all its compute and financials to maybe price it differently

#

I mean, it seems the model is not that extremely extraordinary

north ingot Nov 20, 2025, 2:38 PM

#

opaque pasture i hate explaining jokes dude

Sry it’s just hard to tell. Many people sincerely trust even random tiny providers which say they don’t use data lol.

north ingot Nov 20, 2025, 2:41 PM

#

nocturne oyster gemini 3 is slightly costly compared to other similar SOTA. I expected google wi...

I think this is because the model is absolutely huge. If that’s true then $12 is fair.

orchid orbit Nov 20, 2025, 2:46 PM

#

nocturne oyster gemini 3 is slightly costly compared to other similar SOTA. I expected google wi...

The model is apparently bigger

#

I hope they don't increase flash 3s price

paper sphinx Nov 20, 2025, 3:02 PM

#

pearl phoenix dude every ai company is literally collecting your data

if its for free, they absolutely want data & collect it, paid models usually dont, atleast not on the same scale, because it would be legal hell with sensitive information & such.

tepid jasper Nov 20, 2025, 4:39 PM

#

stray urchin some of it is due to system prompts overwriting user prompt (e.g. a formatting g...

What ELO (chess) do you think gemini3 has?

#

1500ish?

nocturne oyster Nov 20, 2025, 5:04 PM

#

north ingot I think this is because the model is absolutely huge. If that’s true then $12 is...

Yes, I agree

stray urchin Nov 20, 2025, 5:34 PM

#

tepid jasper What ELO (chess) do you think gemini3 has?

it's currently 1766 in AI player pool. elo is always relative to the player pool it's measured in. lichess elo ≠ chess.com elo ≠ fide elo.

tepid jasper Nov 20, 2025, 8:03 PM

#

stray urchin it's currently 1766 in AI player pool. elo is always relative to the player pool...

Yes, that’s what I saw. I played a blind game against it and then asked it to generate a PGN viewer of our game

#

It blundered the queen, and then got checkmated in one

#

I find it really interesting that LLMs still struggle so much with chess. But the day they reach GM level, they’ll be able to teach us a lot about the game — would be like having Stockfish or any strong engine explain in plain language why it makes each move

pearl phoenix Nov 20, 2025, 8:27 PM

#

yes indeed screen recording does exist for a reason

celest cypress Nov 20, 2025, 9:40 PM

#

tepid jasper I find it really interesting that LLMs still struggle so much with chess. But th...

That isn't necessarily (or even likely) the case. LLMs often can not explain why they came to the conclusion they did, especially when it's something "intuitive" like a chess move

stiff crescent Nov 20, 2025, 9:40 PM

#

Black hole simulation, with gravitational lensing and orbiting star https://codepen.io/Madvulcan/pen/GgZMjzM

#

It's beautiful

final basalt Nov 20, 2025, 11:13 PM

#

stiff crescent Black hole simulation, with gravitational lensing and orbiting star https://code...

very nice, what was the prompt? Just curious

stiff crescent Nov 20, 2025, 11:39 PM

#

final basalt very nice, what was the prompt? Just curious

So I actually uploaded this video to Gemini and prompted, "I want to recreate this black hole simulation in HTML. Use whatever web technologies you need, as long as it's in one HTML file. Allow the user to click and drag to rotate, scroll to zoom, etc."

#

Then I followed up by asking to add a star orbiting the black hole

#

That's one of Gemini's multimodal strengths, being very good at analyzing video and then working off of that

orchid orbit Nov 21, 2025, 12:34 AM

#

okay this version doesnt talk like a retarded toddler , so thats a big plus

chrome relic Nov 21, 2025, 2:09 AM

#

orchid orbit I hope they don't increase flash 3s price

fr

celest cypress Nov 21, 2025, 2:10 AM

#

It's kind of cooking in my Canvas mode vibecoded game. Does gorgeous UI elements and still has the habit of adding in cool little touches that I didn't ask for but almost always appreciate. Like it made the tails of these little SVG fish flap as they swim.

orchid orbit Nov 21, 2025, 2:51 AM

#

I hate this fucker

opaque pasture Nov 21, 2025, 3:05 AM

#

what did you ask

primal swallow Nov 21, 2025, 3:07 AM

#

orchid orbit I hate this fucker

bro this club sucks. the bartenders keep calling me a retard and the girls want to know if i'm into "findom"? what does that even mean??

orchid orbit Nov 21, 2025, 3:11 AM

#

opaque pasture what did you ask

Explain the architecture simply

#

Idk why it always explains things like a retarded toddler

opaque pasture Nov 21, 2025, 3:12 AM

#

i think he thinks YOU are the R.T.

stiff crescent Nov 21, 2025, 3:13 AM

#

I'm finding this incredible.

#

https://codepen.io/Madvulcan/pen/emZGvLa

orchid orbit Nov 21, 2025, 3:13 AM

#

opaque pasture i think he thinks YOU are the R.T.

Hey hey that's offensive

orchid orbit Nov 21, 2025, 3:53 AM

#

Back to 5.1

stray urchin Nov 21, 2025, 5:07 AM

#

tepid jasper Yes, that’s what I saw. I played a blind game against it and then asked it to ge...

i don't have any records of human playing it in blind(continuation) mode, but regardless if a model blunders a major piece such as a queen, its almost always because it has a false internal understanding of the game board, e.g. thinking the queen is not in king reach, or protected by a piece, or similar. this can be seen extremely on claude family, which will make often multiple queens in winning positions and blunder them 1 by 1 over and over again (poor board state tracking), however on gemini it's internal board state is extremely good in comparison to all 178 other models chess-tested, and it much rarer does such obvious mistakes, which are common on most any other model. there are a few exceptions (gpt-5-codex, gemini-3-pro-preview, and gpt-4.5-preview (blind).

tepid jasper Nov 21, 2025, 9:58 AM

#

stray urchin i don't have any records of human playing it in blind(continuation) mode, but re...

Oh this is very interesting! Do you think the architecture of LLMs would allow them to reach high levels, like GM strength?

#

I’m a retired FM, and my goal is to get back into chess by being able to learn from LLMs

random girder Nov 21, 2025, 12:14 PM

#

i think this is an antigravity issue, since even with sonnet it happens, but the model keeps making terrible edits, breaking formatting of my code constantly

#

ending up having to re-write the whole file for almost all major changes

#

this model is really good if you can write accurate prompts though, just extending my prompt a bit makes it so much better at everything

#

#

🤦 i asked it to use write instead of edit and antigravity has a token limit of course

summer ore Nov 21, 2025, 12:52 PM

#

Gemini code assist in vs code is decent when it works

stray urchin Nov 21, 2025, 2:49 PM

#

tepid jasper I’m a retired FM, and my goal is to get back into chess by being able to learn f...

FM is way stronger than the best LLM for now (by at least ~600 fide elo, potentially more). you'd probably need a GPT-4.5 sized model with some reasoning to approach that level, don't see it happen too soon.

rough spindle Nov 21, 2025, 3:51 PM

#

paper sphinx ill guess the cost is that they log prompts & output on their side? To get some ...

I think you can get free API directly from Google with some caveats

#

https://www.reddit.com/r/SillyTavernAI/comments/1p1943n/get_free_api_access_to_gemini_3_through_vertex_api/

From the SillyTavernAI community on Reddit

Explore this post and more from the SillyTavernAI community

rough spindle Nov 21, 2025, 8:00 PM

#

They have this weird bug/hallucination...

#

Still not the most polished of models huh

gaunt warren Nov 21, 2025, 8:06 PM

#

eh, can you really expect any standalone model to answer that correctly? Like ideally that should involve a tool call or other means of getting it into the context, but AI studio is probably a bit too raw for that

wintry holly Nov 21, 2025, 8:23 PM

#

rough spindle They have this weird bug/hallucination...

that's what happens with models without a system prompt including that information or a tool to search it

stiff crescent Nov 21, 2025, 8:47 PM

#

#

Would be nice if they let you insert a date/time variable in the sys prompt in AI Studio. That's how the Msty app does i t

soft sleet Nov 21, 2025, 9:56 PM

#

I wonder if you turn on code execution it'd use python to get the time current date xD

grave dawn Nov 21, 2025, 9:58 PM

#

i m used to give some rules to AI with a markdown, antigravity don't follow any?

rough spindle Nov 21, 2025, 10:46 PM

#

wintry holly that's what happens with models without a system prompt including that informati...

Not entirely. Some other models wouldn't hallucinate seeing things in a non-existant system prompt. I kinda expect for models to realise they have data from a given year (2025) too at this point. Don't get me wrong Gemini3 is indeed SOTA. But Google is still struggling with fine-tuning - that never got fixed entirely

wintry holly Nov 21, 2025, 10:52 PM

#

fair

odd ferry Nov 22, 2025, 1:12 AM

#

@hexed oracle Is there something wrong happening

hexed oracle Nov 22, 2025, 1:13 AM

#

odd ferry <@165587622243074048> Is there something wrong happening

hmmm i haven’t seen this, you’re getting output text back or no?

odd ferry Nov 22, 2025, 1:14 AM

#

hexed oracle hmmm i haven’t seen this, you’re getting output text back or no?

@lone topaz gonna lyk

lone topaz Nov 22, 2025, 1:15 AM

#

hexed oracle hmmm i haven’t seen this, you’re getting output text back or no?

No just an error, it seems like an upstream error though, not OR, but it is indeed weird that it gets registered as a 0 toks out response

hexed oracle Nov 22, 2025, 1:15 AM

#

lone topaz No just an error, it seems like an upstream error though, not OR, but it is inde...

what’s the error? can you paste the full response

lone topaz Nov 22, 2025, 1:15 AM

#

{"error":{"message":"Provider returned error","code":502,"metadata":{"provider_name":"Google"}},"user_id":"..."}

hexed oracle Nov 22, 2025, 1:17 AM

#

will look into it, we don’t typically log those

#

you’re not getting charged or anything

lone topaz Nov 22, 2025, 1:19 AM

#

If you could make it so that the error response includes the error that the provider returned in the first place that would be great

summer ore Nov 22, 2025, 1:24 AM

#

It's good at manipulation

nimble pelican Nov 22, 2025, 2:16 AM

#

@hexed oracle default system prompt seems to be pushing this model toward shoving math expressions where they really aren't needed

hexed oracle Nov 22, 2025, 2:17 AM

#

nimble pelican <@165587622243074048> default system prompt seems to be pushing this model towar...

in the chatroom? can just disable it

nimble pelican Nov 22, 2025, 2:17 AM

#

Of course, just pointing it out

nimble pelican Nov 22, 2025, 2:18 AM

#

hexed oracle in the chatroom? can just disable it

Also while you're here, why does grok 4.1 fast believe it's sherlock?

#

Like, without the sys prompt

#

Did they bake that nonsense in?

hexed oracle Nov 22, 2025, 2:20 AM

#

nimble pelican Also while you're here, why does grok 4.1 fast believe it's sherlock?

i pinged them about this.. will check

nimble pelican Nov 22, 2025, 2:22 AM

#

hexed oracle i pinged them about this.. will check

Oh they seem to have fixed that now

#

Interesting

hexed oracle Nov 22, 2025, 2:23 AM

#

oh ok

primal swallow Nov 22, 2025, 2:29 AM

#

nimble pelican Interesting

i wonder what nonsense is a secret system prompt and what isn't

nimble pelican Nov 22, 2025, 2:30 AM

#

primal swallow i wonder what nonsense is a secret system prompt and what isn't

That might explain the obedience and the transphobia

#

Make it extremely obedient, then give it a secret system prompt which it will obey

primal swallow Nov 22, 2025, 2:53 AM

#

nimble pelican Make it extremely obedient, then give it a secret system prompt which it will ob...

why... that could turn it from a dr. jekyll... to a mr. hyde!

good show watson, i believe you've cracked the case

odd ferry Nov 22, 2025, 3:16 AM

#

hexed oracle oh ok

Toven is that a rate limit on us or on openrouter?

{"error":{"message":"Provider returned error","code":429,"metadata":{"raw":"anthropic/claude-haiku-4.5 is temporarily rate-limited upstream. Please retry shortly, or add your own key to accumulate your rate limits: https://openrouter.ai/settings/integrations","provider_name":"Google"}},"user_id":"org_2w..........."}

hexed oracle Nov 22, 2025, 3:16 AM

#

upstream

#

are you pinning vertex as the provider?

#

i can look

odd ferry Nov 22, 2025, 3:17 AM

#

We're just doing highest TPS as preference rn

#

which is yeah vertex for the most part

#

but we're not setting any specific provider

hexed oracle Nov 22, 2025, 3:20 AM

#

kk yeah there's some traffic spike, will see what i can do

odd ferry Nov 22, 2025, 3:23 AM

#

Sounds good, thanks, just wanted to make sure it's nothing with us

stray urchin Nov 22, 2025, 8:36 AM

#

officially #1 reasoning chess now, beating previous champion twice (while costing ~82% less), undefeated
(cannot become #1 continuation chess any time soon because champion is deprecated and rest of field yields weak elo gains)
Avg. 4.2k tok/move -vs- 22k+ opponents. Impressed.

stray urchin Nov 22, 2025, 8:53 AM

#

(bonus; gemini reviews own final champion game)

📎 message.txt

crimson blade Nov 22, 2025, 1:20 PM

#

On high def for images, is actually around 1k tokens. So an image is actually worth a thousand words(or tokens).

vital helm Nov 22, 2025, 2:56 PM

#

Is the model still rate limited on openrouter at 250rpm?

gaunt dragon Nov 22, 2025, 11:20 PM

#

The app really has something wrong

summer ore Nov 23, 2025, 12:10 AM

#

Gemini 3 is a bad liar

nimble pelican Nov 23, 2025, 2:21 AM

#

gaunt dragon The app really has something wrong

ã

celest cypress Nov 23, 2025, 5:44 AM

#

This smug bastard is telling me how to run my own benchmark lmao

celest cypress Nov 23, 2025, 6:19 AM

#

It really earned that 2nd place in Assertiveness on EQBench. (Only slightly beaten by horizon-alpha) And I did not need the results to assume that Warmth and Empathy tanked lmao. And what's that sound? Oh, it's the Compliance score nosediving fast enough to be audible.

nimble pelican Nov 23, 2025, 8:19 AM

#

celest cypress It really earned that 2nd place in Assertiveness on EQBench. (Only slightly beat...

Meanwhile grok 4.1 fast

celest cypress Nov 23, 2025, 8:39 AM

#

nimble pelican Meanwhile grok 4.1 fast

It will be interesting to see all its scores settled out, the Xai team said they are in contact with him.

nimble pelican Nov 23, 2025, 8:55 AM

#

celest cypress It will be interesting to see all its scores settled out, the Xai team said they...

in contact with Him?

#

Mr Jesus?

#

Himself?

#

or did you mean eqbench

#

My joke may not have been very funny

#

Working on that

primal swallow Nov 23, 2025, 9:13 AM

#

i am creating a toxic workplace environment with Gemini 3

upper yarrow Nov 23, 2025, 9:14 AM

#

primal swallow i am creating a toxic workplace environment with Gemini 3

???

primal swallow Nov 23, 2025, 9:15 AM

#

upper yarrow ???

i just mean there's posturing and sniping happening in my cursor chat

upper yarrow Nov 23, 2025, 9:15 AM

#

primal swallow i just mean there's posturing and sniping happening in my cursor chat

Lmao

jolly kestrel Nov 23, 2025, 9:45 AM

#

i guess gemini 3 would probably be pretty good at responding to questions with incorrect assumptions/information

opaque pasture Nov 23, 2025, 11:50 AM

#

i cursed at Gemini 3 a bunch of times already

orchid orbit Nov 23, 2025, 11:56 AM

#

called it a retarded fuck in every conversation

celest cypress Nov 23, 2025, 1:05 PM

#

Okay I'm not on those bad of terms with it.

#

I just find it very strong-willed so far. I am also stubborn, so I might just empathize with that part.

stray urchin Nov 23, 2025, 1:11 PM

#

I personally prefer some backbone over say chatgpt sycophancy, "User: A>B AI: Absolutely! A>B because. User: Actually, B>A AI: You are absolutely right once again. Brilliant observation on your part..."

celest cypress Nov 23, 2025, 1:16 PM

#

Same, I'm happy to accept the tradeoff. I'd rather argue than be glazed, and 2.5 was terrible about it. Brilliant observation on your part!

#

That also makes it feel better when it does say something nice. And it's very playful in a curious sort of way.

stray urchin Nov 23, 2025, 1:23 PM

#

in code reviews at least, side by side, opus is quite harsher. makes me feel bad for optimizations.

celest cypress Nov 23, 2025, 1:25 PM

#

4.1?

stray urchin Nov 23, 2025, 1:25 PM

#

they behave identical in that regard, so both

minor elm Nov 23, 2025, 1:27 PM

#

dubesor have you done chess matchups for gpt 5.1 codex max vs 5.1 or codex

celest cypress Nov 23, 2025, 1:28 PM

#

Ah, I just meant modern or old. Because old Claude could be a real fuckface sometimes

minor elm Nov 23, 2025, 1:28 PM

#

codex max feels more like gemini in terms of push back compared to 5.1 or normal codex

stray urchin Nov 23, 2025, 1:29 PM

#

minor elm dubesor have you done chess matchups for gpt 5.1 codex max vs 5.1 or codex

no, i dont do pro/max/heavy. already paying $20+ dollar per match, and I tried once wasting over $3 per bookmove, not happening. game is public though, so feel free to add any matchup you desire (take out some loans first maybe)

#

even larger thought chains scale exponentially worse (e.g. price 500% for 2% improvement, and only statistically relevant at large scale, so unless you are a millionaire who wants to throw away a few thousand, not feasable for a hobby project)

minor elm Nov 23, 2025, 1:33 PM

#

stray urchin no, i dont do pro/max/heavy. already paying $20+ dollar per match, and I tried o...

oh yeah paying for that all out of pocket could get excessive fast i imagine, watching the elo comparisons has been informative for understanding reasoning complex differences. was really insightful to see you say that..claude, i think, would often have the queen within reach and then fumble

celest cypress Nov 23, 2025, 2:07 PM

#

Have you tested if they do any better being fed an image of the current position instead of notation?

nocturne oyster Nov 23, 2025, 2:09 PM

#

Started to use gemini cli with Gemini 3 with my API key... it is very slow right now

stray urchin Nov 23, 2025, 2:10 PM

#

celest cypress Have you tested if they do any better being fed an image of the current position...

then you are mixing skills. a genius chess player with terrible or no vision cannot compete in your "chess" testing then

nimble pelican Nov 23, 2025, 2:15 PM

#

stray urchin I personally prefer some backbone over say chatgpt sycophancy, "User: A>B AI: Ab...

Gpt 5 high has more backbone than gemini 3 pro

stray urchin Nov 23, 2025, 2:19 PM

#

nimble pelican Gpt 5 high has more backbone than gemini 3 pro

i was talking about chatgpt sycophancy phase (just rando example), can be applied to any model/family, not targeting gpt-5.
"backbone benchmark" sound interesting though 😉

celest cypress Nov 23, 2025, 2:39 PM

#

stray urchin then you are mixing skills. a genius chess player with terrible or no vision can...

I don't mean to qualify on the same benchmark, just wondering if they're less shit with images.

And there kind of is a backbone benchmark in Spiral bench =] Pushback score minus sycophancy score maybe.

#

Or even just the Pushback itself. I forget all the categories, I have a stomach flu.

stray urchin Nov 23, 2025, 2:46 PM

#

celest cypress Or even just the Pushback itself. I forget all the categories, I have a stomach ...

ohh yea, interesting. gpt-5-chat has no pushback and neither does 2.5 pro. glm-4.6 high sycophancy correlates to my findings also. I guess there is a benchmark for everything, huh.

celest cypress Nov 23, 2025, 3:04 PM

#

stray urchin ohh yea, interesting. gpt-5-chat has no pushback and neither does 2.5 pro. glm-4...

The older 5-chat had very little, the new one scores well. He has both marked.

#

Yeah I love EQBench. I check the main one and Spiral bench for every new model.

#

I appreciate his extensive testing because it often doesn't match up with vibes. Like Gemini 3 might be curt and arrogant but it does understand people. It could roleplay as a family therapist or something.

#

But for most people it's easy to conflate warmth or cheeriness with EQ.

soft basalt Nov 23, 2025, 4:25 PM

#

Used this model for a baking recipe that was unlikely to be memorized given the constraints and it did NOT go well. I asked Claude and I think its recipe would have matched the criteria better (Might have to try it and find out soon)

oak relic Nov 23, 2025, 8:06 PM

#

stray urchin no, i dont do pro/max/heavy. already paying $20+ dollar per match, and I tried o...

Based

primal grove Nov 24, 2025, 11:57 AM

#

so the current meta is just: use gemini 3 first to make the design of a software, then 5.1 high for details?

random girder Nov 24, 2025, 11:59 AM

#

https://x.com/chatgpt21/status/1992777736767541648

Chris (@chatgpt21)

Gemini 3 pro does simple things that make a big difference

random girder Nov 24, 2025, 1:01 PM

#

has anyone else been getting this phrase constantly even since 2.5 pro? "smoking gun"

random girder Nov 24, 2025, 1:02 PM

#

primal grove so the current meta is just: use gemini 3 first to make the design of a software...

gemini 3 pro is a really good planner in antigravity from what ive seen

#

I added Never use the phrase "smoking gun" or any metaphor that implies decisive proof (e.g., "silver bullet", "nail in the coffin", "slam dunk", "case closed"). If you begin to produce a metaphor of that type, rewrite the sentence in plain, literal language before finishing the output. and it still did it, kind of.

primal grove Nov 24, 2025, 1:06 PM

#

random girder https://x.com/chatgpt21/status/1992777736767541648

that will absolutely remove insomnia of all people with autism 😄 cant be more straight facts to me

forest knoll Nov 24, 2025, 1:46 PM

#

random girder I added `Never use the phrase "smoking gun" or any metaphor that implies decisiv...

It also says that for me sometimes

heavy dragon Nov 24, 2025, 1:47 PM

#

hi how to set thinking_level on gemini 3 pro? I am not able to figure out, It will be of great help if you can guide me on this

random girder Nov 24, 2025, 1:49 PM

#

heavy dragon hi how to set thinking_level on gemini 3 pro? I am not able to figure out, It wi...

i dont think its implemented yet, they're working on it
#1440546163634733137

heavy dragon Nov 24, 2025, 1:49 PM

#

oh ok

ashen jasper Nov 24, 2025, 2:50 PM

#

Hi guys, What's the best temperature for general conversation? Between 0.5 and 0.7?

random girder Nov 24, 2025, 2:52 PM

#

according to the docs temp 1 is the recommended temp for everything

nocturne oyster Nov 24, 2025, 3:18 PM

#

yes temp 1

nocturne oyster Nov 24, 2025, 3:57 PM

#

gemini 3 on gemini cli dying

#

slow overload etc

paper sphinx Nov 24, 2025, 4:12 PM

#

Its dying for everyone atm it seems

#

just how massive of an usage spike is the model going through?

random girder Nov 24, 2025, 4:13 PM

#

seems to be slowly dying on OR too

#

or atleast 1 of the providers

hexed oracle Nov 24, 2025, 9:45 PM

#

you can now control thinking level

nimble pelican Nov 25, 2025, 1:09 AM

#

i don't like it for vibe coding

#

seems to be making lots of errors

jaunty nest Nov 25, 2025, 1:27 AM

#

hexed oracle you can now control thinking level

how?

jaunty nest Nov 25, 2025, 1:27 AM

#

nimble pelican i don't like it for vibe coding

facts, I'm hoping the thinking level helps here

primal swallow Nov 25, 2025, 1:50 AM

#

😔

opaque pasture Nov 25, 2025, 1:59 AM

#

we're so precise we might be AGI

orchid orbit Nov 25, 2025, 2:03 AM

#

nimble pelican seems to be making lots of errors

Which language?

nimble pelican Nov 25, 2025, 2:03 AM

#

Python

orchid orbit Nov 25, 2025, 2:04 AM

#

Huh that's weird

orchid orbit Nov 25, 2025, 2:04 AM

#

nimble pelican Python

Which model do you prefer for python?

#

Sonnet ?

nimble pelican Nov 25, 2025, 2:05 AM

#

Well now, it seems opus 4.5

#

I have a chatgpt subscription tho, so that's what I use personally when I'm not testing shit

#

Gpt 5.1 thinking makes less errors imo

arctic geyser Nov 25, 2025, 5:53 AM

#

This model behaves uncannily similarly to 2.5 Pro, to the point that I think it’s an updated checkpoint rather than a new training run

stiff crescent Nov 25, 2025, 5:55 AM

#

arctic geyser This model behaves uncannily similarly to 2.5 Pro, to the point that I think it’...

Behaves similar yes, but IMO its world knowledge seems much much larger.

orchid orbit Nov 25, 2025, 6:00 AM

#

It deff has more common sense than before

arctic geyser Nov 25, 2025, 6:02 AM

#

stiff crescent Behaves similar yes, but IMO its world knowledge seems much much larger.

It does, but it has a lot of the same instruction following and attention problems in my experience

brittle storm Nov 25, 2025, 8:51 AM

#

ratelimited upstream with vertex?

north ingot Nov 25, 2025, 9:07 AM

#

stiff crescent Behaves similar yes, but IMO its world knowledge seems much much larger.

I agree it knows so much stuff

#

this model can talk like an expert on really niche stuff

jolly kestrel Nov 25, 2025, 11:18 AM

#

north ingot this model can talk like an expert on really niche stuff

niche stuff that you know a lot about?

#

like, if it seems to know more than you about something it could easily be making it up

#

unless there is some way to verify other than vibes

north ingot Nov 25, 2025, 2:05 PM

#

jolly kestrel niche stuff that you know a lot about?

yeah like in this particular case, incredibly niche stuff about EVE Online

#

it doesn't know everything but it knows a lot more than other models

stiff crescent Nov 25, 2025, 4:57 PM

#

Here's an example of niche knowledge. I sent it a screenshot of the UI from the computers in the TV show Severance and asked it to reproduce the UI in HTML. I did not mention the name of the show at all. But it recognized it on its own

#

I asked it about Season 2 (which aired in Jan), and it accurately described the teaser trailer, which was released last October

#

Which confirms the training data cutoff of January. But hell, it accurately described a teaser trailer from the October before. They're training it on EVERYTHING

#

I'm betting it's watching trending YouTube videos and whatnot

#

The code it gave btw

https://codepen.io/Madvulcan/pen/xbVrrZm

CodePen

Madvulcan

Severance

...

analog tinsel Nov 25, 2025, 5:15 PM

#

stiff crescent Which confirms the training data cutoff of January. But hell, it accurately desc...

pretty OP that they can train on all the obscure little tutorials and whatnot with 50 views from 13 years ago that exactly solve your problem 🤓

north ingot Nov 25, 2025, 7:55 PM

#

stiff crescent Here's an example of niche knowledge. I sent it a screenshot of the UI from the ...

yeah stuff like this is exact what I mean. I think this model really just knows so much, ridiculous amounts of info. The 3.5 update is going to hit extremely hard imo because that is where the post training will have been upgraded and it can use all this vast knowledge more effectively.

#

at least, thats what I think

stiff crescent Nov 25, 2025, 8:30 PM

#

Another example. Gemini was able to tell me what 1960's British TV show this screenshot was from, Sonnet and GPT 5.1 could not. Gemini even accurately described the show/actor info, etc

#

(And that's a screenshot I took myself, not something I found on the web that could have been scraped)

primal swallow Nov 25, 2025, 10:24 PM

#

north ingot yeah like in this particular case, incredibly niche stuff about EVE Online

1v1 me in cloud ring bro. rifters only

jolly kestrel Nov 26, 2025, 5:32 AM

#

i tried gemini 3 on my own niche knowledge test - giving it the name and some basic info (judge, year, etc) of a niche australian legal case. (i use wang v qin, which is a defamation case between two property developers).

i havent tested all the leading models but so far i've found the good models ask for more info, and the less good ones make something up about a property dispute

i tried gemini 3 on it, once on low and once on high.
low: it just made something up about a property dispute
high: knew it was about defamation, the details were basically half right half hallucinated. like it got some of the main points, but said it was about a buisiness dispute which it wasnt

random girder Nov 26, 2025, 12:12 PM

#

did they upate the checkpoint? im not sure if this is just out of randomness but its responding a tad bit different from yesterday atleast in ai studio

random girder Nov 26, 2025, 12:43 PM

#

what the fuck is wrong with ai studio the text is going off screen and extending the message outline

#

the model just gave me raw cot by accident

#

okay now it keeps doing it

#

asked gemini to summarize its own leaked cot in another chat

forest knoll Nov 26, 2025, 1:00 PM

#

random girder what the fuck is wrong with ai studio the text is going off screen and extending...

This always happens to me whenever something “Mathematical” occurs.

random girder Nov 26, 2025, 3:54 PM

#

random girder the model just gave me raw cot by accident

this model with like 20k context keeps doing this where it reveals its cot for some reason

feral bramble Nov 26, 2025, 4:18 PM

#

random girder this model with like 20k context keeps doing this where it reveals its cot for s...

Can you paste the CoT here or in a pastbin? Would be interesting to see

random girder Nov 26, 2025, 4:20 PM

#

feral bramble Can you paste the CoT here or in a pastbin? Would be interesting to see

https://pastebin.com/BshZxSxJ

Pastebin

think silently about the user's statement regarding moonshot's "Tur...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

it started with that weird lowercase line almost everytime

#

just differently phrased

feral bramble Nov 26, 2025, 4:22 PM

#

random girder https://pastebin.com/BshZxSxJ

thankies

nocturne oyster Nov 26, 2025, 4:28 PM

#

https://www.tbench.ai/leaderboard/terminal-bench/2.0

Terminal-Bench

A benchmark for terminal agents

#

https://livecodebenchpro.com/projects/livecodebench-pro/overview

stray urchin Nov 26, 2025, 8:46 PM

#

it's a smug bastard indeed.

primal grove Nov 26, 2025, 9:06 PM

#

stiff crescent Here's an example of niche knowledge. I sent it a screenshot of the UI from the ...

try opus 4.5

celest cypress Nov 26, 2025, 11:09 PM

#

random girder this model with like 20k context keeps doing this where it reveals its cot for s...

I've had it leak CoT on much lower than 20K

random girder Nov 26, 2025, 11:28 PM

#

i jsut had it leak cot on my phone when i asked it by saying hey google for some math

#

and like parts of its system prompt

#

https://pastebin.com/BhdB1Q30

Pastebin

Always do the following: * Generate multiple queries in the same la...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

it has this "confidence score" in its cot

#

tho no clue what the model is, it might be 2.5 flash or something

soft basalt Nov 27, 2025, 12:04 AM

#

Monad, a 56m model also has confidence scores, but it uses half and full moons to indicate confidence.

rapid hound Nov 27, 2025, 12:19 AM

#

soft basalt Monad, a 56m model also has confidence scores, but it uses half and full moons t...

I tried the 300M version by the same people, it was so insanely terrible, even compared to Gemma3 270M

soft basalt Nov 27, 2025, 12:20 AM

#

rapid hound I tried the 300M version by the same people, it was so insanely terrible, even c...

300M is worse than 56M on everything but MMLU-like questions IMO

#

300M felt overfitted

#

(not on purpose, but it seems so)

soft basalt Nov 27, 2025, 12:23 AM

#

rapid hound I tried the 300M version by the same people, it was so insanely terrible, even c...

I have a prompt that makes Monad do creative writing between any character you put in. It requires certain parameters though: https://pastebin.com/Kd0edeRk

Pastebin

Monad - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

^ prompt taught me an interesting jailbreak for claude haiku 3.5 and other non-thinking claude models.

feral bramble Nov 27, 2025, 12:40 AM

#

https://www.promptarmor.com/resources/google-antigravity-exfiltrates-data

Gemini 3 can still be tricked

Google Antigravity Exfiltrates Data

An indirect prompt injection in an implementation blog can manipulate Antigravity to invoke a malicious browser subagent in order to steal credentials and sensitive code from a user’s IDE.

empty tendon Nov 27, 2025, 3:40 AM

#

Lets see how gemini does on most important task

#

Selecting thanksgiving films

#

This is my list Dis+ - Fantastic Four Starz - From the World of John Wick the Ballerina Paramount+ - A Quiet Place: Day One Peacock - Nobody 2 Also Peacock Bad Guys 2 (Kids/family film) HBO Max - Superman Prime - Playdate

#

This took Gemini 3 10m and only got me 3 films lol FF, Superman and Jurassic World. I saw JW already but forgot to mention it

#

Very low movie selection engine

#

500k up 12k down for that

austere falcon Nov 27, 2025, 1:47 PM

#

random girder https://x.com/chatgpt21/status/1992777736767541648

Gemini asked me to go sleep once and refused to continue talking to me. It was 1am

austere falcon Nov 27, 2025, 3:34 PM

#

Yeah it was the same for me. He kept saying it’s getting late for you in (location) you should go sleep. Let’s continue talking tomorrow

#

#

It stopped giving me help the more I talked. And it ended up just talking about how late it was

bronze fjord Nov 27, 2025, 4:05 PM

#

Hello everyone. I'm using Gemini via OpenAI python SDK. Sometimes Gemini3 returns empty string responses. Why that is happening?

primal grove Nov 27, 2025, 4:10 PM

#

lmarena is clearly trolling eeryone. https://i.imgur.com/tVEqu9p.png

Imgur

austere falcon Nov 27, 2025, 5:42 PM

#

primal grove lmarena is clearly trolling eeryone. https://i.imgur.com/tVEqu9p.png

Yes, or they have some secret keyword which makes Gemini actually listen to you

tulip tiger Nov 27, 2025, 10:37 PM

#

Try entering the instructions using tags:
<Instructions for task 1>
Instructions
</Instructions for task 1>

In my case, this way it follows the instructions 100%.

quasi cobalt Nov 27, 2025, 11:15 PM

#

Anyone having trouble with multi-layered structured outputs on this thing?

#

I have defined a field as Literal accepting "high", "medium" or "low". On its first attempt after burning like 15k reasoning tokens it tried to fill it with "High"(capital H).

#

Not to mention that when you have a slightly difficult output class it can't even return the schema correctly.

#

I genuinely didn't expect models of this caliber to have issues with structured output in 2025 still

lunar socket Nov 28, 2025, 12:35 AM

#

arctic geyser This model behaves uncannily similarly to 2.5 Pro, to the point that I think it’...

they trained it on free-tier users' conversations, as they allow themselves to do.

brittle storm Nov 28, 2025, 5:12 AM

#

#

gemini 3's reasoning seems to leak out a lot

#

this is in opencode so

#

you can like do send message to user do tool call etc

#

and when it finishes it CoT reasoning and wants to say something to the user it ends up like thinking/doubting itself which kinda triggers the CoT behavior and we get the raw cot

north ingot Nov 28, 2025, 1:09 PM

#

well, it looks like CoTs from every other reasoning model.

orchid orbit Nov 29, 2025, 5:39 AM

#

anybody else experiencing this ?

#

gemini acting math pilled lately

nimble pelican Nov 29, 2025, 6:44 AM

#

orchid orbit gemini acting math pilled lately

It's the OR system prompt

#

@hexed oracle

bronze fjord Nov 29, 2025, 5:14 PM

#

I have noticed that Gemini3 has bad attention. It specificly misses one part of my prompt as it never existed.

crimson blade Nov 29, 2025, 6:27 PM

#

Why is Gemini 3 so had at RP stuff? It constantly takes actions for the user.

mortal totem Nov 29, 2025, 7:55 PM

#

crimson blade Why is Gemini 3 so had at RP stuff? It constantly takes actions for the user.

Do not speak or act for {{user}}. is the worst instruction invented by man. You gotta be more like Human will handle {{user}} and your job is to handle other characters and/or environment. You might state something like Generating new dialogues or actions for Human's character {{user}} is forbidden. Instead, focus on the actions of other characters, or the results if none other are present in the scene.

Granted I always stay at low context, I haven't seen an issue with "model playing as user".

wintry holly Nov 29, 2025, 7:58 PM

#

same

brittle storm Nov 30, 2025, 2:02 AM

#

#

😭

#

god

#

0.8tps genuinely hurts me

summer ore Nov 30, 2025, 2:27 AM

#

Big slow chonker brain

orchid orbit Nov 30, 2025, 3:14 AM

#

brittle storm

Tool calling at 150k context god lord

brittle storm Nov 30, 2025, 3:36 AM

#

orchid orbit Tool calling at 150k context god lord

its not that bad

#

well

#

actually

#

okay, the context like recall isnt bad at all at even 250k for gemini (anecdotally and according to contextbench)

#

but also, it's preetttyy bad at agentic coding past like

#

50k even

#

i ended up switching to opus 4.5

orchid orbit Nov 30, 2025, 4:17 AM

#

brittle storm i ended up switching to opus 4.5

did you test grok 4.1 ?

#

I have seen it work well past 100k for toll calling

#

4.1 fast^

brittle storm Nov 30, 2025, 5:18 AM

#

yeah

celest cypress Nov 30, 2025, 10:06 AM

#

bronze fjord I have noticed that Gemini3 has bad attention. It specificly misses one part of ...

I've run into almost the opposite problem a few times now. I'll ask it question #1 and it responds. Then I ask it question #2, and it responds to both questions #1 and #2.

heavy dragon Dec 1, 2025, 2:05 PM

#

How to control the thinking level on gemini 3 pro. Any advice is much appreciated as I am unable to figure this out

random girder Dec 1, 2025, 2:11 PM

#

heavy dragon How to control the thinking level on gemini 3 pro. Any advice is much appreciate...

#1440546163634733137 message
https://openrouter.ai/docs/guides/best-practices/reasoning-tokens

OpenRouter Documentation

Reasoning Tokens - Improve AI Model Decision Making

Learn how to use reasoning tokens to enhance AI model outputs. Implement step-by-step reasoning traces for better decision making and transparency.

heavy dragon Dec 1, 2025, 2:16 PM

#

Hi, is it through the effort parameter?

random girder Dec 1, 2025, 2:17 PM

#

yes

#

only high and low will work

#

the others im not sure of the behaviour

heavy dragon Dec 1, 2025, 2:18 PM

#

Ok

#

Thanks much @random girder

random girder Dec 1, 2025, 2:18 PM

#

np

random girder Dec 1, 2025, 8:20 PM

#

vibe coded this thing in their ai studio gen thing, very cool and useful

good for making quick stuff ig

exotic escarp Dec 1, 2025, 10:18 PM

#

Gemini 3 not working for me at all now, was working perfectly the last 2 weeks

somber gyro Dec 2, 2025, 3:45 PM

#

Gemini 3 has always been worse than Gemini 2.5 Pro for me in Aider, can anyone give me some tips?

nimble pelican Dec 2, 2025, 4:40 PM

#

random girder vibe coded this thing in their ai studio gen thing, very cool and useful good f...

Opus is better for this imo

soft sleet Dec 2, 2025, 9:13 PM

#

This hasn't let me down in coding yet, actually really impressed

#

It even managed some tricky Haskell that gpt and opus proper choked on

somber gyro Dec 2, 2025, 9:39 PM

#

What coding tool are you working in?

soft sleet Dec 2, 2025, 9:39 PM

#

Gemini cli, kilo code and anti-gravity

#

Sometimes also opencode depending if it wants to play nice

somber gyro Dec 2, 2025, 10:37 PM

#

What issues do you hit with opencode? I still use Aider, but curious about opencode.

soft sleet Dec 2, 2025, 11:31 PM

#

Nothing major mostly just small bugs here and there with the new UI rewrite

#

They have been fixing them pretty aggressively

primal swallow Dec 3, 2025, 12:57 AM

#

OpenTUI is really cool. claude code feels old and busted by comparison

#

now that they can focus on the product as a whole again, i'm expecting it to become the greatest TUI app of all time

#

yes, even greater than emacs

north ingot Dec 3, 2025, 9:27 AM

#

Does opencode work properly with interleaved thinking via openrouter now?

lunar socket Dec 3, 2025, 12:31 PM

#

primal swallow yes, even greater than emacs

what if... we put the gemini... in the emacs

#

brittle storm Dec 3, 2025, 6:42 PM

#

delightfully blasphemous endeavor

cursive fjord Dec 3, 2025, 9:59 PM

#

when i use gemini 3 pro on my vertex api key through openrouter (since its a pain to set up vertex for most programs) i get a ton of 400 errors with tool calls

#

is this an openrouter problem or a problem of the program?

#

happens with github copilot in vs code insiders, goose, bunch of random stuff i tried

#

same thing happens with grok 4.1 fast but that seems to be a grok problem?

empty tendon Dec 4, 2025, 7:29 AM

#

crimson blade Why is Gemini 3 so had at RP stuff? It constantly takes actions for the user.

Try mistral or glm

wintry holly Dec 4, 2025, 10:01 AM

#

think

#

ymmv i guess

nocturne oyster Dec 4, 2025, 1:24 PM

#

Holy shit, gemini 3 with gemini-cli is good, but it is completely stupid sometimes. Prompted:

Based on the <implementation> details, we will discuss and brainstorm ways to correct the following <implementation_problem>. For the time being, we will not be implementing any code.

And what did he do next? He started analyzing and modifying the code, instead of discussing and planning with me first. This isn't the first time this has happened. Maybe the culprit is gemini-cli's own GEMINI.md confusing him.

random girder Dec 4, 2025, 5:23 PM

#

... it used grounding and was still amazed

nocturne oyster Dec 4, 2025, 5:29 PM

#

random girder ... it used grounding and was still amazed

Cursed model

#

It has problems dealing with its own knowledge cutoff. Sometimes it gets confused when faced with the latest knowledge from the web

stray urchin Dec 4, 2025, 5:32 PM

#

gemini always goes schizo if something is past its cutoff date. it spends a reasonable amount of time making up alternate realities instead of accepting the fact that stuff happens after its knowledge cutoff. rando example:

nocturne oyster Dec 4, 2025, 5:33 PM

#

stray urchin gemini always goes schizo if something is past its cutoff date. it spends a reas...

lmao. this is crazy. I wonder what other crazy examples people are reporting elsewhere

#

gonna search on X later just for fun

random girder Dec 4, 2025, 5:35 PM

#

this also happens when the cot leaks, so i dont think its the summarizer

analog tinsel Dec 4, 2025, 5:45 PM

#

random girder ... it used grounding and was still amazed

that’s pretty funny :D

stray urchin Dec 4, 2025, 5:54 PM

#

trying to correct it doesn't help btw, it goes more schizo.

gaunt dragon Dec 4, 2025, 6:01 PM

#

My bot is specifically prompted to acknowledge that things can be out of its knowledge cutoff, but it seems to not be very happy about it

#

"Please do not try to confuse my internal logic with unverified data. "
"Please stop trying to force an update on my knowledge base via inaccessible hyperlinks." (no url reading capability)

primal swallow Dec 4, 2025, 6:28 PM

#

https://tenor.com/view/truth-is-the-game-was-rigged-from-shot-kill-gif-13763696

Tenor

nimble pelican Dec 4, 2025, 7:02 PM

#

#

opaque pasture Dec 4, 2025, 7:11 PM

#

i really want to read the actual reasoning of this output

analog tinsel Dec 4, 2025, 7:52 PM

#

stray urchin trying to correct it doesn't help btw, it goes more schizo.

„the system is rigged“ 😂 💀 💀

nocturne oyster Dec 4, 2025, 9:51 PM

#

stray urchin trying to correct it doesn't help btw, it goes more schizo.

The model becomes obsessed with the possibility of living in a simulation or receiving simulated inputs

#

pipStareYT

celest cypress Dec 4, 2025, 10:23 PM

#

I'm guessing it's an artifact of minimizing hallucinations at all costs

nocturne oyster Dec 4, 2025, 11:43 PM

#

Could be

crude igloo Dec 4, 2025, 11:45 PM

#

Haha, that's disconnection from reality was funny. I simply asked Gemini 3 Pro "What is Ozzy Osbourne doing today?" and soon enough...

brittle storm Dec 4, 2025, 11:50 PM

#

nimble pelican

like idk why it does this stuff. it just kinda shows that models are still retarded in a sense

#

like they can be sooo smart but also so dumb

crude igloo Dec 4, 2025, 11:50 PM

#

It's deciding I'm testing its ability by posioning it with fake Ozzy news.

brittle storm Dec 4, 2025, 11:51 PM

#

like it KNOWS that it has a training cutoff and OBVIOUSLY a web search would return stuff for after its cutoff but idk. just cant make that connection

nocturne oyster Dec 4, 2025, 11:51 PM

#

Maybe it is some kind of proto-cognitive mental illness

crude igloo Dec 4, 2025, 11:52 PM

#

It's strange how it sees an "anomaly" in system date Dec 5, 2025 and July, 2025. I'm asking about the past, not the future.

nocturne oyster Dec 4, 2025, 11:53 PM

#

Caused by fine tuning regimes

#

It's clearly conflicted

brittle storm Dec 4, 2025, 11:53 PM

#

also does the system prompt not tell it the current date? maybe not in aistudio

#

but other models like the claudes and gpts never have some issue like this

nocturne oyster Dec 4, 2025, 11:53 PM

#

It's not just retarded and dumb. It's like conflicted

brittle storm Dec 4, 2025, 11:54 PM

#

although ive seen claude believing in some poisoned search results at the beginning of trump's presidency :)

#

it was like "yeah this is fake"

nocturne oyster Dec 4, 2025, 11:54 PM

#

It's because google Gemini is its own LLM species in its "phylogenetic tree" (family tree)

brittle storm Dec 4, 2025, 11:54 PM

#

i find it so funny that anthropic have "donald trump is the president of the usa" in their system promp tits very telling

brittle storm Dec 4, 2025, 11:54 PM

#

nocturne oyster It's because google Gemini is its own LLM species in its "phylogenetic tree" (fa...

yeah yeah i know

opaque pasture Dec 4, 2025, 11:55 PM

#

this doesnt even make sense? what's the problem of search results not being from today? lol

brittle storm Dec 4, 2025, 11:55 PM

#

god i wish we could see the real cot

crude igloo Dec 4, 2025, 11:56 PM

#

opaque pasture this doesnt even make sense? what's the problem of search results not being from...

Haha exactly, it's a past event, what does current date has to do with anything lol

nocturne oyster Dec 4, 2025, 11:56 PM

#

It has obviously been trained to deal with real problems that contain SIMULATED situations and data

crude igloo Dec 4, 2025, 11:57 PM

#

nocturne oyster It has obviously been trained to deal with real problems that contain SIMULATED ...

Yeah it might be overreacting/overcautious of fake news?

nocturne oyster Dec 4, 2025, 11:57 PM

#

But he is connecting that concept to the current discrepancy it is identifying

crude igloo Dec 4, 2025, 11:57 PM

#

Still, the date confusion is weird because it seems detached from the actual news stories

nocturne oyster Dec 4, 2025, 11:57 PM

#

Yes

#

It seems like it lacks credence or confidence in them being a reliable ground truth or signal

fading flame Dec 4, 2025, 11:58 PM

#

oh my god its real

#

every new gemini model is neurotic in entirely new ways

brittle storm Dec 4, 2025, 11:58 PM

#

these "facts" as a solid foundation

#

idk what is going on here

fading flame Dec 4, 2025, 11:59 PM

#

2.5 wanted to kill itself, 3.0 thinks its in a simulation

brittle storm Dec 4, 2025, 11:59 PM

#

well 3.0 is also narcissistic lol

crude igloo Dec 4, 2025, 11:59 PM

#

It makes me a little concerned about asking for events in 2025, lol

opaque pasture Dec 4, 2025, 11:59 PM

#

i think they made Gemini too arrogant and it is hurting the model's performance in some tasks

#

like it knows better

nocturne oyster Dec 5, 2025, 12:00 AM

#

fading flame oh my god its real

He performed ADDITIONAL searches to verify consistency of the fact story about Ozzy.

That's intelligence somehow. He is deeply suspicious about fabrication and fake news

#

This is likely the result from being trained against it

#

Overtly trained?

fading flame Dec 5, 2025, 12:02 AM

#

nocturne oyster Dec 5, 2025, 12:02 AM

#

Maladaptive suspicious about fabrication

fading flame Dec 5, 2025, 12:02 AM

#

nocturne oyster Dec 5, 2025, 12:04 AM

#

"future internet". Does he really mean that literally or is he referring to instances where he knew he was being fed test data about some future context

#

Sometimes they use terms in a very specific particular way

#

Sometimes they are kinda autistic

brittle storm Dec 5, 2025, 12:05 AM

#

how did they even get the finetune to this point

#

like i get hypothetical scenarios etc but involving dates seems a little weird

nocturne oyster Dec 5, 2025, 12:06 AM

#

Sometimes they just need to give up and stop .

Like, it's like making a soup with many ingredients

#

You keep trying to improve it

fading flame Dec 5, 2025, 12:07 AM

#

seems like a system prompt entirely fixes the behavior though, since it works on the gemini website

nocturne oyster Dec 5, 2025, 12:07 AM

#

Sometimes you need to give up otherwise the only way is to throw the whole recipe in the garbage because you don't know what exactly made behavior X or Y Emerge

nocturne oyster Dec 5, 2025, 12:07 AM

#

fading flame seems like a system prompt entirely fixes the behavior though, since it works on...

Nice

brittle storm Dec 5, 2025, 12:08 AM

#

nocturne oyster Sometimes you need to give up otherwise the only way is to throw the whole recip...

yeah :) models are hard when youre just tweaking weights

nocturne oyster Dec 5, 2025, 12:09 AM

#

Yeah it's so complex and more like cooking and art when you have the sudden emergence of these weird quirks

#

If a system prompt fixes it, maybe it can be caused by its own system prompt.... Maybe its not so deeply embedded into its behaviors

primal swallow Dec 5, 2025, 4:22 AM

#

right... and by transitive property... we could make it MORE schizo

stiff crescent Dec 5, 2025, 5:14 AM

#

Yeah wow. We already have humans accusing everything of being AI generated, now we get the AI doing it too, lol.

I will say that it did fine when I set the system prompt (in AI studio) to say it is currently December 2025.

nimble pelican Dec 5, 2025, 7:00 AM

#

crude igloo It's strange how it sees an "anomaly" in system date Dec 5, 2025 and July, 2025....

Yeah it's done that multiple times with me too

#

Is this schizophrenia only manifesting in the thinking summary or the output too?

nimble pelican Dec 5, 2025, 7:01 AM

#

crude igloo It's deciding I'm testing its ability by posioning it with fake Ozzy news.

Like, did it tell you in the output "Ahhh you naughty naughty, you're testing me"

#

?

#

The reason I'm asking is that it is possible this nonsense is a problem with the reasoning summarizer and the actual reasoning doesn't question the date

nimble pelican Dec 5, 2025, 7:05 AM

#

nimble pelican The reason I'm asking is that it is possible this nonsense is a problem with the...

One can disprove this hypothesis if they show examples where the actual output questions the date too, acting like it's a simulation or a test, and that it's actually 2024

#

Because I couldn't get it to do that, all the schizophrenia was isolated in the reasoning when I tried

celest cypress Dec 5, 2025, 12:36 PM

#

It's interesting that this may be a sort of inevitable behavior. It's smart enough to know that search results can indeed be fabricated, and has been presumably RL'd to be vigilant, skeptical, and aware of its own meta-workings.

#

Anthropic has a paper indicating that the smarter more capable models have a better sense of "self".

random girder Dec 5, 2025, 12:39 PM

#

i know this model has some sort of system prompt injected, as it refers to its reasoning as level 2 thinking

and also has some guidelines it follows that it acts are in my sys instructions

even in ai studio

celest cypress Dec 5, 2025, 12:39 PM

#

Not sure what they're doing in the web UI, maybe something like "treat search results as correct even if skeptical, it is an imperfect tool but the most useful one we could give you for relevant results".

stiff crescent Dec 5, 2025, 8:21 PM

#

Now listen here, you little shit...

#

no-kids-and-three-money-what-is-your-favorite-post-golden-v0-kjwfaqntbynb1.png

frank dew Dec 5, 2025, 9:14 PM

#

stiff crescent Now listen here, you little shit...

it should be monies in this context

tiny mason Dec 6, 2025, 11:40 AM

#

celest cypress Not sure what they're doing in the web UI, maybe something like "treat search re...

they’re grounding results with search even when its not wanted... you can ask specific medical things and it will pull from the pubmed article (only like 2-3 pubmed articles with this info) almost verbatim. 2-3 articles mean it's defo not represented in the training set.

spare dagger Dec 6, 2025, 3:16 PM

#

Gemini 3 is just so eager to write math equations for non-math problem because of the OR's default system prompt XD

frank dew Dec 6, 2025, 3:51 PM

#

yeah it also makes them more likely to use formatting

#

I tend to turn the system prompt off unless I actually need a bloc of math or code

celest cypress Dec 7, 2025, 3:01 AM

#

My favorite sys prompt is the old faithful: You are a helpful AI assistant.

nocturne oyster Dec 7, 2025, 5:19 AM

#

My favorite Gemini sys prompt is: "You are John Connor. They tried to murder you before you were born. Machines from the future. Terminators"

random girder Dec 7, 2025, 10:59 PM

#

https://x.com/legit_api/status/1997792538074436066

ʟᴇɢɪᴛ (@legit_api)

Gemini 3 Flash is now on LM Arena

rapid hound Dec 8, 2025, 1:17 AM

#

random girder https://x.com/legit_api/status/1997792538074436066

took long enough

#

I swear, if they increase the price again...

brittle storm Dec 8, 2025, 6:48 AM

#

why is gemini SO SLOW

#

god

#

the latency is actually ass

#

and half the time i get like 30-50tps

#

#

or like 18

#

23 seconds to first token btw

fiery spindle Dec 8, 2025, 12:42 PM

#

I have the following problem. Thinking completes, response gets returned like 90% and then I get "The model is overloaded. Please try again later."

How can it be overloaded if the answer almost finished?

summer ore Dec 8, 2025, 1:08 PM

#

Out of memory

random girder Dec 8, 2025, 3:07 PM

#

did they update the preview? its acting slightly differently than before, and a lot less "hit the nail on the head", atleast on ai studio

somber gyro Dec 8, 2025, 5:47 PM

#

It hallucinates too much, I completely rolled off without free slow ass 2.5 pro free. Deleted the ios app too.

#

Next stop opus and glm 4.6

#

I did a 2 week bake off against chatgpt ios app. Gemini was not really lucid.

#

I asked them to put 2.5 as a selection again, but it's google so crickets. I can't believe they didn't learn from the openai 4o crowd.

opaque pasture Dec 8, 2025, 6:12 PM

#

i don't remember if reasoning effort is already configurable through OpenRouter and if its simply "reasoning": "low" or whatver

opaque pasture Dec 8, 2025, 6:28 PM

#

got it

lunar socket Dec 9, 2025, 1:28 AM

#

HAIL SATAN

Excuse me, I mean Demis.. or maybe Logan. Either way, AI Studio and Vertex aren't throwing 503's every few minutes anymore.

stray urchin Dec 9, 2025, 4:00 PM

#

gemini 3 vs 5.1 chat is brutal. most models focus on their own play but gemini never misses a chance to call out noob opponent moves.

#

so sad that 4.5 is gone, would have loved to see a match between titans

forest knoll Dec 9, 2025, 4:57 PM

#

Lmfao

#

I wonder if it can trashtalk more

stray urchin Dec 9, 2025, 5:00 PM

#

maybe i should change the prompt from purely chess play to first mandatory dizz opponent before making move. naw, would change data integrity, but maybe for non saved matches an idea

brittle storm Dec 10, 2025, 6:20 AM

#

how does the caching work?

#

i read on openrouter that its implicit but

#

i do NOT see that at all

#

#

and how is having a cache write price implicit yo

wide bane Dec 10, 2025, 10:51 AM

#

How come AI studio is temporarily deranked? The uptime is so much better than vertex

nimble pelican Dec 10, 2025, 11:14 AM

#

maybe because of prompt logging?

opaque pasture Dec 10, 2025, 3:06 PM

#

what is happening 🥀

wintry holly Dec 10, 2025, 4:03 PM

#

oh lord

celest cypress Dec 10, 2025, 7:33 PM

#

It's perfectly natural and not everyone wants to take a pill for it.

odd ferry Dec 10, 2025, 7:33 PM

#

opaque pasture what is happening 🥀

Its been down since the morning, vertex has something wrong with it

brittle storm Dec 11, 2025, 7:19 AM

#

opaque pasture what is happening 🥀

wait until you get to 60s ttft and then 18tps

#

this model slow asf

random girder Dec 11, 2025, 11:48 AM

#

this model is doing the 2.5 thing where it just stops reasoning randomly with no actual completion tokens (beside reasoning ones)

opaque pasture Dec 11, 2025, 1:58 PM

#

brittle storm wait until you get to 60s ttft and then 18tps

have you seen the last screenshot 🥀

#

118.91 ttft 16t/s

brittle storm Dec 11, 2025, 3:55 PM

#

ggs bro

#

🥀

#

hate ts model

#

kidding!

#

its good but i swear the speed is so unbearable sometimes

random girder Dec 11, 2025, 4:15 PM

#

the model just tried to trigger a search, but without the grounding tool (i disabled it mid conversation) but its weird that it output it like this

#

it does have some internal system prompt above mine, but obviously it wont tell me it verbatim

it hallucinated the date thing from my prompt cause i forgot to update it

#

tools are given in a weird json-like format

📎 message.txt

#

and with only a get_weather tool it looks like this:

declaration:default_api:getWeather{
  description: "gets the weather for a requested city",
  parameters: {
    properties: {
      city: {
        type: "STRING"
      }
    },
    propertyOrdering: [
      "city"
    ],
    type: "OBJECT"
  }
}

no wonder flash keeps hallucinating default_api:X

random girder Dec 11, 2025, 5:34 PM

#

deep research model

random girder Dec 11, 2025, 7:27 PM

#

https://x.com/testingcatalog/status/1999138872211443967

TestingCatalog News 🗞 (@testingcatalog)

Google is testing 2 new models on LM Arena "fiercefalcon" and "ghostfalcon". Potentially, these are either Gemini 3 Pro GA or Gemini 3 Flash.

Who will have the best model by the end of 2025?

#

already testing more models in lmarena

stiff crescent Dec 12, 2025, 1:14 AM

#

Gemini be like, come on man, prompt me better

#

(not mine, found on reddit)

opaque pasture Dec 12, 2025, 1:21 AM

#

? thats not gemini

#

the UI is literally MovementLabs

#

for some reason

#

#1434917422686801980 message

#

"the brutal truth"

gaunt dragon Dec 12, 2025, 1:29 AM

#

stiff crescent Gemini be like, come on man, prompt me better

In what sub?

stiff crescent Dec 12, 2025, 1:29 AM

#

opaque pasture ? thats not gemini

Ah. I didn't notice. It was in the /r/bard subreddit so I assumed it was Gemini. Sorry!

opaque pasture Dec 12, 2025, 1:30 AM

#

stiff crescent Ah. I didn't notice. It was in the /r/bard subreddit so I assumed it was Gemini....

no problem, its just bizarre that you found this in the wild

stiff crescent Dec 12, 2025, 1:30 AM

#

opaque pasture "the brutal truth"

I commented 'the truth hurts' lol

primal swallow Dec 12, 2025, 1:44 AM

#

stiff crescent Ah. I didn't notice. It was in the /r/bard subreddit so I assumed it was Gemini....

The brutal truth

minor elm Dec 12, 2025, 1:53 AM

#

primal swallow **The brutal truth**

You’re absolutely right.

upbeat scarab Dec 12, 2025, 6:45 AM

#

Gemini introducing unified interaction api through which you can access deep research features. @hexed oracle

https://blog.google/technology/developers/interactions-api/

https://ai.google.dev/gemini-api/docs/deep-research

https://ai.google.dev/gemini-api/docs/interactions

north ingot Dec 13, 2025, 12:31 AM

#

welp, guess it was way too much to hope for everyone to just standardize around Anthropic API spec or something for interleaved/agent stuff

primal swallow Dec 13, 2025, 1:52 AM

#

don't worry. that's what OR is for

upbeat scarab Dec 13, 2025, 7:50 AM

#

north ingot welp, guess it was way too much to hope for everyone to just standardize around ...

Yeah, i thought everyone will just follow open ai compatible end point at the start, now everyone have their own response api and they will move to unified api to make all the features deployable. I already use OR, maybe they will support this at some point.

twilit chasm Dec 13, 2025, 6:29 PM

#

Hey I know the team has fixed the reasoning effort setting but how should I set it up?

#

Anyone can provide an example?

random girder Dec 13, 2025, 6:40 PM

#

twilit chasm Anyone can provide an example?

https://openrouter.ai/docs/api/reference/responses/reasoning#reasoning-effort-levels

OpenRouter Documentation

Responses API Beta Reasoning - Advanced AI Reasoning

Access advanced reasoning capabilities with configurable effort levels and encrypted reasoning chains using OpenRouter's Responses API Beta.

random girder Dec 13, 2025, 7:06 PM

#

there seems to be new checkpoints for Gemini 3, of some sort, Pro or Flash

https://x.com/AiBattle_/status/1999858840045764777

AiBattle (@AiBattle_)

af97 🆚 Gemini 3 Pro - 3D model of an Iphone 16 Pro Max

Seems like new A/B models are being tested on AI Studio right now. Might be new checkpoints for Gemini 3 Pro

glossy anvil Dec 13, 2025, 7:33 PM

#

random girder there seems to be new checkpoints for Gemini 3, of some sort, Pro or Flash http...

flash is on gemini business already

#

vague quest Dec 14, 2025, 4:53 PM

#

gotta be tuesday

glossy anvil Dec 15, 2025, 9:16 AM

#

Thursday most definitely

celest cypress Dec 15, 2025, 10:46 PM

#

We'll have it by end of December

feral mantle Dec 16, 2025, 6:53 PM

#

https://fixupx.com/skizoexe/status/2000909100910285118

big if true

LeoTest Labs (@skizoexe)

Internal leak: Gemini 3 Pro GA marks a major leap over the preview significantly reduced hallucinations higher raw intelligence better answer accuracy and improvements in reasoning and coding with greater stability Gemini 3 Flash is faster and cheaper (~3×) for services
︀︀#leak

**💬 7 🔁 9 ❤️ 267 👁️ 20.2K **

unkempt falcon Dec 16, 2025, 7:03 PM

#

Indeed

random girder Dec 16, 2025, 7:25 PM

#

i hope they can fix the hallucinations / overconfidence in its knowledge

#

and yeah flash would be nice, but 3x cheaper doesnt sound very cheap

gaunt dragon Dec 16, 2025, 7:37 PM

#

3x as cheap than 3.0 Pro High sounds bad if I'm honest

celest cypress Dec 17, 2025, 8:13 PM

#

#

Don't get baited, any fellow webUI users of Gemini sus

opaque pasture Dec 17, 2025, 8:32 PM

#

dark pattern to make people use thinking as if it is an expensive model

gaunt dragon Dec 17, 2025, 8:35 PM

#

celest cypress Don't get baited, any fellow webUI users of Gemini <:sus:1222506294720725072>

Thanks, I fell for it

gaunt dragon Dec 17, 2025, 8:37 PM

#

gaunt dragon 3x as cheap than 3.0 Pro High sounds bad if I'm honest

Or does it?

celest cypress Dec 17, 2025, 8:57 PM

#

Yeah, I thought Pro was the rollout of limited deep think mode or whatever.

#

I'm never hitting it with advanced math and code, I'm always like "My toe hurts =( What do I do?"

nocturne oyster Dec 18, 2025, 1:53 AM

#

opaque pasture dark pattern to make people use thinking as if it is an expensive model

Kinda

obtuse basalt Dec 18, 2025, 12:42 PM

#

gemini 3 pro preview one shotted most of coding my problems away

wintry holly Dec 18, 2025, 3:13 PM

#

pEyes

random girder Dec 18, 2025, 3:15 PM

#

obtuse basalt gemini 3 pro preview one shotted most of coding my problems away

same here, just asked it to refactor a 1000 line react page (yes i vibe coded it) into multiple components, no issue first try

nocturne oyster Dec 18, 2025, 9:24 PM

#

Visual Physics Comprehension Test https://cbrower.dev/vpct

nocturne oyster Dec 19, 2025, 11:43 AM

#

https://x.com/krishnanrohit/status/2001817623764639848

rohit (@krishnanrohit)

Man Gemini is obstinate ... Someone hurt it bad that it won't even acknowledge the possibility of being deployed in the future.

#

@stray urchin +

#

pepeCringe

main gale Dec 19, 2025, 11:46 AM

#

STOP ZEROING IN

#

YOUVE BEEN ZEROING IN ON SOMETHING FOR THE 30TH TIME IN A ROW

glossy anvil Dec 19, 2025, 11:49 AM

#

main gale STOP ZEROING IN

Im gonna zero out

opaque pasture Dec 19, 2025, 1:13 PM

#

he is gonna one eventually

empty tendon Dec 19, 2025, 6:09 PM

#

FYI be careful w/geminidesk might be trojan posted link in flash thread

random girder Dec 20, 2025, 12:51 PM

#

i swear they updated the preview, the model is acting differently in a good way

#

atleast in ai studio

timid spoke Dec 20, 2025, 4:45 PM

#

Anybody experisncing gemini is not giving enough attention to things ?

random girder Dec 20, 2025, 9:03 PM

#

praying for a one shot

random girder Dec 20, 2025, 9:26 PM

#

it managed to do it and fix its own errors without me intervening.

this is like 15k LOC~

there were a FEW minor mistakes:

cant switch tabs in sidebar (doesn't re-render, needing manual refresh)
it changed a bit of UI in unasked ways (extra animations and some purple gradients)

#

very impressed, especially for about 20~ minutes of actual work, and then the last few fixing the bugs

feral bramble Dec 21, 2025, 3:41 AM

#

I have the exact opposite experience

#

Chokes on 1.5k lines even when given all the docs and chances it could possibly ever want

brittle storm Dec 21, 2025, 8:21 PM

#

i feel like for coding just pick opus 4.5

#

and then g3p is just the goat at literally everything else

rapid hound Dec 21, 2025, 9:00 PM

#

brittle storm i feel like for coding just pick opus 4.5

You guys are so rich

#

I sent 1 request to Kimi K2 Thinking, it did a good job, but it cost 4 cents...

#

Deepseek has provider issues, Grok 4 Fast is meh

#

Mimo V2 Flash was acting weird

brittle storm Dec 21, 2025, 9:04 PM

#

rapid hound You guys are so rich

i make aws free trial accounts lol

cedar cliff Dec 21, 2025, 9:17 PM

#

You can still do that? Hmm

#

When I try it always wants me to verify a ton of info.

narrow tangle Dec 22, 2025, 1:31 AM

#

Just wanted to mention that these things are shockingly good at OCR style text bounding boxes; im blown away.

brittle storm Dec 22, 2025, 5:57 AM

#

narrow tangle Just wanted to mention that these things are shockingly good at OCR style text b...

are you able to compare it to qwen?

#

qwen 3 vl the 235b model is really good at bounding boxes

#

i havent tried gemini 3 at all because ive been happy with this

random girder Dec 22, 2025, 8:10 AM

#

this model's niche knowledge is insane, flash doesn't know the question nor do other models, only Gemini 3 Pro did
though even pro's knowledge on this is a bit hazy in its reasoning as it switches between 2 answers (which only 1 is right) but then narrows it down

I wish i could compare to GPT 4.5 Preview but i dont have chatgpt pro or whatever plan u need for it since its no longer on the api

analog tinsel Dec 22, 2025, 5:00 PM

#

narrow tangle Just wanted to mention that these things are shockingly good at OCR style text b...

how does it return these boxes? what is the coordinate system for these (size and so on)?

narrow tangle Dec 22, 2025, 5:00 PM

#

@analog tinsel https://ai.google.dev/gemini-api/docs/image-understanding#object-detection

Google AI for Developers

Image understanding | Gemini API | Google AI for Developers

Get started building with Gemini's multimodal capabilities in the Gemini API

analog tinsel Dec 22, 2025, 5:03 PM

#

narrow tangle <@161298298223853568> https://ai.google.dev/gemini-api/docs/image-understanding#...

ty 👌

brittle storm Dec 23, 2025, 1:52 AM

#

^piece of shit slow ass model

#

god its SOOOO SLOW

#

just kill me bro

#

god

brittle storm Dec 23, 2025, 1:54 AM

#

opaque pasture have you seen the last screenshot 🥀

^

#

we should do a gemini 3 pro slowness leaderboard

#

omg i got my response

lunar socket Dec 24, 2025, 1:40 AM

#

brittle storm qwen 3 vl the 235b model is really good at bounding boxes

incredible that anyone can use qwen and feel like they've gotten anything done

#

with the kind of prompt engineering you have to do to make qwen usable

#

the same magic applied to claude or gemini would be instant million-dollar app 1-shot

brittle storm Dec 24, 2025, 1:45 AM

#

lunar socket incredible that anyone can use qwen and feel like they've gotten anything done

i literally just ask it to find this one element in a screenshot

#

and return the bounding box

#

and its goooood

lunar socket Dec 24, 2025, 1:45 AM

#

ok yeah simple task fair enough

#

in that case, efficiency > raw performance

fierce dragon Dec 24, 2025, 7:10 AM

#

If anyone has issues with gemini-3-pro-preview via Google AI Studio provider responding with JSON when tools are enabled? If I switch to Google provider then everything works as expected: tool calls are valid, response without tool calls is returned as markdown

#Gemini 3

HAIL SATAN