#general | Arena | Page 16

sturdy mica Apr 9, 2025, 5:40 AM

#

https://tenor.com/view/half-life-gman-g-man-half-life-2-rise-and-shine-gif-5438670790719714756

Tenor

balmy mist Apr 9, 2025, 5:40 AM

#

maybe for kids

sturdy mica Apr 9, 2025, 5:40 AM

#

i’m leaving this chat

balmy mist Apr 9, 2025, 5:40 AM

#

it sucks we cant have multiple tools at once

sturdy mica Apr 9, 2025, 5:40 AM

#

balmy mist maybe for kids

no. no friend, no

balmy mist Apr 9, 2025, 5:40 AM

#

sturdy mica no. no friend, no

lol

sturdy mica Apr 9, 2025, 5:40 AM

#

balmy mist it sucks we cant have multiple tools at once

it does :(

#

i wish we could, you can with api

balmy mist Apr 9, 2025, 5:41 AM

#

yeah does not make sense

sturdy mica Apr 9, 2025, 5:41 AM

#

and with the live streaming thing but that only works with 2.0 flash

balmy mist Apr 9, 2025, 5:41 AM

#

have you played aroudn with gemini 2.5 and tools?

sturdy mica Apr 9, 2025, 5:41 AM

#

yes

balmy mist Apr 9, 2025, 5:41 AM

#

how was it and what did you use?

sturdy mica Apr 9, 2025, 5:41 AM

#

i use gemini 2.5 with grounding, function calling, code execution

#

they all work how you think they would

#

why?

balmy mist Apr 9, 2025, 5:43 AM

#

i guess this is how you use multiple tools

balmy mist Apr 9, 2025, 5:44 AM

#

sturdy mica why?

i just was curious because i never played with it

#

what does code execution od?

#

do*

#

just executed what code you give it?

#

does it pop open view for it?

#

i need to dive deeper

sturdy mica Apr 9, 2025, 5:47 AM

#

balmy mist just executed what code you give it?

you can ask it to generate and run code like

#

“generate and run code that gives a random number “

#

and it will write python code and show output

#

ill screenshot rn

#

ok i cant but

#

you get yhe idea

#

just test it urself

balmy mist Apr 9, 2025, 5:52 AM

#

lmaoo yeah

#

have you tried fine tuning a model on studio?

torn mantle Apr 9, 2025, 6:15 AM

#

20 reports/day

for-now-its-only-available-for-google-one-ai-premium-users-v0-d7cjk021xqte1.png

#

quite generous

#

xd

drifting thorn Apr 9, 2025, 6:20 AM

#

ok Cline is not that bad

#

I finally overcome that 2 chapters of my character's throwback

#

with Cline and a prompt that I've editted multiple times

balmy mist Apr 9, 2025, 6:24 AM

#

torn mantle quite generous

you only get like 20 per day with openai right?

teal mantle Apr 9, 2025, 6:33 AM

#

Can you still use credits and still qualify for higher request limits?

balmy mist Apr 9, 2025, 6:35 AM

#

teal mantle Can you still use credits and still qualify for higher request limits?

did you hit the 1000 limit?

keen beacon Apr 9, 2025, 6:40 AM

#

teal mantle Can you still use credits and still qualify for higher request limits?

I don't believe so but u should really ask it in OR discord server lol

teal mantle Apr 9, 2025, 6:57 AM

#

balmy mist did you hit the 1000 limit?

Far from

torn mantle Apr 9, 2025, 7:05 AM

#

balmy mist you only get like 20 per day with openai right?

much less

#

like 2 per day or smth

#

or 1

#

xddd

#

didnt age well

oblique flint Apr 9, 2025, 7:15 AM

#

meanwhile microsoft still cant produce a frontier model themselves, they're entirely dependent on openai to do so

alpine coral Apr 9, 2025, 7:49 AM

#

oh i noticed the colour but didn't realise what it meant til now - nice / congrats! 👍
(will be good to have an active mod ha)

alpine coral Apr 9, 2025, 7:52 AM

#

torn mantle 20 reports/day

have you tested it yet? If it's as good as they're saying it is, 20/day is epic value (vs chatgpt's Deep Search offering, which is limited at like 10/month or something it feels like)

keen fulcrum Apr 9, 2025, 7:52 AM

#

torn mantle xddd

https://storage.googleapis.com/gweb-uniblog-publish-prod/original_videos/ai-mode-animation.mp4

▶ Play video

oblique flint Apr 9, 2025, 7:55 AM

#

https://www.youtube.com/watch?v=_WvtdRtG1aY

wish we had something like this for gemini and other OSes than macos

YouTube

OpenAI

Shipping code to your IDE with ChatGPT

See how OpenAI’s ChatGPT can now integrate directly with apps like IDEs to help engineers write, debug, and refactor code in real time. In this demo, we fix a checkout error and ship the fix directly to our IDE.—no copy/paste needed.

Ideal for developers and technical teams looking to enhance their daily tools with AI-powered code generatio...

▶ Play video

alpine coral Apr 9, 2025, 7:56 AM

#

torn mantle or 1

yeah even less (for Plus anyway).. it's 10/month

keen beacon Apr 9, 2025, 8:55 AM

#

gemini 2.5 flash today apparently

oblique flint Apr 9, 2025, 9:02 AM

#

keen beacon gemini 2.5 flash today apparently

sauce?

sage raptor Apr 9, 2025, 9:06 AM

#

stargazer

oblique flint Apr 9, 2025, 9:08 AM

#

how do you know its releasing today tho

north vale Apr 9, 2025, 9:16 AM

#

https://x.com/btibor91/status/1909895821589458989?s=46

Tibor Blaho (@btibor91) on X

gemini-2.5-flash-preview-04-09 + thinking_config/thinking_budget have been added to Google Gen AI Python SDK

hardy pecan Apr 9, 2025, 9:23 AM

#

cool

alpine coral Apr 9, 2025, 9:29 AM

#

north vale https://x.com/btibor91/status/1909895821589458989?s=46

oh interesting so thinking_budget (in this case 10,000) is presumably their language for the parameter that defines token allowance for 'thinking'

#

sonnet is capped at 32k tokens. oai lets you adjust via low / med / high

keen beacon Apr 9, 2025, 9:30 AM

#

i think u can do 64k

#

or with claude u can do even more by taking advantage that its a single model, and continously prefilling it lol

#

not the exact antml:thinking tag but the behavior is close enough

alpine coral Apr 9, 2025, 9:31 AM

#

lol indeed

#

yeah ik what you mean

keen beacon Apr 9, 2025, 9:32 AM

#

its interestingn to me anthropic didnt make it a special token

#

so the behavior can 'leak' easily

alpine coral Apr 9, 2025, 9:33 AM

#

as far as i can tell, there is nothing 'special' going on at all with sonnet3.7 thinking

#

it just has a 'srcatchpad' thing given to it and some system prompt

#

there doesn't seem a fundamental difference; it just does CoT reasoning (as part of its regular inference, even if it's rendered in box on the UI) and that informs its 'final' response

#

when 'thinking' is enabled / usd as the model

keen beacon Apr 9, 2025, 9:35 AM

#

alpine coral it just has a 'srcatchpad' thing given to it and some system prompt

it doesnt have comprehensive thinking instructions, just an instruction to think in antml:thinking and max thinking length where they probably tuned in several values with differing 'thinking' lengths. it was trained in to think when the response starts with antml:thinking, otherwise its normal

barren prairie Apr 9, 2025, 9:35 AM

#

keen beacon gemini 2.5 flash today apparently

I want Gemini 2.5 flash thinking not Gemini 2.5 flash 😭

keen beacon Apr 9, 2025, 9:36 AM

#

keen beacon it doesnt have comprehensive thinking instructions, just an instruction to think...

antml:thinking isnt a special token but its sanitized hard so u cant really use it (without degradation) but since it isn't a special token, <thinking> will basically be seen as the same because of pretraining associations instead of adding a new special token to the vocab

hardy pecan Apr 9, 2025, 9:37 AM

#

a new challenger:

hazy quest Apr 9, 2025, 9:38 AM

#

We have been saying that every few months for over a year lol. Remember Opus 3.0

keen beacon Apr 9, 2025, 9:41 AM

#

anthropic doesnt use a regular eot special token even though its present in the tokenizer since claude 1. (this is how this breakage occurs, conditional stop on human:)

they seem to avoid adding special tokens whenever possible to maintain pretraining knowledge throughout the window

(they replace antml:thinking with <thinking>, even though it's actually antml:thinking if the screenshot is confusing)

sage raptor Apr 9, 2025, 9:45 AM

#

is it good ?

oblique flint Apr 9, 2025, 9:48 AM

#

north vale https://x.com/btibor91/status/1909895821589458989?s=46

please be better than o3 mini at coding

harsh flume Apr 9, 2025, 9:56 AM

#

https://x.com/BoshiWang2/status/1909772639104540677

Boshi Wang (@BoshiWang2) on X

LLMs exhibit the Reversal Curse, a basic generalization failure where they struggle to learn reversible factual associations (e.g., "A is B" -> "B is A"). But why?

Our new work uncovers that it's a symptom of the long-standing binding problem in AI, and shows that a model design

#

I wonder how much (if any) research is going on in things outside of mainstream transformer architecture on the major companies

cedar tide Apr 9, 2025, 10:04 AM

#

barren prairie I want Gemini 2.5 flash thinking not Gemini 2.5 flash 😭

Its the thinking version today, and we have too thinking budget

keen beacon Apr 9, 2025, 10:52 AM

#

so 2.5 flash today

#

but i find it hard to believe that's all we'll be getting

#

they've literally had like

#

5 or 6 anonymous models on the text arena

#

and a few on webdev

keen beacon Apr 9, 2025, 10:54 AM

#

hardy pecan a new challenger:

this is a base model

#

didn't think with my first test Q (which it got right)

keen beacon Apr 9, 2025, 10:55 AM

#

keen beacon and a few on webdev

they removed all the anon google models on web dev i think

#

thats interesting

oblique flint Apr 9, 2025, 10:55 AM

#

what price do yall think 2.5 flash api is gonna be.

Current flash is 0.10$ per 1M input and 0.40$ per 1M output.

o3 mini is 1.10 per 1M input and 4.40 per 1M output

my guess 2.5 flash will be 0.2 per 1M input and 1.0 per 1M output. I think it's prolly still worse than o3 mini at math and coding but it's also a lot cheaper ofc

keen beacon Apr 9, 2025, 10:57 AM

#

keen beacon they removed all the anon google models on web dev i think

yeah i just checked

#

looks like those models may be public today, or at least one of them

#

hopefully nightwhisperer...

drifting thorn Apr 9, 2025, 11:01 AM

#

Hope nightwhisper is 2.5 Ultra

keen beacon Apr 9, 2025, 11:02 AM

#

i highly doubt they're working on an ultra model tbh but i would be happy to be proven wrong

drifting thorn Apr 9, 2025, 11:03 AM

#

Like I won’t hope that nightwhisper is a detuned version of 2.5 Pro

#

Either increasing parameter or thinking tokens

barren prairie Apr 9, 2025, 11:11 AM

#

keen beacon 5 or 6 anonymous models on the text arena

That's great with Meta we were getting 1526366373 Anonymous model to get a stupid one than

alpine coral Apr 9, 2025, 11:47 AM

#

drifting thorn Either increasing parameter or thinking tokens

i think we could see a thinking_budget API param (and like a slider in the aistudio UI)

balmy mist Apr 9, 2025, 11:54 AM

#

alpine coral i think we could see a `thinking_budget` API param (and like a slider in the ais...

thats pretty interesting

gentle plinth Apr 9, 2025, 12:01 PM

#

javascript honestly, just tell it to generate everything in one html file, just copy that and open it , no compiling, special software or anything needed and it can run on almost any pc

#

obviously c++ would be faster but its a tenfold more complicated and highly dependent upon the exact system and hardware configuration

#

i mean for example gemini 2.5 pro can generate a 3d airplane simulator without a problem in one-shot

#

all in one html file with html/js/css embeded

#

it probably uses the three.js lib I think, but dont need to ask it, it will use whatever is appropriate

brittle tiger Apr 9, 2025, 12:21 PM

#

keen beacon this is a base model

Sure it's not thinking? Would be first non-thinking model to get one of my test arc-agi problems correct and it went through a bunch of hypotheses and combined them to reach final answer

keen fulcrum Apr 9, 2025, 12:22 PM

#

Nightwhisper
when , I want it now

drifting thorn Apr 9, 2025, 12:23 PM

#

I need Deepseek R2

#

Hope DeepSeek R2 has a million token context window

keen beacon Apr 9, 2025, 12:23 PM

#

brittle tiger Sure it's not thinking? Would be first non-thinking model to get one of my test ...

well it started streaming immediately when i tested it

cloud meadow Apr 9, 2025, 12:24 PM

#

Elon is both naive and simultaneously a narcissist.

#

You could already do that I think.

#

Not sure if it's new but Gemini has started asking "Which answer is better?" on some prompts.

cloud meadow Apr 9, 2025, 12:31 PM

#

hazy quest We have been saying that every few months for over a year lol. Remember Opus 3.0

The only time an AI will get me to actually le soyjak (mouth wide open) would be if I put in an entire 1 million token project and it efficiently recodes it to another language and another stack. Maybe 3.5 years away?

drifting thorn Apr 9, 2025, 12:31 PM

#

What if AIs have unlimited thought tokens?

#

Would AI be able to make decisions on how many tokens they used for thinking? If not, then the above idea is disastrous

cloud meadow Apr 9, 2025, 12:32 PM

#

Gemini 2.5 is pretty and I'm excited for the future of coding AIs

hardy pecan Apr 9, 2025, 12:34 PM

#

https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/

Google

Ironwood: The first Google TPU for the age of inference

We’re introducing Ironwood, our seventh-generation Tensor Processing Unit (TPU) designed to power the age of generative AI inference.

#

TPU dominance

oblique flint Apr 9, 2025, 12:34 PM

#

https://blog.google/products/google-cloud/next-2025/

Gemini 2.5 Flash, our workhorse model with low latency and cost efficiency, will soon be available in Vertex AI.

wen ai studio lol

blog.google

Google Cloud Next 25

Here’s a look at what we announced at Google Cloud Next 25.

keen beacon Apr 9, 2025, 12:37 PM

#

https://blog.google/products/google-cloud/cloud-next-gen-ai-vertex-ai-updates/

Google

New video, image, speech and music generative AI tools are coming t...

Today at Google Cloud Next, we announced four big updates for generative media within Vertex AI, Google Cloud’s fully-managed, unified AI development platform:Lyria, Goo…

calm sequoia Apr 9, 2025, 12:42 PM

#

ivory schooner Apr 9, 2025, 12:44 PM

#

我的24k......但我相信24k的后继者将会是Behemoth ......

#

但我听中文社群说，恐怕要等很久以后......

#

（准确一点的话是夏天.......）

#

期待Behemoth 吧

brittle tiger Apr 9, 2025, 12:53 PM

#

torn mantle Apr 9, 2025, 12:55 PM

#

brittle tiger

mm

#

no wonder

torn mantle Apr 9, 2025, 12:56 PM

#

alpine coral have you tested it yet? If it's as good as they're saying it is, 20/day is epic ...

not yet

balmy mist Apr 9, 2025, 1:07 PM

#

they changed ui back?

barren prairie Apr 9, 2025, 1:07 PM

#

balmy mist they changed ui back?

Who?

balmy mist Apr 9, 2025, 1:41 PM

#

barren prairie Who?

nvm my chrome glitching

#

and studio

drifting thorn Apr 9, 2025, 1:48 PM

#

I think Anthropic will be able to train a model way better than 3.7, R1, o3-mini or even Gemini 2.5 Pro when they can get a ‘honest’ large multimodal model

#

The answer of cot models will adhere to what they ‘think’ and we can train it much more efficiently with RL.

keen beacon Apr 9, 2025, 2:08 PM

#

oh i love the new aistudio ui

lime coral Apr 9, 2025, 2:09 PM

#

lol

#

Weird on mobile. Will need some adaptation

harsh flume Apr 9, 2025, 2:13 PM

#

Anyone can see a feasible path any company surpasses Google this month?

sturdy mica Apr 9, 2025, 2:13 PM

#

no

#

also is the new gemini model coming out today

keen beacon Apr 9, 2025, 2:14 PM

#

yes 2.5 flash

sturdy mica Apr 9, 2025, 2:14 PM

#

awww

#

shucks

oblique flint Apr 9, 2025, 2:14 PM

#

where are the 2.5 flash benchmarks tho

keen beacon Apr 9, 2025, 2:19 PM

#

oblique flint where are the 2.5 flash benchmarks tho

when its out in aistudio

#

woah i just saw the new ai studio

#

it looks nice

drifting thorn Apr 9, 2025, 2:26 PM

#

Do you guys think Gemini models are based on MoE architecture?

keen beacon Apr 9, 2025, 2:27 PM

#

drifting thorn Do you guys think Gemini models are based on MoE architecture?

yes

#

they said so in 1.5 pro's announcement i think.

torn mantle Apr 9, 2025, 2:27 PM

#

kinda sad we wont have gemini coder

oblique flint Apr 9, 2025, 2:27 PM

#

moe is the only way their api prices can be so low I think

torn mantle Apr 9, 2025, 2:27 PM

#

its the only model i was looking forward to tbh

drifting thorn Apr 9, 2025, 2:28 PM

#

I’m looking for Claude 4.0 after they found out that reasoning models aren’t honest

keen beacon Apr 9, 2025, 2:28 PM

#

oblique flint moe is the only way their api prices can be so low I think

moe is faster/cheaper but i think in high batching/etc the calculus is more complicated and the gains are reduced. i dont know much tho lol

#

its not just moe thats making the difference

drifting thorn Apr 9, 2025, 2:29 PM

#

Anthropic is definitely training a new model to be “honest” in showing it’s chain of thought

#

After their research showing 3.7 thinking hides its actual thoughts

#

https://www.anthropic.com/research/reasoning-models-dont-say-think

brittle tiger Apr 9, 2025, 2:42 PM

#

torn mantle kinda sad we wont have gemini coder

Gemini coder was just a dream, a whisper in the night

barren prairie Apr 9, 2025, 2:43 PM

#

brittle tiger Gemini coder was just a dream, a whisper in the night

We will have some stars today

#

Or maybe luna or some dreams

drifting thorn Apr 9, 2025, 2:56 PM

#

For every model, does the chain of thought aligns with the answer

fleet lintel Apr 9, 2025, 2:56 PM

#

close to zero

balmy mist Apr 9, 2025, 2:57 PM

#

brittle tiger Gemini coder was just a dream, a whisper in the night

fr im gonna cry tonight

sturdy mica Apr 9, 2025, 2:57 PM

#

torn mantle kinda sad we wont have gemini coder

they should release coder instead of flash today…

balmy mist Apr 9, 2025, 2:57 PM

#

i dont care about flash unless its blazing fast and dirt dirt cheap

sturdy mica Apr 9, 2025, 2:58 PM

#

yeah

oblique flint Apr 9, 2025, 2:59 PM

#

o3 mini killer would be great

barren prairie Apr 9, 2025, 2:59 PM

#

balmy mist i dont care about flash unless its blazing fast and dirt dirt cheap

I don t care about flash unless it is flash thinking 🙂 🩷🩵 I love the flash thinking bot

#

But is it confirmed ? There is no NW?

keen beacon Apr 9, 2025, 3:00 PM

#

barren prairie I don t care about flash unless it is flash thinking 🙂 🩷🩵 I love the flash th...

it is thinking dw

barren prairie Apr 9, 2025, 3:00 PM

#

I mean Gemini coder?

keen beacon Apr 9, 2025, 3:01 PM

#

balmy mist i dont care about flash unless its blazing fast and dirt dirt cheap

2.0 flash was cheaper than 4o mini, i dont expect it to be much more expensive per token

#

considering each request is generally more expensive because of thinking tokens and they want to be competitive

sturdy mica Apr 9, 2025, 3:01 PM

#

oblique flint o3 mini killer would be great

??

#

Gemini 2.5 pro is a o3 mini killer

#

i think

#

its way better anyway

#

also when is o4mini and o3 comjng out

balmy mist Apr 9, 2025, 3:02 PM

#

gg https://x.com/mark_k/status/1909980772338901448

Mark Kretschmann (@mark_k) on X

Google "Ironwood" TPU v7 is a 10x jump in compute over the last generation, and it has vastly more memory. Can anyone even compete with Google's AI hardware at this point?

#

this is y

#

makes sense now

#

openai who?

blazing rune Apr 9, 2025, 3:03 PM

#

sturdy mica Gemini 2.5 pro is a o3 mini killer

well, yeah, but it's also likely a lot bigger, and a good bit more expensive for output

#

the input is about the same I think

#

honestly I would say it beats o3 mini considerably, at least for ML stuff

sage raptor Apr 9, 2025, 3:04 PM

#

what tpu openai uses

barren prairie Apr 9, 2025, 3:05 PM

#

sturdy mica Gemini 2.5 pro is a o3 mini killer

You mean at coding bcz o3 mini is a trush on other tasks

torn mantle Apr 9, 2025, 3:06 PM

#

balmy mist gg https://x.com/mark_k/status/1909980772338901448

no wonder they are serving all of these models

#

its just so crazy

oblique flint Apr 9, 2025, 3:07 PM

#

sturdy mica Gemini 2.5 pro is a o3 mini killer

its way more expensive

sage raptor Apr 9, 2025, 3:08 PM

#

oblique flint o3 mini killer would be great

oblique flint Apr 9, 2025, 3:09 PM

#

yea I saw it and I'll definitely try it out, but I doubt it holds up against o3 mini in actual practical coding. This benchmark I believe is more competitive coding

blazing rune Apr 9, 2025, 3:09 PM

#

sage raptor

that's almost definitely fake

sage raptor Apr 9, 2025, 3:09 PM

#

it's not fake

blazing rune Apr 9, 2025, 3:09 PM

#

well, not fake

#

but it certainly won't be good in the real world

sage raptor Apr 9, 2025, 3:10 PM

#

idk

blazing rune Apr 9, 2025, 3:10 PM

#

oblique flint yea I saw it and I'll definitely try it out, but I doubt it holds up against o3 ...

because of this

#

competitive coding isn't very useful in the real world, at least for LLMs. it's good for humans because we can apply the concepts in many different ways, but LLMs can't generalize nearly as well

brittle tiger Apr 9, 2025, 3:25 PM

#

https://www.reddit.com/r/singularity/s/aVgBNOf67x

From the singularity community on Reddit: Gemini 2.5 Pro got added ...

Explore this post and more from the singularity community

lavish orchid Apr 9, 2025, 3:38 PM

#

hey everyone, very happy to share that I got accepted into YC's AI Startup School!! will hopefully see Sam Altman, Elon Musk and others! 🙂

https://x.com/ahmetdedeler101/status/1909988039863959833

Ahmet ☕ (@ahmetdedeler101) on X

Beyond excited to share that I was accepted into @ycombinator's AI Startup School!!

Grateful to be joining at 17, can't wait to learn and build :)

keen beacon Apr 9, 2025, 3:54 PM

#

i'd love to meet sam, dario, demis... leave out elon or i might do something i regret

sonic tendon Apr 9, 2025, 3:59 PM

#

keen beacon i'd love to meet sam, dario, demis... leave out elon or i might do something i r...

lmaoo based

leaden palm Apr 9, 2025, 4:05 PM

#

who watching cloud next

keen beacon Apr 9, 2025, 4:07 PM

#

watched the opening keynote and will just keep an eye on the blog for everything else

leaden palm Apr 9, 2025, 4:10 PM

#

lm arena mentioned

#

ok he just said 2.5 flash

#

thinking

#

has reasoning effort

#

"coming soon"...

#

lame

#

should have a giant lever to deploy it to prod

keen fulcrum Apr 9, 2025, 4:12 PM

#

leaden palm lm arena mentioned

Shame not the alpha site is shown

leaden palm Apr 9, 2025, 4:12 PM

#

depends on how soon "soon" is

#

the work day is still young

keen beacon Apr 9, 2025, 4:13 PM

#

where's nightwhisperer at 😔

keen fulcrum Apr 9, 2025, 4:13 PM

#

leaden palm depends on how soon "soon" is

Did they share nightwhisper yet?

leaden palm Apr 9, 2025, 4:13 PM

#

theyre not gonna call it that (probably)

keen beacon Apr 9, 2025, 4:13 PM

#

well yeah

leaden palm Apr 9, 2025, 4:13 PM

#

so far its just been mentioning 2.5 pro and flash (both thinking, and flash has reasoning effort)

keen beacon Apr 9, 2025, 4:13 PM

#

either it was a 2.5 pro variant or it was an update

leaden palm Apr 9, 2025, 4:14 PM

#

ts not serious

keen beacon Apr 9, 2025, 4:14 PM

#

they done got the mcdonalds ceo on stage 💔

leaden palm Apr 9, 2025, 4:18 PM

#

leaden palm so far its just been mentioning 2.5 pro and flash (both thinking, and flash has ...

oh and dont forget the new tpu announcement

torn mantle Apr 9, 2025, 4:19 PM

#

keen beacon either it was a 2.5 pro variant or it was an update

when is the google event

#

how much time left

leaden palm Apr 9, 2025, 4:19 PM

#

TODAY

#

NOW

#

HOP ON

#

https://www.youtube.com/live/Md4Fs-Zc3tg

YouTube

Google Cloud

Opening Keynote: The new way to cloud

Organizations around the world are driving change with innovative solutions, boosting efficiency, empowering employees, engaging customers, and fueling growt...

▶ Play video

torn mantle Apr 9, 2025, 4:19 PM

#

link?

#

thanks

sturdy mica Apr 9, 2025, 4:20 PM

#

leaden palm should have a giant lever to deploy it to prod

minecraft release in 2011

leaden palm Apr 9, 2025, 4:20 PM

#

3.7 can keep working without breaking things for longer but doesn't think as well

sturdy mica Apr 9, 2025, 4:20 PM

#

2011

#

https://tenor.com/view/minecraft-lever-notch-2011-official-release-gif-6894684587692301610

Tenor

#

this

leaden palm Apr 9, 2025, 4:21 PM

#

man i would be so hyped rn if i was a devops engineer

#

gemini weights leak when /s

#

~~ is this new~~ no, already made and tested in products, just now in vertex ai

keen beacon Apr 9, 2025, 4:27 PM

#

depends on how you mean

#

new on GCP? yes, it was announced earlier today

#

new as a thing? no

#

it was already being tested publicly

#

on musicFX

#

it's a pretty meh model, we've been spoilt w/ things like suno and udio

leaden palm Apr 9, 2025, 4:33 PM

#

they just said mindful and demure in the big 25

#

ehh what should i expect

balmy mist Apr 9, 2025, 4:40 PM

#

is the event over?

leaden palm Apr 9, 2025, 4:40 PM

#

balmy mist is the event over?

no!

#

one stream

#

many hours (probably)

barren prairie Apr 9, 2025, 4:41 PM

#

balmy mist is the event over?

Still speaking

balmy mist Apr 9, 2025, 4:41 PM

#

thnx

lime coral Apr 9, 2025, 4:41 PM

#

Would turn into a generational hater if no update on native audio/img

balmy mist Apr 9, 2025, 4:42 PM

#

yall notice that gemini works a lot better in api then on studio? is that by design?

#

might be noob observation

#

just started using api for gemini lol

lime coral Apr 9, 2025, 4:48 PM

#

If you want to try nightwhisper this might be the way to go https://x.com/testingcatalog/status/1910010822937698425?s=46

TestingCatalog News 🗞 (@testingcatalog) on X

Google launched Firebase Studio 🔥

There, you can build an app using prompting, and it runs on the web.

"lovable+cursor+replit+bolt+windsurf all in one"

lime coral Apr 9, 2025, 4:49 PM

#

balmy mist yall notice that gemini works a lot better in api then on studio? is that by des...

Logan once said ai studio is purposely build to reflect the api behavior. It runs on the api

torn mantle Apr 9, 2025, 4:49 PM

#

lime coral If you want to try nightwhisper this might be the way to go https://x.com/testin...

smh

keen beacon Apr 9, 2025, 4:51 PM

#

for anyone who wants a link - https://studio.firebase.google.com/

Firebase Studio

Firebase Studio is an entirely web-based workspace for full-stack application development, complete with the latest generative AI from Gemini, and full-fidelity app previews, powered by cloud emulators.

alpine coral Apr 9, 2025, 4:51 PM

#

these events are always so cringe lol

keen beacon Apr 9, 2025, 4:51 PM

#

third edit is the charm 💀

balmy mist Apr 9, 2025, 4:52 PM

#

lime coral If you want to try nightwhisper this might be the way to go https://x.com/testin...

wait what?

#

what model is powering this?

#

nw?

#

or the tools being used with 2.5

#

makes it like nw

lime coral Apr 9, 2025, 4:52 PM

#

Obviously I don’t know

balmy mist Apr 9, 2025, 4:52 PM

#

or nw was just 2.5 with tools all along?

#

ohh lmaoo

keen beacon Apr 9, 2025, 4:53 PM

#

using it npw

#

this thing is fast

#

it uses next.js

leaden palm Apr 9, 2025, 4:55 PM

#

ah yes, "war of the checkboxes" needs ai

#

also buggy

keen beacon Apr 9, 2025, 4:56 PM

#

yeah i've just got to that stage as well

#

it ran into an error, said it auto fixed it and it turns out it didn't

leaden palm Apr 9, 2025, 4:57 PM

#

agentspace is kinda cool though, personalized deep research and tool use (sending mail, analyzing data, generating audio overviews) plus chrome could be useful if i was an employee

keen beacon Apr 9, 2025, 4:57 PM

#

oh nevermind.. just needed to put in a key for it to let me click fix

balmy mist Apr 9, 2025, 5:01 PM

#

wow tbh its too much stuff being released lmaoo

#

like i cant keep up

#

i gotta test out this firebase studio tho

oblique flint Apr 9, 2025, 5:01 PM

#

bro release it already darn it

barren prairie Apr 9, 2025, 5:03 PM

#

This fire studio is for what?? 😅

lime coral Apr 9, 2025, 5:06 PM

#

Full stack app

keen beacon Apr 9, 2025, 5:06 PM

#

oblique flint bro release it already darn it

they're waiting to release 2.5 flash and gemini coder in one go trust me bro trust me 💔

barren prairie Apr 9, 2025, 5:08 PM

#

keen beacon they're waiting to release 2.5 flash and gemini coder in one go trust me bro tru...

When is the "soon" 🥲

spare mango Apr 9, 2025, 5:10 PM

#

leaden palm Apr 9, 2025, 5:10 PM

#

the voice agent demo is actually really cool

spare mango Apr 9, 2025, 5:10 PM

#

Is this for real? Do most people not have access to 2.5 Pro (experimental) on the free version of Gemini?

#

According to Gemini 2.5 Pro itself, only a select few people have access to this experimental model, and I'm one of them?

#

Is this making stuff up or do any of you guys not have access to this?

leaden palm Apr 9, 2025, 5:12 PM

#

spare mango Is this for real? Do most people not have access to 2.5 Pro (experimental) on th...

llms are llms and llms hallucinate

spare mango Apr 9, 2025, 5:12 PM

#

leaden palm llms are llms and llms hallucinate

so the "best" llm in the market is hallucinating about this simple fact, interesting...

#

I asked it if there's any difference between Gemini Free and Gemini Advanced, since I have access to 2.5 Pro (experimental) on the free version anyway, and it went off on a tangent about how this model does not exist, then went on to say how I'm part of a select cohort that has access to it.

leaden palm Apr 9, 2025, 5:13 PM

#

if you think that the hallucination problem will never be solved consider going to a prediction market

spare mango Apr 9, 2025, 5:14 PM

#

leaden palm if you think that the hallucination problem will never be solved consider going ...

Sorry I don't what you mean?

leaden palm Apr 9, 2025, 5:14 PM

#

gemini advanced still has advantages (first few i can think of are higher usage limits and more deep research)

keen beacon Apr 9, 2025, 5:14 PM

#

the biggest advantage for gemini advanced rn is 2.5 pro deep research

#

i am tempted to go for it because of that

#

otherwise i wouldn't care

spare mango Apr 9, 2025, 5:14 PM

#

keen beacon the biggest advantage for gemini advanced rn is 2.5 pro deep research

deep research is another model

#

apparently the non-deep research model is still the best in the market.

keen beacon Apr 9, 2025, 5:14 PM

#

?

leaden palm Apr 9, 2025, 5:14 PM

#

keen beacon the biggest advantage for gemini advanced rn is 2.5 pro deep research

is that only for advanced?

keen beacon Apr 9, 2025, 5:15 PM

#

yeah

spare mango Apr 9, 2025, 5:15 PM

#

keen beacon ?

what

keen beacon Apr 9, 2025, 5:15 PM

#

free deep research is on 2.0 flash thinking

keen beacon Apr 9, 2025, 5:15 PM

#

spare mango deep research is another model

it isn't

leaden palm Apr 9, 2025, 5:15 PM

#

i read this as a general rollout

#

but yeah no 2.5 pro is a good model for deep research purposes, at least when testing with my own harness

spare mango Apr 9, 2025, 5:15 PM

#

keen beacon it isn't

LMArena did not place 2.5 Pro Experimental as number 1 because of Deep Research is what I'm saying.

keen beacon Apr 9, 2025, 5:15 PM

#

leaden palm i read this as a general rollout

keen beacon Apr 9, 2025, 5:15 PM

#

spare mango LMArena did not place 2.5 Pro Experimental as number 1 because of Deep Research ...

i never said they did

spare mango Apr 9, 2025, 5:16 PM

#

keen beacon i never said they did

you said you wouldn't use it if it weren't for that so I thought you were implying it was.

keen beacon Apr 9, 2025, 5:16 PM

#

i said i wouldn't use it if it wasn't for 2.5 pro deep research because you can get 2.5 pro with no discernible rate limit for free on ai studio

leaden palm Apr 9, 2025, 5:18 PM

#

fun fact: if you trust google's deep research evals, their version would be 146 elo above openai's

keen beacon Apr 9, 2025, 5:18 PM

#

i'd like to see its performance on HLE

#

at least that way we'd get a more direct comparison

leaden palm Apr 9, 2025, 5:18 PM

#

leaden palm fun fact: if you trust google's deep research evals, their version would be 146 ...

(that's like the difference between llama nemotron 49b and gemini 2.5 pro exp)

keen beacon Apr 9, 2025, 5:19 PM

#

balmy mist Apr 9, 2025, 5:20 PM

#

keen beacon

nutss

keen beacon Apr 9, 2025, 5:21 PM

#

nemotron ultra is now out on nvidia build

#

it beats R1 in most benchmarks

#

https://build.nvidia.com/nvidia/llama-3_1-nemotron-ultra-253b-v1

NVIDIA NIM

llama-3.1-nemotron-ultra-253b-v1 Model by NVIDIA | NVIDIA NIM

Superior inference efficiency with highest accuracy for scientific and complex math reasoning, coding, tool calling, and instruction following.

#

balmy mist Apr 9, 2025, 5:22 PM

#

is firebase studio free?

#

if it is gg

keen beacon Apr 9, 2025, 5:22 PM

#

in my testing it was... meh

balmy mist Apr 9, 2025, 5:23 PM

#

keen beacon in my testing it was... meh

lol

#

the thing that is good about it is that it can handle large code

#

the other apps fail when i give it anything above 32k for some reason

#

but you are right it is kinda meh, but I would still say its better than gemini 2.5 by itself

#

im cooking rn, give me a sec

keen beacon Apr 9, 2025, 5:25 PM

#

I don't even know if it's fully powered by 2.5 pro

#

some of it may be a flash model

balmy mist Apr 9, 2025, 5:25 PM

#

bruhh

#

it better not be, but if it is that is impressive

#

really shows how tools can boost up a model

#

nevermind

#

it keeps failing

keen beacon Apr 9, 2025, 5:28 PM

#

keen beacon

Isn't this fp8. The other ones don't have native support. there's still a big leap though

#

slightly off topic but

#

the more i test chatgpt 4o latest

#

(the march version)

#

the higher opinion i have of its creative writing

#

it feels like R1 quality (great) but it doesn't fall apart after more than a few chapters like R1 does

#

What about quasar

#

quasar disappointed me for writing tbh

#

step back from chatgpt 4o latest

#

more robotic

#

for the most part, i agree with this

#

although c3.7s would def be in my top 5 minimum

balmy mist Apr 9, 2025, 5:34 PM

#

wow so deepseek taking ws?

keen beacon Apr 9, 2025, 5:35 PM

#

yeah

#

i'm very excited for R2

sturdy mica Apr 9, 2025, 5:35 PM

#

what is quasar

keen beacon Apr 9, 2025, 5:35 PM

#

we should be getting that quite soon

sturdy mica Apr 9, 2025, 5:35 PM

#

model

keen beacon Apr 9, 2025, 5:35 PM

#

sturdy mica what is quasar

anonymous openai model on openrouter

sturdy mica Apr 9, 2025, 5:35 PM

#

oh

keen beacon Apr 9, 2025, 5:35 PM

#

its just a gpt 4o update

#

prob an api dated version

sturdy mica Apr 9, 2025, 5:35 PM

#

oh 🥀

#

💔

keen beacon Apr 9, 2025, 5:36 PM

#

they shouldve released chatgpt 4o the last one as an api dated version

sturdy mica Apr 9, 2025, 5:36 PM

#

https://tenor.com/view/case-oh-caseoh-waffle-house-waffle-house-gif-10934642274965704175

Tenor

keen beacon Apr 9, 2025, 5:36 PM

#

chatgpt 4o and 4o on the api are too different for them to have released it as a 4o dated version

sturdy mica Apr 9, 2025, 5:36 PM

#

case oh

keen beacon Apr 9, 2025, 5:36 PM

#

4o is more professional/"serious", chatgpt 4o is optimised for chat

#

and ofc creative tasks

sturdy mica Apr 9, 2025, 5:36 PM

#

ok

#

did u guys see new secret model in battle mode

keen beacon Apr 9, 2025, 5:37 PM

#

although it did take them too long to release the current latest chatgpt 4o model as a dated version, for a while you could only access it via the -latest endpoint

sturdy mica Apr 9, 2025, 5:37 PM

#

its called noob pro hacker obby tycoon

keen beacon Apr 9, 2025, 5:37 PM

#

keen beacon although it did take them too long to release the current latest chatgpt 4o mode...

thats still the case tho?

#

iirc

sturdy mica Apr 9, 2025, 5:37 PM

#

it is o6 mini

keen beacon Apr 9, 2025, 5:37 PM

#

only lmsys has access to older versions

keen beacon Apr 9, 2025, 5:38 PM

#

keen beacon only lmsys has access to older versions

oh this is news to me lmao

#

yeah nevermind

balmy mist Apr 9, 2025, 5:40 PM

#

wait the event over?

#

so where is flash?

keen beacon Apr 9, 2025, 5:40 PM

#

announcement by logan later maybe?

barren prairie Apr 9, 2025, 5:41 PM

#

balmy mist so where is flash?

"sooner" 🙂

woeful geyser Apr 9, 2025, 5:44 PM

#

The entire venom system prompt, summarized by Claude:

Screenshot_2025-04-10-00-43-51-816_com.android.chrome.jpg

keen beacon Apr 9, 2025, 5:45 PM

#

keen beacon the higher opinion i have of its creative writing

lol this is what i mean

sturdy mica Apr 9, 2025, 5:45 PM

#

woeful geyser The entire `venom` system prompt, summarized by Claude:

what is venom

keen beacon Apr 9, 2025, 5:46 PM

#

lmfao

sturdy mica Apr 9, 2025, 5:49 PM

#

keen beacon lmfao

what is this

#

also what is venom

#

is that a prompt

keen beacon Apr 9, 2025, 5:49 PM

#

sturdy mica Apr 9, 2025, 5:49 PM

#

ok

keen beacon Apr 9, 2025, 5:49 PM

#

sturdy mica also what is venom

it was an anonymous meta model on the arena with a long system prompt

sturdy mica Apr 9, 2025, 5:50 PM

#

oh ok…………

#

im gonna play roblox now

#

https://tenor.com/view/minecraft-minecraft-movie-a-minecraft-movie-steve-jack-black-gif-17997916878979753001

Tenor

#

actually twm fortress 2

wintry tinsel Apr 9, 2025, 5:51 PM

#

keen beacon for the most part, i agree with this

Deep seek isn’t that good for creative writing it introduces random non sequiters and makes everything overly verbose and dramatic

keen beacon Apr 9, 2025, 5:51 PM

#

yeah that was the reason i said mostly

#

sonnet 3.7 is king

leaden palm Apr 9, 2025, 5:52 PM

#

"optimus alpha"

ocean vortex Apr 9, 2025, 5:52 PM

#

keen beacon sonnet 3.7 is king

lol what

keen beacon Apr 9, 2025, 5:52 PM

#

ocean vortex lol what

...for creative writing

#

in my experience

balmy mist Apr 9, 2025, 5:52 PM

#

leaden palm "optimus alpha"

really?

ocean vortex Apr 9, 2025, 5:52 PM

#

oh. Ok that's less ridiculous lol

balmy mist Apr 9, 2025, 5:52 PM

#

i cant take anymore torlling

#

trolling

wintry tinsel Apr 9, 2025, 5:52 PM

#

It is across the board but in some areas with the right prompt 2.5 pro can be better

leaden palm Apr 9, 2025, 5:52 PM

#

balmy mist really?

soon™️

balmy mist Apr 9, 2025, 5:52 PM

#

i just want my damn nightwhisper

#

man

brittle tiger Apr 9, 2025, 5:53 PM

#

Could this be nightwhisper?

https://x.com/_davideast/status/1909984439985229940?t=ddJfOgwZywH0inE1AxyWyg&s=19

David East (@_davideast) on X

Hey Firebase. I'm back.

https://t.co/e8XYdNhGxk

balmy mist Apr 9, 2025, 5:53 PM

#

forget everything else

wintry tinsel Apr 9, 2025, 5:53 PM

#

2.5 pro is the best at nsfw writing lmao

balmy mist Apr 9, 2025, 5:53 PM

#

brittle tiger Could this be nightwhisper? https://x.com/_davideast/status/1909984439985229940...

noooo

#

its bunns

sage raptor Apr 9, 2025, 5:53 PM

#

lime coral If you want to try nightwhisper this might be the way to go https://x.com/testin...

nightwhisper there ?

balmy mist Apr 9, 2025, 5:53 PM

#

jk

#

its solid

#

just not nightwhisper

balmy mist Apr 9, 2025, 5:53 PM

#

sage raptor nightwhisper there ?

2.5 pro maybe with tool calls

keen beacon Apr 9, 2025, 5:53 PM

#

woah

#

new model

balmy mist Apr 9, 2025, 5:53 PM

#

keen beacon new model

really

#

give me plz

ocean vortex Apr 9, 2025, 5:53 PM

#

is it better than opus for that in your experience? @keen beacon

keen beacon Apr 9, 2025, 5:54 PM

#

leaden palm "optimus alpha"

let me test this

keen beacon Apr 9, 2025, 5:54 PM

#

ocean vortex is it better than opus for that in your experience? <@456226577798135808>

yeah once u get it setup. multi turn, context usage, all their training tricks, etc., are just awesome and result in a really good experience imho

balmy mist Apr 9, 2025, 5:54 PM

#

keen beacon let me test this

it says coming soon

#

so not out yet

keen beacon Apr 9, 2025, 5:54 PM

#

oh

#

hmph

#

should be up by the end of today then

#

must be another oai model

balmy mist Apr 9, 2025, 5:55 PM

#

nooooo

#

it better be o3

upper wolf Apr 9, 2025, 5:55 PM

#

2.5 pro still throws so many refusals. It won’t do anything related to web scraping (puppeteer/selenium etc.). on top of that, it can’t even say that it won’t, it says that it’s “not sufficiently trained” on it

balmy mist Apr 9, 2025, 5:55 PM

#

keen beacon oh

wait you already test o3 rright and it was mid?

keen beacon Apr 9, 2025, 5:56 PM

#

i have tested o3 medium

#

it was pretty good

#

but there are some things it performs meh on

#

web development is still a weak point

#

but significantly better than o1/o3 mini

barren prairie Apr 9, 2025, 5:56 PM

#

keen beacon i have tested o3 medium

O3 is out? But when ???

keen beacon Apr 9, 2025, 5:56 PM

#

it's not out

#

i help labs out sometimes

wintry tinsel Apr 9, 2025, 5:56 PM

#

Google event goes through 11th right

balmy mist Apr 9, 2025, 5:56 PM

#

barren prairie O3 is out? But when ???

only for chads

balmy mist Apr 9, 2025, 5:57 PM

#

keen beacon but significantly better than o1/o3 mini

okay i guess thats okay

#

ill just stick to 2.5

wintry tinsel Apr 9, 2025, 5:58 PM

#

ocean vortex is it better than opus for that in your experience? <@456226577798135808>

Opus has less flavor and prompt adherence but more personality and a “natural feel to it”

#

There’s no reason to use it though since it’s much more pricy

keen beacon Apr 9, 2025, 5:58 PM

#

i really like opus even now when price isn't a consideration

#

just has good vibes

sturdy mica Apr 9, 2025, 5:59 PM

#

what was that super good coder on lmarena again

#

it was nightfall or something

keen beacon Apr 9, 2025, 5:59 PM

#

nightwhisperer

#

or nightwhisper

#

or whatever it was

#

it was webdev arena only

sturdy mica Apr 9, 2025, 6:00 PM

#

yeah

#

when is that gonna come out sigh

#

anyway what are you sll talking about

wintry tinsel Apr 9, 2025, 6:00 PM

#

The reason google models are so cheap is they are trying to roll them out en masse, if they went the other route and had less instances for higher compute and cost they might be able to have the model give much better outputs relatively, I could not know what I’m talking about tho

leaden palm Apr 9, 2025, 6:01 PM

#

wintry tinsel The reason google models are so cheap is they are trying to roll them out en mas...

nah google is now sota

sturdy mica Apr 9, 2025, 6:01 PM

#

why can i not

#

react to things

#

oh

#

there

#

i cant react to ktibow

keen beacon Apr 9, 2025, 6:01 PM

#

hmm sorta weird 2.5 flash isnt released yet, maybe they might do an anon model on openrouter 🤣

barren prairie Apr 9, 2025, 6:02 PM

#

sturdy mica i cant react to ktibow

I can react 🙂

sturdy mica Apr 9, 2025, 6:02 PM

#

lucky you

#

https://tenor.com/view/minecraft-minecraft-movie-minecraft-meme-minecraft-villager-villager-gif-5342559956921446299

Tenor

#

im going to hit you with my car

#

that villager is YOU

#

this is mildly uninteresting and nobody is ralking now

#

https://tenor.com/view/half-life-gman-g-man-half-life-2-rise-and-shine-gif-5438670790719714756

Tenor

keen beacon Apr 9, 2025, 6:06 PM

#

peak shitposter

torn mantle Apr 9, 2025, 6:09 PM

#

brittle tiger Could this be nightwhisper? https://x.com/_davideast/status/1909984439985229940...

can someone confirm

torn mantle Apr 9, 2025, 6:09 PM

#

balmy mist just not nightwhisper

i see

#

thanks

wintry tinsel Apr 9, 2025, 6:10 PM

#

They should add auto regressive image capabilities to 2.5 pro

sturdy mica Apr 9, 2025, 6:12 PM

#

keen beacon peak shitposter

thanks

sturdy mica Apr 9, 2025, 6:13 PM

#

wintry tinsel They should add auto regressive image capabilities to 2.5 pro

what’s that

keen beacon Apr 9, 2025, 6:13 PM

#

native image gen

keen beacon Apr 9, 2025, 6:14 PM

#

wintry tinsel They should add auto regressive image capabilities to 2.5 pro

it already has them

#

they just haven't released it yet

#

actually no

#

i may be getting mixed up with 2.0 pro

keen beacon Apr 9, 2025, 6:14 PM

#

keen beacon i may be getting mixed up with 2.0 pro

2.5 pro is highly likely to be a cpt of 2.0 pro it should still have them

#

im curious whether they worked on it at all with 2.5 pro

sturdy mica Apr 9, 2025, 6:15 PM

#

ok nerd

brittle tiger Apr 9, 2025, 6:16 PM

#

torn mantle can someone confirm

I don't think it js

sturdy mica Apr 9, 2025, 6:20 PM

#

whats the best model rn

#

its 2.5 pro right

#

ai studio version / api version

#

torn mantle Apr 9, 2025, 6:24 PM

#

https://x.com/Kimi_Moonshot/status/1910035354570371082

Kimi.ai (@Kimi_Moonshot) on X

🚀 Meet Kimi-VL and Kimi-VL-Thinking! 🌟 Our latest open source lightweight yet powerful Vision-Language Model with reasoning capability.

✨ Key Highlights:
💡 An MoE VLM and an MoE Reasoning VLM with only ~3B activated parameters
🧠 Strong multimodal reasoning (36.8% on

keen fulcrum Apr 9, 2025, 7:09 PM

#

There is still imagefx which gives you exceptional results

balmy mist Apr 9, 2025, 7:20 PM

#

as i play with firebase some more i see what truly is

#

its pretty much a competitor for cursor and every other ide and even claude code tbh

#

so it basically gemini code

#

just not in the cli

#

and this agentspace is nuts

#

wow

olive mesa Apr 9, 2025, 7:23 PM

#

are there any models better than 2.5 yet

balmy mist Apr 9, 2025, 7:23 PM

#

gg open ai

balmy mist Apr 9, 2025, 7:23 PM

#

olive mesa are there any models better than 2.5 yet

there really is no need for any models better than 2.5

wintry locust Apr 9, 2025, 7:23 PM

#

skibidi toilet rizz

olive mesa Apr 9, 2025, 7:23 PM

#

balmy mist there really is no need for any models better than 2.5

there kinda is

balmy mist Apr 9, 2025, 7:23 PM

#

i would like it

olive mesa Apr 9, 2025, 7:23 PM

#

usually theres a new best every week to a month

balmy mist Apr 9, 2025, 7:23 PM

#

but its not necessary for what we need

#

i would say maybe faster

#

and larger output and maybe window

keen beacon Apr 9, 2025, 7:23 PM

#

2.5 flash soon 🤔

balmy mist Apr 9, 2025, 7:24 PM

#

but thats pretty much it

keen beacon Apr 9, 2025, 7:24 PM

#

is faster

balmy mist Apr 9, 2025, 7:24 PM

#

IQ wise no

#

it would be nice tho

olive mesa Apr 9, 2025, 7:24 PM

#

it would be nice to have a passive superintelligence

thorny drum Apr 9, 2025, 7:24 PM

#

balmy mist there really is no need for any models better than 2.5

i could use some agi rn

balmy mist Apr 9, 2025, 7:24 PM

#

i mean but do we need that

olive mesa Apr 9, 2025, 7:24 PM

#

not the allied mastercomputer type

#

yeah

balmy mist Apr 9, 2025, 7:24 PM

#

agi is subjective

#

some people say its already here

#

some say years away

#

some say this year

#

depends on your definition a this point

#

remember wen it was to pass the turing test lol

olive mesa Apr 9, 2025, 7:26 PM

#

yeah

balmy mist Apr 9, 2025, 7:26 PM

#

but this agentspace is very interesting

#

this is gonna go wild once people get on it both it and fire studio

olive mesa Apr 9, 2025, 7:27 PM

#

are most ais trained with curriculum learning?

#

i honestly feel like we would have asi by now if so

#

like just giving them near-impossible questions and every once in a while they produce an extremely good cot and response

#

then add that to a dataset

#

train

#

and repeat

keen beacon Apr 9, 2025, 7:28 PM

#

that is sorta being done rn

barren prairie Apr 9, 2025, 7:29 PM

#

balmy mist there really is no need for any models better than 2.5

I need deepSeek r2 😁

balmy mist Apr 9, 2025, 7:30 PM

#

yo firebase is free

#

wtf

#

why the hell am i using roo anymore lol

balmy mist Apr 9, 2025, 7:30 PM

#

barren prairie I need deepSeek r2 😁

yeah i wouldnt mind that

#

how is it free?

#

like am i missing something?

#

gg cline, roocode, augment, cursor, windsurf RIP

#

bolt

#

claude code lol, but i still like that its a CLI so claude still got a lil hope

#

no its using 2.5

#

why the hell would i use my own api lmaooo

#

actually nevermind i know what to do

#

gonna use my own free exp model, buts pretty much like roocode in terms of how you pay

#

oh i see what you mean

#

so by defaul the built in is 2.0?

#

gonna try it with 2.5

hidden widget Apr 9, 2025, 8:03 PM

#

torn mantle https://x.com/Kimi_Moonshot/status/1910035354570371082

i can see another clickbait video from IcodeKing here

oblique flint Apr 9, 2025, 8:03 PM

#

2.5 flash still not released?

keen beacon Apr 9, 2025, 8:04 PM

#

a little strange right?

balmy mist Apr 9, 2025, 8:04 PM

#

nahh firebase is expensive

#

going back to free version lol

keen beacon Apr 9, 2025, 8:04 PM

#

keen beacon a little strange right?

maybe 2.5 flash doesnt have enough votes in the arena

oblique flint Apr 9, 2025, 8:07 PM

#

disappointing performance and they want to keep cooking or smth else?

keen beacon Apr 9, 2025, 8:12 PM

#

oblique flint disappointing performance and they want to keep cooking or smth else?

if its stargazer i dont think its disappointing

oblique flint Apr 9, 2025, 8:15 PM

#

Im just confused cause the leaked model string in the python sdk said april 9th right?

keen beacon Apr 9, 2025, 8:20 PM

#

oblique flint Im just confused cause the leaked model string in the python sdk said april 9th ...

yes

balmy mist Apr 9, 2025, 8:36 PM

#

nice

#

you gonna use that in firebase?

#

also whats a good prompt for refactoring an app i have to look better visually? should I saw apple design expert and stuff?

keen fulcrum Apr 9, 2025, 9:31 PM

#

Did they announce nightwhisper?

storm bolt Apr 9, 2025, 9:32 PM

#

brittle tiger Apr 9, 2025, 9:46 PM

#

keen fulcrum Did they announce nightwhisper?

No

wet ingot Apr 9, 2025, 10:28 PM

#

Anyone know what the model “dragontail” is? I got it on lmarena and it was good but I can’t find anything about it

balmy mist Apr 9, 2025, 10:34 PM

#

how good is it?

vivid oyster Apr 9, 2025, 10:48 PM

#

wet ingot Anyone know what the model “dragontail” is? I got it on lmarena and it was good ...

Yeah

#

How good

#

Ima try it rn

torn mantle Apr 9, 2025, 10:49 PM

#

#

why would anyone prefer grok 3 over gemini 2.5 pro?>

vivid oyster Apr 9, 2025, 10:49 PM

#

Cuz elon musk is hot

torn mantle Apr 9, 2025, 10:49 PM

#

torn mantle Apr 9, 2025, 10:49 PM

#

vivid oyster Cuz elon musk is hot

did he do it with you?

wet ingot Apr 9, 2025, 10:49 PM

#

balmy mist how good is it?

Idk but it had a better answer to my logic problem than most other models and it was good with an image

torn mantle Apr 9, 2025, 10:50 PM

#

xd

vivid oyster Apr 9, 2025, 10:50 PM

#

torn mantle did he do it with you?

Yes

torn mantle Apr 9, 2025, 10:50 PM

#

vivid oyster Yes

it felt good?

vivid oyster Apr 9, 2025, 10:50 PM

#

Yes

vivid oyster Apr 9, 2025, 10:50 PM

#

wet ingot Idk but it had a better answer to my logic problem than most other models and it...

What model did it say it was

#

Did u askit

wet ingot Apr 9, 2025, 10:50 PM

#

No

#

Let me see if I can get it again

vivid oyster Apr 9, 2025, 10:50 PM

#

Im trying rn

#

Its google @wet ingot

wet ingot Apr 9, 2025, 10:51 PM

#

I got it

vivid oyster Apr 9, 2025, 10:51 PM

#

wet ingot Apr 9, 2025, 10:51 PM

#

Yeah lmao I just got the exact same thing

vivid oyster Apr 9, 2025, 10:51 PM

#

Only google says 'i am a large language mode trained by google'

#

Maybe its nightwhisper

#

Was it better than 2.5 pro

wet ingot Apr 9, 2025, 10:51 PM

#

Hard to say

keen ferry Apr 9, 2025, 10:52 PM

#

Screenshot_2025-04-10-01-52-28-975-edit_com.android.chrome.jpg

vivid oyster Apr 9, 2025, 10:52 PM

#

Whos

vivid oyster Apr 9, 2025, 10:52 PM

#

keen ferry

Did u ask wat middle it was

#

Model

torn mantle Apr 9, 2025, 10:52 PM

#

vivid oyster

Tf

#

Tf

brittle tiger Apr 9, 2025, 10:53 PM

#

Shadebrook passes vibe test. dragon tail does not for me

torn mantle Apr 9, 2025, 10:53 PM

#

2 new models?

vivid oyster Apr 9, 2025, 10:53 PM

#

Yeah

brittle tiger Apr 9, 2025, 10:53 PM

#

Shadebrook is Google

vivid oyster Apr 9, 2025, 10:53 PM

#

Dragontail is google too

keen ferry Apr 9, 2025, 10:53 PM

#

vivid oyster Did u ask wat middle it was

google

wet ingot Apr 9, 2025, 10:53 PM

#

Interesting Google is adding so many models

#

Without saying anything

vivid oyster Apr 9, 2025, 10:54 PM

#

Maybe they're experimenting

#

For 2.5 flash

wet ingot Apr 9, 2025, 10:54 PM

#

Yeah

vivid oyster Apr 9, 2025, 10:54 PM

#

One of them might be nightwhisper

torn mantle Apr 9, 2025, 10:54 PM

#

vivid oyster Dragontail is google too

pro? flash?

vivid oyster Apr 9, 2025, 10:54 PM

#

torn mantle pro? flash?

Idk

#

Im trying to get it

#

So I can ask it

#

Questions

#

But it said the same prompt all google model says

torn mantle Apr 9, 2025, 10:54 PM

#

you are trying to debunk it

#

silly you

wet ingot Apr 9, 2025, 10:55 PM

#

We don’t know what it is but it’s good with logic and images

vivid oyster Apr 9, 2025, 10:55 PM

#

Yes

torn mantle Apr 9, 2025, 10:55 PM

#

shadebrook is not good

#

probably like flash lite

upper wolf Apr 9, 2025, 10:55 PM

#

Is it thinking

torn mantle Apr 9, 2025, 10:55 PM

#

so fast too

#

no

keen beacon Apr 9, 2025, 10:55 PM

#

vivid oyster Cuz elon musk is hot

eugh

keen beacon Apr 9, 2025, 10:55 PM

#

brittle tiger Shadebrook passes vibe test. dragon tail does not for me

ooh

#

let me test this stuff

wet ingot Apr 9, 2025, 10:55 PM

#

Dragontail was good for me

torn mantle Apr 9, 2025, 10:56 PM

#

its probably flash without thinking

#

its blazing fast

brittle tiger Apr 9, 2025, 10:56 PM

#

keen beacon ooh

Actually idk. It just got an arc-agi problem wrong that it got right this morning. It's very fast tho

keen beacon Apr 9, 2025, 10:57 PM

#

hmm

torn mantle Apr 9, 2025, 10:57 PM

#

please let dragontail be good

#

pleaaaaaase

keen ferry Apr 9, 2025, 10:58 PM

#

can someone give me basic questions

#

i got him

#

dragontail

torn mantle Apr 9, 2025, 10:58 PM

#

keen ferry can someone give me basic questions

how many r's are in strawberry?

torn mantle Apr 9, 2025, 10:58 PM

#

keen ferry dragontail

make a discord clone in one html file.

vivid oyster Apr 9, 2025, 10:58 PM

#

keen ferry can someone give me basic questions

How tf do u know

keen ferry Apr 9, 2025, 10:58 PM

#

magic

vivid oyster Apr 9, 2025, 10:58 PM

#

That u got dragontail

torn mantle Apr 9, 2025, 10:58 PM

#

shhhhhhh

upper wolf Apr 9, 2025, 10:58 PM

#

You’re playing Roulette at a casino with a broken wheel that makes it 0.36% more likely to land on Green. What is the new expected value of a $100 bet on the color red?

keen ferry Apr 9, 2025, 10:58 PM

#

its soo good

#

There are three 'r's in the word "strawberry".

torn mantle Apr 9, 2025, 10:59 PM

#

ask him this

wet ingot Apr 9, 2025, 10:59 PM

#

I mean a lot of newer models know that

torn mantle Apr 9, 2025, 10:59 PM

#

keen ferry There are **three** 'r's in the word "strawberry".

make a discord clone in one html file.

keen ferry Apr 9, 2025, 10:59 PM

#

k

torn mantle Apr 9, 2025, 10:59 PM

#

and give me the html file

keen ferry Apr 9, 2025, 10:59 PM

#

sec

torn mantle Apr 9, 2025, 10:59 PM

#

and i will tell if its good or not

vivid oyster Apr 9, 2025, 10:59 PM

#

#

This bot is a dumbass

keen beacon Apr 9, 2025, 10:59 PM

#

oh dear

olive mesa Apr 9, 2025, 10:59 PM

#

vivid oyster

why do models get that specifically wrong

#

people probably memed it then it got into a dataset

torn mantle Apr 9, 2025, 11:00 PM

#

dreamtides is meh too

torn mantle Apr 9, 2025, 11:00 PM

#

olive mesa why do models get that specifically wrong

because its a tokenization issue

vivid oyster Apr 9, 2025, 11:00 PM

#

I think I got it

#

It told me there's three

keen beacon Apr 9, 2025, 11:00 PM

#

gottem

zinc ore Apr 9, 2025, 11:00 PM

#

vivid oyster This bot is a dumbass

Don't do R questions, they don't see letters like us

keen beacon Apr 9, 2025, 11:00 PM

#

it's taking a while to start streaming

torn mantle Apr 9, 2025, 11:00 PM

#

you got his ash

keen beacon Apr 9, 2025, 11:00 PM

#

okay it just started

vivid oyster Apr 9, 2025, 11:00 PM

#

I made him give me a discord html

torn mantle Apr 9, 2025, 11:01 PM

#

vivid oyster I made him give me a discord html

gimme gimme

keen beacon Apr 9, 2025, 11:01 PM

#

im also giving dragontail a web task

vivid oyster Apr 9, 2025, 11:01 PM

#

📎 message.txt

#

This is what it gave me

torn mantle Apr 9, 2025, 11:01 PM

#

vivid oyster

worse than gemini 2.5 pro thinking

torn mantle Apr 9, 2025, 11:01 PM

#

vivid oyster

whats this

#

are ytou sure its dragontail?

upper wolf Apr 9, 2025, 11:01 PM

#

Guys

vivid oyster Apr 9, 2025, 11:02 PM

#

Dragontail

torn mantle Apr 9, 2025, 11:02 PM

#

vivid oyster

its shadebrook?

upper wolf Apr 9, 2025, 11:02 PM

#

is dragontail thinking

vivid oyster Apr 9, 2025, 11:02 PM

#

keen beacon Apr 9, 2025, 11:02 PM

#

upper wolf is dragontail thinking

yes

vivid oyster Apr 9, 2025, 11:02 PM

#

upper wolf is dragontail thinking

Yeah

#

It's flash thinking

keen ferry Apr 9, 2025, 11:02 PM

#

mine is still generating

vivid oyster Apr 9, 2025, 11:02 PM

#

Probably

keen beacon Apr 9, 2025, 11:02 PM

#

this seems to be flash

upper wolf Apr 9, 2025, 11:02 PM

#

Thanks

torn mantle Apr 9, 2025, 11:02 PM

#

its meh

keen beacon Apr 9, 2025, 11:02 PM

#

yes

vivid oyster Apr 9, 2025, 11:02 PM

#

It takes a while to start

#

Sending

keen ferry Apr 9, 2025, 11:03 PM

#

📎 ah.html

torn mantle Apr 9, 2025, 11:03 PM

#

keen ferry

wtf

vivid oyster Apr 9, 2025, 11:03 PM

#

keen ferry

Is this dragontail

torn mantle Apr 9, 2025, 11:03 PM

#

wait a minute

#

this is so good

keen ferry Apr 9, 2025, 11:03 PM

#

yeah

torn mantle Apr 9, 2025, 11:03 PM

#

which model is that

keen ferry Apr 9, 2025, 11:03 PM

#

dragontail

torn mantle Apr 9, 2025, 11:03 PM

#

just from that output

keen beacon Apr 9, 2025, 11:04 PM

#

yeah it's at least on par with 2.5 pro thinking in my very limited testing thus far

#

which i wouldn't expect from a small model

torn mantle Apr 9, 2025, 11:05 PM

#

i got it

#

i will try it

#

doesnt seem like a thinking model

#

so fast too

keen ferry Apr 9, 2025, 11:05 PM

#

it was slow for me

zinc ore Apr 9, 2025, 11:06 PM

#

Google's models from now on are hybrid models, so if it's something like flash it'll be both

keen beacon Apr 9, 2025, 11:06 PM

#

torn mantle doesnt seem like a thinking model

it is

#

it seems dynamic

#

i have had a request with practically 0 time and another with like 15s

#

seems pretty good

zinc ore Apr 9, 2025, 11:10 PM

#

Faster than 2.5 pro?

keen beacon Apr 9, 2025, 11:10 PM

#

yes

torn mantle Apr 9, 2025, 11:10 PM

#

its kinda the same as gemini 2.5 pro

#

a bit worse no?

keen beacon Apr 9, 2025, 11:10 PM

#

no

#

they're very similar, i can't really discern much of a performance difference as of yet

#

which is sorta surprising given this seems to spend half as much time on it and yet matches pro

zinc ore Apr 9, 2025, 11:11 PM

#

Could also be updated pro if not flash

#

But my bet is on flash

vivid oyster Apr 9, 2025, 11:12 PM

#

It's faster

#

Its probly just flash thinking

zinc ore Apr 9, 2025, 11:12 PM

#

Flash 2.5

#

They don't call it thinking anymore

keen beacon Apr 9, 2025, 11:13 PM

#

zinc ore But my bet is on flash

i would be kinda surprised

#

in my (again somewhat limited) testing it doesn't seem worse than pro

drifting thorn Apr 9, 2025, 11:13 PM

#

All 2.5 models are thinking models now

keen beacon Apr 9, 2025, 11:13 PM

#

and if this is flash

#

wtf are the other like 3 anon google models on the arena rn

#

it doesn't make sense to add flash as an anonymous model today when they're releasing it like tomorrow 😭

zinc ore Apr 9, 2025, 11:14 PM

#

Yeah depends if we getting flash this week, or more like next week or something

drifting thorn Apr 9, 2025, 11:15 PM

#

I guess they are Gemma 4?

torn mantle Apr 9, 2025, 11:15 PM

#

drifting thorn I guess they are Gemma 4?

its you

#

you are giving us all these outputs

drifting thorn Apr 9, 2025, 11:15 PM

#

Hhhhhhj

vivid oyster Apr 9, 2025, 11:31 PM

#

.;

brittle tiger Apr 9, 2025, 11:34 PM

#

keen beacon it seems dynamic

This seems plausible bc it nailed the arc-agi problem this morning after running like 4 hypotheses to solve and combining relevant ones and then this afternoon zipped out a totally wrong answer on same problem

keen beacon Apr 9, 2025, 11:37 PM

#

ffs 🤦‍♂️

#

the full switch to the new arena couldn't come soon enough

harsh flume Apr 9, 2025, 11:57 PM

#

Did google remove the option to data train from AI studio or am I just not finding it?

#

I was gonna test it out today 😭

#

We should get new LB update today btw

keen fulcrum Apr 10, 2025, 12:29 AM

#

https://www.bleepingcomputer.com/news/google/google-takes-on-cursor-with-firebase-studio-its-ai-builder-for-vibe-coding/

BleepingComputer

Google takes on Cursor with Firebase Studio, its AI builder for vib...

Google has quietly launched Firebase Studio, which is a cloud-based AI-powered integrated development environment that lets you build full-fledged apps using prompts.

#

Microsoft has copilot and vscode (roocode is better), Google has Firebase Studio and then there are third party ones

#

I wonder if Apple will do something about it

#

Xcode with AI?

#

Firebase Studio and Nightwhisper can be the cheapest option honestly.

alpine coral Apr 10, 2025, 1:01 AM

#

keen beacon i have had a request with practically 0 time and another with like 15s

were they the same prompts? perhaps it is dynamic if they were the same (though if they were different, esp in length and / complexity) that might better explain the discrepancy

#

i just got it. v strong indeed. would've said almost certainly thinking model based on the quality and time(/delay) of the response (+ it was against command-a, which isn't thinking afaik)

raven void Apr 10, 2025, 1:10 AM

#

grok 3 mini high is looking pretty good

balmy mist Apr 10, 2025, 1:22 AM

#

raven void grok 3 mini high is looking pretty good

thats new?

raven void Apr 10, 2025, 1:23 AM

#

yes, API

#

https://docs.x.ai/docs/models#models-and-pricing

Models and Pricing | xAI Docs

Grok model descriptions and pricing

balmy mist Apr 10, 2025, 1:30 AM

#

oh snapp

#

finally

keen beacon Apr 10, 2025, 1:32 AM

#

pretty bad pricing lmao

balmy mist Apr 10, 2025, 1:33 AM

#

lmaoo yeah its like sonnet right?

raven void Apr 10, 2025, 1:39 AM

#

mini looks like a pareto frontier model though

brittle tiger Apr 10, 2025, 1:41 AM

#

keen beacon pretty bad pricing lmao

For being huge and propped up on infra faster than even seemed possible it's not that surprising tho

ivory schooner Apr 10, 2025, 2:08 AM

#

愿即将到来的Behemoth 是24k......

#

愿即将到来的Behemoth 是24k......

sturdy mica Apr 10, 2025, 2:30 AM

#

ivory schooner 愿即将到来的Behemoth 是24k......

llama 4 behemoth?

#

what

balmy mist Apr 10, 2025, 2:38 AM

#

ivory schooner 愿即将到来的Behemoth 是24k......

its coming out?

vivid oyster Apr 10, 2025, 2:44 AM

#

balmy mist its coming out?

No

#

He's saying he wishes it's 24k

#

24k karat gold

#

When it comes out

balmy mist Apr 10, 2025, 3:01 AM

#

oh lol okay

upper wolf Apr 10, 2025, 3:21 AM

#

It might’ve just been maverick with a modified system prompt it didnt seem that accurate

vivid oyster Apr 10, 2025, 3:30 AM

#

upper wolf It might’ve just been maverick with a modified system prompt it didnt seem that ...

It is

#

Behemoth is prob anonymous-test

#

It's shet

hardy pecan Apr 10, 2025, 5:16 AM

#

https://x.com/bindureddy/status/1910185483545776630

Bindu Reddy (@bindureddy) on X

Qwen 3 is coming in hours!

Good chance of topping the open source leaderboard

hardy pecan Apr 10, 2025, 6:16 AM

#

Tested Grok 3 Beta in OpenRouter for the 20 public SimpleBench questions, it got 6/20

torn mantle Apr 10, 2025, 6:17 AM

#

hardy pecan Tested Grok 3 Beta in OpenRouter for the 20 public SimpleBench questions, it got...

How much did gemini pro get

oblique flint Apr 10, 2025, 6:18 AM

#

still no 2.5 flash

#

🥲

hardy pecan Apr 10, 2025, 6:27 AM

#

torn mantle How much did gemini pro get

9/20

torn mantle Apr 10, 2025, 6:34 AM

#

hardy pecan 9/20

does grok 3 use reasoning?

woeful geyser Apr 10, 2025, 7:20 AM

#

Got Maverick full release! 😌

Screenshot_2025-04-10-14-18-27-070_com.android.chrome.jpg

#

Quite a surprise, but that's a good thing: I can't pinpoint the exact model because its "vibe" doesn't scream out loud.

Hoping for the actual result.

torn mantle Apr 10, 2025, 7:48 AM

#

https://x.com/btibor91/status/1910237861674353108

Tibor Blaho (@btibor91) on X

New ChatGPT web app version deployed just now (after midnight San Francisco time) adds mentions of "o4-mini", "o4-mini-high" and "o3"

drifting thorn Apr 10, 2025, 7:50 AM

#

WHY ALL APIs have a context limit of only 1 million

#

I was just trying to set up a MCP server with Cline

#

it failed

#

it says it's out of context window when it's like 80% done

#

arghhhhhhhhhhh

cedar tide Apr 10, 2025, 7:58 AM

#

torn mantle does grok 3 use reasoning?

Not yet

cedar tide Apr 10, 2025, 8:00 AM

#

hardy pecan https://x.com/bindureddy/status/1910185483545776630

https://x.com/JustinLin610/status/1910230919346270300?t=y1qXCeqpSD6AIPKMQVNn-w&s=19

Junyang Lin (@JustinLin610) on X

Sorry Bindu, this is not gonna happen that soon. We still need some more time.

drifting thorn Apr 10, 2025, 8:01 AM

#

I need real 10 million context window model

#

There's been 3 months since the publication of "Titans" architecture by Google

#

I hope Google can make a reasoning large multimodal reasoning model out of that architecture

cedar tide Apr 10, 2025, 8:10 AM

#

torn mantle why would anyone prefer grok 3 over gemini 2.5 pro?>

Wait grok 3 reasoning

drifting thorn Apr 10, 2025, 8:12 AM

#

Since Grok 3 have no ethics regulations

tall summit Apr 10, 2025, 8:16 AM

#

hihi

fleet lintel Apr 10, 2025, 8:20 AM

#

is dragontail as good as nightwhisper?

torn mantle Apr 10, 2025, 8:24 AM

#

cedar tide https://x.com/JustinLin610/status/1910230919346270300?t=y1qXCeqpSD6AIPKMQVNn-w&s...

seems like they werent satisfied with the performance

torn mantle Apr 10, 2025, 8:24 AM

#

fleet lintel is dragontail as good as nightwhisper?

coding wise?

#

no

#

nightwhisper is finetuned intensively on coding

#

they probably made a good reward model for web dev

#

styling wise

cedar tide Apr 10, 2025, 8:28 AM

#

Dragontail

Screenshot_2025-04-10-10-26-54-996_com.android.chrome-edit.jpg

noble zinc Apr 10, 2025, 8:30 AM

#

could dragon tail be o4-mini or o3?

cedar tide Apr 10, 2025, 8:31 AM

#

noble zinc could dragon tail be o4-mini or o3?

Its from google

noble zinc Apr 10, 2025, 8:32 AM

#

hopefully pricing remains similar to 2.0 flash

torn mantle Apr 10, 2025, 8:33 AM

#

its def not nightwhisper

#

i got much better results for a discord clone from nw

#

its on par with gemini 2.5 pro

#

idk if its better or not

cedar tide Apr 10, 2025, 8:50 AM

#

cedar tide Dragontail

sorry for the bad screenshot

#

screenshot on laptop
dragontail

#

2.5 pro

#

i think its gemini 2.5 pro, low thinking

#

the results are very similar to 2.5 and it thinks but for less time

oblique flint Apr 10, 2025, 9:00 AM

#

o4 mini 👀 I have no hope for o3 being affordable though lol

hazy quest Apr 10, 2025, 9:04 AM

#

Do you guys have access to Veo2 in AI Studio? It seems to be rolling out, some have it, some don't. I don't 😦

hardy pecan Apr 10, 2025, 9:05 AM

#

Yeh I've tried it out

keen beacon Apr 10, 2025, 9:18 AM

#

torn mantle https://x.com/btibor91/status/1910237861674353108

it's openai's launch day in the week... we may actually be getting things 👀

oblique flint Apr 10, 2025, 9:25 AM

#

hmm will o4 mini launch before 2.5 flash, seems like it lol

keen beacon Apr 10, 2025, 9:34 AM

#

i wonder if o4 mini is based on an updated 4o mini base

#

or if its just more roids 🤣

#

seems like google sort of forced their hand

#

how could it be o4 mini

#

its literally not thinking

#

it streams immediately and there is no apparent thinking

#

yeah

sage raptor Apr 10, 2025, 9:49 AM

#

maybe its o4 mini low

keen beacon Apr 10, 2025, 9:49 AM

#

no

#

and i benchmarked it it must be an insane regression from o3 mini 🤣

#

there is 0 thinking

#

nada

#

zero

#

people are crazy 🤣

#

i'm more interested in this

#

they're adding it to openrouter "tomorrow morning" (EST)

#

imagine if its 2.5 flash 🤣

#

same naming scheme as the anon openai model so i don't think so lmao

keen beacon Apr 10, 2025, 9:50 AM

#

keen beacon same naming scheme as the anon openai model so i don't think so lmao

could be named like that to trip people

#

shrug i find that a little unlikely

brittle tiger Apr 10, 2025, 9:50 AM

#

Why does open router say o4-mini and have same stats basically?

keen beacon Apr 10, 2025, 9:51 AM

#

openrouter doesn't say o4 mini

brittle tiger Apr 10, 2025, 9:51 AM

#

Lmaoo I need to stop trying to use brain 2 min after waking uo

torn mantle Apr 10, 2025, 10:10 AM

#

im not hyped for any o-serie model

keen beacon Apr 10, 2025, 10:11 AM

#

suit yourself

#

i'm still very interested

#

competition = always good

alpine coral Apr 10, 2025, 10:30 AM

#

keen beacon its literally not thinking

quaser ?

keen beacon Apr 10, 2025, 10:31 AM

#

alpine coral quaser ?

ye

alpine coral Apr 10, 2025, 10:31 AM

#

yeah no thinking there.. but it can't be o4-mini can it?

#

tf is the point of their naming schema if that is the case ha

keen beacon Apr 10, 2025, 10:31 AM

#

alpine coral yeah no thinking there.. but it can't be o4-mini can it?

i measured gpqa diamond and math 500 it must be a severe regression if it was o4 mini (which it can't be because its not thinking)

alpine coral Apr 10, 2025, 10:32 AM

#

yeah the non-thihking alone

keen beacon Apr 10, 2025, 10:32 AM

#

#general message

alpine coral Apr 10, 2025, 10:32 AM

#

but also performance-wise, solid as it is - it's not rubbing up against the frontier in any way

alpine coral Apr 10, 2025, 10:33 AM

#

keen beacon https://discord.com/channels/1340554757349179412/1340554757827461211/13591565679...

ah yup i recall

keen beacon Apr 10, 2025, 10:34 AM

#

also: o3 mini gpqa diamond: 74.8%, math 500: 97.3%

#

@keen beacon you should test optimus alpha when it releases

keen beacon Apr 10, 2025, 10:35 AM

#

keen beacon <@456226577798135808> you should test optimus alpha when it releases

i will if rate limits are like quasar

alpine coral Apr 10, 2025, 10:37 AM

#

some scores from a quiz (~20 questionss in one prompt; same as shared above here somewhere)
just fwiw

#

wait.. that wasn't sorted right

#

tall summit Apr 10, 2025, 10:40 AM

#

keen beacon i'm more interested in this

which server's that?

keen beacon Apr 10, 2025, 10:43 AM

#

openrouter

visual turret Apr 10, 2025, 10:43 AM

#

why is 3.7 sonnet in 23 place? gemini flash lite is in 16

keen beacon Apr 10, 2025, 10:43 AM

#

use style control..

visual turret Apr 10, 2025, 10:45 AM

#

keen beacon use style control..

well i know but i want to know why tbh.

visual turret Apr 10, 2025, 10:47 AM

#

keen beacon use style control..

still doesn't make sense

keen beacon Apr 10, 2025, 10:48 AM

#

that's just what something based on human preference will produce 🤷‍♂️

oblique flint Apr 10, 2025, 11:02 AM

#

man the difference in capability between 2.5 pro in ai studio vs cursor is so huge. I really wish it wasnt so bad in cursor lol

calm sequoia Apr 10, 2025, 11:29 AM

#

keen beacon openrouter

Coyld you explain what does it mean "the open-router server"?

fleet lintel Apr 10, 2025, 11:41 AM

#

alpine coral

That's disappointing. I am hoping for better models than 2.5 Gemini

keen fulcrum Apr 10, 2025, 11:50 AM

#

oblique flint man the difference in capability between 2.5 pro in ai studio vs cursor is so hu...

Just use roocode

#

It deleted my 800k context conversation which is frustrating though

oblique flint Apr 10, 2025, 11:51 AM

#

I have been avoiding roocode because I hear it can be expensive af

golden ocean Apr 10, 2025, 12:12 PM

#

I created new chatgpt alt account on new chrome profile and I can use gpt 4o forever it seems unlike my og chatgpt account. Can also generate way more images

#

Is that a feature for new accounts or something

#

Will prob return to normal after a while

balmy mist Apr 10, 2025, 12:18 PM

#

wait who owns dragontail?

balmy mist Apr 10, 2025, 12:19 PM

#

fleet lintel That's disappointing. I am hoping for better models than 2.5 Gemini

2.5 is too good

#

lol

#

https://x.com/googlecloud/status/1910124837781328117

Google Cloud (@googlecloud) on X

TPU x @ssi news: "OpenAI co-founder and former chief scientist @ilyasut’s new AI startup, Safe Superintelligence, is using Google Cloud’s TPU chips to power its AI research," via @TechCrunch ↓ https://t.co/G0jnB5rEua

#

i wonder what progress he has made

vague orbit Apr 10, 2025, 12:28 PM

#

Any reports on how Cognito performs?

sonic tendon Apr 10, 2025, 12:35 PM

#

visual turret still doesn't make sense

i think most of it is that claude's terse response style doesn't work well when you're trying to do quick one-shot comparisons between models

#

which is what lmarena "power users" tend to do en masse, i think

vague orbit Apr 10, 2025, 12:37 PM

#

What kind of persona is a lmarena power user?

sonic tendon Apr 10, 2025, 12:37 PM

#

sonic tendon i think most of it is that claude's terse response style doesn't work well when ...

it's nice when you're expecting it though

sonic tendon Apr 10, 2025, 12:37 PM

#

vague orbit What kind of persona is a lmarena power user?

people that talk a lot on this discord, basically

#

and/or do more than ~5 chats on lmarena a day on average

#

(that number is totally arbitrary, i just made it up)

vague orbit Apr 10, 2025, 12:38 PM

#

I would guess that if you have a problem and wanted to solve it with an LLM, throwing it at LMArena would lead to really mixed results

sonic tendon Apr 10, 2025, 12:39 PM

#

vague orbit I would guess that if you have a problem and wanted to solve it with an LLM, thr...

personally, i do it just to try and explore different models (plus, it's one of the lowest-barrier-of-entry ways to get alpha access to new LLMs as they're being developed)

#

but yeah, you're right

#

i don't use it as a general-purpose chatbot, it's more for my own curiosity

tall summit Apr 10, 2025, 12:40 PM

#

sonic tendon and/or do more than ~5 chats on lmarena a day on average

guess im a power user now

sonic tendon Apr 10, 2025, 12:40 PM

#

tall summit guess im a power user now

welcome to the club :3

sonic tendon Apr 10, 2025, 12:42 PM

#

keen beacon <@456226577798135808> you should test optimus alpha when it releases

what's optimus alpha

keen beacon Apr 10, 2025, 12:42 PM

#

anonymous model releasing on arena today

sonic tendon Apr 10, 2025, 12:42 PM

#

ah

#

openrouter too?

keen beacon Apr 10, 2025, 12:42 PM

#

sorry i meant openrouter

#

lmao

sonic tendon Apr 10, 2025, 12:42 PM

#

all good

tall summit Apr 10, 2025, 12:42 PM

#

sonic tendon welcome to the club :3

hyped

calm sequoia Apr 10, 2025, 12:43 PM

#

calm sequoia

poll_question_text

Best Deep Research

victor_answer_votes

8

total_votes

15

victor_answer_id

1

victor_answer_text

Gemini

victor_answer_emoji_name

🫡

sonic tendon Apr 10, 2025, 12:43 PM

#

does openrouter have a discord?

keen beacon Apr 10, 2025, 12:43 PM

#

yup

#

look at their footer

sonic tendon Apr 10, 2025, 12:43 PM

#

joined

#

oh, grok 3's finally open access now

tall summit Apr 10, 2025, 12:45 PM

#

sonic tendon what's optimus alpha

i am hearing about so many random models and i've been in the server for 2 minutes

sonic tendon Apr 10, 2025, 12:46 PM

#

lmaoo

#

yeah 2025 ai scene in a nutshell

sonic tendon Apr 10, 2025, 12:52 PM

#

drifting thorn WHY ALL APIs have a context limit of only 1 million

what exactly are you trying to do? most models tend to perform pretty poorly when operating with that much context, even if they do support it officially

#

especially with coding, in my experience

#

off the top of my head, i don't know of any models that support much more than 1M context, anyway

keen beacon Apr 10, 2025, 12:54 PM

#

the new llama models are supposed to but they suck balls

#

and some of the geminis iirc go to 2M

sonic tendon Apr 10, 2025, 12:55 PM

#

i forget what that fiction context window benchmark is called

ocean vortex Apr 10, 2025, 12:56 PM

#

I find it odd this name matching, even the context matches? 🤯

https://x.com/SILXLAB/status/1909475116637208937?t=Jd0rllJnICEeVyKNPuYZ_g&s=19

SILX (@SILXLAB) on X

Quasar this, Quasar that...

It’s finally time to fully announce the new Quasar Series — introducing our 400B parameter models (1m context!!!) and smaller ones, the first of their kind. Built using a new scaling law training pipeline, advanced techniques, and more. By @LambdaAPI

keen beacon Apr 10, 2025, 12:56 PM

#

ocean vortex I find it odd this name matching, even the context matches? 🤯 https://x.com/...

17 yr old grifter or smthing. just a larper just ignore

sonic tendon Apr 10, 2025, 12:57 PM

#

lol

keen beacon Apr 10, 2025, 12:57 PM

#

😭

ocean vortex Apr 10, 2025, 12:58 PM

#

keen beacon 17 yr old grifter or smthing. just a larper just ignore

probably. It's somewhat unusual for openai though as well. It's still not released and only on openrouter

sonic tendon Apr 10, 2025, 12:58 PM

#

0 weights

keen beacon Apr 10, 2025, 12:58 PM

#

ocean vortex probably. It's somewhat unusual for openai though as well. It's still not releas...

this guy is actually 17 yrs old

sonic tendon Apr 10, 2025, 12:59 PM

#

keen beacon the new llama models are *supposed* to but they suck balls

meta in general seems to have been sucking balls lately

#

/nonsexual

keen beacon Apr 10, 2025, 1:00 PM

#

behemoth better be good

sonic tendon Apr 10, 2025, 1:00 PM

#

it's gonna suck

#

probably

keen beacon Apr 10, 2025, 1:00 PM

#

it'll probably underperform

#

we'll see

#

hopefully we get o3 today

#

o4 mini

sonic tendon Apr 10, 2025, 1:00 PM

#

oh, on that note

#

#

man, i need to stop doubting myself

keen beacon Apr 10, 2025, 1:01 PM

#

wats that for

#

i kinda doubt it will take top position

#

idk

sonic tendon Apr 10, 2025, 1:01 PM

#

keen beacon wats that for

https://polymarket.com/event/which-company-has-best-ai-model-end-of-april/will-openai-have-the-top-ai-model-on-april-30?tid=1744290054541

Polymarket

Which company has best AI model end of April?

Polymarket | This market will resolve according to the company which owns the model which has the highest arena score based off the Chatbot Arena LLM Leaderb...

keen beacon Apr 10, 2025, 1:01 PM

#

maybe joint 1st w/ stylectrl

north vale Apr 10, 2025, 1:01 PM

#

It’ll matter a lot to how well reasoning scales to superhuman levels how o4-mini performs

#

It really depends whether they posttrained o3 on their new 4o stuff

#

imo

keen beacon Apr 10, 2025, 1:02 PM

#

keen beacon maybe joint 1st w/ stylectrl

it also depends on which reasoning effort version they put on the arena

#

o3 low will be very different from o3 high

#

its conceivable the new anonymous chatbot could top the leaderboard

keen beacon Apr 10, 2025, 1:02 PM

#

keen beacon its conceivable the new anonymous chatbot could top the leaderboard

which one

#

there are many

sonic tendon Apr 10, 2025, 1:02 PM

#

keen beacon i kinda doubt it will take top position

i mean, i had a pretty good impression of it, but you may be right

keen beacon Apr 10, 2025, 1:02 PM

#

keen beacon which one

anonymous chatbot/quasar by openai. since the leaderboard is mainly human preference

#

quasar isn't on the arena

#

and even if it was

keen beacon Apr 10, 2025, 1:03 PM

#

keen beacon quasar isn't on the arena

it is lol

north vale Apr 10, 2025, 1:03 PM

#

It’s ass compared to 2.5

keen beacon Apr 10, 2025, 1:03 PM

#

i don't think it would take the top spot tbh

#

it's not as nice to talk to as chatgpt 4o latest

#

also seems less creative

oblique flint Apr 10, 2025, 1:03 PM

#

keen beacon o3 low will be very different from o3 high

well if it takes like several minutes to answer a question idk if it will get the top stop lol

keen beacon Apr 10, 2025, 1:03 PM

#

keen beacon it is lol

no it isn't

#

it's on openrouter

sonic tendon Apr 10, 2025, 1:03 PM

#

keen beacon it's not as nice to talk to as chatgpt 4o latest

yeah, as we've learned, this matters a lot more than people think it does

keen beacon Apr 10, 2025, 1:03 PM

#

keen beacon it's on openrouter

anonymous chatbot is the same though lol

sonic tendon Apr 10, 2025, 1:04 PM

#

oblique flint well if it takes like several minutes to answer a question idk if it will get th...

in the original (gradio) lmarena, it pauses the output of both models until they both start sending tokens

keen beacon Apr 10, 2025, 1:04 PM

#

anonymous chatbot isn't on the arena

#

not the same name, but its the same model

brittle tiger Apr 10, 2025, 1:04 PM

#

making up 30 elo points is harder than it seems too

sonic tendon Apr 10, 2025, 1:04 PM

#

so, both models get paused until they both finish thinking, basically

keen beacon Apr 10, 2025, 1:04 PM

#

there's anonymous-test