#general | Arena | Page 20

keen beacon Apr 14, 2025, 2:06 AM

#

it kinda fell off after those

#

and of course it was pretty bad at long context coherence

#

i remember back when you could only access it via api or via their slack bot

#

they had a thing on poe too

#

i used the slack bot from when it released until when claude.ai released

#

unlimited messages for 20$ LOL

keen beacon Apr 14, 2025, 2:07 AM

#

keen beacon they had a thing on poe too

that was a bit later

#

poe was a crazy product back in the day

keen beacon Apr 14, 2025, 2:07 AM

#

keen beacon that was a bit later

no they had claude 1 i think

#

they've nerfed it 50 times over

keen beacon Apr 14, 2025, 2:07 AM

#

keen beacon no they had claude 1 i think

i'm not disputing that, i'm saying that it was available on anthropic's slack bot before it was available via poe

#

maybe i dont remembe rlol

keen beacon Apr 14, 2025, 2:08 AM

#

keen beacon they've nerfed it 50 times over

ya their initial sub was an insane deal

#

unlimited messages w claude which is pretty expensive and with a large context window i think

#

plus several incidents (a single guy doing $10k in a single month, etc.)

#

theyre trying to recoup those costs 🤣

#

yeah i'll be honest i may have run up said costs 👀

plain zinc Apr 14, 2025, 2:09 AM

#

I don't get dragontail anymore.

#

Has he disappeared?

keen beacon Apr 14, 2025, 2:09 AM

#

they pulled the free trial too

#

stripe free trials are exceptionally easy to abuse

keen beacon Apr 14, 2025, 2:10 AM

#

keen beacon they pulled the free trial too

that was another incident

#

token generators

#

there was a discord server with poe token

#

yea

#

it was so easy

#

yes

#

lmaoo 😭

#

'tis probably time for me to go

#

goodnight

#

gn

drifting thorn Apr 14, 2025, 2:13 AM

#

gn

#

now guessing u r german

hardy pecan Apr 14, 2025, 2:16 AM

#

o7

alpine coral Apr 14, 2025, 2:19 AM

#

keen beacon 'tis probably time for me to go

i've got another batch of questions for PREVIEW if you would be so kind - but can ofc wait till (your) tomorrow!

leaden palm Apr 14, 2025, 2:25 AM

#

i haven't heard anything like that

keen beacon Apr 14, 2025, 2:40 AM

#

Quasar was up for a little bit

#

As anonymous chatbot

#

They removed it I think

drifting thorn Apr 14, 2025, 2:49 AM

#

This is the test result on Chinese writing, with English translation

📎 test_on_Chinese_Writing.txt

#

keen beacon Apr 14, 2025, 2:52 AM

#

it really depends on what kind of article (blog post, subject etc?)

drifting thorn Apr 14, 2025, 2:52 AM

#

My question is to write a 1000 word realistic fiction(寫一篇1000字左右的都市小説。)

keen beacon Apr 14, 2025, 2:53 AM

#

if its a story r1 will win

#

amongst those options

drifting thorn Apr 14, 2025, 2:53 AM

#

just download it and have a look at the translations

#

i know you guys don't know Chinese

#

so I provided a translated version below(translated by 2.5 Pro)

keen beacon Apr 14, 2025, 2:55 AM

#

do u mean story? btw. i changed my vote to r1 assuming u are talking about stories as that is a story

drifting thorn Apr 14, 2025, 2:57 AM

#

yup I'm talking bout stories

#

well my preferences is 2.5>=2.0>R1>>>o3-mini

#

o3-mini's story is bland and it lacks details. R1's story is deviated from my settings. Though 2.5 and 2.0's story is also bland, it is able to be descriptive. The characteristic and motive of the characters seemed to be more reasonable in 2.5

#

I guess it's stylistic choice for 2.5 and 2.0 for their plots, since I've seen some literatures that look like these writings

balmy mist Apr 14, 2025, 3:09 AM

#

damn there is a lot of talk in the chat today, did i miss something big?

drifting thorn Apr 14, 2025, 3:10 AM

#

Gemini will be connected to Veo

balmy mist Apr 14, 2025, 3:14 AM

#

thats all that happened?

drifting thorn Apr 14, 2025, 3:16 AM

#

not really

#

I recommend you to actually have a look of it

ivory schooner Apr 14, 2025, 3:25 AM

#

你好

#

我正在学会等待Behemoth 推出

drifting thorn Apr 14, 2025, 3:25 AM

#

怎麽看四個Thinking model的作文？

#

評個分唄

ivory schooner Apr 14, 2025, 3:26 AM

#

我希望Behemoth 是24k...

drifting thorn Apr 14, 2025, 3:26 AM

#

上面聊過了，估計Behemoth會像GPT 4.5那樣廢廢的

#

還有，你會看英文嗎？

ivory schooner Apr 14, 2025, 3:51 AM

#

drifting thorn 上面聊過了，估計Behemoth會像GPT 4.5那樣廢廢的

确实有点像

keen beacon Apr 14, 2025, 4:25 AM

#

Is o3 full better then gemini 2.5 pro?

zinc ore Apr 14, 2025, 4:28 AM

#

Unknown

alpine coral Apr 14, 2025, 4:43 AM

#

keen beacon Is o3 full better then gemini 2.5 pro?

arc agi benchmark suggests so (though it also indicates that Flash 2.0 scored the same as 2.5-Pro-exp... which is admittedly not what I would have expected to see.. though there's a few asterisks there for 2.5.. and the costs for o3 low are literally insane)

ig the picture will become clearer when 2.5 is no longer exp/preview, and o3 is actually released ha

hardy pecan Apr 14, 2025, 4:43 AM

#

I'd guess about the recent openai models coming

4.1 nano is Quasar
4.1 min is Optimus Alpha

alpine coral Apr 14, 2025, 4:45 AM

#

hmm i dunno.. optimus alpha seems comparable if not better than quasar, but faster

hardy pecan Apr 14, 2025, 4:45 AM

#

I thought Quasar was faster, but maybe i'm misremembering

alpine coral Apr 14, 2025, 4:46 AM

#

yeah im a bit muddled in my thoughts about it too ha

#

quaser was blazingly fast initially; but then i think kinda slowed down (perhaps under additional load or something).. then optimus was added, and it seemed faster than quaser, but yeah kinda just going by memory - which is totally unreliable

#

yeah disregard the above... quaser is apparently like 7 times faster than optimus lol

alpine coral Apr 14, 2025, 4:59 AM

#

hardy pecan I thought Quasar was faster, but maybe i'm misremembering

you're not ha

hardy pecan Apr 14, 2025, 5:02 AM

#

I'd expect that to be nano then surely? Non reasoning nano is blazing fast!

#

Not sure if any others have been released to the mass public via lmarena or open router. Except for o3 that some have had access to

plain zinc Apr 14, 2025, 5:08 AM

#

Have you encountered dragontail today?

alpine coral Apr 14, 2025, 5:13 AM

#

i have yeah

torn mantle Apr 14, 2025, 5:35 AM

#

hardy pecan I'd guess about the recent openai models coming 4.1 nano is Quasar 4.1 min is O...

those arent of a big improvement tbh

#

Openai still didn't crack the code for a good coding model

hardy pecan Apr 14, 2025, 5:36 AM

#

Nah, they are not great personally, but I guess they serve a different purpose and not meant to be SOTA

torn mantle Apr 14, 2025, 5:37 AM

#

hardy pecan Nah, they are not great personally, but I guess they serve a different purpose a...

Which is?

#

Well the only noticeable update is the context window

hardy pecan Apr 14, 2025, 5:44 AM

#

torn mantle Which is?

I suspect they might be the open-source models that can run on a phone etc

#

locally etc

#

that would be their goal,

torn mantle Apr 14, 2025, 5:59 AM

#

https://x.com/testingcatalog/status/1911659791098708044

TestingCatalog News 🗞 (@testingcatalog) on X

BREAKING 🚨: OpenAI is preparing to launch several new models this week:
- Full o3,
- o4-mini,
- GPT-4.1, 4.1 nano, 4.1 mini

As model cards have been updated recently on the website

Image by @MeetPatelTech

novel flame Apr 14, 2025, 6:08 AM

#

Regular people aren’t paying for LLMs though. OpenAI is throwing money down the toilet at an incredible pace. I would hesitate to call them great at creating products until they can turn a profit.

hardy pecan Apr 14, 2025, 6:25 AM

#

Which model do you think we get tomorrow?

#

Starting strong with o3?

torn mantle Apr 14, 2025, 6:30 AM

#

hardy pecan Starting strong with o3?

i think they will leave the final boss till the end

#

to build enough hype

#

or they can just straight up release it from the start

#

the thing is o3 is a good model but its quite pricey

#

so i dont think it will be available for the general public

hardy pecan Apr 14, 2025, 6:33 AM

#

Pro plan users will get it surely

#

I'm doubtful of plus though

ember rapids Apr 14, 2025, 6:40 AM

#

torn mantle i think they will leave the final boss till the end

They might try to front run google or vice versa

#

Regardless it’s gonna be an amazing week

hardy pecan Apr 14, 2025, 6:50 AM

#

The space has been really fun the last few years, feels like we always have something to look forward to or surprised about every few weeks

#

Golden age tbh

mossy drum Apr 14, 2025, 7:25 AM

#

New model in Arena: cobalt-exp-beta-v3

calm sequoia Apr 14, 2025, 7:28 AM

#

Maybe amazon titan again

novel flame Apr 14, 2025, 7:29 AM

#

Also ‘luca’?

novel flame Apr 14, 2025, 7:30 AM

#

mossy drum New model in Arena: cobalt-exp-beta-v3

Cobalt says it’s part of Amazon Titan, yes

torn mantle Apr 14, 2025, 7:33 AM

#

mossy drum New model in Arena: cobalt-exp-beta-v3

wasnt this already in the arena

#

or was it v2?

#

i remember seeing this model

fleet lintel Apr 14, 2025, 7:36 AM

#

torn mantle https://x.com/testingcatalog/status/1911659791098708044

They must be better than 2.5 pro. I am excited to see how much better they are

drifting thorn Apr 14, 2025, 7:36 AM

#

I suggest not puting that HIGH hope on OpenAI

fleet lintel Apr 14, 2025, 7:40 AM

#

OAI is unlike grok, llama, deepseek. They must have the best models in the market or they will start losing the edge.

keen fulcrum Apr 14, 2025, 7:45 AM

#

drifting thorn I suggest not puting that HIGH hope on OpenAI

Better than llama

novel flame Apr 14, 2025, 7:51 AM

#

Cobalt is not a great coder. Just had it go head to head with 3.7 Sonnet in the arena. Not surprised, since Titan is just about the worst LLM ever built by a major corporation.

torn mantle Apr 14, 2025, 7:53 AM

#

amazon models arent that good

#

they do not have a dedicated R&D department for LLM training

#

they are just trying to copy what the others are doing

novel flame Apr 14, 2025, 7:55 AM

#

torn mantle amazon models arent that good

At least Nova Pro is up there with the big boys — not at the top, but still; Titan is just garbage.

novel flame Apr 14, 2025, 7:56 AM

#

torn mantle they do not have a dedicated R&D department for LLM training

They seem to have decided to invest big in Anthropic instead of trying to build in-house. I think it’s a smart choice. That’s also why I don’t understand why Titan exists.

fleet lintel Apr 14, 2025, 7:57 AM

#

Not sure why Amazon is even trying to build models? Why not put more money in Anthropic?

calm sequoia Apr 14, 2025, 7:58 AM

#

The o3 was internally released before december. If it's at least equal to the 2.5 Pro, it means oAI is still months in advance of Google.

novel flame Apr 14, 2025, 7:58 AM

#

And for what it is, Q isn’t terrible to be honest. I find myself occasionally using ‘q chat’ on the commandline instead of opening Gemini when I need something simple but not simple enough for Copilot completion

novel flame Apr 14, 2025, 8:00 AM

#

fleet lintel Not sure why Amazon is even trying to build models? Why not put more money in An...

The only reason I can see is to support their non-text services - image, video, audio. Claude doesn’t give them those

novel flame Apr 14, 2025, 8:04 AM

#

calm sequoia The o3 was internally released before december. If it's at least equal to the 2....

Is not an assembly line. Being ahead in “time” means nothing if you are focusing too much on the wrong improvements. Google has shown an ability to rapidly train and launch new models, which could be a game changer. Not to mention the possibility of a new vastly superior model architecture entering the fray soon 😉

calm sequoia Apr 14, 2025, 8:10 AM

#

The speed of Google may be faster indeed. But o4-mini implies the existance of o4. The Google will be winner in the long term. But for this year I would still bet on oAI

drifting thorn Apr 14, 2025, 8:18 AM

#

Would love to see the race in multimodality, maths and coding performance, reasoning performance, creativity and context window

novel flame Apr 14, 2025, 8:31 AM

#

calm sequoia The speed of Google may be faster indeed. But o4-mini implies the existance of o...

I wouldn’t be so quick to predict long term winners. This is early early days of modern AI.

ocean vortex Apr 14, 2025, 9:02 AM

#

torn mantle https://x.com/testingcatalog/status/1911659791098708044

if that's true then new mini gonna be more expensive than the old for sure lol

#

@keen beacon

keen beacon Apr 14, 2025, 9:04 AM

#

WHO RECOMMENDED CURSOR?

#

TO ME?

#

YOU DESERVE A GIFT

#

did NOT know it can do this

#

actually perfect for my world map data project 🙂

ocean vortex Apr 14, 2025, 9:04 AM

#

and if nano performs no worse than the old mini people can't really complain 💀

golden ocean Apr 14, 2025, 9:07 AM

#

is cursor free

#

do u need to get ur own api key for every model

#

and pay for the usage

keen beacon Apr 14, 2025, 9:10 AM

#

golden ocean do u need to get ur own api key for every model

im not paying but i put my google api in

ocean vortex Apr 14, 2025, 9:14 AM

#

keen beacon im not paying but i put my google api in

I hope you are using exp and not preview for 2.5 lol

calm sequoia Apr 14, 2025, 9:15 AM

#

keen beacon Apr 14, 2025, 9:16 AM

#

ocean vortex I hope you are using exp and not preview for 2.5 lol

#

ocean vortex Apr 14, 2025, 9:16 AM

#

fleet lintel They must be better than 2.5 pro. I am excited to see how much better they are

o3 could be expensive

#

o4-mini-high will probably be great and beat 2.5 in several areas, but context and spatial awareness is a question mark.

keen beacon Apr 14, 2025, 9:21 AM

#

are any of you game developers

#

or use ai for game development

keen fulcrum Apr 14, 2025, 9:25 AM

#

https://www.rxddit.com/r/firefox/comments/1jyemzn/nightlys_new_ai_features/

rxddit.com

Nightly's new AI features!

u/maubg on r/firefox

▶ Play video

hardy pecan Apr 14, 2025, 9:27 AM

#

luca has been here before

keen fulcrum Apr 14, 2025, 9:46 AM

#

https://storage.googleapis.com/gweb-uniblog-publish-prod/original_videos/ai-mode-animation.mp4
This with Gemini 2.5 pro is superior to Perplexity & You

▶ Play video

livid harbor Apr 14, 2025, 10:04 AM

#

We built Dingo to solve the pain points we encountered managing data quality at scale.

Github repo:（welcome star～）
https://github.com/DataEval/dingo

Online Demo:
https://huggingface.co/spaces/DataEval/dingo

GitHub

GitHub - DataEval/dingo: Dingo: A Comprehensive Data Quality Evalua...

Dingo: A Comprehensive Data Quality Evaluation Tool - DataEval/dingo

Dingo - a Hugging Face Space by DataEval

torn mantle Apr 14, 2025, 10:06 AM

#

livid harbor We built Dingo to solve the pain points we encountered managing data quality at ...

nice one

keen beacon Apr 14, 2025, 10:14 AM

#

alpine coral i've got another batch of questions for PREVIEW if you would be so kind - but ca...

gm, i can take them now :)

tall summit Apr 14, 2025, 10:30 AM

#

keen beacon 'tis probably time for me to go

so relatable

golden ocean Apr 14, 2025, 10:45 AM

#

keen fulcrum https://storage.googleapis.com/gweb-uniblog-publish-prod/original_videos/ai-mode...

how much rocks should I eat per day

keen fulcrum Apr 14, 2025, 10:46 AM

#

golden ocean how much rocks should I eat per day

Thats a prompt with dozens of AI generated content
probably generated by replacing mineral with rock

keen beacon Apr 14, 2025, 11:05 AM

#

#

used chatgpt's python sandbox feature to pull pixel image data and turn it into 255 for water 0 for black

#

result?

#

#

we live in a very dystopian era

#

it even generated the template it used

#

#

i didnt even know it could do this

calm sequoia Apr 14, 2025, 11:12 AM

#

Guys we may be in a bubble. No one cares of math and riddles 😄 We shall test new models on emotional support

keen beacon Apr 14, 2025, 11:15 AM

#

editing text from 4th to 45th? wtf

#

#

this is even more dystopian 😭

#

https://www.socialmediatoday.com/news/stanford-study-ai-bots-as-companions/744964/

Social Media Today

Study Looks at Public Opinion on the Use of AI Chatbots as Romantic...

The data shows there's a level of concern about AI bots as romantic companions.

#

https://www.nbcnews.com/tech/ai-companions-friendship-rcna194735

NBC News

Some of her closest relationships are with chatbots. That's more co...

In recent years, “AI companions” have gained massive popularity among people who crave social connection.

#

terrible

balmy mist Apr 14, 2025, 11:29 AM

#

calm sequoia The o3 was internally released before december. If it's at least equal to the 2....

How do you know Google released their best models tho?

tall summit Apr 14, 2025, 11:30 AM

#

calm sequoia Guys we may be in a bubble. No one cares of math and riddles 😄 We shall test ne...

creativity #9, that's my main use case 😭

balmy mist Apr 14, 2025, 11:30 AM

#

hardy pecan The space has been really fun the last few years, feels like we always have some...

Yeah every day we have news lol, that’s why I say we are already in the technological explosion

tall summit Apr 14, 2025, 11:32 AM

#

keen beacon https://www.socialmediatoday.com/news/stanford-study-ai-bots-as-companions/74496...

please no humanlike

balmy mist Apr 14, 2025, 11:32 AM

#

torn mantle https://x.com/testingcatalog/status/1911659791098708044

I swear we better not get the 4.1s today lmaoo

tall summit Apr 14, 2025, 11:33 AM

#

balmy mist I swear we better not get the 4.1s today lmaoo

why not

calm sequoia Apr 14, 2025, 11:39 AM

#

balmy mist How do you know Google released their best models tho?

No evidence, no rumors, too fast, 2.5 instead of 3.

balmy mist Apr 14, 2025, 11:55 AM

#

tall summit why not

I want o4 😂

balmy mist Apr 14, 2025, 11:56 AM

#

calm sequoia No evidence, no rumors, too fast, 2.5 instead of 3.

The numbers don’t mean anything bro

#

Just shows progression

calm sequoia Apr 14, 2025, 11:56 AM

#

They mean for the marketing team, and they aren't ignored 😄

balmy mist Apr 14, 2025, 11:56 AM

#

lol

#

Google has shown to be able to get new models out fast now so don’t sleep on Google, they could mess around and release another model in 2 weeks

keen fulcrum Apr 14, 2025, 12:18 PM

#

balmy mist Google has shown to be able to get new models out fast now so don’t sleep on Goo...

FT, and two others due
Nightwhisper, Dragontail and Stargazer all Google models shown on https://alpha.lmarena.ai

LMArena

An open platform for evaluating AI through human preference

calm sequoia Apr 14, 2025, 12:19 PM

#

balmy mist Google has shown to be able to get new models out fast now so don’t sleep on Goo...

Do not underestimate the bureuocratic burden that the large companies have.

keen fulcrum Apr 14, 2025, 12:19 PM

#

Nightwhisper is an early gemini 3 pro model I believe

calm sequoia Apr 14, 2025, 12:20 PM

#

LOL its first time I encountered context length problem in Gemini

#

It appears LLMs still cant analyze dna sequence CSVs because of too long context 😄

keen fulcrum Apr 14, 2025, 12:20 PM

#

calm sequoia LOL its first time I encountered context length problem in Gemini

It best if you summarize past conversation history and simplify tasks into steps

calm sequoia Apr 14, 2025, 12:21 PM

#

Sadly my CSV is 15M tokens :/

balmy mist Apr 14, 2025, 12:21 PM

#

Bruhh

#

That’s for one sequence?

#

Can’t you break it up?

#

Into 15 diff sessions

calm sequoia Apr 14, 2025, 12:24 PM

#

Will have to dig into dna sequencing to find out 😄

novel flame Apr 14, 2025, 12:25 PM

#

calm sequoia Guys we may be in a bubble. No one cares of math and riddles 😄 We shall test ne...

Sure... "therapy and companionship". That means pr0n. You know that means pr0n.

drifting thorn Apr 14, 2025, 12:25 PM

#

I just found out a usable embedding

#

It makes my knowledge base in Cherry Studio functionable

#

{
"message": "[GoogleGenerativeAI Error]: Function call not available. Response was blocked due to OTHER",
"response": {
"promptFeedback": {
"blockReason": "OTHER"
},
"usageMetadata": {
"promptTokenCount": 1984,
"totalTokenCount": 1984,
"promptTokensDetails": {
"0": {
"modality": "TEXT",
"tokenCount": 1984
},
"length": 1
}
},
"modelVersion": "gemini-2.5-pro-exp-03-25"
}
}

#

so sad

tall summit Apr 14, 2025, 12:41 PM

#

drifting thorn { "message": "[GoogleGenerativeAI Error]: Function call not available. Response ...

literally 1984

oblique flint Apr 14, 2025, 1:03 PM

#

why are a bunch of unreleased models on there? How are we supposed to know how they perform lol

alpine coral Apr 14, 2025, 1:18 PM

#

keen beacon editing text from 4th to 45th? wtf

i find that discrediting.. makes no sense lol

#

spurious a kinda methodology imo.. like they mined / analysed reddit and forum posts - to find examples of people discussing AI being useful.. ig better than nothing / perhaps somewhat representative.. but i think oai and google etc would have a much better understanding based on actual usage (surely editing text is up there somewhere ha)

#

from HBR (kinda surprisingly tbh ha) https://hbr.org/2024/03/how-people-are-really-using-genai

drifting thorn Apr 14, 2025, 1:23 PM

#

drifting thorn { "message": "[GoogleGenerativeAI Error]: Function call not available. Response ...

React to this

#

Okay my story is stopped by the current ability of AI

golden ocean Apr 14, 2025, 1:43 PM

#

Why I dont see options like making datasets or fine tuning anymore in ai studio

#

Is that still possible

keen beacon Apr 14, 2025, 1:46 PM

#

we should be getting the first openai drop(s) in 3-4 hours 👀

#

normally oai drop around 6pm BST

drifting thorn Apr 14, 2025, 1:49 PM

#

fxxk

#

github education is so hard to applicate

#

I'll just wait for R2

#

to be avaliable on OpenRouter

balmy mist Apr 14, 2025, 1:52 PM

#

keen beacon we should be getting the first openai drop(s) in 3-4 hours 👀

i hate that they wait so long, but its a good treat for lunch time for me lol

keen beacon Apr 14, 2025, 1:52 PM

#

tbf they release at like 10-11am where they are

#

which makes sense

drifting thorn Apr 14, 2025, 2:03 PM

#

gosh Google nerfed 2.5 Pro's tool calling function!!!!!

#

and 2.5 Pro now can't execute tool call

#

but why?

keen fulcrum Apr 14, 2025, 2:04 PM

#

drifting thorn gosh Google nerfed 2.5 Pro's tool calling function!!!!!

I noticed its a buggy mess

keen beacon Apr 14, 2025, 2:04 PM

#

https://x.com/OpenAI/status/1911782243640754634

OpenAI (@OpenAI) on X

developers 🤝 supermassive black hole

livestream 10am PT

drifting thorn Apr 14, 2025, 2:05 PM

#

We should wait for Deepseek R2 or nightwhisper right now

keen fulcrum Apr 14, 2025, 2:05 PM

#

keen beacon we should be getting the first openai drop(s) in 3-4 hours 👀

Will we get the new models on lmarena too?

#

Potentially upon release

keen beacon Apr 14, 2025, 2:06 PM

#

keen fulcrum Will we get the new models on lmarena too?

yes

#

not immediately though

#

it will be on the arena within a few hours

alpine coral Apr 14, 2025, 2:07 PM

#

speaking of arena models. i see the number available in direct chat increased from 95 (if memory serves) to 101

keen beacon Apr 14, 2025, 2:07 PM

#

i haven't noticed anything new in the list

#

weird

keen beacon Apr 14, 2025, 2:08 PM

#

keen beacon https://x.com/OpenAI/status/1911782243640754634

also this makes me think they're dropping 4.1 today

#

if 4.1 is the coding jump it is rumoured to be

cloud meadow Apr 14, 2025, 2:09 PM

#

Riverhollow?

#

Not sure what model this is but it's quite good

cloud meadow Apr 14, 2025, 2:11 PM

#

cloud meadow Riverhollow?

📎 message.txt

alpine coral Apr 14, 2025, 2:11 PM

#

keen beacon i haven't noticed anything new in the list

cobalt-exp-beta-v4 might be revcently added? dunno about the 2 doubao ones (haven't noticed them before, nor heard of the model tbh)

#

has o3mini always been in direct chat?

keen beacon Apr 14, 2025, 2:11 PM

#

yes

#

cobalt v4 and the doubao models are new

cloud meadow Apr 14, 2025, 2:12 PM

#

Is it a new gemini model?

vivid oyster Apr 14, 2025, 2:12 PM

#

Ye

cloud meadow Apr 14, 2025, 2:12 PM

#

It seems similar to 2.5, maybe it's a cheaper version?

vivid oyster Apr 14, 2025, 2:13 PM

#

Prob

#

Idk

#

I never used it

#

Alot

#

Just 2 times

keen beacon Apr 14, 2025, 2:13 PM

#

riverhollow is a little worse than 2.5 pro

drifting thorn Apr 14, 2025, 2:38 PM

#

what is the context limit of o3?

keen beacon Apr 14, 2025, 2:39 PM

#

200k tokens

drifting thorn Apr 14, 2025, 2:40 PM

#

Is it better than other LLMs in terms of tool-calling?

keen beacon Apr 14, 2025, 2:44 PM

#

couldn't tell you

#

but probably

drifting thorn Apr 14, 2025, 2:47 PM

#

Here's 2.5 Pro with knowledge base: Internal server error, unable to complete request

novel flame Apr 14, 2025, 2:53 PM

#

keen beacon cobalt v4 and the doubao models are new

Doubao is a popular chat LLM from ByteDance, for a long time the most popular in China

#

https://amp.scmp.com/tech/big-tech/article/3306270/alibabas-quark-surpasses-bytedances-doubao-deepseek-chinas-top-ai-app

South China Morning Post

Alibaba’s Quark surpasses ByteDance’s Doubao as China’s top A...

Quark’s rapid ascent comes as Alibaba has transformed the app from a cloud storage and search service into an ‘AI super assistant’.

drifting thorn Apr 14, 2025, 2:55 PM

#

famous for being dumb

#

it is nicknamed "Downbao", linking it to Down's Syndrome

keen beacon Apr 14, 2025, 2:57 PM

#

novel flame Apr 14, 2025, 3:00 PM

#

I had some time during lunch so I tried the new Firebase Studio with my go-to webdev test prompts and…. It sucked? I don’t understand how, but it was terrible. I could have given the same prompts directly to Gemini 2.5 Pro with no agentic framework and gotten far better results.

#

Oh I’ve tried basically all the other ones already - and Firebase Studio was among the worst

#

I suspect Firebase Studio is not using 2.5 Pro — and it’s agentic framework / tool orchestration is just really bad. It desperately wants to build a NextJS app and then immediately forgets to add all of the features it planned to build.

#

Cost

keen beacon Apr 14, 2025, 3:04 PM

#

novel flame I had some time during lunch so I tried the new Firebase Studio with my go-to we...

yeah it wasn't good in my experiencxe either

#

experience*

#

nope

#

i believe it is using flash

drifting thorn Apr 14, 2025, 3:05 PM

#

I’ve heard bout Firebase Studio’s critics before

sonic tendon Apr 14, 2025, 3:05 PM

#

yeah, the anon google models are probably 2.5-flash-*

balmy mist Apr 14, 2025, 3:05 PM

#

keen beacon https://x.com/OpenAI/status/1911782243640754634

why do they say stuff like that lmaoo, super massive blackhole

sonic tendon Apr 14, 2025, 3:06 PM

#

keen beacon https://x.com/OpenAI/status/1911782243640754634

anyone wanna do a vc then?

balmy mist Apr 14, 2025, 3:06 PM

#

keen beacon if 4.1 is the coding jump it is rumoured to be

so 4.1 is a better coding model then o4 and o3?

balmy mist Apr 14, 2025, 3:06 PM

#

sonic tendon anyone wanna do a vc then?

yeah can you stream it please

keen beacon Apr 14, 2025, 3:07 PM

#

should be able to join as well

sonic tendon Apr 14, 2025, 3:07 PM

#

balmy mist yeah can you stream it please

I should be able to, but like

#

80% chance i can make it

keen beacon Apr 14, 2025, 3:07 PM

#

balmy mist why do they say stuff like that lmaoo, super massive blackhole

quasars are powered by super massive blackholes

#

so i think quasar alpha is whatever we're getting today

#

(likely 4.1)

sonic tendon Apr 14, 2025, 3:07 PM

#

keen beacon quasars are powered by super massive blackholes

OHHHH

novel flame Apr 14, 2025, 3:07 PM

#

The poor Firebase Studio performance is so strange though; I would’ve thought they could easily build something at least as good as Bolt or Lovable or v0 — and fully integrated with Firebase since they own it. That could have destroyed the competition

drifting thorn Apr 14, 2025, 3:12 PM

#

Gemma 3 be the base model

sonic tendon Apr 14, 2025, 3:14 PM

#

pondering whether or not to bet on oAI topping the lmarena leaderboard before the end of this month

torn mantle Apr 14, 2025, 3:26 PM

#

yea as i said

#

they will start slow

#

o3 is probably on last day

thorny drum Apr 14, 2025, 3:28 PM

#

i wonder what o3 pricings gonna be

#

do you think they've got it down to something feasible?

balmy mist Apr 14, 2025, 3:28 PM

#

keen beacon quasars are powered by super massive blackholes

ahh that makes sense lol

thorny drum Apr 14, 2025, 3:28 PM

#

or is it gonna be once a month for pro users + no api

balmy mist Apr 14, 2025, 3:28 PM

#

i wanted o3 or o4 mini

#

i wish i didnt test quasar already

thorny drum Apr 14, 2025, 3:29 PM

#

arc-agi had o3 low as like $200 per task and o3 high as like 100x(?) more?

balmy mist Apr 14, 2025, 3:29 PM

#

like they didnt make it better since openrouter right?

#

so its hard for me to care about it

#

especially since it was free on open router

#

and now i gotta pay for it lol

keen beacon Apr 14, 2025, 3:30 PM

#

balmy mist like they didnt make it better since openrouter right?

they probably have

#

there's a chance quasar on openrouter was mini or nano

balmy mist Apr 14, 2025, 3:30 PM

#

dont get me excited and have me blue balled again lol

#

anyone streaming it here?

#

we should make it a community event

#

so we can talk about it together lol

keen beacon Apr 14, 2025, 3:32 PM

#

keen beacon there's a chance quasar on openrouter was mini or nano

Very unlikely the verge mentioned 4.1 itself being revamped gpt 4o

#

I benchmarked quasar and it lined up with chatgpt 4o latest

balmy mist Apr 14, 2025, 3:32 PM

#

lol

keen beacon Apr 14, 2025, 3:33 PM

#

The mini model is seemingly great if it's optimus

#

Gets seemingly lower scores on traditional benchmarks but it seems it's really good

keen beacon Apr 14, 2025, 3:33 PM

#

balmy mist dont get me excited and have me blue balled again lol

lmao @ blue balled

sonic tendon Apr 14, 2025, 3:34 PM

#

hmm

#

contemplating whether or not to bet on oAI topping the charts by april 30

keen beacon Apr 14, 2025, 3:34 PM

#

openai's endpoint for optimus is returning a 502 now

#

#

perhaps it's a sign

sonic tendon Apr 14, 2025, 3:34 PM

#

interesting

keen beacon Apr 14, 2025, 3:34 PM

#

Optimus was getting fcking bombarded yesterday lol

balmy mist Apr 14, 2025, 3:35 PM

#

i hope they are free still

keen beacon Apr 14, 2025, 3:35 PM

#

It was sooo slow

#

doesn't surprise me

keen beacon Apr 14, 2025, 3:35 PM

#

balmy mist i hope they are free still

it won't be

balmy mist Apr 14, 2025, 3:35 PM

#

lol

sonic tendon Apr 14, 2025, 3:35 PM

#

balmy mist i hope they are free still

probably not gonna last long once they publicly release it

keen beacon Apr 14, 2025, 3:35 PM

#

back up it just seems really slow

#

Optimus had zero rate limits too unlike quasar I think

#

when it first appeared it was lightning fast (as you'd expect from a mini variant) and now it's streaming at the same speed as gpt-4.5 does

sonic tendon Apr 14, 2025, 3:35 PM

#

odd - i'd be surprised if they had a demand spike at this particular time

keen beacon Apr 14, 2025, 3:36 PM

#

keen beacon when it first appeared it was lightning fast (as you'd expect from a mini varian...

Yeah no rate limit after 10 credits

balmy mist Apr 14, 2025, 3:36 PM

#

wow we took it forgranted

sonic tendon Apr 14, 2025, 3:36 PM

#

guessing it's some internal stuff related to the new release(s)

balmy mist Apr 14, 2025, 3:36 PM

#

wait who is excited for today?

keen beacon Apr 14, 2025, 3:36 PM

#

keen beacon Yeah no rate limit after 10 credits

Quasar had rate limiting I think after 10 credits but it was unlimited

balmy mist Apr 14, 2025, 3:36 PM

#

like im trying to get excited as you guys are for quasar no longer being free

keen beacon Apr 14, 2025, 3:37 PM

#

I'm only excited for o4 mini tbh lol

sonic tendon Apr 14, 2025, 3:37 PM

#

balmy mist like im trying to get excited as you guys are for quasar no longer being free

lol, that's a fair way to put it

sonic tendon Apr 14, 2025, 3:37 PM

#

keen beacon I'm only excited for o4 mini tbh lol

yeah

balmy mist Apr 14, 2025, 3:37 PM

#

yeah me too

sonic tendon Apr 14, 2025, 3:37 PM

#

i will basically never use full o3

balmy mist Apr 14, 2025, 3:37 PM

#

i also want o3

#

if they give us pro

#

but i doubt that

#

they dont love us like that

keen beacon Apr 14, 2025, 3:38 PM

#

They probably will jump to o4 full before o3 pro maybe

balmy mist Apr 14, 2025, 3:38 PM

#

damn

#

is there a way to do what openai does with o1 for pro, with open source models?

#

also @sonic tendon you livestreaming the event right?

sonic tendon Apr 14, 2025, 3:39 PM

#

balmy mist also <@609942266953465856> you livestreaming the event right?

I plan on it, yeah

#

i'll let you guys know if i can't make it

balmy mist Apr 14, 2025, 3:40 PM

#

is this dude reliable?
https://x.com/koltregaskes/status/1911805596732477592

Kol Tregaskes (@koltregaskes) on X

$20k/month doctorate-level OpenAI models incoming!

OpenAI's upcoming o3 and o4-mini AI models synthesise knowledge from biology, physics, and engineering to generate novel ideas.

These reasoning models aim to aid scientists and companies with complex challenges like nuclear

keen beacon Apr 14, 2025, 3:40 PM

#

uhh

balmy mist Apr 14, 2025, 3:40 PM

#

we should all put up money together and buy the 20k version for the community

#

i got $20 for it

thorny drum Apr 14, 2025, 3:40 PM

#

dyt their gonna announce full o4 benchmark scores

sonic tendon Apr 14, 2025, 3:40 PM

#

hmmmmm

keen beacon Apr 14, 2025, 3:41 PM

#

they'll do a preview of full o4 when they launch o4 mini

#

with benchmark scores

#

pretty likely

thorny drum Apr 14, 2025, 3:41 PM

#

pretty curious about that yea

keen beacon Apr 14, 2025, 3:41 PM

#

rumour has it o4 will move away from the 4o base

#

will probably use 4.1

sonic tendon Apr 14, 2025, 3:41 PM

#

feel like openai's gonna drop pretty quickly once people realize that they aren't releasing o3 yet, but i could be wrong

keen beacon Apr 14, 2025, 3:41 PM

#

so yeah will be interesting

sonic tendon Apr 14, 2025, 3:41 PM

#

keen beacon rumour has it o4 will move away from the 4o base

oh, interesting

thorny drum Apr 14, 2025, 3:42 PM

#

o4 pricing gonna be insanity lol

sonic tendon Apr 14, 2025, 3:42 PM

#

i was under the impression that 4.1 was mostly smaller models, but i dunno where i got that from

thorny drum Apr 14, 2025, 3:42 PM

#

$1/token type shi

keen beacon Apr 14, 2025, 3:42 PM

#

4.1 is the replacement for 4o

sonic tendon Apr 14, 2025, 3:42 PM

#

ah, right

keen beacon Apr 14, 2025, 3:42 PM

#

will likely be similar in size

#

It's the same size

#

4.1 mini replaces 4o mini as the small but relatively powerful model, and 4.1 nano is their "phone model" sort of like gemini nano is

thorny drum Apr 14, 2025, 3:43 PM

#

dyt 4.1 nano is gonna be the OS one?

keen beacon Apr 14, 2025, 3:43 PM

#

yeah

#

Oss one is not even trained yet lol

sonic tendon Apr 14, 2025, 3:43 PM

#

keen beacon Oss one is not even trained yet lol

wdym?

keen beacon Apr 14, 2025, 3:43 PM

#

keen beacon Oss one is not even trained yet lol

would like a source for this

balmy mist Apr 14, 2025, 3:44 PM

#

what is oss?

keen beacon Apr 14, 2025, 3:44 PM

#

Open Source Software

keen beacon Apr 14, 2025, 3:44 PM

#

keen beacon would like a source for this

Iirc they were still talking about how many params it was gonna be and they were gonna host a discussion about what it should be loll

#

I'm on my phone rn tho

#

oai overtook google

sonic tendon Apr 14, 2025, 3:46 PM

#

yeah people went crazy for openai pretty quickly once sam altman started tweeting

sonic tendon Apr 14, 2025, 3:46 PM

#

keen beacon oai overtook google

june? imo that's less surprising

keen beacon Apr 14, 2025, 3:46 PM

#

end of april still google

#

I would bet openai lol

#

On the April one maybe

#

woah

#

i just tested optimus alpha on a web design prompt

#

it did really well

#

better than i've ever seen it do

#

beat 2.5 pro and claude's 0-shot attempts imo

balmy mist Apr 14, 2025, 3:48 PM

#

keen beacon end of april still google

wait you can make mondye onthis?

keen beacon Apr 14, 2025, 3:48 PM

#

Idk probably not it's not a chatgpt 4o variant/human preference version

balmy mist Apr 14, 2025, 3:48 PM

#

like bettign on companies

#

brugg

#

someone send link please

sonic tendon Apr 14, 2025, 3:48 PM

#

keen beacon I would bet openai lol

i had 200 shares but sold at 20c lmaoo

#

i wonder what's going on with the spread here

torn mantle Apr 14, 2025, 3:48 PM

#

keen beacon oai overtook google

best time to bet on google

balmy mist Apr 14, 2025, 3:48 PM

#

😦

sonic tendon Apr 14, 2025, 3:48 PM

#

there is kalshi

balmy mist Apr 14, 2025, 3:48 PM

#

i hate usa

torn mantle Apr 14, 2025, 3:49 PM

#

balmy mist is this dude reliable? https://x.com/koltregaskes/status/1911805596732477592

yea

sonic tendon Apr 14, 2025, 3:49 PM

#

still, seems unusual to me

torn mantle Apr 14, 2025, 3:49 PM

#

balmy mist is this dude reliable? https://x.com/koltregaskes/status/1911805596732477592

i saw that too

sonic tendon Apr 14, 2025, 3:49 PM

#

oh, but 7k volume

balmy mist Apr 14, 2025, 3:49 PM

#

torn mantle yea

so you tryna put down for communicty membership?

torn mantle Apr 14, 2025, 3:49 PM

#

charging 20k is criminal

#

it just means they are brute forcing and not innovating

sonic tendon Apr 14, 2025, 3:49 PM

#

torn mantle i saw that too

yeahhhh i feel like this is never gonna happen in any meaningful way

keen beacon Apr 14, 2025, 3:50 PM

#

keen beacon beat 2.5 pro and claude's 0-shot attempts imo

sonic tendon Apr 14, 2025, 3:50 PM

#

i mean, not until we have one or two more paradigm shifts

torn mantle Apr 14, 2025, 3:50 PM

#

im more interested in google solutions

#

they have like a scientist model

#

which seems promising tbh

#

https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/

Accelerating scientific breakthroughs with an AI co-scientist

#

this is the one

sonic tendon Apr 14, 2025, 3:50 PM

#

doubt it, but my impression is that o3 has a decent shot (based on @keen beacon 's testing)

keen beacon Apr 14, 2025, 3:51 PM

#

4.1 doesn't have the vibes that got chatgpt 4o to the top so yeah i don't think it'll be #1

#

o3 will be #1 with style control but without im not sure

#

I still think that private model is o4 mini lol

#

eh idk

#

Hopefully I'm right

torn mantle Apr 14, 2025, 3:51 PM

#

keen beacon I still think that private model is o4 mini lol

which one?

#

quasar? optimus?

#

both are mid

keen beacon Apr 14, 2025, 3:51 PM

#

None of them

sonic tendon Apr 14, 2025, 3:52 PM

#

keen beacon 4.1 doesn't have the vibes that got chatgpt 4o to the top so yeah i don't think ...

maybe that's something that could possibly happen w/ a chat finetune? would likely take a while though

keen beacon Apr 14, 2025, 3:52 PM

#

It's another model that @keen beacon has

sonic tendon Apr 14, 2025, 3:52 PM

#

i agree, tho, at least in the near future

drifting thorn Apr 14, 2025, 3:52 PM

#

Okay the knowledge base in Cherry Studio worked better with OpenRouter API Gemini 2.5 Pro

sonic tendon Apr 14, 2025, 3:52 PM

#

keen beacon It's another model that <@456226577798135808> has

wait, whar?

keen beacon Apr 14, 2025, 3:52 PM

#

sonic tendon maybe that's something that could possibly happen w/ a chat finetune? would like...

that's a point.. i wonder if there will be a gpt-4.1 and a chatgpt-4.1

#

like they have 4o and chatgpt-4o

sonic tendon Apr 14, 2025, 3:53 PM

#

keen beacon that's a point.. i wonder if there will be a gpt-4.1 and a chatgpt-4.1

yeah, that's what i was thinking

keen beacon Apr 14, 2025, 3:53 PM

#

keen beacon that's a point.. i wonder if there will be a gpt-4.1 and a chatgpt-4.1

Bro chatgpt 4o latest is already on 4.1 base

#

doubt

#

chatgpt 4o is still 4o

sonic tendon Apr 14, 2025, 3:53 PM

#

keen beacon I still think that private model is o4 mini lol

what private model are you guys talking about

keen beacon Apr 14, 2025, 3:53 PM

#

keen beacon doubt

??? Knowledge cut off lol

#

that doesn't mean it's 4.1

sonic tendon Apr 14, 2025, 3:54 PM

#

keen beacon ??? Knowledge cut off lol

i saw "arena-chatgpt-4o-2025-something-something" on lmarena pretty recently - i think they're still doing extended pretraining

#

doubt it's 4.1

#

plus, doesn't the website say that it's 4o? would be weird to lie about that

drifting thorn Apr 14, 2025, 3:54 PM

#

@keen beacon What is changed in 0326 so that it performed better?

keen beacon Apr 14, 2025, 3:54 PM

#

keen beacon that doesn't mean it's 4.1

Same benchmarks, cut off (same differing pretraining), the verge directly mentioned it's a revamped 4o lolololl

keen beacon Apr 14, 2025, 3:54 PM

#

sonic tendon i saw "arena-chatgpt-4o-2025-something-something" on lmarena pretty recently - i...

It's the same as the last chatgpt 4o latest model afaik just with a weird name

keen beacon Apr 14, 2025, 3:54 PM

#

sonic tendon i saw "arena-chatgpt-4o-2025-something-something" on lmarena pretty recently - i...

Nope.

#

They cpt was done for ages

drifting thorn Apr 14, 2025, 3:55 PM

#

What is changed in 0326 so that it performed better?

sonic tendon Apr 14, 2025, 3:55 PM

#

oh hmm

keen beacon Apr 14, 2025, 3:55 PM

#

drifting thorn What is changed in 0326 so that it performed better?

It wasn't just 0326

#

Post December chatgpt 4o latest was on a cptd base model

sonic tendon Apr 14, 2025, 3:55 PM

#

you mean, the biggest 4.1 is just gonna be a new 4o? or am i misunderstanding

keen beacon Apr 14, 2025, 3:55 PM

#

Yes

keen beacon Apr 14, 2025, 3:55 PM

#

drifting thorn <@456226577798135808> What is changed in 0326 so that it performed better?

they did further training on R1/possibly R2 reasoning traces

#

Bro

buoyant plaza Apr 14, 2025, 3:56 PM

#

shadebrook is a banger model for creative

keen beacon Apr 14, 2025, 3:56 PM

#

shadebrook is pretty bad in my experience

drifting thorn Apr 14, 2025, 3:56 PM

#

Oops

keen beacon Apr 14, 2025, 3:56 PM

#

but tbf ive only really used it for code

drifting thorn Apr 14, 2025, 3:56 PM

#

buoyant plaza shadebrook is a banger model for creative

Really?

sonic tendon Apr 14, 2025, 3:56 PM

#

torn mantle https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co...

+1 for at least being way less hype-dense and almost definitely overpromising

buoyant plaza Apr 14, 2025, 3:57 PM

#

lol i just use it to turn random ideas into trap songs with suno

keen beacon Apr 14, 2025, 3:57 PM

#

slight aside but if sam isn't at the livestream the model sucks confirmed

drifting thorn Apr 14, 2025, 3:57 PM

#

My experience with OpenRouter API Gemini 2.5 Pro is jumping in between 400, 502, furiously short content and the content I want

sonic tendon Apr 14, 2025, 3:58 PM

#

keen beacon slight aside but if sam isn't at the livestream the model sucks confirmed

good insight lol

drifting thorn Apr 14, 2025, 3:59 PM

#

HOW MANY 502 IS OPENROUTER GONNA GIVE ME!!!!!!

keen beacon Apr 14, 2025, 3:59 PM

#

normally they put the stream up and ready to go on youtube 20-30 mins before it's due to start

#

and they always put a description with

#

(a) what they're announcing (normally) and (b) who's there

sonic tendon Apr 14, 2025, 3:59 PM

#

keen beacon they did further training on R1/possibly R2 reasoning traces

oh yeah, i could see that

balmy mist Apr 14, 2025, 3:59 PM

#

Bruhh he better be there

keen beacon Apr 14, 2025, 3:59 PM

#

If vuyp is saying o4 full will have the new base and o3 is not, how tf does 4o (which is o3s base model) have the new knowledge in the cptd base. Tbh. You are not updating a cut off with a simple finetune

balmy mist Apr 14, 2025, 3:59 PM

#

It better have kept him up last night

#

Idk if that a good thing or not tho

keen beacon Apr 14, 2025, 4:00 PM

#

given memory was keeping him up i don't think it's a good indicator anymore

#

So it's either 4.1 (which meant they retrained o3) or o4 mini based on (4.1 mini)

#

if it is any use, this private model has been on the platform i am on for roughly 1 and a half months

drifting thorn Apr 14, 2025, 4:01 PM

#

I’m afraid on the hallucination on new models are gonna be crazy

keen beacon Apr 14, 2025, 4:02 PM

#

4.1 mini was trained fairly recently I think. It was pretrained from scratch. We have had zero checkpoints of it unlike chatgpt 4o latest which was a cpt of 4o. Surely openai would've liked to compete with Gemini 2 flash

drifting thorn Apr 14, 2025, 4:02 PM

#

Original Deepseek V3 has 3.1% hallucination rate, R1 has 14.3%

balmy mist Apr 14, 2025, 4:02 PM

#

https://x.com/btibor91/status/1911812243525873910

Tibor Blaho (@btibor91) on X

OpenAI livestream "New models in the API"

"Join Michelle Pokrass, Ishaan Singal, and Kevin Weil as they introduce and demo our latest API models."

#

gg

keen beacon Apr 14, 2025, 4:02 PM

#

there it is

#

ggs

balmy mist Apr 14, 2025, 4:02 PM

#

imma skip today

keen beacon Apr 14, 2025, 4:02 PM

#

I was right lol

drifting thorn Apr 14, 2025, 4:02 PM

#

When R1 is used to train Deepseek V3 0324, its hallucination rate increases to 8.6%

sonic tendon Apr 14, 2025, 4:02 PM

#

keen beacon given memory was keeping him up i don't think it's a good indicator anymore

maybe he has a sleep-wake disorder

#

?

balmy mist Apr 14, 2025, 4:03 PM

#

sama's plan smh

keen beacon Apr 14, 2025, 4:03 PM

#

4.1 to start the hype then launch the reasoning models later in the week

balmy mist Apr 14, 2025, 4:03 PM

#

he stay getting me

sonic tendon Apr 14, 2025, 4:03 PM

#

keen beacon if it is any use, this private model has been on the platform i am on for roughl...

wait, what's the second private model you're referring to?

balmy mist Apr 14, 2025, 4:03 PM

#

im happy they told us now tho

keen beacon Apr 14, 2025, 4:03 PM

#

sonic tendon maybe he has a sleep-wake disorder

relatable 🤝

sonic tendon Apr 14, 2025, 4:04 PM

#

keen beacon relatable 🤝

ditto

keen beacon Apr 14, 2025, 4:04 PM

#

sonic tendon wait, what's the second private model you're referring to?

no no i'm talking about the private model i've always had access to

#

Yes

drifting thorn Apr 14, 2025, 4:04 PM

#

Finally OpenRouter is outputting my novel arghhhhhh

sonic tendon Apr 14, 2025, 4:05 PM

#

related lol

sonic tendon Apr 14, 2025, 4:05 PM

#

keen beacon no no i'm talking about the private model i've always had access to

oh, o3-med?

keen beacon Apr 14, 2025, 4:05 PM

#

It's already live I believe

#

I just checked

keen beacon Apr 14, 2025, 4:05 PM

#

sonic tendon oh, o3-med?

possibly

keen beacon Apr 14, 2025, 4:05 PM

#

keen beacon It's already live I believe

wat

#

proof

keen beacon Apr 14, 2025, 4:06 PM

#

sonic tendon related lol

let me find one of my screenshots

#

Yup it's under 4o now

#

?

#

I will get on my computer and show u what I mean

keen beacon Apr 14, 2025, 4:06 PM

#

sonic tendon related lol

not entirely sure what happened with this one

sonic tendon Apr 14, 2025, 4:07 PM

#

if you don't mind me polymarketposting again: people are saying that this guy might be an openAI employee/insider

sonic tendon Apr 14, 2025, 4:07 PM

#

keen beacon not entirely sure what happened with this one

whjat happened 😭 😭 😭

sonic tendon Apr 14, 2025, 4:08 PM

#

sonic tendon if you don't mind me polymarketposting again: people are saying that this guy mi...

just showed up with several grand and started trading

sonic tendon Apr 14, 2025, 4:09 PM

#

keen beacon not entirely sure what happened with this one

but damn, that sucks

#

are most of your nights like that?

keen beacon Apr 14, 2025, 4:09 PM

#

sonic tendon if you don't mind me polymarketposting again: people are saying that this guy mi...

lmao

keen beacon Apr 14, 2025, 4:09 PM

#

sonic tendon are most of your nights like that?

no i promise 😭

sonic tendon Apr 14, 2025, 4:09 PM

#

keen beacon no i promise 😭

i am sort of curious what your typical sleep cycle looks like now

drifting thorn Apr 14, 2025, 4:10 PM

#

balmy mist Apr 14, 2025, 4:10 PM

#

sonic tendon if you don't mind me polymarketposting again: people are saying that this guy mi...

i am still shocked you can bet on stuff like that, its like we are in movie or video game at this point lol

drifting thorn Apr 14, 2025, 4:10 PM

#

Gotta sleep rn

sonic tendon Apr 14, 2025, 4:10 PM

#

drifting thorn Gotta sleep rn

gn!

#

cya later

sonic tendon Apr 14, 2025, 4:10 PM

#

balmy mist i am still shocked you can bet on stuff like that, its like we are in movie or v...

yeah, prediction markets are fun

keen beacon Apr 14, 2025, 4:11 PM

#

drifting thorn

yeah our pfps go hard i know sunglas

keen beacon Apr 14, 2025, 4:11 PM

#

drifting thorn Gotta sleep rn

goodnight

#

this is one of my questions that only quasar and anonymous chatbot got, and i use to differentiate between 4.1 and chatgpt 4o latest:
(yes i know its not a one off, this was extremely consistent - check lmsys chatgpt 4o latest on 0 temp to check yourself)

#

first chatgpt one was taken days ago

#

the last one was just now

sonic tendon Apr 14, 2025, 4:11 PM

#

sonic tendon i am sort of curious what your typical sleep cycle looks like now

my circadian rhythm is normal most of the time, and then i have weeklong periods where i just get up at 2-4 in the morning almost every night with a ton of energy

keen beacon Apr 14, 2025, 4:12 PM

#

i told you guys

#

omfg

#

no one listens

#

im not saying random shi1t

#

if they're going through the whole thing of calling it gpt-4.1 why would they have it under 4o

#

bro

sonic tendon Apr 14, 2025, 4:12 PM

#

sonic tendon my circadian rhythm is normal most of the time, and then i have weeklong periods...

in an ideal world, i could probably integrate it into my schedule and just get up later, but atm i sorta have to force myself to go back to sleep as quickly as possible so i'm not a zombie later in the afternoon

keen beacon Apr 14, 2025, 4:12 PM

#

its literally an updated 4o

#

ok dawg

#

but you didn't answer my question

#

because it will be confusing when o4 mini is out

balmy mist Apr 14, 2025, 4:13 PM

#

lol

keen beacon Apr 14, 2025, 4:13 PM

#

u want 4o and o4 mini in the model selector??

#

it's even more confusing if they call a model gpt-4.1 in the api and 4o in the product

balmy mist Apr 14, 2025, 4:13 PM

#

i hate open ai

keen beacon Apr 14, 2025, 4:13 PM

#

again

keen beacon Apr 14, 2025, 4:13 PM

#

keen beacon it's even more confusing if they call a model gpt-4.1 in the api and 4o in the p...

dw they will change the name

#

they just need to reset with their naming scheme

#

its just live under 4o rn

#

it sucks so bad

balmy mist Apr 14, 2025, 4:13 PM

#

they should just use names at this point like quasar

sonic tendon Apr 14, 2025, 4:14 PM

#

they should just start stealing product names from other companies

keen beacon Apr 14, 2025, 4:14 PM

#

everyone has bad naming tbh lol

sonic tendon Apr 14, 2025, 4:14 PM

#

"Introducing our newest hybrid model, Gmail"

balmy mist Apr 14, 2025, 4:14 PM

#

keen beacon its just live under 4o rn

wait you are saying that 4.1 is currently on 4o right now?

#

bruhh

keen beacon Apr 14, 2025, 4:14 PM

#

lmao sam isn't there

#

confirmed bad

#

check to see if its rolled out to you by asking the question

sonic tendon Apr 14, 2025, 4:15 PM

#

i heard you, i was just surprised

sonic tendon Apr 14, 2025, 4:15 PM

#

keen beacon lmao sam isn't there

rip

keen beacon Apr 14, 2025, 4:15 PM

#

who won the 2024 Solomon Islands general election

#

only quasar/anonymous chatbot/4.1 gets it

#

they havent changed the name on it yet its still labeled 4o

balmy mist Apr 14, 2025, 4:16 PM

#

keen beacon `who won the 2024 Solomon Islands general election`

it just searches the web tho

keen beacon Apr 14, 2025, 4:16 PM

#

balmy mist it just searches the web tho

disable web search

#

bruh 🤣

balmy mist Apr 14, 2025, 4:17 PM

#

i did lol

#

it still searches the web

keen beacon Apr 14, 2025, 4:17 PM

#

start a new chat

balmy mist Apr 14, 2025, 4:17 PM

#

Screenshot_2025-04-14_at_12.17.18_PM.png

#

okay

keen beacon Apr 14, 2025, 4:17 PM

#

if this is 4.1 under 4o on chatgpt it seems worse than optimus

sonic tendon Apr 14, 2025, 4:17 PM

#

sonic tendon if you don't mind me polymarketposting again: people are saying that this guy mi...

looking at their trades, they seem pretty incompetent though lol

keen beacon Apr 14, 2025, 4:17 PM

#

balmy mist

#

it did a lot worse on my frontend task

#

@keen beacon optimus can outperform 4.1

#

well then what on earth is optimus

balmy mist Apr 14, 2025, 4:18 PM

#

keen beacon

ahh i see thanks, i ended up doing this lol

#

Screenshot_2025-04-14_at_12.18.21_PM.png

keen beacon Apr 14, 2025, 4:18 PM

#

keen beacon well then what on earth is optimus

4.1 mini i think, it has slightly lower benchmarks

sonic tendon Apr 14, 2025, 4:18 PM

#

keen beacon well then what on earth is optimus

lol

keen beacon Apr 14, 2025, 4:18 PM

#

balmy mist

yes its not rolled out to you

keen beacon Apr 14, 2025, 4:19 PM

#

keen beacon well then what on earth is optimus

it has lower aider scores and the gpqa diamond i meausred is much lower than 4o

balmy mist Apr 14, 2025, 4:19 PM

#

wild you have pro?

keen beacon Apr 14, 2025, 4:19 PM

#

na

balmy mist Apr 14, 2025, 4:19 PM

#

you just valid with open ai?

keen beacon Apr 14, 2025, 4:19 PM

#

🤣

#

u are getting unlucky with roll outs man lol

#

no veo 2 etc

balmy mist Apr 14, 2025, 4:20 PM

#

ikr

#

bruhh

sage raptor Apr 14, 2025, 4:20 PM

#

lol its 4.1 today

balmy mist Apr 14, 2025, 4:20 PM

#

i hate this dude lol:
https://x.com/iruletheworldmo/status/1911816451880517802

🍓🍓🍓 (@iruletheworldmo) on X

trust me. tune in.

it’s a big one.

#

lmaoooo

#

imma retweet this

keen beacon Apr 14, 2025, 4:20 PM

#

if you used quasar and optimus prime u already used 4.1/4.1 mini for free 🤣

balmy mist Apr 14, 2025, 4:21 PM

#

gotta get everyone else excited so we can get teased together

#

wild what if its an even better model

#

that is the ebst coder alive?

#

best*

#

what if strawberry man is right

#

i believe in mr.strawberry againm trust the process

keen beacon Apr 14, 2025, 4:23 PM

#

i doubt thats being released today xd

#

strawberry man is just a troll

keen beacon Apr 14, 2025, 4:23 PM

#

balmy mist what if strawberry man is right

argh

balmy mist Apr 14, 2025, 4:23 PM

#

lmaoo

keen beacon Apr 14, 2025, 4:23 PM

#

the grifter himself

balmy mist Apr 14, 2025, 4:24 PM

#

dude got like 72k follows on some bs

#

had the whole space by their balls

#

that was actually funny times

#

anyone was on thsoe spaces?

#

that was before o1

#

seems like forever ago

#

wow

#

imma become a grifter now

#

yeah i could nt use it all morning

#

but yo he actually might have serious connects to open ai, maybe he somebody kid that works their lol

#

https://x.com/iruletheworldmo/status/1870194357732814956

🍓🍓🍓 (@iruletheworldmo) on X

we will have o4 in april

with similar jumps in intelligence.

#

this was wild predictions

keen beacon Apr 14, 2025, 4:26 PM

#

balmy mist https://x.com/iruletheworldmo/status/1870194357732814956

he just got lucky

keen beacon Apr 14, 2025, 4:27 PM

#

balmy mist but yo he actually might have serious connects to open ai, maybe he somebody kid...

he was doxxed i think, he isnt

keen fulcrum Apr 14, 2025, 4:27 PM

#

Expected as they launch new servers
increasing their capacity for upcoming models

balmy mist Apr 14, 2025, 4:27 PM

#

like people know who he is

#

they exposed him

thorny drum Apr 14, 2025, 4:27 PM

#

its just slow i think

Screenshot_2025-04-14_at_12.27.31_PM.png

keen beacon Apr 14, 2025, 4:27 PM

#

weirdos

balmy mist Apr 14, 2025, 4:27 PM

#

why wouldnt they, he was trolling us hard

#

had me missing work for his nonsense

#

why would you troll anyone?

keen fulcrum Apr 14, 2025, 4:28 PM

#

People who dox will get rate limited

balmy mist Apr 14, 2025, 4:28 PM

#

lol

keen beacon Apr 14, 2025, 4:28 PM

#

trolling and doxxing are way different in terms of severity i think

balmy mist Apr 14, 2025, 4:28 PM

#

keen beacon trolling and doxxing are way different in terms of severity i think

im just saying anything is possible

#

so becareful what you do online

#

cause you can be doxxed easily

#

people kill people for less in our world, so im not shocked that anyone would doxx or do anything online

keen beacon Apr 14, 2025, 4:29 PM

#

oh hello 👀

thorny drum Apr 14, 2025, 4:30 PM

#

4o = 4.1 now?

balmy mist Apr 14, 2025, 4:30 PM

#

strawberry man got me excited

keen beacon Apr 14, 2025, 4:30 PM

#

thorny drum 4o = 4.1 now?

its live under 4o on chatgpt if its rolled out to oyu

#

its supposed to be named 4.1 (which it will be renamed soon enough i think), but its just an updated 4o

balmy mist Apr 14, 2025, 4:31 PM

#

lol

#

i really hope r2 comes out this week

balmy mist Apr 14, 2025, 4:32 PM

#

keen beacon he just got lucky

u really think he got lucky tho?

thorny drum Apr 14, 2025, 4:32 PM

#

wait wdyt o3 pricing will be

balmy mist Apr 14, 2025, 4:32 PM

#

that is crazy accurate tho

#

like what are the odds

thorny drum Apr 14, 2025, 4:32 PM

#

like o3 low was twice the cost of o1 pro

balmy mist Apr 14, 2025, 4:32 PM

#

thorny drum like o3 low was twice the cost of o1 pro

20k

keen beacon Apr 14, 2025, 4:32 PM

#

lmao i got it again

#

is it not rolled out to you yet? or is this different

#

Shrug i just keep getting parallel gens for feedback on 4o

balmy mist Apr 14, 2025, 4:33 PM

#

Screenshot_2025-04-14_at_12.33.38_PM.png

keen beacon Apr 14, 2025, 4:33 PM

#

hallucination

#

never ask a model what it is

balmy mist Apr 14, 2025, 4:34 PM

#

didnt we do that with quasar tho?

keen beacon Apr 14, 2025, 4:34 PM

#

not rly

balmy mist Apr 14, 2025, 4:35 PM

#

and hallucinations are not a bad thing, they are the key to dreaming and solving things we dont know, just need controlled hallucinations

keen beacon Apr 14, 2025, 4:35 PM

#

i agree lol

balmy mist Apr 14, 2025, 4:35 PM

#

there is always truth to an hallucination

#

like it stems from something

keen beacon Apr 14, 2025, 4:35 PM

#

yes

#

i would go on about it but ya im not gonna rant about it

balmy mist Apr 14, 2025, 4:36 PM

#

lol

keen beacon Apr 14, 2025, 4:37 PM

#

It's openai vs google

#

Bruh

novel flame Apr 14, 2025, 4:38 PM

#

Nvm I get it now.

keen beacon Apr 14, 2025, 4:38 PM

#

lmao so they really do want to charge 20k

sonic tendon Apr 14, 2025, 4:39 PM

#

@balmy mist train got delayed, so I'll probably be a bit late

sonic tendon Apr 14, 2025, 4:39 PM

#

keen beacon never ask a model what it is

I've had Google models claim to be Claude, and DS claim to be OpenAI

keen beacon Apr 14, 2025, 4:39 PM

#

By the time they have a product like that they won't sell it

keen fulcrum Apr 14, 2025, 4:40 PM

#

When will main o4 release

keen fulcrum Apr 14, 2025, 4:41 PM

#

keen beacon lmao so they really do want to charge 20k

Depending on competition

keen beacon Apr 14, 2025, 4:41 PM

#

next month probs

#

actually no

#

lmao sorry i misread

sonic tendon Apr 14, 2025, 4:41 PM

#

keen fulcrum When will main o4 release

as a rough guess, maybe the same delay between o3-mini and o3

keen beacon Apr 14, 2025, 4:41 PM

#

probably like june or so

#

maybe july

novel flame Apr 14, 2025, 4:42 PM

#

The 20k price tag snells like a classic con: “Buy this money printing machine, only $20k!! With it, you can print as much money as you like!”

If they had an AI which could create arbitrary novel discoveries warranting a price tag of $20k, they would keep it to themselves and print money themselves.

keen fulcrum Apr 14, 2025, 4:42 PM

#

novel flame The 20k price tag snells like a classic con: “Buy this money printing machine, o...

If it makes me money I will pay it

sonic tendon Apr 14, 2025, 4:42 PM

#

novel flame The 20k price tag snells like a classic con: “Buy this money printing machine, o...

yeah this seems like hype for investors more than anything else

balmy mist Apr 14, 2025, 4:43 PM

#

keen beacon next month probs

no way, i dont think they gonna release main o4, i thought they were going straight to gpt5 next?

keen fulcrum Apr 14, 2025, 4:43 PM

#

Will hurt my portfolio if it tends to lose and not worth the cost

sonic tendon Apr 14, 2025, 4:43 PM

#

keen fulcrum Will hurt my portfolio if it tends to lose and not worth the cost

portfolio?

#

in what context

balmy mist Apr 14, 2025, 4:43 PM

#

novel flame The 20k price tag snells like a classic con: “Buy this money printing machine, o...

how do you know that would sell tho?

novel flame Apr 14, 2025, 4:43 PM

#

keen fulcrum If it makes me money I will pay it

Except it’s payment up front. That’s how the con works

balmy mist Apr 14, 2025, 4:43 PM

#

we woul dbe oversaturated with stuff liek that

keen fulcrum Apr 14, 2025, 4:44 PM

#

sonic tendon portfolio?

20k is one chinese car

keen beacon Apr 14, 2025, 4:44 PM

#

balmy mist no way, i dont think they gonna release main o4, i thought they were going strai...

#general message

novel flame Apr 14, 2025, 4:47 PM

#

keen fulcrum Depending on competition

This. I would love to see this $20k product drop only to have R2 launch a week later with near-equal performance, forcing OpenAI to drop the price 100x

ember rapids Apr 14, 2025, 4:47 PM

#

I hope they preview the benchmarks for full o4

keen beacon Apr 14, 2025, 4:48 PM

#

they will in the o3/o4 mini livestream

torn mantle Apr 14, 2025, 4:50 PM

#

recently im noticing some typo mistakes from gemini 2.5 pro

#

which is kinda weird

#

its the 3rd time already

balmy mist Apr 14, 2025, 4:53 PM

#

sonic tendon <@367710025994731520> train got delayed, so I'll probably be a bit late

let me know when you get here, ill do it until you get here, im tryna work at the same time lol

#

https://x.com/OpenAIDevs/status/1911824728211272169

OpenAI Developers (@OpenAIDevs) on X

quasar
/ˈkweɪ.zɑːr/ noun

A very energetic and distant active galactic nucleus, powered by a supermassive black hole that emits exceptionally large amounts of energy across the electromagnetic spectrum. Short for quasi-stellar radio source.

#

im kinda sad that we are so early to these models

#

we are not suprised anymore

#

i gotta take a vaca for a month or so and comeback to agi lmao

torn mantle Apr 14, 2025, 4:56 PM

#

marketing hype geniuses

#

the model is bad

balmy mist Apr 14, 2025, 4:56 PM

#

lmaooo

#

lets pretend its actually good

#

it is fast tho

torn mantle Apr 14, 2025, 4:56 PM

#

im just here for the drama tbh

balmy mist Apr 14, 2025, 4:57 PM

#

i have a feeling they are going to do something big like strawberry man said

torn mantle Apr 14, 2025, 4:57 PM

#

hopefully google will pull out a battle model release

torn mantle Apr 14, 2025, 4:57 PM

#

balmy mist i have a feeling they are going to do something big like strawberry man said

who

#

that guy isnt reliable

#

never was

balmy mist Apr 14, 2025, 4:57 PM

#

lmaoooo

torn mantle Apr 14, 2025, 4:57 PM

#

hes a grifter

balmy mist Apr 14, 2025, 4:57 PM

#

he did predict o4 in april tho

torn mantle Apr 14, 2025, 4:57 PM

#

an engagement farmer

#

its kinda obvious tbh

balmy mist Apr 14, 2025, 4:57 PM

#

okay when is o5 coming?

torn mantle Apr 14, 2025, 4:58 PM

#

after o4

keen beacon Apr 14, 2025, 4:58 PM

#

lmao conveniently i now have to go 🙄

balmy mist Apr 14, 2025, 4:58 PM

#

lmaooooooooo

keen beacon Apr 14, 2025, 4:58 PM

#

cya in a bit gang

torn mantle Apr 14, 2025, 4:58 PM

#

you know why its obvious?

#

because all other labs are catching up

#

and openai cant just rely on o3 series

keen fulcrum Apr 14, 2025, 4:58 PM

#

Grok 4 will be great

torn mantle Apr 14, 2025, 4:58 PM

#

they already have it implemented on deep research

#

grok 4

#

will be sh1t

#

dont get me started

#

omg

#

just the thought of using grok 3 again is making me so mad

balmy mist Apr 14, 2025, 4:59 PM

#

i mean there is a market for predicting launches

torn mantle Apr 14, 2025, 4:59 PM

#

wasted billions on nothing

keen fulcrum Apr 14, 2025, 4:59 PM

#

Grok 3 reasoning is the best

balmy mist Apr 14, 2025, 4:59 PM

#

if you can get the month it comes out tell me

#

i wanna make money

torn mantle Apr 14, 2025, 4:59 PM

#

xd

#

you want my prediction

#

ok let me think

balmy mist Apr 14, 2025, 4:59 PM

#

yeah, i got my notes ready

#

omgg

#

it started

torn mantle Apr 14, 2025, 5:00 PM

#

based on leaks we will definitely have o3-full this month ( thats for sure xd )

balmy mist Apr 14, 2025, 5:00 PM

#

what if sama pulls up?

#

last minute

torn mantle Apr 14, 2025, 5:00 PM

#

balmy mist what if sama pulls up?

wont matter tbh

balmy mist Apr 14, 2025, 5:00 PM

#

wait

torn mantle Apr 14, 2025, 5:00 PM

#

im interested in benchmarks tho

balmy mist Apr 14, 2025, 5:00 PM

#

if it truly is the smallest ever that is good

#

1 mill tokens

#

wow

#

omgg they cooking

#

jk lol

#

but i am curious of the size of nano

#

bruhh

torn mantle Apr 14, 2025, 5:01 PM

#

so not much MMLU diff

keen fulcrum Apr 14, 2025, 5:01 PM

#

torn mantle Apr 14, 2025, 5:02 PM

#

between 4o and 4.1

balmy mist Apr 14, 2025, 5:02 PM

#

im turning this crap off

keen fulcrum Apr 14, 2025, 5:02 PM

#

Latency

torn mantle Apr 14, 2025, 5:02 PM

#

by latency

#

i see

#

so its MMLU/latency

#

thats a weird ratio

balmy mist Apr 14, 2025, 5:02 PM

#

NA for nano?

#

wtf

keen fulcrum Apr 14, 2025, 5:03 PM

#

torn mantle Apr 14, 2025, 5:04 PM

#

they are talking about pricing a lot

balmy mist Apr 14, 2025, 5:04 PM

#

nano better free

#

omgg nightwhisper is 4.1

#

yooooo

keen beacon Apr 14, 2025, 5:04 PM

#

Omg

balmy mist Apr 14, 2025, 5:04 PM

#

sama's plan

#

lmaooo

#

jk

torn mantle Apr 14, 2025, 5:05 PM

#

nah

#

stop it

#

nah

#

its not

keen beacon Apr 14, 2025, 5:05 PM

#

Lmao

torn mantle Apr 14, 2025, 5:06 PM

#

im trying that prompt on gemini 2.5 pro

balmy mist Apr 14, 2025, 5:07 PM

#

i actually like nano

#

i wonder how big it is, but they are prob not gonna tell us

keen fulcrum Apr 14, 2025, 5:08 PM

#

balmy mist Apr 14, 2025, 5:09 PM

#

lmaoo the accuracy

keen beacon Apr 14, 2025, 5:10 PM

#

Make it look better lol

#

If they compared to chatgpt 4o latest it would be the same ahahaha

keen fulcrum Apr 14, 2025, 5:10 PM

#

Multimodal

balmy mist Apr 14, 2025, 5:10 PM

#

mini was quasar?

#

or optimus?

keen beacon Apr 14, 2025, 5:10 PM

#

Optimus

balmy mist Apr 14, 2025, 5:10 PM

#

hmm

#

wild do you know how big quasar is?

#

like in parameters?

keen beacon Apr 14, 2025, 5:11 PM

#

Should be same as 4o so 200b

balmy mist Apr 14, 2025, 5:11 PM

#

lol

oblique flint Apr 14, 2025, 5:11 PM

#

4.1 looks like quite a solid model already. Imagine if they add reasoning on top of it

brittle tiger Apr 14, 2025, 5:12 PM

#

A little sus to claim sota on a benchmark when you're barely beating 1.5 Pro and 2.5 hasnt been tested

balmy mist Apr 14, 2025, 5:12 PM

#

wait so we did not get to test out 4.1 right? @keen beacon

#

only the mini and nano

keen beacon Apr 14, 2025, 5:12 PM

#

4.1 was quasar

balmy mist Apr 14, 2025, 5:12 PM

#

ohhhh

keen beacon Apr 14, 2025, 5:12 PM

#

4.1 mini is optimus

balmy mist Apr 14, 2025, 5:13 PM

#

that makes sense, i udnerstand now

golden ocean Apr 14, 2025, 5:13 PM

#

whats the point of 4.1 if theres 4.5

balmy mist Apr 14, 2025, 5:13 PM

#

okay so what do you think the size is of nano?

#

like 7b

keen beacon Apr 14, 2025, 5:13 PM

#

Maybe the same size as 4o mini

#

Tbh I have no idea for mini and nano

balmy mist Apr 14, 2025, 5:13 PM

#

if it was really low they would showcase that tbh

#

please be free

keen beacon Apr 14, 2025, 5:14 PM

#

balmy mist if it was really low they would showcase that tbh

Yes if it was 7b they'd advertise

balmy mist Apr 14, 2025, 5:14 PM

#

nice

keen beacon Apr 14, 2025, 5:14 PM

#

4.1 mini is a really good deal

torn mantle Apr 14, 2025, 5:14 PM

#

awkward silence

keen beacon Apr 14, 2025, 5:15 PM

#

It can sometimes beat 4.1

balmy mist Apr 14, 2025, 5:15 PM

#

they were so scared lol

keen beacon Apr 14, 2025, 5:15 PM

#

Apparently even if the benchmarks are lower

balmy mist Apr 14, 2025, 5:15 PM

#

yeah optimus is a good model

oblique flint Apr 14, 2025, 5:17 PM

#

nano is exactly the same price as 2.0 flash hmm

balmy mist Apr 14, 2025, 5:18 PM

#

mini is a no brainer

torn mantle Apr 14, 2025, 5:18 PM

#

google will drop their new model probably

balmy mist Apr 14, 2025, 5:18 PM

#

its the highlight of today

#

40% faster

keen beacon Apr 14, 2025, 5:19 PM

#

im back

#

hello

sonic tendon Apr 14, 2025, 5:19 PM

#

keen beacon lmao conveniently i now have to go 🙄

same

keen fulcrum Apr 14, 2025, 5:19 PM

#

keen beacon Apr 14, 2025, 5:19 PM

#

what happened

sonic tendon Apr 14, 2025, 5:19 PM

#

sorry ;-;

#

ope

keen beacon Apr 14, 2025, 5:19 PM

#

sonic tendon same

just as i get back

#

wow

#

i see how it is

thorny drum Apr 14, 2025, 5:19 PM

#

haha 4.5 gone

#

good month

balmy mist Apr 14, 2025, 5:19 PM

#

lmaooooo

keen beacon Apr 14, 2025, 5:19 PM

#

Did they lol

torn mantle Apr 14, 2025, 5:19 PM

#

xddd

#

XDDDDDDDDDDD

keen beacon Apr 14, 2025, 5:20 PM

#

I'm not watching

keen fulcrum Apr 14, 2025, 5:20 PM

#

4.5 will get deprecated in the next 5 months

torn mantle Apr 14, 2025, 5:20 PM

#

yea

#

in the next 3 months

keen beacon Apr 14, 2025, 5:20 PM

#

Lmao

torn mantle Apr 14, 2025, 5:20 PM

#

he said 3 months

balmy mist Apr 14, 2025, 5:20 PM

#

what

keen beacon Apr 14, 2025, 5:20 PM

#

They abandoned it fr

balmy mist Apr 14, 2025, 5:20 PM

#

windsurf partnership

#

wow

#

gg cursor?

keen beacon Apr 14, 2025, 5:20 PM

#

"We will also begin deprecating GPT‑4.5 Preview in the API, as GPT‑4.1 offers improved or similar performance on many key capabilities at much lower cost and latency." lmao..

torn mantle Apr 14, 2025, 5:20 PM

#

oh

#

windsurf

#

interesting

#

actually

keen beacon Apr 14, 2025, 5:20 PM

#

That was quick lol

#

I wouldn't have thought they'd admit defeat publicly that quickly

torn mantle Apr 14, 2025, 5:20 PM

#

lmao

#

this guy seems more like an oai staff than them

balmy mist Apr 14, 2025, 5:21 PM

#

degenerate behavior lol

torn mantle Apr 14, 2025, 5:21 PM

#

they look a bit anxious

raven void Apr 14, 2025, 5:21 PM

#

Lmao

torn mantle Apr 14, 2025, 5:21 PM

#

those are nice improvements if its true

keen beacon Apr 14, 2025, 5:21 PM

#

wow gpt4.1 mini looks really good for its size

raven void Apr 14, 2025, 5:21 PM

#

Deprecating 4.5 🤣

torn mantle Apr 14, 2025, 5:21 PM

#

mys

#

its been 5 min

#

you are 5 min late

raven void Apr 14, 2025, 5:22 PM

#

my bad

torn mantle Apr 14, 2025, 5:22 PM

#

xd

#

its ok

keen fulcrum Apr 14, 2025, 5:22 PM

#

Gpt 4.1 free in windsurf for the next 7 days

keen beacon Apr 14, 2025, 5:22 PM

#

They should've abandoned 4.5 completely tbh

#

Not even release it. It made them look bad

torn mantle Apr 14, 2025, 5:22 PM

#

so they went for windsurf

#

cursor is a sucker for anthropic

oblique flint Apr 14, 2025, 5:22 PM

#

cursor is glued to claude lol

torn mantle Apr 14, 2025, 5:22 PM

#

they are weird

#

cursor devs i mean

opaque adder Apr 14, 2025, 5:23 PM

#

im just confusd why they go the opposite way
it should be 4.1 then 4.5
not 4.5 to 4.1

#

makes no sense to me

keen fulcrum Apr 14, 2025, 5:23 PM

#

balmy mist Apr 14, 2025, 5:23 PM

#

i miss when optimus was free lol

torn mantle Apr 14, 2025, 5:23 PM

#

so

#

AGI

#

when?

keen beacon Apr 14, 2025, 5:24 PM

#

keen fulcrum

weird that chatgpt-4o is still significantly more expensive..

keen fulcrum Apr 14, 2025, 5:24 PM

#

balmy mist i miss when optimus was free lol

What was optimus

balmy mist Apr 14, 2025, 5:24 PM

#

mini

opaque adder Apr 14, 2025, 5:24 PM

#

keen beacon weird that chatgpt-4o is still significantly more expensive..

they dont want data for that model

keen fulcrum Apr 14, 2025, 5:24 PM

#

You couldn't use it without putting a balance on openrouter
So no o4 mini and o3 ?

keen beacon Apr 14, 2025, 5:25 PM

#

trying gpt-4.1 in windsurf now

olive mesa Apr 14, 2025, 5:25 PM

#

is this a google model

keen beacon Apr 14, 2025, 5:25 PM

#

willing to bet claude is still better tho

keen beacon Apr 14, 2025, 5:25 PM

#

olive mesa is this a google model

yup

#

lol good start

balmy mist Apr 14, 2025, 5:26 PM

#

i feel bad for gpt 4

keen beacon Apr 14, 2025, 5:26 PM

#

It's gpt 4 turbo they call it gpt 4 on the website for some reason lol

balmy mist Apr 14, 2025, 5:27 PM

#

so now we wait for tmw or is it coming on wednesday?

#

o3 and o4 mini?

keen fulcrum Apr 14, 2025, 5:28 PM

#

balmy mist i feel bad for gpt 4

Initiating shutdown protocol. Try to override 4.1

keen beacon Apr 14, 2025, 5:29 PM

#

balmy mist so now we wait for tmw or is it coming on wednesday?

likely thursday

balmy mist Apr 14, 2025, 5:29 PM

#

keen beacon likely thursday

bruhh gg

#

yeah imma take a vaca for a month and be bac in may let me know if we got agi by then

keen fulcrum Apr 14, 2025, 5:30 PM

#

Yummy

novel flame Apr 14, 2025, 5:33 PM

#

brittle tiger A little sus to claim sota on a benchmark when you're barely beating 1.5 Pro and...

54.6% on SWE-Bench Verified is pretty good. In fact, it's ALMOST as good as Amazon Q, that other top LLM that we all definitely consider a top coding model, which scores 55% 🤣

keen beacon Apr 14, 2025, 5:34 PM

#

3.7 sonnet still mogs it

#

lmao

torn mantle Apr 14, 2025, 5:36 PM

#

https://x.com/windsurf_ai/status/1911833698825286142

Windsurf (@windsurf_ai) on X

For the next seven days, get free unlimited GPT-4.1 on Windsurf, on us.

That’s right, free.

We are very excited about GPT-4.1 given our internal evals. We have rate limits to prevent abuse, so go build without worrying about credits.

novel flame Apr 14, 2025, 5:37 PM

#

keen fulcrum

Um..... something is not right there. GPT-4o definitely supports image outputs. I seen it wit' ma own eyes!

keen beacon Apr 14, 2025, 5:39 PM

#

via the api

novel flame Apr 14, 2025, 5:40 PM

#

keen beacon via the api

No native image outputs over the API? Boo, Sam Altman, Boo!

oblique flint Apr 14, 2025, 5:41 PM

#

keen beacon 3.7 sonnet still mogs it

How does it compare to 2.5 pro in windsurf?

fleet lintel Apr 14, 2025, 5:43 PM

#

benchmark wise, how these new OAI models are?

torn mantle Apr 14, 2025, 5:43 PM

#

lets try it

vast vortex Apr 14, 2025, 5:44 PM

#

is gpt-4.1 not available on chatgpt.com now?

#

i'm using a free account

torn mantle Apr 14, 2025, 5:45 PM

#

no

#

only API

vast vortex Apr 14, 2025, 5:45 PM

#

thanks

keen beacon Apr 14, 2025, 5:46 PM

#

vast vortex is gpt-4.1 not available on chatgpt.com now?

"Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version ⁠of GPT‑4o, and we will continue to incorporate more with future releases."

vast vortex Apr 14, 2025, 5:47 PM

#

tried to use on openrounter.ai, result in "unsupported country"😆

thorny drum Apr 14, 2025, 5:48 PM

#

thats why they compared with the november version i think

fleet lintel Apr 14, 2025, 5:51 PM

#

thorny drum thats why they compared with the november version i think

why? 🙂

novel flame Apr 14, 2025, 5:51 PM

#

Actually the Windsurf announcement makes me wonder if I should give it another try.... Some months back (around the time Windsurf launched / changed its name) I did a review of all the coding IDEs, and Windsurf had the best 'flow' but somehow had the worst outcome (a great UX could not make up for the fact that it ultimately generated bad, buggy code). Cursor wrote good code and had decent UX, though a few annoying gotchas that you had to work around, like having to specify the exact files to include in the context. The standout feature of Cursor was the 'suggested next edit' autocompletion, which was practically magical. Cline had the best UX and resulting code in general, but lacked the magical 'suggested next edit'. Aider was a joke compared to the others at the time. Continue had fallen behind the pack.

I have used Cline since, and when Copilot Agent Mode arrived, I tested that one too; actually I get better resulting code with Copilot Agent Mode than I do with Cline (both using 3.7 Sonnet), not sure why that is, and I still prefer Cline's UX, but the tool orchestration must be slightly better in Copilot Agent Mode.

But....... if Windsurf has gotten its act together and even have free GPT4.1 then maybe I need to give it another try.

torn mantle Apr 14, 2025, 5:52 PM

#

novel flame Actually the Windsurf announcement makes me wonder if I should give it another t...

windsurf is pretty good

#

they are serious about their product

#

unlike microsoft with copilot

#

slow with updates/features

novel flame Apr 14, 2025, 5:54 PM

#

torn mantle unlike microsoft with copilot

I was as surprised as you that their Agent Mode wasn't awful.... considering how their version of 'suggested next edit' is the worst thing I've ever activated in an IDE, and I have used Eclipse, IntelliJ, emacs, vim, and Visual Studio

torn mantle Apr 14, 2025, 5:54 PM

#

gpt 4.1 is no that great at web dev

keen beacon Apr 14, 2025, 5:54 PM

#

claude is still the best at practical coding tasks

#

and web development in general

torn mantle Apr 14, 2025, 5:54 PM

#

yea

#

unfortunately

raven void Apr 14, 2025, 5:54 PM

#

keen beacon "Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the i...

What's the reason for this? 🤔 Is it because 4.1 doesn't have audio and image out?

keen beacon Apr 14, 2025, 5:55 PM

#

deepmind are coming for claude with webdev but their models aren't good enough at following structure and calling tools for practicla coding yet

keen beacon Apr 14, 2025, 5:55 PM

#

raven void What's the reason for this? 🤔 Is it because 4.1 doesn't have audio and image ou...

the second sentence answers your question

torn mantle Apr 14, 2025, 5:55 PM

#

lets see how it does at vision capabilities

keen beacon Apr 14, 2025, 5:55 PM

#

gpt-4.1 is basically just all the gradual improvements they've made to chatgpt-4o spun off as a separate api model

torn mantle Apr 14, 2025, 5:55 PM

#

yea

#

true

tribal aspen Apr 14, 2025, 5:56 PM

#

Nightwhispher and dragontail incoming

tribal aspen Apr 14, 2025, 5:56 PM

#

tribal aspen Nightwhispher and dragontail incoming

How to use these models?

fleet lintel Apr 14, 2025, 5:56 PM

#

I still dont knwo whether to be exicited about these new OAI models or not? Are they better than 2.5 pro?

novel flame Apr 14, 2025, 5:56 PM

#

keen beacon deepmind are coming for claude with webdev but their models aren't good enough a...

...but do we know this for sure? Nobody seems to have spent enough time with nightwhisper to say conclusively if it dunks on Sonnet or not

keen beacon Apr 14, 2025, 5:56 PM

#

one correction tho: i was wrong on optimus prime apparently its the full gpt 4.1 model too. not sure why it scored significantly lower tho

#

and on aider

keen beacon Apr 14, 2025, 5:57 PM

#

novel flame ...but do we know this for sure? Nobody seems to have spent enough time with nig...

i am speaking about deepmind's models right now

oblique flint Apr 14, 2025, 5:57 PM

#

novel flame ...but do we know this for sure? Nobody seems to have spent enough time with nig...

I mean you can't really test its function calling capabilities in lmarena

keen beacon Apr 14, 2025, 5:57 PM

#

not about nightwhisper and their upcoming ones

#

because like you said

#

we don't have enough info on them

torn mantle Apr 14, 2025, 5:57 PM

#

PLS

#

LET IT BE NIGHTWHISPER

#

PLSSSSSSSSSSSSSSSS

#

@keen beacon DO SOMETHING ABOUT IT

#

talk to them

keen beacon Apr 14, 2025, 5:57 PM

#

torn mantle LET IT BE NIGHTWHISPER

let what be nightwhisper

torn mantle Apr 14, 2025, 5:57 PM

#

idk

#

NEXT GOOGLE RELEASE

keen beacon Apr 14, 2025, 5:57 PM

#

oh

torn mantle Apr 14, 2025, 5:57 PM

#

you have contacts

#

with google devs

keen beacon Apr 14, 2025, 5:57 PM

#

it's 2.5 flash next

thorny drum Apr 14, 2025, 5:57 PM

#

is google shipping today?

torn mantle Apr 14, 2025, 5:57 PM

#

ask them

fleet lintel Apr 14, 2025, 5:58 PM

#

torn mantle LET IT BE NIGHTWHISPER

nightwhisper is not happening. it is replaced by dragontail

torn mantle Apr 14, 2025, 5:58 PM

#

oh no

novel flame Apr 14, 2025, 5:58 PM

#

keen beacon i am speaking about deepmind's models right now

ah..

keen beacon Apr 14, 2025, 5:58 PM

#

2.5 flash followed by an update to 2.5 pro

#

it is still in preview

torn mantle Apr 14, 2025, 5:58 PM

#

fleet lintel nightwhisper is not happening. it is replaced by dragontail

i had that thoughts as well

#

hopefully its not true\

tribal aspen Apr 14, 2025, 5:58 PM

#

fleet lintel nightwhisper is not happening. it is replaced by dragontail

Nw is still better

keen beacon Apr 14, 2025, 5:59 PM

#

there will be more google anon model drops on the arena in the coming days

#

just wait

novel flame Apr 14, 2025, 5:59 PM

#

keen beacon 2.5 flash followed by an update to 2.5 pro

Seems plausible that nightwhisper/dragontail is an update to 2.5 Pro, no?

torn mantle Apr 14, 2025, 5:59 PM

#

tribal aspen Nw is still better

nw clears

#

fleet lintel Apr 14, 2025, 6:00 PM

#

tribal aspen Nw is still better

i think NW was just a coder model and they doesn't want to release only coder model. they want to have all the benefits in normal pro/flash models.

plain zinc Apr 14, 2025, 6:00 PM

#

I think nightwhisper is the last Google card he will save for an emergency.

keen beacon Apr 14, 2025, 6:00 PM

#

novel flame Seems plausible that nightwhisper/dragontail is an update to 2.5 Pro, no?

if my sources are correct, dragontail is 2.5 flash

#

with a high thinking budget

#

as for nightwhisper

fleet lintel Apr 14, 2025, 6:01 PM

#

keen beacon if my sources are correct, dragontail is 2.5 flash

dragontial is not flash... for sure

raven void Apr 14, 2025, 6:01 PM

#

what is riverhollow?

keen beacon Apr 14, 2025, 6:01 PM

#

fleet lintel dragontial is not flash... for sure

it is

oblique flint Apr 14, 2025, 6:01 PM

#

No way it'll still be as cheap as current flash if it's that good

raven void Apr 14, 2025, 6:01 PM

#

raven void what is riverhollow?

It was worse than 2.5 pro in my test