#general

1 messages · Page 20 of 1

keen beacon
#

it kinda fell off after those

#

and of course it was pretty bad at long context coherence

#

i remember back when you could only access it via api or via their slack bot

#

they had a thing on poe too

#

i used the slack bot from when it released until when claude.ai released

#

unlimited messages for 20$ LOL

keen beacon
#

poe was a crazy product back in the day

keen beacon
#

they've nerfed it 50 times over

keen beacon
#

maybe i dont remembe rlol

keen beacon
#

unlimited messages w claude which is pretty expensive and with a large context window i think

#

plus several incidents (a single guy doing $10k in a single month, etc.)

#

theyre trying to recoup those costs 🤣

#

yeah i'll be honest i may have run up said costs 👀

plain zinc
#

I don't get dragontail anymore.

#

Has he disappeared?

keen beacon
#

they pulled the free trial too

#

stripe free trials are exceptionally easy to abuse

keen beacon
#

token generators

#

there was a discord server with poe token

#

yea

#

it was so easy

#

yes

#

lmaoo 😭

#

'tis probably time for me to go

#

goodnight

#

gn

drifting thorn
#

gn

#

now guessing u r german

hardy pecan
#

o7

alpine coral
leaden palm
#

i haven't heard anything like that

keen beacon
#

Quasar was up for a little bit

#

As anonymous chatbot

#

They removed it I think

drifting thorn
keen beacon
#

it really depends on what kind of article (blog post, subject etc?)

drifting thorn
#

My question is to write a 1000 word realistic fiction(寫一篇1000字左右的都市小説。)

keen beacon
#

if its a story r1 will win

#

amongst those options

drifting thorn
#

just download it and have a look at the translations

#

i know you guys don't know Chinese

#

so I provided a translated version below(translated by 2.5 Pro)

keen beacon
#

do u mean story? btw. i changed my vote to r1 assuming u are talking about stories as that is a story

drifting thorn
#

yup I'm talking bout stories

#

well my preferences is 2.5>=2.0>R1>>>o3-mini

#

o3-mini's story is bland and it lacks details. R1's story is deviated from my settings. Though 2.5 and 2.0's story is also bland, it is able to be descriptive. The characteristic and motive of the characters seemed to be more reasonable in 2.5

#

I guess it's stylistic choice for 2.5 and 2.0 for their plots, since I've seen some literatures that look like these writings

balmy mist
#

damn there is a lot of talk in the chat today, did i miss something big?

drifting thorn
#

Gemini will be connected to Veo

balmy mist
#

thats all that happened?

drifting thorn
#

not really

#

I recommend you to actually have a look of it

ivory schooner
#

你好

#

我正在学会等待Behemoth 推出

drifting thorn
#

怎麽看四個Thinking model的作文?

#

評個分唄

ivory schooner
#

我希望Behemoth 是24k...

drifting thorn
#

上面聊過了,估計Behemoth會像GPT 4.5那樣廢廢的

#

還有,你會看英文嗎?

ivory schooner
keen beacon
#

Is o3 full better then gemini 2.5 pro?

zinc ore
#

Unknown

alpine coral
# keen beacon Is o3 full better then gemini 2.5 pro?

arc agi benchmark suggests so (though it also indicates that Flash 2.0 scored the same as 2.5-Pro-exp... which is admittedly not what I would have expected to see.. though there's a few asterisks there for 2.5.. and the costs for o3 low are literally insane)

ig the picture will become clearer when 2.5 is no longer exp/preview, and o3 is actually released ha

hardy pecan
#

I'd guess about the recent openai models coming

4.1 nano is Quasar
4.1 min is Optimus Alpha

alpine coral
#

hmm i dunno.. optimus alpha seems comparable if not better than quasar, but faster

hardy pecan
#

I thought Quasar was faster, but maybe i'm misremembering

alpine coral
#

yeah im a bit muddled in my thoughts about it too ha

#

quaser was blazingly fast initially; but then i think kinda slowed down (perhaps under additional load or something).. then optimus was added, and it seemed faster than quaser, but yeah kinda just going by memory - which is totally unreliable

#

yeah disregard the above... quaser is apparently like 7 times faster than optimus lol

alpine coral
hardy pecan
#

I'd expect that to be nano then surely? Non reasoning nano is blazing fast!

#

Not sure if any others have been released to the mass public via lmarena or open router. Except for o3 that some have had access to

plain zinc
#

Have you encountered dragontail today?

alpine coral
#

i have yeah

torn mantle
#

Openai still didn't crack the code for a good coding model

hardy pecan
#

Nah, they are not great personally, but I guess they serve a different purpose and not meant to be SOTA

torn mantle
#

Well the only noticeable update is the context window

hardy pecan
#

locally etc

#

that would be their goal,

torn mantle
novel flame
#

Regular people aren’t paying for LLMs though. OpenAI is throwing money down the toilet at an incredible pace. I would hesitate to call them great at creating products until they can turn a profit.

hardy pecan
#

Which model do you think we get tomorrow?

#

Starting strong with o3?

torn mantle
#

to build enough hype

#

or they can just straight up release it from the start

#

the thing is o3 is a good model but its quite pricey

#

so i dont think it will be available for the general public

hardy pecan
#

Pro plan users will get it surely

#

I'm doubtful of plus though

ember rapids
#

Regardless it’s gonna be an amazing week

hardy pecan
#

The space has been really fun the last few years, feels like we always have something to look forward to or surprised about every few weeks

#

Golden age tbh

mossy drum
#

New model in Arena: cobalt-exp-beta-v3

calm sequoia
#

Maybe amazon titan again

novel flame
#

Also ‘luca’?

novel flame
torn mantle
#

or was it v2?

#

i remember seeing this model

fleet lintel
drifting thorn
#

I suggest not puting that HIGH hope on OpenAI

fleet lintel
#

OAI is unlike grok, llama, deepseek. They must have the best models in the market or they will start losing the edge.

keen fulcrum
novel flame
#

Cobalt is not a great coder. Just had it go head to head with 3.7 Sonnet in the arena. Not surprised, since Titan is just about the worst LLM ever built by a major corporation.

torn mantle
#

amazon models arent that good

#

they do not have a dedicated R&D department for LLM training

#

they are just trying to copy what the others are doing

novel flame
novel flame
fleet lintel
#

Not sure why Amazon is even trying to build models? Why not put more money in Anthropic?

calm sequoia
#

The o3 was internally released before december. If it's at least equal to the 2.5 Pro, it means oAI is still months in advance of Google.

novel flame
#

And for what it is, Q isn’t terrible to be honest. I find myself occasionally using ‘q chat’ on the commandline instead of opening Gemini when I need something simple but not simple enough for Copilot completion

novel flame
novel flame
calm sequoia
#

The speed of Google may be faster indeed. But o4-mini implies the existance of o4. The Google will be winner in the long term. But for this year I would still bet on oAI

drifting thorn
#

Would love to see the race in multimodality, maths and coding performance, reasoning performance, creativity and context window

novel flame
ocean vortex
#

@keen beacon

keen beacon
#

WHO RECOMMENDED CURSOR?

#

TO ME?

#

YOU DESERVE A GIFT

#

did NOT know it can do this

#

actually perfect for my world map data project 🙂

ocean vortex
#

and if nano performs no worse than the old mini people can't really complain 💀

golden ocean
#

is cursor free

#

do u need to get ur own api key for every model

#

and pay for the usage

keen beacon
ocean vortex
calm sequoia
ocean vortex
#

o4-mini-high will probably be great and beat 2.5 in several areas, but context and spatial awareness is a question mark.

keen beacon
#

are any of you game developers

#

or use ai for game development

hardy pecan
#

luca has been here before

keen fulcrum
livid harbor
tall summit
golden ocean
keen fulcrum
keen beacon
#

used chatgpt's python sandbox feature to pull pixel image data and turn it into 255 for water 0 for black

#

result?

#

we live in a very dystopian era

#

it even generated the template it used

#

i didnt even know it could do this

calm sequoia
#

Guys we may be in a bubble. No one cares of math and riddles 😄 We shall test new models on emotional support

keen beacon
#

editing text from 4th to 45th? wtf

#

this is even more dystopian 😭

#

terrible

balmy mist
tall summit
balmy mist
balmy mist
tall summit
calm sequoia
balmy mist
balmy mist
#

Just shows progression

calm sequoia
#

They mean for the marketing team, and they aren't ignored 😄

balmy mist
#

lol

#

Google has shown to be able to get new models out fast now so don’t sleep on Google, they could mess around and release another model in 2 weeks

keen fulcrum
calm sequoia
keen fulcrum
#

Nightwhisper is an early gemini 3 pro model I believe

calm sequoia
#

LOL its first time I encountered context length problem in Gemini

#

It appears LLMs still cant analyze dna sequence CSVs because of too long context 😄

keen fulcrum
calm sequoia
#

Sadly my CSV is 15M tokens :/

balmy mist
#

Bruhh

#

That’s for one sequence?

#

Can’t you break it up?

#

Into 15 diff sessions

calm sequoia
#

Will have to dig into dna sequencing to find out 😄

novel flame
drifting thorn
#

I just found out a usable embedding

#

It makes my knowledge base in Cherry Studio functionable

#

{
"message": "[GoogleGenerativeAI Error]: Function call not available. Response was blocked due to OTHER",
"response": {
"promptFeedback": {
"blockReason": "OTHER"
},
"usageMetadata": {
"promptTokenCount": 1984,
"totalTokenCount": 1984,
"promptTokensDetails": {
"0": {
"modality": "TEXT",
"tokenCount": 1984
},
"length": 1
}
},
"modelVersion": "gemini-2.5-pro-exp-03-25"
}
}

#

so sad

oblique flint
#

why are a bunch of unreleased models on there? How are we supposed to know how they perform lol

alpine coral
#

spurious a kinda methodology imo.. like they mined / analysed reddit and forum posts - to find examples of people discussing AI being useful.. ig better than nothing / perhaps somewhat representative.. but i think oai and google etc would have a much better understanding based on actual usage (surely editing text is up there somewhere ha)

drifting thorn
#

Okay my story is stopped by the current ability of AI

golden ocean
#

Why I dont see options like making datasets or fine tuning anymore in ai studio

#

Is that still possible

keen beacon
#

we should be getting the first openai drop(s) in 3-4 hours 👀

#

normally oai drop around 6pm BST

drifting thorn
#

fxxk

#

github education is so hard to applicate

#

I'll just wait for R2

#

to be avaliable on OpenRouter

balmy mist
keen beacon
#

tbf they release at like 10-11am where they are

#

which makes sense

drifting thorn
#

gosh Google nerfed 2.5 Pro's tool calling function!!!!!

#

and 2.5 Pro now can't execute tool call

#

but why?

keen fulcrum
keen beacon
drifting thorn
#

We should wait for Deepseek R2 or nightwhisper right now

keen fulcrum
#

Potentially upon release

keen beacon
#

not immediately though

#

it will be on the arena within a few hours

alpine coral
#

speaking of arena models. i see the number available in direct chat increased from 95 (if memory serves) to 101

keen beacon
#

i haven't noticed anything new in the list

#

weird

keen beacon
#

if 4.1 is the coding jump it is rumoured to be

cloud meadow
#

Riverhollow?

#

Not sure what model this is but it's quite good

cloud meadow
alpine coral
#

has o3mini always been in direct chat?

keen beacon
#

yes

#

cobalt v4 and the doubao models are new

cloud meadow
#

Is it a new gemini model?

vivid oyster
#

Ye

cloud meadow
#

It seems similar to 2.5, maybe it's a cheaper version?

vivid oyster
#

Prob

#

Idk

#

I never used it

#

Alot

#

Just 2 times

keen beacon
#

riverhollow is a little worse than 2.5 pro

drifting thorn
#

what is the context limit of o3?

keen beacon
#

200k tokens

drifting thorn
#

Is it better than other LLMs in terms of tool-calling?

keen beacon
#

couldn't tell you

#

but probably

drifting thorn
#

Here's 2.5 Pro with knowledge base: Internal server error, unable to complete request

novel flame
drifting thorn
#

famous for being dumb

#

it is nicknamed "Downbao", linking it to Down's Syndrome

keen beacon
novel flame
#

I had some time during lunch so I tried the new Firebase Studio with my go-to webdev test prompts and…. It sucked? I don’t understand how, but it was terrible. I could have given the same prompts directly to Gemini 2.5 Pro with no agentic framework and gotten far better results.

#

Oh I’ve tried basically all the other ones already - and Firebase Studio was among the worst

#

I suspect Firebase Studio is not using 2.5 Pro — and it’s agentic framework / tool orchestration is just really bad. It desperately wants to build a NextJS app and then immediately forgets to add all of the features it planned to build.

#

Cost

keen beacon
#

experience*

#

nope

#

i believe it is using flash

drifting thorn
#

I’ve heard bout Firebase Studio’s critics before

sonic tendon
#

yeah, the anon google models are probably 2.5-flash-*

balmy mist
sonic tendon
balmy mist
balmy mist
keen beacon
#

should be able to join as well

sonic tendon
#

80% chance i can make it

keen beacon
#

so i think quasar alpha is whatever we're getting today

#

(likely 4.1)

sonic tendon
novel flame
#

The poor Firebase Studio performance is so strange though; I would’ve thought they could easily build something at least as good as Bolt or Lovable or v0 — and fully integrated with Firebase since they own it. That could have destroyed the competition

drifting thorn
#

Gemma 3 be the base model

sonic tendon
#

pondering whether or not to bet on oAI topping the lmarena leaderboard before the end of this month

torn mantle
#

yea as i said

#

they will start slow

#

o3 is probably on last day

thorny drum
#

i wonder what o3 pricings gonna be

#

do you think they've got it down to something feasible?

balmy mist
thorny drum
#

or is it gonna be once a month for pro users + no api

balmy mist
#

i wanted o3 or o4 mini

#

i wish i didnt test quasar already

thorny drum
#

arc-agi had o3 low as like $200 per task and o3 high as like 100x(?) more?

balmy mist
#

like they didnt make it better since openrouter right?

#

so its hard for me to care about it

#

especially since it was free on open router

#

and now i gotta pay for it lol

keen beacon
#

there's a chance quasar on openrouter was mini or nano

balmy mist
#

dont get me excited and have me blue balled again lol

#

anyone streaming it here?

#

we should make it a community event

#

so we can talk about it together lol

keen beacon
#

I benchmarked quasar and it lined up with chatgpt 4o latest

balmy mist
#

lol

keen beacon
#

The mini model is seemingly great if it's optimus

#

Gets seemingly lower scores on traditional benchmarks but it seems it's really good

keen beacon
sonic tendon
#

hmm

#

contemplating whether or not to bet on oAI topping the charts by april 30

keen beacon
#

openai's endpoint for optimus is returning a 502 now

#

perhaps it's a sign

sonic tendon
#

interesting

keen beacon
#

Optimus was getting fcking bombarded yesterday lol

balmy mist
#

i hope they are free still

keen beacon
#

It was sooo slow

#

doesn't surprise me

keen beacon
balmy mist
#

lol

sonic tendon
keen beacon
#

back up it just seems really slow

#

Optimus had zero rate limits too unlike quasar I think

#

when it first appeared it was lightning fast (as you'd expect from a mini variant) and now it's streaming at the same speed as gpt-4.5 does sk

sonic tendon
#

odd - i'd be surprised if they had a demand spike at this particular time

keen beacon
balmy mist
#

wow we took it forgranted

sonic tendon
#

guessing it's some internal stuff related to the new release(s)

balmy mist
#

wait who is excited for today?

keen beacon
balmy mist
#

like im trying to get excited as you guys are for quasar no longer being free

keen beacon
#

I'm only excited for o4 mini tbh lol

sonic tendon
sonic tendon
balmy mist
#

yeah me too

sonic tendon
#

i will basically never use full o3

balmy mist
#

i also want o3

#

if they give us pro

#

but i doubt that

#

they dont love us like that

keen beacon
#

They probably will jump to o4 full before o3 pro maybe

balmy mist
#

damn

#

is there a way to do what openai does with o1 for pro, with open source models?

#

also @sonic tendon you livestreaming the event right?

sonic tendon
#

i'll let you guys know if i can't make it

balmy mist
keen beacon
#

uhh

balmy mist
#

we should all put up money together and buy the 20k version for the community

#

i got $20 for it

thorny drum
#

dyt their gonna announce full o4 benchmark scores

sonic tendon
#

hmmmmm

keen beacon
#

they'll do a preview of full o4 when they launch o4 mini

#

with benchmark scores

#

pretty likely

thorny drum
#

pretty curious about that yea

keen beacon
#

rumour has it o4 will move away from the 4o base

#

will probably use 4.1

sonic tendon
#

feel like openai's gonna drop pretty quickly once people realize that they aren't releasing o3 yet, but i could be wrong

keen beacon
#

so yeah will be interesting

sonic tendon
thorny drum
#

o4 pricing gonna be insanity lol

sonic tendon
#

i was under the impression that 4.1 was mostly smaller models, but i dunno where i got that from

thorny drum
#

$1/token type shi

keen beacon
#

4.1 is the replacement for 4o

sonic tendon
#

ah, right

keen beacon
#

will likely be similar in size

#

It's the same size

#

4.1 mini replaces 4o mini as the small but relatively powerful model, and 4.1 nano is their "phone model" sort of like gemini nano is

thorny drum
#

dyt 4.1 nano is gonna be the OS one?

keen beacon
#

yeah

#

Oss one is not even trained yet lol

sonic tendon
keen beacon
balmy mist
#

what is oss?

keen beacon
#

Open Source Software

keen beacon
#

I'm on my phone rn tho

#

oai overtook google

sonic tendon
#

yeah people went crazy for openai pretty quickly once sam altman started tweeting

sonic tendon
keen beacon
#

end of april still google

#

I would bet openai lol

#

On the April one maybe

#

woah

#

i just tested optimus alpha on a web design prompt

#

it did really well

#

better than i've ever seen it do

#

beat 2.5 pro and claude's 0-shot attempts imo

balmy mist
keen beacon
#

Idk probably not it's not a chatgpt 4o variant/human preference version

balmy mist
#

like bettign on companies

#

brugg

#

someone send link please

sonic tendon
#

i wonder what's going on with the spread here

torn mantle
balmy mist
#

😦

sonic tendon
#

there is kalshi

balmy mist
#

i hate usa

sonic tendon
#

still, seems unusual to me

sonic tendon
#

oh, but 7k volume

balmy mist
torn mantle
#

charging 20k is criminal

#

it just means they are brute forcing and not innovating

sonic tendon
sonic tendon
#

i mean, not until we have one or two more paradigm shifts

torn mantle
#

im more interested in google solutions

#

they have like a scientist model

#

which seems promising tbh

#

this is the one

sonic tendon
#

doubt it, but my impression is that o3 has a decent shot (based on @keen beacon 's testing)

keen beacon
#

4.1 doesn't have the vibes that got chatgpt 4o to the top so yeah i don't think it'll be #1

#

o3 will be #1 with style control but without im not sure

#

I still think that private model is o4 mini lol

#

eh idk

#

Hopefully I'm right

torn mantle
#

quasar? optimus?

#

both are mid

keen beacon
#

None of them

sonic tendon
keen beacon
#

It's another model that @keen beacon has

sonic tendon
#

i agree, tho, at least in the near future

drifting thorn
#

Okay the knowledge base in Cherry Studio worked better with OpenRouter API Gemini 2.5 Pro

sonic tendon
keen beacon
#

like they have 4o and chatgpt-4o

sonic tendon
keen beacon
#

doubt

#

chatgpt 4o is still 4o

sonic tendon
keen beacon
#

that doesn't mean it's 4.1

sonic tendon
#

doubt it's 4.1

#

plus, doesn't the website say that it's 4o? would be weird to lie about that

drifting thorn
#

@keen beacon What is changed in 0326 so that it performed better?

keen beacon
keen beacon
keen beacon
#

They cpt was done for ages

drifting thorn
#

What is changed in 0326 so that it performed better?

sonic tendon
#

oh hmm

keen beacon
#

Post December chatgpt 4o latest was on a cptd base model

sonic tendon
#

you mean, the biggest 4.1 is just gonna be a new 4o? or am i misunderstanding

keen beacon
#

Yes

keen beacon
#

Bro

buoyant plaza
#

shadebrook is a banger model for creative

keen beacon
#

shadebrook is pretty bad in my experience

drifting thorn
#

Oops

keen beacon
#

but tbf ive only really used it for code

drifting thorn
sonic tendon
buoyant plaza
#

lol i just use it to turn random ideas into trap songs with suno

keen beacon
#

slight aside but if sam isn't at the livestream the model sucks confirmed

drifting thorn
#

My experience with OpenRouter API Gemini 2.5 Pro is jumping in between 400, 502, furiously short content and the content I want

drifting thorn
#

HOW MANY 502 IS OPENROUTER GONNA GIVE ME!!!!!!

keen beacon
#

normally they put the stream up and ready to go on youtube 20-30 mins before it's due to start

#

and they always put a description with

#

(a) what they're announcing (normally) and (b) who's there

sonic tendon
balmy mist
#

Bruhh he better be there

keen beacon
#

If vuyp is saying o4 full will have the new base and o3 is not, how tf does 4o (which is o3s base model) have the new knowledge in the cptd base. Tbh. You are not updating a cut off with a simple finetune

balmy mist
#

It better have kept him up last night

#

Idk if that a good thing or not tho

keen beacon
#

given memory was keeping him up i don't think it's a good indicator anymore

#

So it's either 4.1 (which meant they retrained o3) or o4 mini based on (4.1 mini)

#

if it is any use, this private model has been on the platform i am on for roughly 1 and a half months

drifting thorn
#

I’m afraid on the hallucination on new models are gonna be crazy

keen beacon
#

4.1 mini was trained fairly recently I think. It was pretrained from scratch. We have had zero checkpoints of it unlike chatgpt 4o latest which was a cpt of 4o. Surely openai would've liked to compete with Gemini 2 flash

drifting thorn
#

Original Deepseek V3 has 3.1% hallucination rate, R1 has 14.3%

balmy mist
#

gg

keen beacon
#

there it is

#

ggs

balmy mist
#

imma skip today

keen beacon
#

I was right lol

drifting thorn
#

When R1 is used to train Deepseek V3 0324, its hallucination rate increases to 8.6%

sonic tendon
#

?

balmy mist
#

sama's plan smh

keen beacon
#

4.1 to start the hype then launch the reasoning models later in the week

balmy mist
#

he stay getting me

sonic tendon
balmy mist
#

im happy they told us now tho

keen beacon
sonic tendon
keen beacon
#

Yes

drifting thorn
#

Finally OpenRouter is outputting my novel arghhhhhh

sonic tendon
#

related lol

keen beacon
#

It's already live I believe

#

I just checked

keen beacon
keen beacon
#

proof

keen beacon
#

Yup it's under 4o now

#

?

#

I will get on my computer and show u what I mean

keen beacon
sonic tendon
#

if you don't mind me polymarketposting again: people are saying that this guy might be an openAI employee/insider

sonic tendon
sonic tendon
sonic tendon
#

are most of your nights like that?

keen beacon
sonic tendon
drifting thorn
balmy mist
drifting thorn
#

Gotta sleep rn

sonic tendon
#

cya later

sonic tendon
keen beacon
keen beacon
#

this is one of my questions that only quasar and anonymous chatbot got, and i use to differentiate between 4.1 and chatgpt 4o latest:
(yes i know its not a one off, this was extremely consistent - check lmsys chatgpt 4o latest on 0 temp to check yourself)

#

first chatgpt one was taken days ago

#

the last one was just now

sonic tendon
keen beacon
#

i told you guys

#

omfg

#

no one listens

#

im not saying random shi1t

#

if they're going through the whole thing of calling it gpt-4.1 why would they have it under 4o

#

bro

sonic tendon
keen beacon
#

its literally an updated 4o

#

ok dawg

#

but you didn't answer my question

#

because it will be confusing when o4 mini is out

balmy mist
#

lol

keen beacon
#

u want 4o and o4 mini in the model selector??

#

it's even more confusing if they call a model gpt-4.1 in the api and 4o in the product

balmy mist
#

i hate open ai

keen beacon
#

again

keen beacon
#

they just need to reset with their naming scheme

#

its just live under 4o rn

#

it sucks so bad

balmy mist
#

they should just use names at this point like quasar

sonic tendon
#

they should just start stealing product names from other companies

keen beacon
#

everyone has bad naming tbh lol

sonic tendon
#

"Introducing our newest hybrid model, Gmail"

balmy mist
#

bruhh

keen beacon
#

lmao sam isn't there

#

confirmed bad

#

check to see if its rolled out to you by asking the question

sonic tendon
#

i heard you, i was just surprised

sonic tendon
keen beacon
#

who won the 2024 Solomon Islands general election

#

only quasar/anonymous chatbot/4.1 gets it

#

they havent changed the name on it yet its still labeled 4o

balmy mist
keen beacon
#

bruh 🤣

balmy mist
#

i did lol

#

it still searches the web

keen beacon
#

start a new chat

balmy mist
#

okay

keen beacon
#

if this is 4.1 under 4o on chatgpt it seems worse than optimus

sonic tendon
keen beacon
#

it did a lot worse on my frontend task

#

@keen beacon optimus can outperform 4.1

#

well then what on earth is optimus

balmy mist
keen beacon
sonic tendon
keen beacon
keen beacon
balmy mist
#

wild you have pro?

keen beacon
#

na

balmy mist
#

you just valid with open ai?

keen beacon
#

🤣

#

u are getting unlucky with roll outs man lol

#

no veo 2 etc

balmy mist
#

ikr

#

bruhh

sage raptor
#

lol its 4.1 today

balmy mist
#

lmaoooo

#

imma retweet this

keen beacon
#

if you used quasar and optimus prime u already used 4.1/4.1 mini for free 🤣

balmy mist
#

gotta get everyone else excited so we can get teased together

#

wild what if its an even better model

#

that is the ebst coder alive?

#

best*

#

what if strawberry man is right

#

i believe in mr.strawberry againm trust the process

keen beacon
#

i doubt thats being released today xd

#

strawberry man is just a troll

keen beacon
balmy mist
#

lmaoo

keen beacon
#

the grifter himself

balmy mist
#

dude got like 72k follows on some bs

#

had the whole space by their balls

#

that was actually funny times

#

anyone was on thsoe spaces?

#

that was before o1

#

seems like forever ago

#

wow

#

imma become a grifter now

#

yeah i could nt use it all morning

#

but yo he actually might have serious connects to open ai, maybe he somebody kid that works their lol

#

this was wild predictions

keen beacon
keen beacon
keen fulcrum
#

Expected as they launch new servers
increasing their capacity for upcoming models

balmy mist
#

like people know who he is

#

they exposed him

thorny drum
#

its just slow i think

keen beacon
#

weirdos

balmy mist
#

why wouldnt they, he was trolling us hard

#

had me missing work for his nonsense

#

why would you troll anyone?

keen fulcrum
#

People who dox will get rate limited

balmy mist
#

lol

keen beacon
#

trolling and doxxing are way different in terms of severity i think

balmy mist
#

so becareful what you do online

#

cause you can be doxxed easily

#

people kill people for less in our world, so im not shocked that anyone would doxx or do anything online

keen beacon
#

oh hello 👀

thorny drum
#

4o = 4.1 now?

balmy mist
#

strawberry man got me excited

keen beacon
#

its supposed to be named 4.1 (which it will be renamed soon enough i think), but its just an updated 4o

balmy mist
#

lol

#

i really hope r2 comes out this week

balmy mist
thorny drum
#

wait wdyt o3 pricing will be

balmy mist
#

that is crazy accurate tho

#

like what are the odds

thorny drum
#

like o3 low was twice the cost of o1 pro

balmy mist
keen beacon
#

lmao i got it again

#

is it not rolled out to you yet? or is this different

#

Shrug i just keep getting parallel gens for feedback on 4o

balmy mist
keen beacon
#

hallucination

#

never ask a model what it is

balmy mist
#

didnt we do that with quasar tho?

keen beacon
#

not rly

balmy mist
#

and hallucinations are not a bad thing, they are the key to dreaming and solving things we dont know, just need controlled hallucinations

keen beacon
#

i agree lol

balmy mist
#

there is always truth to an hallucination

#

like it stems from something

keen beacon
#

yes

#

i would go on about it but ya im not gonna rant about it

balmy mist
#

lol

keen beacon
#

It's openai vs google

#

Bruh

novel flame
#

Nvm I get it now.

keen beacon
#

lmao so they really do want to charge 20k

sonic tendon
#

@balmy mist train got delayed, so I'll probably be a bit late

sonic tendon
keen beacon
#

By the time they have a product like that they won't sell it

keen fulcrum
#

When will main o4 release

keen fulcrum
keen beacon
#

next month probs

#

actually no

#

lmao sorry i misread

sonic tendon
keen beacon
#

probably like june or so

#

maybe july

novel flame
#

The 20k price tag snells like a classic con: “Buy this money printing machine, only $20k!! With it, you can print as much money as you like!”

If they had an AI which could create arbitrary novel discoveries warranting a price tag of $20k, they would keep it to themselves and print money themselves.

keen fulcrum
sonic tendon
balmy mist
keen fulcrum
#

Will hurt my portfolio if it tends to lose and not worth the cost

sonic tendon
#

in what context

balmy mist
novel flame
balmy mist
#

we woul dbe oversaturated with stuff liek that

keen fulcrum
novel flame
ember rapids
#

I hope they preview the benchmarks for full o4

keen beacon
#

they will in the o3/o4 mini livestream

torn mantle
#

recently im noticing some typo mistakes from gemini 2.5 pro

#

which is kinda weird

#

its the 3rd time already

balmy mist
#

im kinda sad that we are so early to these models

#

we are not suprised anymore

#

i gotta take a vaca for a month or so and comeback to agi lmao

torn mantle
#

marketing hype geniuses

#

the model is bad

balmy mist
#

lmaooo

#

lets pretend its actually good

#

it is fast tho

torn mantle
#

im just here for the drama tbh

balmy mist
#

i have a feeling they are going to do something big like strawberry man said

torn mantle
#

hopefully google will pull out a battle model release

torn mantle
#

that guy isnt reliable

#

never was

balmy mist
#

lmaoooo

torn mantle
#

hes a grifter

balmy mist
#

he did predict o4 in april tho

torn mantle
#

an engagement farmer

#

its kinda obvious tbh

balmy mist
#

okay when is o5 coming?

torn mantle
#

after o4

keen beacon
#

lmao conveniently i now have to go 🙄

balmy mist
#

lmaooooooooo

keen beacon
#

cya in a bit gang

torn mantle
#

you know why its obvious?

#

because all other labs are catching up

#

and openai cant just rely on o3 series

keen fulcrum
#

Grok 4 will be great

torn mantle
#

they already have it implemented on deep research

#

grok 4

#

will be sh1t

#

dont get me started

#

omg

#

just the thought of using grok 3 again is making me so mad

balmy mist
#

i mean there is a market for predicting launches

torn mantle
#

wasted billions on nothing

keen fulcrum
#

Grok 3 reasoning is the best

balmy mist
#

if you can get the month it comes out tell me

#

i wanna make money

torn mantle
#

xd

#

you want my prediction

#

ok let me think

balmy mist
#

yeah, i got my notes ready

#

omgg

#

it started

torn mantle
#

based on leaks we will definitely have o3-full this month ( thats for sure xd )

balmy mist
#

what if sama pulls up?

#

last minute

torn mantle
balmy mist
#

wait

torn mantle
#

im interested in benchmarks tho

balmy mist
#

if it truly is the smallest ever that is good

#

1 mill tokens

#

wow

#

omgg they cooking

#

jk lol

#

but i am curious of the size of nano

#

bruhh

torn mantle
#

so not much MMLU diff

keen fulcrum
torn mantle
#

between 4o and 4.1

balmy mist
#

im turning this crap off

keen fulcrum
#

Latency

torn mantle
#

by latency

#

i see

#

so its MMLU/latency

#

thats a weird ratio

balmy mist
#

NA for nano?

#

wtf

keen fulcrum
torn mantle
#

they are talking about pricing a lot

balmy mist
#

nano better free

#

omgg nightwhisper is 4.1

#

yooooo

keen beacon
#

Omg

balmy mist
#

sama's plan

#

lmaooo

#

jk

torn mantle
#

nah

#

stop it

#

nah

#

its not

keen beacon
#

Lmao

torn mantle
#

im trying that prompt on gemini 2.5 pro

balmy mist
#

i actually like nano

#

i wonder how big it is, but they are prob not gonna tell us

keen fulcrum
balmy mist
#

lmaoo the accuracy

keen beacon
#

Make it look better lol

#

If they compared to chatgpt 4o latest it would be the same ahahaha

keen fulcrum
#

Multimodal

balmy mist
#

mini was quasar?

#

or optimus?

keen beacon
#

Optimus

balmy mist
#

hmm

#

wild do you know how big quasar is?

#

like in parameters?

keen beacon
#

Should be same as 4o so 200b

balmy mist
#

lol

oblique flint
#

4.1 looks like quite a solid model already. Imagine if they add reasoning on top of it

brittle tiger
#

A little sus to claim sota on a benchmark when you're barely beating 1.5 Pro and 2.5 hasnt been tested

balmy mist
#

wait so we did not get to test out 4.1 right? @keen beacon

#

only the mini and nano

keen beacon
#

4.1 was quasar

balmy mist
#

ohhhh

keen beacon
#

4.1 mini is optimus

balmy mist
#

that makes sense, i udnerstand now

golden ocean
#

whats the point of 4.1 if theres 4.5

balmy mist
#

okay so what do you think the size is of nano?

#

like 7b

keen beacon
#

Maybe the same size as 4o mini

#

Tbh I have no idea for mini and nano

balmy mist
#

if it was really low they would showcase that tbh

#

please be free

keen beacon
balmy mist
#

nice

keen beacon
#

4.1 mini is a really good deal

torn mantle
#

awkward silence

keen beacon
#

It can sometimes beat 4.1

balmy mist
#

they were so scared lol

keen beacon
#

Apparently even if the benchmarks are lower

balmy mist
#

yeah optimus is a good model

oblique flint
#

nano is exactly the same price as 2.0 flash hmm

balmy mist
#

mini is a no brainer

torn mantle
#

google will drop their new model probably

balmy mist
#

its the highlight of today

#

40% faster

keen beacon
#

im back

#

hello

sonic tendon
keen fulcrum
keen beacon
#

what happened

sonic tendon
#

sorry ;-;

#

ope

keen beacon
#

wow

#

i see how it is

thorny drum
#

haha 4.5 gone

#

good month

balmy mist
#

lmaooooo

keen beacon
#

Did they lol

torn mantle
#

xddd

#

XDDDDDDDDDDD

keen beacon
#

I'm not watching

keen fulcrum
#

4.5 will get deprecated in the next 5 months

torn mantle
#

yea

#

in the next 3 months

keen beacon
#

Lmao

torn mantle
#

he said 3 months

balmy mist
#

what

keen beacon
#

They abandoned it fr

balmy mist
#

windsurf partnership

#

wow

#

gg cursor?

keen beacon
#

"We will also begin deprecating GPT‑4.5 Preview in the API, as GPT‑4.1 offers improved or similar performance on many key capabilities at much lower cost and latency." lmao..

torn mantle
#

oh

#

windsurf

#

interesting

#

actually

keen beacon
#

That was quick lol

#

I wouldn't have thought they'd admit defeat publicly that quickly

torn mantle
#

lmao

#

this guy seems more like an oai staff than them

balmy mist
#

degenerate behavior lol

torn mantle
#

they look a bit anxious

raven void
#

Lmao

torn mantle
#

those are nice improvements if its true

keen beacon
#

wow gpt4.1 mini looks really good for its size

raven void
#

Deprecating 4.5 🤣

torn mantle
#

mys

#

its been 5 min

#

you are 5 min late

raven void
#

my bad

torn mantle
#

xd

#

its ok

keen fulcrum
#

Gpt 4.1 free in windsurf for the next 7 days

keen beacon
#

They should've abandoned 4.5 completely tbh

#

Not even release it. It made them look bad

torn mantle
#

so they went for windsurf

#

cursor is a sucker for anthropic

oblique flint
#

cursor is glued to claude lol

torn mantle
#

they are weird

#

cursor devs i mean

opaque adder
#

im just confusd why they go the opposite way
it should be 4.1 then 4.5
not 4.5 to 4.1

#

makes no sense to me

keen fulcrum
balmy mist
#

i miss when optimus was free lol

torn mantle
#

so

#

AGI

#

when?

keen beacon
# keen fulcrum

weird that chatgpt-4o is still significantly more expensive..

keen fulcrum
balmy mist
#

mini

opaque adder
keen fulcrum
#

You couldn't use it without putting a balance on openrouter
So no o4 mini and o3 ?

keen beacon
#

trying gpt-4.1 in windsurf now

olive mesa
#

is this a google model

keen beacon
#

willing to bet claude is still better tho

keen beacon
#

lol good start

balmy mist
#

i feel bad for gpt 4

keen beacon
#

It's gpt 4 turbo they call it gpt 4 on the website for some reason lol

balmy mist
#

so now we wait for tmw or is it coming on wednesday?

#

o3 and o4 mini?

keen fulcrum
keen beacon
balmy mist
#

yeah imma take a vaca for a month and be bac in may let me know if we got agi by then

keen fulcrum
novel flame
keen beacon
#

3.7 sonnet still mogs it

#

lmao

torn mantle
novel flame
# keen fulcrum

Um..... something is not right there. GPT-4o definitely supports image outputs. I seen it wit' ma own eyes!

keen beacon
#

via the api

novel flame
oblique flint
fleet lintel
#

benchmark wise, how these new OAI models are?

torn mantle
#

lets try it

vast vortex
#

i'm using a free account

torn mantle
#

no

#

only API

vast vortex
#

thanks

keen beacon
# vast vortex is gpt-4.1 not available on chatgpt.com now?

"Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version ⁠of GPT‑4o, and we will continue to incorporate more with future releases."

vast vortex
thorny drum
#

thats why they compared with the november version i think

novel flame
#

Actually the Windsurf announcement makes me wonder if I should give it another try.... Some months back (around the time Windsurf launched / changed its name) I did a review of all the coding IDEs, and Windsurf had the best 'flow' but somehow had the worst outcome (a great UX could not make up for the fact that it ultimately generated bad, buggy code). Cursor wrote good code and had decent UX, though a few annoying gotchas that you had to work around, like having to specify the exact files to include in the context. The standout feature of Cursor was the 'suggested next edit' autocompletion, which was practically magical. Cline had the best UX and resulting code in general, but lacked the magical 'suggested next edit'. Aider was a joke compared to the others at the time. Continue had fallen behind the pack.

I have used Cline since, and when Copilot Agent Mode arrived, I tested that one too; actually I get better resulting code with Copilot Agent Mode than I do with Cline (both using 3.7 Sonnet), not sure why that is, and I still prefer Cline's UX, but the tool orchestration must be slightly better in Copilot Agent Mode.

But....... if Windsurf has gotten its act together and even have free GPT4.1 then maybe I need to give it another try.

torn mantle
#

they are serious about their product

#

unlike microsoft with copilot

#

slow with updates/features

novel flame
# torn mantle unlike microsoft with copilot

I was as surprised as you that their Agent Mode wasn't awful.... considering how their version of 'suggested next edit' is the worst thing I've ever activated in an IDE, and I have used Eclipse, IntelliJ, emacs, vim, and Visual Studio

torn mantle
#

gpt 4.1 is no that great at web dev

keen beacon
#

claude is still the best at practical coding tasks

#

and web development in general

torn mantle
#

yea

#

unfortunately

raven void
keen beacon
#

deepmind are coming for claude with webdev but their models aren't good enough at following structure and calling tools for practicla coding yet

keen beacon
torn mantle
#

lets see how it does at vision capabilities

keen beacon
#

gpt-4.1 is basically just all the gradual improvements they've made to chatgpt-4o spun off as a separate api model

torn mantle
#

yea

#

true

tribal aspen
#

Nightwhispher and dragontail incoming

tribal aspen
fleet lintel
#

I still dont knwo whether to be exicited about these new OAI models or not? Are they better than 2.5 pro?

novel flame
keen beacon
#

one correction tho: i was wrong on optimus prime apparently its the full gpt 4.1 model too. not sure why it scored significantly lower tho

#

and on aider

keen beacon
oblique flint
keen beacon
#

not about nightwhisper and their upcoming ones

#

because like you said

#

we don't have enough info on them

torn mantle
#

PLS

#

LET IT BE NIGHTWHISPER

#

PLSSSSSSSSSSSSSSSS

#

@keen beacon DO SOMETHING ABOUT IT

#

talk to them

keen beacon
torn mantle
#

idk

#

NEXT GOOGLE RELEASE

keen beacon
#

oh

torn mantle
#

you have contacts

#

with google devs

keen beacon
#

it's 2.5 flash next

thorny drum
#

is google shipping today?

torn mantle
#

ask them

fleet lintel
torn mantle
#

oh no

keen beacon
#

2.5 flash followed by an update to 2.5 pro

#

it is still in preview

torn mantle
#

hopefully its not true\

tribal aspen
keen beacon
#

there will be more google anon model drops on the arena in the coming days

#

just wait

novel flame
torn mantle
fleet lintel
# tribal aspen Nw is still better

i think NW was just a coder model and they doesn't want to release only coder model. they want to have all the benefits in normal pro/flash models.

plain zinc
#

I think nightwhisper is the last Google card he will save for an emergency.

keen beacon
#

with a high thinking budget

#

as for nightwhisper

fleet lintel
raven void
#

what is riverhollow?

keen beacon
oblique flint
#

No way it'll still be as cheap as current flash if it's that good

raven void