#general
1 messages · Page 20 of 1
and of course it was pretty bad at long context coherence
i remember back when you could only access it via api or via their slack bot
they had a thing on poe too
i used the slack bot from when it released until when claude.ai released
unlimited messages for 20$ LOL
that was a bit later
poe was a crazy product back in the day
no they had claude 1 i think
they've nerfed it 50 times over
i'm not disputing that, i'm saying that it was available on anthropic's slack bot before it was available via poe
maybe i dont remembe rlol
ya their initial sub was an insane deal
unlimited messages w claude which is pretty expensive and with a large context window i think
plus several incidents (a single guy doing $10k in a single month, etc.)
theyre trying to recoup those costs 🤣
yeah i'll be honest i may have run up said costs 👀
that was another incident
token generators
there was a discord server with poe token
yea
it was so easy
yes
lmaoo 😭
'tis probably time for me to go
goodnight
gn
o7
i've got another batch of questions for PREVIEW if you would be so kind - but can ofc wait till (your) tomorrow!
i haven't heard anything like that
it really depends on what kind of article (blog post, subject etc?)
My question is to write a 1000 word realistic fiction(寫一篇1000字左右的都市小説。)
just download it and have a look at the translations
i know you guys don't know Chinese
so I provided a translated version below(translated by 2.5 Pro)
do u mean story? btw. i changed my vote to r1 assuming u are talking about stories as that is a story
yup I'm talking bout stories
well my preferences is 2.5>=2.0>R1>>>o3-mini
o3-mini's story is bland and it lacks details. R1's story is deviated from my settings. Though 2.5 and 2.0's story is also bland, it is able to be descriptive. The characteristic and motive of the characters seemed to be more reasonable in 2.5
I guess it's stylistic choice for 2.5 and 2.0 for their plots, since I've seen some literatures that look like these writings
damn there is a lot of talk in the chat today, did i miss something big?
Gemini will be connected to Veo
thats all that happened?
我希望Behemoth 是24k...
确实有点像
Is o3 full better then gemini 2.5 pro?
Unknown
arc agi benchmark suggests so (though it also indicates that Flash 2.0 scored the same as 2.5-Pro-exp... which is admittedly not what I would have expected to see.. though there's a few asterisks there for 2.5.. and the costs for o3 low are literally insane)
ig the picture will become clearer when 2.5 is no longer exp/preview, and o3 is actually released ha
I'd guess about the recent openai models coming
4.1 nano is Quasar
4.1 min is Optimus Alpha
hmm i dunno.. optimus alpha seems comparable if not better than quasar, but faster
I thought Quasar was faster, but maybe i'm misremembering
yeah im a bit muddled in my thoughts about it too ha
quaser was blazingly fast initially; but then i think kinda slowed down (perhaps under additional load or something).. then optimus was added, and it seemed faster than quaser, but yeah kinda just going by memory - which is totally unreliable
yeah disregard the above... quaser is apparently like 7 times faster than optimus lol
you're not ha
I'd expect that to be nano then surely? Non reasoning nano is blazing fast!
Not sure if any others have been released to the mass public via lmarena or open router. Except for o3 that some have had access to
Have you encountered dragontail today?
i have yeah
those arent of a big improvement tbh
Openai still didn't crack the code for a good coding model
Nah, they are not great personally, but I guess they serve a different purpose and not meant to be SOTA
Which is?
Well the only noticeable update is the context window
I suspect they might be the open-source models that can run on a phone etc
locally etc
that would be their goal,
Regular people aren’t paying for LLMs though. OpenAI is throwing money down the toilet at an incredible pace. I would hesitate to call them great at creating products until they can turn a profit.
i think they will leave the final boss till the end
to build enough hype
or they can just straight up release it from the start
the thing is o3 is a good model but its quite pricey
so i dont think it will be available for the general public
They might try to front run google or vice versa
Regardless it’s gonna be an amazing week
The space has been really fun the last few years, feels like we always have something to look forward to or surprised about every few weeks
Golden age tbh
New model in Arena: cobalt-exp-beta-v3
Maybe amazon titan again
Also ‘luca’?
Cobalt says it’s part of Amazon Titan, yes
wasnt this already in the arena
or was it v2?
i remember seeing this model
They must be better than 2.5 pro. I am excited to see how much better they are
I suggest not puting that HIGH hope on OpenAI
OAI is unlike grok, llama, deepseek. They must have the best models in the market or they will start losing the edge.
Better than llama
Cobalt is not a great coder. Just had it go head to head with 3.7 Sonnet in the arena. Not surprised, since Titan is just about the worst LLM ever built by a major corporation.
amazon models arent that good
they do not have a dedicated R&D department for LLM training
they are just trying to copy what the others are doing
At least Nova Pro is up there with the big boys — not at the top, but still; Titan is just garbage.
They seem to have decided to invest big in Anthropic instead of trying to build in-house. I think it’s a smart choice. That’s also why I don’t understand why Titan exists.
Not sure why Amazon is even trying to build models? Why not put more money in Anthropic?
The o3 was internally released before december. If it's at least equal to the 2.5 Pro, it means oAI is still months in advance of Google.
And for what it is, Q isn’t terrible to be honest. I find myself occasionally using ‘q chat’ on the commandline instead of opening Gemini when I need something simple but not simple enough for Copilot completion
The only reason I can see is to support their non-text services - image, video, audio. Claude doesn’t give them those
Is not an assembly line. Being ahead in “time” means nothing if you are focusing too much on the wrong improvements. Google has shown an ability to rapidly train and launch new models, which could be a game changer. Not to mention the possibility of a new vastly superior model architecture entering the fray soon 😉
The speed of Google may be faster indeed. But o4-mini implies the existance of o4. The Google will be winner in the long term. But for this year I would still bet on oAI
Would love to see the race in multimodality, maths and coding performance, reasoning performance, creativity and context window
I wouldn’t be so quick to predict long term winners. This is early early days of modern AI.
if that's true then new mini gonna be more expensive than the old for sure lol
@keen beacon
WHO RECOMMENDED CURSOR?
TO ME?
YOU DESERVE A GIFT
did NOT know it can do this
actually perfect for my world map data project 🙂
and if nano performs no worse than the old mini people can't really complain 💀
is cursor free
do u need to get ur own api key for every model
and pay for the usage
im not paying but i put my google api in
I hope you are using exp and not preview for 2.5 lol
o3 could be expensive
o4-mini-high will probably be great and beat 2.5 in several areas, but context and spatial awareness is a question mark.
luca has been here before
https://storage.googleapis.com/gweb-uniblog-publish-prod/original_videos/ai-mode-animation.mp4
This with Gemini 2.5 pro is superior to Perplexity & You
We built Dingo to solve the pain points we encountered managing data quality at scale.
Github repo:(welcome star~)
https://github.com/DataEval/dingo
Online Demo:
https://huggingface.co/spaces/DataEval/dingo
nice one
gm, i can take them now :)
so relatable
how much rocks should I eat per day
Thats a prompt with dozens of AI generated content
probably generated by replacing mineral with rock
used chatgpt's python sandbox feature to pull pixel image data and turn it into 255 for water 0 for black
result?
we live in a very dystopian era
it even generated the template it used
i didnt even know it could do this
Guys we may be in a bubble. No one cares of math and riddles 😄 We shall test new models on emotional support
editing text from 4th to 45th? wtf
this is even more dystopian 😭
terrible
How do you know Google released their best models tho?
creativity #9, that's my main use case 😭
Yeah every day we have news lol, that’s why I say we are already in the technological explosion
please no humanlike
I swear we better not get the 4.1s today lmaoo
why not
No evidence, no rumors, too fast, 2.5 instead of 3.
I want o4 😂
The numbers don’t mean anything bro
Just shows progression
They mean for the marketing team, and they aren't ignored 😄
lol
Google has shown to be able to get new models out fast now so don’t sleep on Google, they could mess around and release another model in 2 weeks
FT, and two others due
Nightwhisper, Dragontail and Stargazer all Google models shown on https://alpha.lmarena.ai
Do not underestimate the bureuocratic burden that the large companies have.
Nightwhisper is an early gemini 3 pro model I believe
LOL its first time I encountered context length problem in Gemini
It appears LLMs still cant analyze dna sequence CSVs because of too long context 😄
It best if you summarize past conversation history and simplify tasks into steps
Sadly my CSV is 15M tokens :/
Will have to dig into dna sequencing to find out 😄
Sure... "therapy and companionship". That means pr0n. You know that means pr0n.
I just found out a usable embedding
It makes my knowledge base in Cherry Studio functionable
{
"message": "[GoogleGenerativeAI Error]: Function call not available. Response was blocked due to OTHER",
"response": {
"promptFeedback": {
"blockReason": "OTHER"
},
"usageMetadata": {
"promptTokenCount": 1984,
"totalTokenCount": 1984,
"promptTokensDetails": {
"0": {
"modality": "TEXT",
"tokenCount": 1984
},
"length": 1
}
},
"modelVersion": "gemini-2.5-pro-exp-03-25"
}
}
so sad
literally 1984
why are a bunch of unreleased models on there? How are we supposed to know how they perform lol
i find that discrediting.. makes no sense lol
spurious a kinda methodology imo.. like they mined / analysed reddit and forum posts - to find examples of people discussing AI being useful.. ig better than nothing / perhaps somewhat representative.. but i think oai and google etc would have a much better understanding based on actual usage (surely editing text is up there somewhere ha)
from HBR (kinda surprisingly tbh ha) https://hbr.org/2024/03/how-people-are-really-using-genai
React to this
Okay my story is stopped by the current ability of AI
Why I dont see options like making datasets or fine tuning anymore in ai studio
Is that still possible
we should be getting the first openai drop(s) in 3-4 hours 👀
normally oai drop around 6pm BST
fxxk
github education is so hard to applicate
I'll just wait for R2
to be avaliable on OpenRouter
i hate that they wait so long, but its a good treat for lunch time for me lol
gosh Google nerfed 2.5 Pro's tool calling function!!!!!
and 2.5 Pro now can't execute tool call
but why?
I noticed its a buggy mess
We should wait for Deepseek R2 or nightwhisper right now
Will we get the new models on lmarena too?
Potentially upon release
yes
not immediately though
it will be on the arena within a few hours
speaking of arena models. i see the number available in direct chat increased from 95 (if memory serves) to 101
also this makes me think they're dropping 4.1 today
if 4.1 is the coding jump it is rumoured to be
cobalt-exp-beta-v4 might be revcently added? dunno about the 2 doubao ones (haven't noticed them before, nor heard of the model tbh)
has o3mini always been in direct chat?
Is it a new gemini model?
Ye
It seems similar to 2.5, maybe it's a cheaper version?
riverhollow is a little worse than 2.5 pro
what is the context limit of o3?
200k tokens
Is it better than other LLMs in terms of tool-calling?
Here's 2.5 Pro with knowledge base: Internal server error, unable to complete request
Doubao is a popular chat LLM from ByteDance, for a long time the most popular in China

I had some time during lunch so I tried the new Firebase Studio with my go-to webdev test prompts and…. It sucked? I don’t understand how, but it was terrible. I could have given the same prompts directly to Gemini 2.5 Pro with no agentic framework and gotten far better results.
Oh I’ve tried basically all the other ones already - and Firebase Studio was among the worst
I suspect Firebase Studio is not using 2.5 Pro — and it’s agentic framework / tool orchestration is just really bad. It desperately wants to build a NextJS app and then immediately forgets to add all of the features it planned to build.
Cost
yeah it wasn't good in my experiencxe either
experience*
nope
i believe it is using flash
I’ve heard bout Firebase Studio’s critics before
yeah, the anon google models are probably 2.5-flash-*
why do they say stuff like that lmaoo, super massive blackhole
anyone wanna do a vc then?
so 4.1 is a better coding model then o4 and o3?
yeah can you stream it please
should be able to join as well
I should be able to, but like
80% chance i can make it
quasars are powered by super massive blackholes
so i think quasar alpha is whatever we're getting today
(likely 4.1)
OHHHH
The poor Firebase Studio performance is so strange though; I would’ve thought they could easily build something at least as good as Bolt or Lovable or v0 — and fully integrated with Firebase since they own it. That could have destroyed the competition
Gemma 3 be the base model
pondering whether or not to bet on oAI topping the lmarena leaderboard before the end of this month
i wonder what o3 pricings gonna be
do you think they've got it down to something feasible?
ahh that makes sense lol
or is it gonna be once a month for pro users + no api
arc-agi had o3 low as like $200 per task and o3 high as like 100x(?) more?
like they didnt make it better since openrouter right?
so its hard for me to care about it
especially since it was free on open router
and now i gotta pay for it lol
they probably have
there's a chance quasar on openrouter was mini or nano
dont get me excited and have me blue balled again lol
anyone streaming it here?
we should make it a community event
so we can talk about it together lol
Very unlikely the verge mentioned 4.1 itself being revamped gpt 4o
I benchmarked quasar and it lined up with chatgpt 4o latest
lol
The mini model is seemingly great if it's optimus
Gets seemingly lower scores on traditional benchmarks but it seems it's really good
lmao @ blue balled
interesting
Optimus was getting fcking bombarded yesterday lol
i hope they are free still
it won't be
lol
probably not gonna last long once they publicly release it
back up it just seems really slow
Optimus had zero rate limits too unlike quasar I think
when it first appeared it was lightning fast (as you'd expect from a mini variant) and now it's streaming at the same speed as gpt-4.5 does 
odd - i'd be surprised if they had a demand spike at this particular time
Yeah no rate limit after 10 credits
wow we took it forgranted
guessing it's some internal stuff related to the new release(s)
wait who is excited for today?
Quasar had rate limiting I think after 10 credits but it was unlimited
like im trying to get excited as you guys are for quasar no longer being free
I'm only excited for o4 mini tbh lol
lol, that's a fair way to put it
yeah
yeah me too
i will basically never use full o3
They probably will jump to o4 full before o3 pro maybe
damn
is there a way to do what openai does with o1 for pro, with open source models?
also @sonic tendon you livestreaming the event right?
I plan on it, yeah
i'll let you guys know if i can't make it
is this dude reliable?
https://x.com/koltregaskes/status/1911805596732477592
uhh
we should all put up money together and buy the 20k version for the community
i got $20 for it
dyt their gonna announce full o4 benchmark scores
hmmmmm
they'll do a preview of full o4 when they launch o4 mini
with benchmark scores
pretty likely
pretty curious about that yea
feel like openai's gonna drop pretty quickly once people realize that they aren't releasing o3 yet, but i could be wrong
so yeah will be interesting
oh, interesting
o4 pricing gonna be insanity lol
i was under the impression that 4.1 was mostly smaller models, but i dunno where i got that from
$1/token type shi
4.1 is the replacement for 4o
ah, right
will likely be similar in size
It's the same size
4.1 mini replaces 4o mini as the small but relatively powerful model, and 4.1 nano is their "phone model" sort of like gemini nano is
dyt 4.1 nano is gonna be the OS one?
wdym?
would like a source for this
what is oss?
Open Source Software
Iirc they were still talking about how many params it was gonna be and they were gonna host a discussion about what it should be loll
I'm on my phone rn tho
oai overtook google
yeah people went crazy for openai pretty quickly once sam altman started tweeting
june? imo that's less surprising
end of april still google
I would bet openai lol
On the April one maybe
woah
i just tested optimus alpha on a web design prompt
it did really well
better than i've ever seen it do
beat 2.5 pro and claude's 0-shot attempts imo
wait you can make mondye onthis?
Idk probably not it's not a chatgpt 4o variant/human preference version
i had 200 shares but sold at 20c lmaoo
i wonder what's going on with the spread here
best time to bet on google
😦
there is kalshi
i hate usa
yea
still, seems unusual to me
i saw that too
oh, but 7k volume
so you tryna put down for communicty membership?
yeahhhh i feel like this is never gonna happen in any meaningful way
i mean, not until we have one or two more paradigm shifts
im more interested in google solutions
they have like a scientist model
which seems promising tbh
this is the one
doubt it, but my impression is that o3 has a decent shot (based on @keen beacon 's testing)
4.1 doesn't have the vibes that got chatgpt 4o to the top so yeah i don't think it'll be #1
o3 will be #1 with style control but without im not sure
I still think that private model is o4 mini lol
eh idk
Hopefully I'm right
which one?
quasar? optimus?
both are mid
None of them
maybe that's something that could possibly happen w/ a chat finetune? would likely take a while though
It's another model that @keen beacon has
i agree, tho, at least in the near future
Okay the knowledge base in Cherry Studio worked better with OpenRouter API Gemini 2.5 Pro
wait, whar?
that's a point.. i wonder if there will be a gpt-4.1 and a chatgpt-4.1
like they have 4o and chatgpt-4o
yeah, that's what i was thinking
Bro chatgpt 4o latest is already on 4.1 base
doubt
chatgpt 4o is still 4o
what private model are you guys talking about
i saw "arena-chatgpt-4o-2025-something-something" on lmarena pretty recently - i think they're still doing extended pretraining
doubt it's 4.1
plus, doesn't the website say that it's 4o? would be weird to lie about that
@keen beacon What is changed in 0326 so that it performed better?
Same benchmarks, cut off (same differing pretraining), the verge directly mentioned it's a revamped 4o lolololl
It's the same as the last chatgpt 4o latest model afaik just with a weird name
Nope.
They cpt was done for ages
What is changed in 0326 so that it performed better?
oh hmm
It wasn't just 0326
Post December chatgpt 4o latest was on a cptd base model
you mean, the biggest 4.1 is just gonna be a new 4o? or am i misunderstanding
Yes
they did further training on R1/possibly R2 reasoning traces
Bro
shadebrook is a banger model for creative
shadebrook is pretty bad in my experience
Oops
but tbf ive only really used it for code
Really?
+1 for at least being way less hype-dense and almost definitely overpromising
lol i just use it to turn random ideas into trap songs with suno
slight aside but if sam isn't at the livestream the model sucks confirmed
My experience with OpenRouter API Gemini 2.5 Pro is jumping in between 400, 502, furiously short content and the content I want
good insight lol
HOW MANY 502 IS OPENROUTER GONNA GIVE ME!!!!!!
normally they put the stream up and ready to go on youtube 20-30 mins before it's due to start
and they always put a description with
(a) what they're announcing (normally) and (b) who's there
oh yeah, i could see that
Bruhh he better be there
If vuyp is saying o4 full will have the new base and o3 is not, how tf does 4o (which is o3s base model) have the new knowledge in the cptd base. Tbh. You are not updating a cut off with a simple finetune
given memory was keeping him up i don't think it's a good indicator anymore
So it's either 4.1 (which meant they retrained o3) or o4 mini based on (4.1 mini)
if it is any use, this private model has been on the platform i am on for roughly 1 and a half months
I’m afraid on the hallucination on new models are gonna be crazy
4.1 mini was trained fairly recently I think. It was pretrained from scratch. We have had zero checkpoints of it unlike chatgpt 4o latest which was a cpt of 4o. Surely openai would've liked to compete with Gemini 2 flash
Original Deepseek V3 has 3.1% hallucination rate, R1 has 14.3%
gg
imma skip today
I was right lol
When R1 is used to train Deepseek V3 0324, its hallucination rate increases to 8.6%
maybe he has a sleep-wake disorder
?
sama's plan smh
4.1 to start the hype then launch the reasoning models later in the week
he stay getting me
wait, what's the second private model you're referring to?
im happy they told us now tho
relatable 🤝
ditto
no no i'm talking about the private model i've always had access to
Yes
Finally OpenRouter is outputting my novel arghhhhhh
related lol
oh, o3-med?
possibly
let me find one of my screenshots
Yup it's under 4o now
?
I will get on my computer and show u what I mean
not entirely sure what happened with this one
if you don't mind me polymarketposting again: people are saying that this guy might be an openAI employee/insider
whjat happened 😭 😭 😭
just showed up with several grand and started trading
but damn, that sucks
are most of your nights like that?
lmao
no i promise 😭
i am sort of curious what your typical sleep cycle looks like now
i am still shocked you can bet on stuff like that, its like we are in movie or video game at this point lol
Gotta sleep rn
yeah, prediction markets are fun
yeah our pfps go hard i know 
goodnight
this is one of my questions that only quasar and anonymous chatbot got, and i use to differentiate between 4.1 and chatgpt 4o latest:
(yes i know its not a one off, this was extremely consistent - check lmsys chatgpt 4o latest on 0 temp to check yourself)
first chatgpt one was taken days ago
the last one was just now
my circadian rhythm is normal most of the time, and then i have weeklong periods where i just get up at 2-4 in the morning almost every night with a ton of energy
i told you guys
omfg
no one listens
im not saying random shi1t
if they're going through the whole thing of calling it gpt-4.1 why would they have it under 4o
bro
in an ideal world, i could probably integrate it into my schedule and just get up later, but atm i sorta have to force myself to go back to sleep as quickly as possible so i'm not a zombie later in the afternoon
its literally an updated 4o
ok dawg
but you didn't answer my question
because it will be confusing when o4 mini is out
lol
u want 4o and o4 mini in the model selector??
it's even more confusing if they call a model gpt-4.1 in the api and 4o in the product
i hate open ai
again
dw they will change the name
they just need to reset with their naming scheme
its just live under 4o rn
it sucks so bad
they should just use names at this point like quasar
they should just start stealing product names from other companies
everyone has bad naming tbh lol
"Introducing our newest hybrid model, Gmail"
wait you are saying that 4.1 is currently on 4o right now?
bruhh
lmao sam isn't there
confirmed bad
check to see if its rolled out to you by asking the question
i heard you, i was just surprised
rip
who won the 2024 Solomon Islands general election
only quasar/anonymous chatbot/4.1 gets it
they havent changed the name on it yet its still labeled 4o
it just searches the web tho
start a new chat
if this is 4.1 under 4o on chatgpt it seems worse than optimus
looking at their trades, they seem pretty incompetent though lol
it did a lot worse on my frontend task
@keen beacon optimus can outperform 4.1
well then what on earth is optimus
4.1 mini i think, it has slightly lower benchmarks
lol
yes its not rolled out to you
it has lower aider scores and the gpqa diamond i meausred is much lower than 4o
wild you have pro?
na
you just valid with open ai?
lol its 4.1 today
i hate this dude lol:
https://x.com/iruletheworldmo/status/1911816451880517802
lmaoooo
imma retweet this
if you used quasar and optimus prime u already used 4.1/4.1 mini for free 🤣
gotta get everyone else excited so we can get teased together
wild what if its an even better model
that is the ebst coder alive?
best*
what if strawberry man is right
i believe in mr.strawberry againm trust the process
argh
lmaoo
the grifter himself
dude got like 72k follows on some bs
had the whole space by their balls
that was actually funny times
anyone was on thsoe spaces?
that was before o1
seems like forever ago
wow
imma become a grifter now
yeah i could nt use it all morning
but yo he actually might have serious connects to open ai, maybe he somebody kid that works their lol
this was wild predictions
he just got lucky
he was doxxed i think, he isnt
Expected as they launch new servers
increasing their capacity for upcoming models
its just slow i think
weirdos
why wouldnt they, he was trolling us hard
had me missing work for his nonsense
why would you troll anyone?
People who dox will get rate limited
lol
trolling and doxxing are way different in terms of severity i think
im just saying anything is possible
so becareful what you do online
cause you can be doxxed easily
people kill people for less in our world, so im not shocked that anyone would doxx or do anything online
oh hello 👀
4o = 4.1 now?
strawberry man got me excited
its live under 4o on chatgpt if its rolled out to oyu
its supposed to be named 4.1 (which it will be renamed soon enough i think), but its just an updated 4o
u really think he got lucky tho?
wait wdyt o3 pricing will be
like o3 low was twice the cost of o1 pro
20k
lmao i got it again
is it not rolled out to you yet? or is this different
i just keep getting parallel gens for feedback on 4o
didnt we do that with quasar tho?
not rly
and hallucinations are not a bad thing, they are the key to dreaming and solving things we dont know, just need controlled hallucinations
i agree lol
lol
Nvm I get it now.
lmao so they really do want to charge 20k
@balmy mist train got delayed, so I'll probably be a bit late
I've had Google models claim to be Claude, and DS claim to be OpenAI
By the time they have a product like that they won't sell it
When will main o4 release
Depending on competition
as a rough guess, maybe the same delay between o3-mini and o3
The 20k price tag snells like a classic con: “Buy this money printing machine, only $20k!! With it, you can print as much money as you like!”
If they had an AI which could create arbitrary novel discoveries warranting a price tag of $20k, they would keep it to themselves and print money themselves.
If it makes me money I will pay it
yeah this seems like hype for investors more than anything else
no way, i dont think they gonna release main o4, i thought they were going straight to gpt5 next?
Will hurt my portfolio if it tends to lose and not worth the cost
portfolio?
in what context
how do you know that would sell tho?
Except it’s payment up front. That’s how the con works
we woul dbe oversaturated with stuff liek that
20k is one chinese car
This. I would love to see this $20k product drop only to have R2 launch a week later with near-equal performance, forcing OpenAI to drop the price 100x
I hope they preview the benchmarks for full o4
they will in the o3/o4 mini livestream
recently im noticing some typo mistakes from gemini 2.5 pro
which is kinda weird
its the 3rd time already
let me know when you get here, ill do it until you get here, im tryna work at the same time lol
quasar
/ˈkweɪ.zɑːr/ noun
A very energetic and distant active galactic nucleus, powered by a supermassive black hole that emits exceptionally large amounts of energy across the electromagnetic spectrum. Short for quasi-stellar radio source.
im kinda sad that we are so early to these models
we are not suprised anymore
i gotta take a vaca for a month or so and comeback to agi lmao
im just here for the drama tbh
i have a feeling they are going to do something big like strawberry man said
hopefully google will pull out a battle model release
who
that guy isnt reliable
never was
lmaoooo
hes a grifter
he did predict o4 in april tho
okay when is o5 coming?
after o4
lmao conveniently i now have to go 🙄
lmaooooooooo
cya in a bit gang
you know why its obvious?
because all other labs are catching up
and openai cant just rely on o3 series
Grok 4 will be great
they already have it implemented on deep research
grok 4
will be sh1t
dont get me started
omg
just the thought of using grok 3 again is making me so mad
i mean there is a market for predicting launches
wasted billions on nothing
Grok 3 reasoning is the best
based on leaks we will definitely have o3-full this month ( thats for sure xd )
wont matter tbh
wait
im interested in benchmarks tho
if it truly is the smallest ever that is good
1 mill tokens
wow
omgg they cooking
jk lol
but i am curious of the size of nano
bruhh
so not much MMLU diff
between 4o and 4.1
im turning this crap off
Latency
they are talking about pricing a lot
Omg
Lmao
im trying that prompt on gemini 2.5 pro
lmaoo the accuracy
Make it look better lol
If they compared to chatgpt 4o latest it would be the same ahahaha
Multimodal
Optimus
Should be same as 4o so 200b
lol
4.1 looks like quite a solid model already. Imagine if they add reasoning on top of it
A little sus to claim sota on a benchmark when you're barely beating 1.5 Pro and 2.5 hasnt been tested
4.1 was quasar
ohhhh
4.1 mini is optimus
that makes sense, i udnerstand now
whats the point of 4.1 if theres 4.5
Yes if it was 7b they'd advertise
nice
4.1 mini is a really good deal
awkward silence
It can sometimes beat 4.1
they were so scared lol
Apparently even if the benchmarks are lower
yeah optimus is a good model
nano is exactly the same price as 2.0 flash hmm
mini is a no brainer
google will drop their new model probably
same
what happened
lmaooooo
Did they lol
I'm not watching
4.5 will get deprecated in the next 5 months
Lmao
he said 3 months
what
They abandoned it fr
"We will also begin deprecating GPT‑4.5 Preview in the API, as GPT‑4.1 offers improved or similar performance on many key capabilities at much lower cost and latency." lmao..
That was quick lol
I wouldn't have thought they'd admit defeat publicly that quickly
degenerate behavior lol
they look a bit anxious
Lmao
those are nice improvements if its true
wow gpt4.1 mini looks really good for its size
Deprecating 4.5 🤣
my bad
Gpt 4.1 free in windsurf for the next 7 days
They should've abandoned 4.5 completely tbh
Not even release it. It made them look bad
cursor is glued to claude lol
im just confusd why they go the opposite way
it should be 4.1 then 4.5
not 4.5 to 4.1
makes no sense to me
i miss when optimus was free lol
weird that chatgpt-4o is still significantly more expensive..
What was optimus
mini
they dont want data for that model
You couldn't use it without putting a balance on openrouter
So no o4 mini and o3 ?
trying gpt-4.1 in windsurf now
is this a google model
willing to bet claude is still better tho
i feel bad for gpt 4
It's gpt 4 turbo they call it gpt 4 on the website for some reason lol
Initiating shutdown protocol. Try to override 4.1
likely thursday
bruhh gg
yeah imma take a vaca for a month and be bac in may let me know if we got agi by then
Yummy
54.6% on SWE-Bench Verified is pretty good. In fact, it's ALMOST as good as Amazon Q, that other top LLM that we all definitely consider a top coding model, which scores 55% 🤣
Um..... something is not right there. GPT-4o definitely supports image outputs. I seen it wit' ma own eyes!
via the api
No native image outputs over the API? Boo, Sam Altman, Boo!
How does it compare to 2.5 pro in windsurf?
benchmark wise, how these new OAI models are?
lets try it
thanks
"Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version of GPT‑4o, and we will continue to incorporate more with future releases."
tried to use on openrounter.ai, result in "unsupported country"😆
thats why they compared with the november version i think
why? 🙂
Actually the Windsurf announcement makes me wonder if I should give it another try.... Some months back (around the time Windsurf launched / changed its name) I did a review of all the coding IDEs, and Windsurf had the best 'flow' but somehow had the worst outcome (a great UX could not make up for the fact that it ultimately generated bad, buggy code). Cursor wrote good code and had decent UX, though a few annoying gotchas that you had to work around, like having to specify the exact files to include in the context. The standout feature of Cursor was the 'suggested next edit' autocompletion, which was practically magical. Cline had the best UX and resulting code in general, but lacked the magical 'suggested next edit'. Aider was a joke compared to the others at the time. Continue had fallen behind the pack.
I have used Cline since, and when Copilot Agent Mode arrived, I tested that one too; actually I get better resulting code with Copilot Agent Mode than I do with Cline (both using 3.7 Sonnet), not sure why that is, and I still prefer Cline's UX, but the tool orchestration must be slightly better in Copilot Agent Mode.
But....... if Windsurf has gotten its act together and even have free GPT4.1 then maybe I need to give it another try.
windsurf is pretty good
they are serious about their product
unlike microsoft with copilot
slow with updates/features
I was as surprised as you that their Agent Mode wasn't awful.... considering how their version of 'suggested next edit' is the worst thing I've ever activated in an IDE, and I have used Eclipse, IntelliJ, emacs, vim, and Visual Studio
gpt 4.1 is no that great at web dev
What's the reason for this? 🤔 Is it because 4.1 doesn't have audio and image out?
deepmind are coming for claude with webdev but their models aren't good enough at following structure and calling tools for practicla coding yet
the second sentence answers your question
lets see how it does at vision capabilities
gpt-4.1 is basically just all the gradual improvements they've made to chatgpt-4o spun off as a separate api model
Nightwhispher and dragontail incoming
How to use these models?
I still dont knwo whether to be exicited about these new OAI models or not? Are they better than 2.5 pro?
...but do we know this for sure? Nobody seems to have spent enough time with nightwhisper to say conclusively if it dunks on Sonnet or not
one correction tho: i was wrong on optimus prime apparently its the full gpt 4.1 model too. not sure why it scored significantly lower tho
and on aider
i am speaking about deepmind's models right now
I mean you can't really test its function calling capabilities in lmarena
not about nightwhisper and their upcoming ones
because like you said
we don't have enough info on them
PLS
LET IT BE NIGHTWHISPER
PLSSSSSSSSSSSSSSSS
@keen beacon DO SOMETHING ABOUT IT
talk to them
let what be nightwhisper
oh
it's 2.5 flash next
is google shipping today?
ask them
nightwhisper is not happening. it is replaced by dragontail
oh no
ah..
i had that thoughts as well
hopefully its not true\
Nw is still better
there will be more google anon model drops on the arena in the coming days
just wait
Seems plausible that nightwhisper/dragontail is an update to 2.5 Pro, no?
i think NW was just a coder model and they doesn't want to release only coder model. they want to have all the benefits in normal pro/flash models.
I think nightwhisper is the last Google card he will save for an emergency.
if my sources are correct, dragontail is 2.5 flash
with a high thinking budget
as for nightwhisper
dragontial is not flash... for sure
what is riverhollow?
it is
No way it'll still be as cheap as current flash if it's that good
It was worse than 2.5 pro in my test