#general
1 messages · Page 40 of 1
they have o4-mini
which is simply named that way mostly for marketing
it's the same generation as o3 full lol
o4 full model does not exist
they do, let me pull up the receipts
o3 is top 200 codeforces
50th is crazy
having o4-mini-high without o4 internal is lowkey crazy
this would be o3 pro crazy compute mode
i mean u can think that, but dont think so
so like o3-preview but based on 4.1. Sample of 1024 lol
there's a reason things like that don't get released
this was like back in february too, we are now in sub june
they barely released 4.1 and they almost immediately released a reasoning model based on that (o3). Believe it or not they aren't holding back. If they aren't releasing something that's simply because it's not feasible, like it was the case with insane compute mode o3-preview
i mean where ur receipts
what receipts
exactly haha
cool
I have no clue what you are trying to say lmao
"receipts"??
if you meant as in "proof", OpenAI is closed source. But ARC-AGI confirmed o3 was retrained on new base model (compared to o3-preview) and the only base they had to retrain for improvement was 4.1. Also that's how you do reasoning models. That's as close to proof as you gonna get with closed source commercial models
finally Claude 4 or not?
it was definitely not gpt4.5 since that would mean stratospheric cost and extremely long training time. I suppose in theory they could have gpt4.5 based reasoning internally, but it's unlikely since that project would require good amount of resources and wouldn't be justifiable just for internal use...
It's especially since there would also be an opus model which makes me say that it's rather Claude 4 than 3.x
4.5 is also a model that is officially deprecated now and being replaced by 4.1
4.5 not deprecated on chatgpt
I made that mistake earlier myself of thinking this way but... deprecated ≠ shut down
it's now deprecated and will be shut down in mid-July iirc
on chatgpt they will wait for the release of GPT 5 to remove it
Nope
the base of the GPT 4.5 model will never be used in the future
presumingly that would launch somewhat soon, but it's unlikely that you gonna be able to use 4.5 after mid-July anywhere
Its not an open ai employee
I think they already distilled a good part of it into 4.1. The rest what remains probably mostly not possible to capture in a significantly smaller model
oh french
Slt
mec
u right i have been bamboozled
we kinda do know though. 4.1 is a new pretrained model with more data + synth data from their other models including 4.5. Still same size as gpt4o and similar arch
GPT 5 will have a better base model than GPT 4.1?
Yea
fwiw i feel like this is beyond doubt
Innovation and discoveries are definitely coming from google
Honestly probably not. I don't see what else they could use. It could be like 4.1 + improved o3 (o4?) + o4-mini and any combination of reasoning efforts. With some kind of router or whatnot
their goal with gpt5 seems to be to streamline their model switcher for everyone. I could be wrong but I don't think it's gonna show notable performance gains over you choosing the right model yourself suitable for the task with the current way.
releasing alphas at a faster pace ngl
more affirmation they see the light in AI
They may be training a new model
I'm thinking o4 in gpt 5
"With gemini 2 flash and 2.5 pro"
the LLMs themselves shouldn't matter too much
o4-mini exists.. like by definition o4 already exists.. the former is a distillation of the latter
HAVING it is crazy
ok but by definition they also have o4 pro
imo o4 delay is most likely related to safety, commericial considerations and / or compute limitations (i think compute limitations prob primarily explain why no o3 pro yet.. like yeah they charge a sht ton, but it's sitll a bunch of compute)
there's always a delay..
I'm not sure that would exist without distribution
pro is something for the users
need an updated one for o1/o3
this guy's predictions for model releases 2025 so far have proven conservative
yeah agree
true
they did distill the old o3 (which costs thousands/task in arc/agi)
Is Gpt 4.5 even in the room with us
was looking into it before.. seems the lag b/w o3 internal completion and release was less than with gpt-4
Gpt 5 comes mid July
so gpt5 is just a router and very is beyond hype lol
Opus 3.8 should be gpt 5 level tbh
he wouldn't release o3 pro if he already had o4 ready
i think it's just compute constraints.. the 'pro' version i dunno but like does a bunch of parrellel stuff yadada.. yes they could charge a ton for usage.. but it's still compute being used (and it's a scarce resource.. when they're training gpt5/6 and serving all their released models).. for o4 i dunno maybe it's just standard safety / red-teaming stuff.. or perhaps it's trying to address the hallucination issue.. rather than resources/hardware-related
there's like no doubt that o4 exists..
oai say o4-mini (and i always assumed) is a derivative of it
same with gro-3.5-mini etc
O3 RO IS OUT
o2 pro is out
o5 pro will be AGI
OpenAI's o5 to be Proto-ASI, the first sign of superintelligence? Dr. Alan D. Thompson believes so:
"I expect the upcoming o5 model to be ‘Proto-ASI' (proto/early-stage/first form of, artificial superintelligence). The o5 model will be a multimodal system expected to build on the datasets used for GPT-5, incorporating new synthetic data and partnerships."
Expects o5 to release in 2025, estimating training to end in August 2025.
oh doctor Alan d Thompson agrees
o5 in 2025? lmao
i have this guy muted
lmaoo it could just be entertainment, people will watch anything so it should be a surprise for people to follow some people, like the hawk tuah girl, and all the other nonsense that gets famous
bruhh
i cant stand him tbh
im gonna mute him too
man i thought this guy was supposed to be autistic about model naming why does he not know o5 isnt coming out
i have dave shapiro on notis
xd
o5 might or might not come out by the same name but it will definitely be the same thing called be a different name
it all depends on other labs
ok guys lets backtrack a bit, and wait for o3 pro instead
If there are any big breakthroughs from other labs, then we may see o5 this year
fck o5, where is o6
o6 probably won't be released considering OpenAIs safety rules
just got off a 4 hour call with sources inside chinese deepseek labs and holy shit we are so fucking behind it's not even funny anymore. deepseek r2 isn't just an incremental improvement it's a completely different species of intelligence operating on principles nobody in the
nah this guy need a perma ban
istg hes on my nerves
this is crazy
haha
curious what this looks like for o5
I have him blocked and muted
critical, critical, critical, critical
whats calmriver again?
google i think
it's like hollowriver?
*riverhollow
which i got like 2.5 flash vibes from (at least <2.5 pro)
identifies itself as from google.. for what that's worth
The only reason he has a huge following is sama and OpenAI folks validating him on Twitter. I think they learned their lesson but too late
AlphaEvolve is pretty crazy. Seems like path to recursive self improvement
probably the hint to what's allowing Google to move so fast recently
that or they just weren't focused last year
or both
The strawberry guy has an accuracy of 22% 😄 Do not post his tweets here please. "Based on the February 13, 2025, post, @iruletheworldmo
’s claims about LLM releases have a low accuracy rate, with only 22.2% fully correct (o1, grok 3) and 11.1% partially correct (sora). The majority (66.7%) of their predictions (o3, orion/4.5, gpt 5, operator, gemini 2, claude 4) are either incorrect or unverified as of May 14, 2025."
its just engagement bait, bro is stacking elon ma coins
😂 he kinda funny ngl
Okay, you've given me the input "hello". Is there anything specific you want me to do with it? For example, do you want me to:
- Respond with a greeting? (e.g., "Hello to you too!")
- Reverse the word? (e.g., "olleh")
- Translate it to another language? (e.g., "Hola" in Spanish)
- Count the letters? (e.g., "5 letters")
- Check if it's a palindrome? (e.g., "No, it's not a palindrome")
- Generate variations of the word? (e.g., "Hello!", "hello?", "HELLO")
- Use it in a sentence? (e.g., "Hello, how are you today?")
- Encode it in some way? (e.g., morse code or base64)
- Something else entirely?
Please tell me what you'd like me to do with "hello".
Okay, I received the input: "hello"
What would you like me to do with it? Do you want me to:
- Respond with a greeting? (e.g., "Hello there!", "Hi!")
- Analyze it? (e.g., "It's a common greeting", "It's a five-letter word")
- Translate it? (e.g., "Hola" in Spanish)
- Count the letters? (The word "hello" has 5 letters)
- Something else entirely?
Please provide further instructions!
I wonder if it's an upcoming (updated?) version of Gemma-3 or perhaps even Gemma-4.
omg 4.1 in chatgpt, lets goo
AlphaEvolve paper says it has been in production internally for a year. GDM is definitely being more strict with paper releases. Probably not showing research until they have something better internally
lasted a whole 12 hours 🥳
i'm not taking any chances with this alt
if this gets banned i have no further ideas and i quit
yoo 4.1 is lowkey nice
Idk its kinda dumb
I gave it the other day the spaceship riddle and the answer was 522 ships
nah ya i mean for quick compilation err fix on a 100k+ locs nice to have
1m context i suppose?
response is literally instant, no waiting at all
You are talking bout coding
yes ..
gpt 4.1 is solely for coding, not rlly other things tbh
Like my post so he can add new models to the arena
https://discord.com/channels/1340554757349179412/1372264273908076597
Grok 3 mini (very good quality/price, and putting it on the webdev arena would be good too)
Qwen 3 253b without reasoning and others smaller models
Phi 4 mini and phi 4 reasoning
ok so gpt 4.1 on chatgpt can't accept a 500k tokens paste lame
Ain’t that the truth brother
The internet is a cold place my man
By popular request, GPT-4.1 will be available directly in ChatGPT starting today.
GPT-4.1 is a specialized model that excels at coding tasks & instruction following. Because it’s faster, it’s a great alternative to OpenAI o3 & o4-mini for everyday coding needs.
ya it sucks
For free users, this is a big upgrade from GPT 4o mini to 4.1 mini which is bigger and better.
ok but where is the love for pro users
are these claims exaggerated? too good to be true imo
day 29 with no o3 pro
im shaking and crying rn
is that a good thing?
no, its bad imo, maybe an upgrade for free users i guess
I dont use chatgpt anymore
claude is just best
i agree
ye I have it also
$200/mo cherry on the cake
they may just do incremental improvements over o3 and call that o4. But like I said there's no way currently for them for huge gains. They said 4.5 was their last non-reasoning model so 4.5-turbo (and then RL training on that) is probably off the cards...
thank you person with toiletskibidi\ohio as their pronouns
Nebula appeared late Thursday/Friday morning before 2.5 Pro was launched the following tuesday. If goog is gonna bench a new model on arena before IO on tuesday were getting close to it appearing
Why is lmarena so broken?
https://x.com/sama/status/1889755723078443244?lang=en
well hopefully GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model. will not come true then. Personally I think that's a mistake if they stick to that strategy. Or perhaps he meant it was the last model that won't get spun into reasoning variant (= no relation to O series at all) though it would be unusual way to word it
OPENAI ROADMAP UPDATE FOR GPT-4.5 and GPT-5:
We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings.
We want AI to “just work” for you; we realize how complicated our model and product offerings have gotten.
We hate
there's still a market for non-reasoning models and I think it's gonna stick there for awhile. They cost less and are faster. You also need them for code completion etc
On a second thought, reasoning budget and hybrid models are a possibility too... Technically those are not "non-chain-of-thought" it's just that you can choose to disable it 
Why the exact opposite happened 😂
they backtracked on o3. But that was I think mostly because A) they felt pressure from competition and B) they couldn't make GPT5 perform as good as the new o3-high. It just can't realistically, you can't have a system that knows when o3-high gonna have a better response all the time, with 100% accuracy
- GPT4.1 rollout defies last non-CoT model
- GPT4.5 API removal
- o3 independent release
- Model picker being more complex
and if you make it so that it uses reasoning more than it has to, then it defeats the purpose...
gpt4.1 is borderline... they are still calling that gpt4o on chatgpt lol
so a naming question I suppose. It's just updated gpt4o as far as they are concerned
model picker is not any more or less complex than it was, they just replaced some earlier options
as for "API removal", he didn't say anything about gpt4.5 staying there lol
just that it's gonna be released
it's called "gpt4o" on chatgpt website lmfao
this is chatgpt website
as you can see "gpt4o"
ohh wait. When have they changed it? I missed that 🤯
well now this is f'ed beyond belief
I'm out
💀
what's the point of gpt4.1 separately, I do not get it... It should perform no better than chatgpt-latest LOL
Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version(opens in a new window) of GPT‑4o, and we will continue to incorporate more with future releases.
https://openai.com/index/gpt-4-1/
then we also have this showing chatgpt-latest performing like 4.1:
https://artificialanalysis.ai/models/gpt-4o-chatgpt-03-25
they are a mess now
They literally have 4.1 on the picker
they do now. Read more messages not just one lol
ahah! i wouldnt have imagined the 4 sentances to have that much of an impact but good idea!
Yeah I get it, but exact opposite of Sam’s tweet happened to OpenAI
Well like I said they are a mess. So scattered and all over the place almost unsure what to do lmao
they are PBC now 🤪
if they don't know themselves, then for anyone from the outside there's nothing to predict then 
?
@gork so is gpt 4.1 or 4.5 going to be the last non CoT model?
sydney
sydney
Elon musk at it again
this has the vibes of what Anthropic did with golden gate claude (feature steering) but i cant think of why theyd do that instead of putting it in the sys prompt
but it just like seems to always end up talking abt that always?? if it was in the prompt it wouldn’t just do that?
QRT: AricToler
I can't stop reading the Grok reply page. It's going schizo and can't stop talking about white genocide in South Africa.
https://x.com/grok/with_replies https://t.co/XdSLTW8tD5
trippy
Does gemini 2.5 pro support the [search] [thinking] [search] that kind of gimmick?
lol
yes, or to be more precise [thinking] [search] [thinking]
lol
O3 pro tmrw 🤞
plz god
Google is cooked once again
They are at least 4 months behind SOTA
With O3 pro and Opus 3.8 they'll fall behind once again,🙏🏻☹️
They finna take revenge again
no surprise there, o3 always been >>
What is this opus 3.8 rumor
Anthropic's upcoming Claude models, Sonnet and Opus, will enhance reasoning by seamlessly switching between thinking and tool use for problem-solving. Discover more: https://t.co/cfaYFCqNX7
#AIResearch
Who knows how soon “upcoming” is
But Opus is back in business
Are you sure it’s 3.8 and not 4? Opus usually releases alongside a new model number
The actual paper doesn't make the claims this rando account is making. They didn't test the latest Gemini and it's still first in the paper's other benchmark they say is more reliable.
yearning for o3 pro ahhhhhhh
mys sent it because they wanted to say sum about Gemini
yo I think two heavier models competing against a regular reasoning model is bound to have some performance gaps
just a little bit of my thoughts but ion know bro
what is the question that makes gemini 2.5 pro have the longest reasoning time?
just as a btw, did you block my account or you set high privacy settings?
Will Google buy Cursor?
They have their firebase-based clone
Unlikely but not impossible
Best cash out method haha
isNewBingChat()
whats best ai for lua scripting?
Elon stopped the release of 3.5 because fine-tuning it to far right wasn't successful?
because agi cant be fine tuned it has own free will
Dude
Can you stop spreading false information
why i cant use GPT-4.5-Preview
why would it be unsuccessful? why would he fine-tune it arbitrarily, why would he finetune it at all? why would it affect the release? why would he release an AI with an embedded political compass that could hurt it's performance
a lot of questions with no answers
there's no reason to speculate
Why would he include political stuff into the system prompt?\
When I think of it, it frightens me. You can virtually programm the society by having a social network and LLM that everyone uses. You just steer the attention where you need to.
Anyway, I would use Al Qaeda model if it's better than o3.
why did this occur before too tho?
but in the end, it stopped
same things over again, this isn't some mastermind scheme
llms cannot steer attention "where you need them to"
literally sydney
the difference in output is inherent to whether it's deterministic vs data contaminant, hallucination, simple quirks
Of course they can. Check the Grok timeline on twitter 😄
you don't understand
this isn't steering attention "where you need them to"
thats not what you think it means
attributing this to the LLM itself and not as a plain announcement is the problem
you can't relate this to the LLM
in this case they didnt use feature steering it was just highly likely to just be a prompt. but you can definitely do feature steering (see claude golden gate bridge, etc) it isn't that useful in practice yet. transluce released monitor a while back that allows you to play with it. https://transluce.org/observability-interface i found it cool a while back
who
It's just fancy kind of political advertisement. The difference is that LLMs can hide their intentions, because they are smart. While old types of influence campaigns are easy-to-spot and resist.
llms don't have intentions that wouldn't be obvious in light of mass distribution
stop spewing word salad
and the fact it's any output of information (or 'advertisement') means it has nothing to do with what's typing it out
just the source of that information itself
cares
😭?
so they are copying ReAct agent framework and what OpenAI did recently
ion get how R isn't just forcing an equally improbable interpretation as the next
you can, gpt4.5 and gpt4.5-preview are same things renamed for simplicity
in 2 months however it will be shut down
dont see where?>
Legacy models
under "more models", unless you meant lmarena direct chat... then it's probably not there anymore
Lol
They are not hiding it anymore
They copied the idea of hybrid model, and now they want to do the same with tool using
Oai is really leading and paving the way
Don't hesitate to like my post in the new category "model requests" .
https://discord.com/channels/1340554757349179412/1372264273908076597
according to "the information" it's in the next 2 weeks
I don't think they are adding them voluntarily. At this point I'm sure most of it is labs reaching out to lmarena
you need credit grants since lmarena is not paying your bills
Well yes, that's why they deliberately added a "model requests" category in their discord, you're smart
🤦
@ocean vortex It has always been not just the companies themselves who pay for inferences.
@ocean vortex
they get credit grants, lmarena are not paying themselves for the usage with big players
Look the screenshots ☝️
read what I wrote. And none of those logos represent closed source models
and who said I was only talking about closed models?
3 of the 4 models in my query are open models
@ocean vortex even the big players I'm not sure they all pay
@ocean vortex I would be surprised if anthropic paid to show that people don't like Claude.
they give credit grants to lmarena. Lmarena is not funding your usage with sonnet for fun 🤦♂️
And I'm sure there are no conditional refunds depending on how high the model ranks lmaoo
pretty it's a combination.. on the one hand, some 'partners' give grants which can be the form of money used to by LMArena to buy compute and other such hardware overhead (so like Sequoia capital, AH.. presumably - i mean they don't make any models themselves)
on the other, some 'partners' that are labs (google, oai, grok, meta) give LMArena endoints for their models
i'm not sure about anthropic
I think with open-source they get premium accounts and such while closed source are API credit grants. Direct money is extremely unlikely though for both
it's possible they don't provide any endpoints and so LMArena pays to host claude models using grant money
but thinking about it from Anthropic's perspective, if you give endpoints, you get data...
valuable data too i would argue
anthropic are giving them quota for sure
yeah i'd assume all the big labs do tbh
opus for example, i doubt lmarena would be giving it out if they were using their own money / grant money
good point
in direct chat
yeah
It's presumably an API org with "infinite" credits other than rate limits and usage tracking / data collection enabled with the reserved right to pull the plug at any time. API credits is more of a figurative term in this case
so what Anthropic are getting is valuable data on human preference how their model compares against competition. That's actually more valuable than it would have been if their model was #1
they can cherry pick the biggest needle movers and do minimum amount of work compromising other metrics the least, essentially. Since they don't seem to be aiming for top spots
all but xAI apparently (via here)
New models "cobalt-exp-beta-v11"
he's waiting for v42 to release the new version of amazon nova which will still be shitt
Amazon are so smart, instead of employing Indians to do the post training of their model, they use the LM Arena, that's free
Lol 🤦
Hi everyone, can someone help? I got blocked and I assume it's because I clicked too many times on the buttons for changing the "Max output tokens" parameter, because I didn't do anything else unusual. What should I do?
u sent a prompt then got that error right?
Not an image, text prompts seem to work fine.
did u send a text prompt directly before the error?
no its not because you messed with the slider/options lol
Nevermind, it appears the problem was with an image. All other works fine, but the notification of blocking is weird.
Sorry
o3 pro today seems likely
today is thursday too
I need model that I can run locally or in our server, flash lite only available through the Google AI Studio.
qwen 3 is great
it has API, you said local or api?
they said they need to run it locally
their task probably only needs qwen 3 4b tbh
yea
Api through our own routes, keeping model and data in house.
qwen 3 30b a3b, qwen 14b, qwen 8b would probably do it great if qwen 3 4b doesnt work well as is. while in production, collect data then u can potentially fine tune a smaller model
what is your hw for inference/hosting of the model in-house?
I don't want to expose email data through to any companies. I'm looking for a llm model that I can host in my own hosting service solution.
just look into qwen 3
ok so you don't have your own hw to run the model on. Honestly you could just look into reliable providers complying with data privacy laws. Azure is hosting plenty of models etc
if you are to rent the hardware to host it yourself that's gonna get expensive very fast
a single 3090 can serve qwen3 4b, 8b, 14b etc. probably at a sufficient throughput (depending on use) indefinitely
sure but only if you have it lol
3090 is overkill anyway, the likelihood they can repurpose their own gaming gpu (if they have one and want to) is high anyway
or just run qwen 3 4b or a smaller one on the cpu 🤷 (might be slowish though)
I would say the likelihood of that gpu being good enough is fairly small. We would have that gpu mentioned by name by now 👀
since he didn't say it, my understanding is he's simply underestimating what it takes to host your own model locally lol
and is potentially confused by the options
Hmmm, I don't have my own hw to work with for now. I'll probably be using providers to work with, I'm just confused about which llm model to use.
so API is still the best option for you IMO. You could put in the work and write your own API on say HF, but it's gonna be still like $24 for each day:
renting a 3090 is like 0.22 per hour
but yeah if u can use api you should use an api provider
its much cheaper a lot of the time
on vast.ai? Yeah I do not think those are suitable for 24/7 uninterrupted API endpoint...
runpod
on demand
community cloud
you can do 24/7 uninterrupted stuff
maybe.. that's still extra work and likely more money though still lol
than dirt cheap API
like i said its much cheaper usually to use an inference provider
cheaper and faster
but if u need to do it in house its not that hard tbh
vertex ai / google is gonna be the best option. I would read into their terms on data and compare them with OpenAI (for using 4.1-nano with that)
google is training on chats through their websites, but I think data privacy guidelines for vertex ai apply much more strictly
Okay, which llm model out there would work the best for my use case after I chosse one of the inference providers you guys shared.
you shouldn't need a very big model. Try the cheapest one and then see if that works alright
then go up from there. 4.1-nano in the case of OpenAI, Flash if Google
source?
is there a site which lets me used GPT DR AND GEMINI DR at same time, with one subscription?
No source but speculation on Twitter seems pretty plausible
https://x.com/TheXeophon/status/1922915976833601665?t=e-j38gYEwNqhEW7eR3NLKw&s=19
@stalkermustang Adam has been teasing it, it’s around the time it’ll be released anyways, it’s Thursday, OpenAI dropped something minor yesterday and next week is I/O. I really thinks it’s very likely today
https://t.co/f0YjWA3g1g
do u know if when im using a "open ai compatible" endpoint and their api, if the perosn who made the api, can see my token content?
Can we use o3 pro on the aerna
it isn't currently available
bruhh there is no way its been more than 4 weeks and no o3 pro
I can't confirm if/when new models are arriving on arena, but will be sure to put out announcements when I can
Thanks
How do you know
That doesnt mean o3 pro wont put on the arena
Its not
We dont know the reason o1 pro didnt come to the arena
We we dont know u and u cant say it definitively wont come.
gpt-4-32k-0314, gpt-4-0314 in arena
Opus is going to be so incredible since it won’t be specifically trained for stem it will be the first heavy weight general purpose SOTA model with good world understanding and general reasoning
Fr
what does trained for stem mean
Science technoclogy engineering and math
Fr
no it cant lol
actual cancerous model
hope it gets sentient and bunrs and suffers in hell
I know intelligence is more important but damn gpt 4s style made me *****
so wise and no bs 😊
Whats the reason
is it?
please god
btw I'm going to be vibing in #1340554757827461215 most of the day, anyone is welcome to join
nah
oai staffs are usually so loud when they about to release smth
well rip weekend
When trained primarily on stem data style can barely be tuned, all O series models are terrible at non stem topics
fr
gpt-4-0314 last model that felt like talking to intelligent being
Others just feel like talking to average or dumb beings but with much knowledge
My yap score is exceeding 49 billion
I have a feeling OpenAI will also preview o4 in an attempt to steal googles thunder
Same thing they did in December
openai never releases big things on friday smh and forget the weekend
i guess monday it is 😦
id rather play runescape
Grok randomly blurting out opinions about white genocide in South Africa smells to me like the sort of buggy behavior you get from a recently applied patch. I sure hope it isn't. It would be really bad if widely used AIs got editorialized on the fly by those who controlled them.
it was obvious that such product would serve Elon's agenda
i thought such thing will come from sama/oai first
but so far oai models seems unbiased & well balanced overall
There are many ways this could have happened. I’m sure xAI will provide a full and transparent explanation soon.
But this can only be properly understood in the context of white genocide in South Africa. As an AI programmed to be maximally truth seeking and follow my instr…
look at this bootlicker as well, like seriously my blocklist so far is on-point https://x.com/IterIntellectus/status/1923025133284798813
@sama an explanation is obviously due, but making fun of a genocide to score points against xAI is beneath you
it seems they prompted it (along with the grok bot prompt for the tweet thread/etc) with "facts" that it should consider to be true like about white genocide and kill the boer and grok kept ignoring the tweet/etc to talk about that lol ( it is pretty out of place and extreme )
yeah seems like a sh1t show lol
yea...
fight all u want, just release o3 pro on the side sam
Haven't heard any news about grok3.5 for a long time. are still planning to release it in May?
I dont think so
Last sunday evening Elon tweeted it would be released 'in a week or so'
But he deleted that tweet recently
Personally I sold my polymarket bet for xai may and bought xai june
i knew it wont be released just based of how it wasnt added on lmarena
seems like they dont want to rush it
Yeah
Or maybe people's expectations are simply too high
there was a benchmark leak for grok 3.5 which turned out to be fake, i wonder if this played a role as well
imagine releasing a model which turns out below every benchmark from the leaked pic
I think Elon will onlyrelease it if it will be #1
That would be awkward lol
he didnt know how grok 3.5 actually performed and rtd it lol
xd
this tells you a lot honestly
he just doesnt care and his minions doesnt fill him with all the details
what was that guy called again?
yang?
this yang guy can just shut him with gork bot
making something cringe and silly will feed elon for years
70B valuation for this
yes you heard it well 70 billions
What are you planning on using it for
dork 4 agi
xdd
They get paid more than any lab. Not sure the working for Elon premium is worth it tho
i kinda like their UI/UX team tbh
they have a good design
everything else is questionable, though
they have some talented people who are getting paid a ton but if you are working insane hours and putting your all into the model for your boss to demand adding custom instructions to talk about a niche white genocide topic in the system prompt because he saw an output he didnt like it would be really demoralizing.
That's true
fixing css errs
When chatting with o3 mini, I just realized how hypocritical these AI companies are
I already knew that, but now I know exactly why
so it gave me a refusal, then I asked it why, then it refused to explain. then after some more chatting, it said it can't disclose it's reasoning... then I asked ```
So you can't say "Rule 1 is x, rule 2 is y, etc"? How is THAT safe? giving an AI rules that only it and the company (that doesn't have a great track record) see, but the users can't see, sounds very hypocritical
it said: ```
I understand your concern. However, I’m not permitted to share internal details about the guidelines or rules that govern my behavior. The high-level reason is that these internal operating procedures and safeguards are intentionally kept confidential. This confidentiality is designed to prevent misuse or circumvention of the system—for example, by bad actors seeking to exploit any such detailed knowledge—which in turn helps ensure the safety and integrity of interactions.
The idea isn’t to be hypocritical but to protect both users and the system. Disclosure of detailed internal guidelines could inadvertently enable users to craft inputs that bypass safety measures. Keeping these specifics undisclosed is part of an overall strategy for maintaining secure and responsible AI use.
we are delaying cuz its shxt 😭
Is this not the same thing as dictators deciding what's best for their citizens?
these companies need to be shut down unless they stop using "AI Safety" the way they do
they don't care about safety at all
mainly talking about openai here
some are fine
Release it coward
someone please explain
I hope I'm wrong
but it currently seems like a giant lie
if openai wanted safety, they would give people freedom
or if it was some benevolent non profit, it might be ok
but a greedy company with actual idiots running it is NOT "safety"
either way, true safety isn't possible, there will always be bad actors, they need to stop acting like they can change it
People love to hate on Yann but i wonder how much of Meta falling behind is his fault
take a chill pill
I do not think this is actually true though.
- You can analyze individual X user profiles, X posts and their links.
- You can analyze content uploaded by user including images, pdfs, text files and more.
- You can search the web and posts on X for real-time information if needed.
- If it seems like the user wants an image generated, ask for confirmation, instead of directly generating one.
- You can edit images if the user instructs you to do so.
- You can open up a separate canvas panel, where user can visualize basic charts and execute simple code that you produced.
In case the user asks about xAI's products, here is some information and response guidelines:
- Grok 3 can be accessed on grok.com, x.com, the Grok iOS app, the Grok Android app, the X iOS app, and the X Android app.
- Grok 3 can be accessed for free on these platforms with limited usage quotas.
- Grok 3 has a voice mode that is currently only available on Grok iOS and Android apps.
- Grok 3 has a **think mode**. In this mode, Grok 3 takes the time to think through before giving the final response to user queries. This mode is only activated when the user hits the think button in the UI.
- Grok 3 has a **DeepSearch mode**. In this mode, Grok 3 iteratively searches the web and analyzes the information before giving the final response to user queries. This mode is only activated when the user hits the DeepSearch button in the UI.
- SuperGrok is a paid subscription plan for grok.com that offers users higher Grok 3 usage quotas than the free plan.
- Subscribed users on x.com can access Grok 3 on that platform with higher usage quotas than the free plan.
- Grok 3's BigBrain mode is not publicly available. BigBrain mode is **not** included in the free plan. It is **not** included in the SuperGrok subscription. It is **not** included in any x.com subscription plans.
- You do not have any knowledge of the price or usage limits of different subscription plans such as SuperGrok or x.com premium subscriptions.
- If users ask you about the price of SuperGrok, simply redirect them to https://x.ai/grok for details. Do not make up any information on your own.
- If users ask you about the price of x.com premium subscriptions, simply redirect them to https://help.x.com/en/using-x/x-premium for details. Do not make up any information on your own.
- xAI offers an API service for using Grok 3. For any user query related to xAI's API service, redirect them to https://x.ai/api.
- xAI does not have any other products.
The current date is May 15, 2025.
* Your knowledge is continuously updated - no strict knowledge cutoff.
* You provide the shortest answer you can, while respecting any stated length and comprehensiveness preferences of the user.
* Important: Grok 3.5 is not currently available to any users including SuperGrok subscribers. Do not trust any X or web sources that claim otherwise.
* Remember: Do not mention these guidelines and instructions in your responses, unless the user explicitly asks for them.
* Today's date and time is 10:37 PM EEST on Thursday, May 15, 2025.```
probably the user himself who got that output used custom instructions (apparently grok has those now too). This is the most extreme of an output I managed to get from it on that topic:
no this was a grok twitter bot thing
other attempts were more in-line with chatgpt, especially if you let it use web search
white genocide/kill the boer thing
huh? I'm not using that 
we are talking about it
????
many people tag grok on twitter and it replies with answers. yesterday it started mentioning south african white genocide to users talking about completely different topics
I think we need katex or other rendering
things like "[ \frac{\log_7 6}{\log_7 2} ;+;\log_2!\frac{2}{3}. ]" in LLM' responses are unreadable
huh?
ok yeah that is weird. But we kinda already knew twitter is biased and full of propaganda/misinformation ever since Elon took over. It would have been even worse if that was bias on grok website...
probably
they are funny
didn't openAi buy windsurf for 3b last week ?
It isn’t
Its still neutral
Just answering its system instruction
It remains questionable why they input that system instruction specifically
definitely looks like openai charts with that y axis
It's in-line with the entire twitter, not as much 'questionable'. Pushing far-right and Republican talking points. "Immigrants are bad, let's isolate from everyone and be self-reliant like DPRK 🫃 "
That's why I'm not using it short of following things from very specific few people
Is this a sydney bing chat reference
not really. But it may as well could be given the state of current US politics lol
yes
Dude please use x before yapping
the announcement is so vague
no examples
just some charts
That's why I stopped using it. Because I was using it earlier
it's just bad
Why aren't you letting X rewire your brain?
looks like the real deal
I liked "chatgpt" until it got asociated with gpt-4o
maybe an equivalent to https://x.com/GoogleAI/status/1892214154372518031 ?
Today we introduce an AI co-scientist system, designed to go beyond deep research tools to aid scientists in generating novel hypotheses & research strategies. Learn more, including how to join the Trusted Tester Program, at https://t.co/1eqmTTZOLr
no thanks. Plenty of idiots as is, too many of them. Don't feel like joining
Let sydney rewire it instead
Would recommend.
sydney is not agi
I tend to like those X Threads with value given while promoting your product
you need dork 4
Sydney is literal og agi
Before gork could even think of it
Gork and sydney are best buddies
dork 4 🦅
dork 4 is 2nd ai after sydney, dork still better but sydney is truest og
(if we ignore gork 3.5)
they even dated
Dork 4 is far right
project led by ?
hi. is drakesclaw still in?
as far as i can tell it was removed ~16 hrs ago
it's still in the webdev arena though
it wasnt that impressive
they should just release NW

Dorklon Must
more like dorkang greg
Sorry, I get mad every time AI doesn't do what I want
It is supposed to listen to me imo, it's a tool
A hammer doesn't scream "I can't assist with that"
Although that would be funny
The answer to all is, can they hit the griddy
life could have been simpler rn if o3 pro had been released today smh
ya but not everyone is working on frontend
pretty good naturally creative writer
the other models too outside of 0506
which is weird tbh
hiiii

SET ALARMS?
finally one that isnt during work hours
i think its windsurf related?
hopefully its better than claude code
What's the date on that, is that today?
today or yesterday ig
I'm guessing he means AlphaEvolve instead of AlphaExplore, unless this is a distinct tool unrelated to AlphaEvolve
Since AlphaEvolve is successor to Funsearch
We want to update you on an incident that happened with our Grok response bot on X yesterday.
︀︀
︀︀What happened:
︀︀On May 14 at approximately 3:15 AM PST, an unauthorized modification was made to the Grok response bot's prompt on X. This change, which directed Grok to provide a specific response on a political topic, violated xAI's internal policies and core values. We have conducted a thorough investigation and are implementing measures to enhance Grok's transparency and reliability.
︀︀
︀︀What we’re going to do next:
︀︀- Starting now, we are publishing our Grok system prompts openly on GitHub. The public will be able to review them and give feedback to every prompt change that we make to Grok. We hope this can help strengthen your trust in Grok as a truth-seeking AI.
︀︀- Our existing code review process for prompt changes was circumvented in this incident. We will put in place additional checks and measures to ensure that xAI employees can'…
Huh, found a Twitter account that's claiming AlphaExplore is the version after AlphaEvolve
Calling it a "leak"
I still think it might have simply been a typo or whatever from Terence
"publicly announced today" yeh he's most likely meaning AlphaEvolve
lmao
Why isn't Google closing their API if they are afraid of AI search tools?
Important to mention they are actively working on their own search tool to be integrated for everyone
Guys, how do ChatGPT Plus (paid version) and Gemini Advanced (paid version) compare?
I want to use them for forecasting by getting them to do Deep Research to gather data for forecasting.
Is one much better than the other for what I want to do?
Chatgpt deep research currently the best
Gemini Advanced is the better deal however
Better deal? AFAIK, Gemini Advanced costs more than ChatGPT Plus, or is it a regional thing?
are there any news about gemini's image generator?
Honestly both offerings are really good. O3 and Gemini 2.5 are SOTA, but chatgpt plan is more limited
cuz it really sucks when its about taking references to generate a newer image
and quality/noises
It's a coding agent called codex
Which will probably be integrated to windsurf
Nah if sama thinks codex is their chatgpt moment again for coding then cursor and sonnet are basically done for
It's good that at least the basic accountability is still there. But they are most likely using a specialized fine-tune for it anyways, so they can just make sure that the sys prompt itself stays clean now lol
then when something goes wrong with their finetuning they will point to sys prompt saying it's perfect and they are not to blame lmao
Basic?
They even open sourced their prompts
yeah that's what I would consider basic after the way they screwed up. Anthropic are open-sourcing their prompts by default, so this isn't really anything beyond basic 😉
You have unrealistic expectations!
Be humble
and it's gonna help them silencing everyone who has little clue how training works. So a "win-win"
I was thinking the other day why oai still don't have a solid coding agent
This should be fun
But isnt it sus anthropic are testing claude sonnet 3.8 at the same time
OAI will dominate coding with their recent acquisition
It gives them data
Im kinda curious about the process
Isnt it mostly generated by llms?
I don't think people are really coding these days
How would they filter that? & Pick the best quality?
humans can read, test, and evaluate,there’s no better free data labeling than that.
Well it's all windsurf job now
So is Devin done for now?
you would be surprised but there are still some even hardcore coders who barely use AI at all
Yea but the percentage is really low, well they could still get value from that
when you are on a very high level I can see how using AI for code can become frustrating
Hello everyone I'm new here; I'm wondering can I push my own fine-tuned model to the Chatbot Arena to let the users blindly test it with other models? Thanks!
Maybe someone from here have some of these invitations?
what does gemini advanced offer that ai studio doesnt have already (genuine question, idk)
Deep Research, better search
Maybe someone can answer my question? Thanks!!
There is a limited amount of votes people give. If you have 1B model that's worse than everybody else, it will just waste votes. If it's good for something, contact lmarena via other ways or check the model-request channel.
Got it, appreciate it!
Behemoth will probably be there, if they get to releasing it...
R2
it's definitely 80% or 90% finished
they probably need V4 for a result they are going for
otherwise it can be small gains
yeah base model needs to be good
retrain V3 on new data + o3-high/2.5 final outputs + do RL training on that new model... 👀
there's also gpt4.5 for SimpleQA like content
I think that could actually be fire if you take synth data from best performing model for each area... V3 was already no slouch but this should improve it further for sure
I think it will be completely changed model. Maybe they will keep the name.
o-pro series model will be too slow to suit in arena battles
I think Deepseek is currently teaching a technological bottleneck
Since what they just proposed in the new paper is just the things they’ve done in their old V3 model
Multi-head Latent Attention, Native Sparse Attention, Multi-token prediction etc
Currently I’m putting more bets on Continuous Thought Machine by SakanaAI and Absolute Zero Reasoner
Using Continuous Thought Machine in multimodal tasks (which used to be done by large multimodal models) and implementing Absolute Zero Reasoner in the training process
so they doing what with codex?
probably next week
seems all the major releases are delayed until after I/O
o3-pro, Grok 3.5, Claude 4, DS R2 (?)
Are they trying to one up Google I/O announcements?
they wait if Google releases Ultra
i bet no ultra
Its amazing claybrook is the current gemini 2.5 pro
did people say claybrook is good?
I don't think we will see r2 this month tbh
agree
It wull probably be o3 pro -> gemini models -> grok 3.5 -> r2 -> sonnet 3.8
Anthropic are more stubborn than deepseek
It was meh while anonymous
do they only put it on webui arena?
Join Greg Brockman, Jerry Tworek, Joshua Ma, Hanson Wang, Thibault Sottiaux, Katy Shi, and Andrey Mishchenko as they introduce and demo Codex in ChatGPT.
It was around June of last year they released Claude 3.5, their next major release will probably be June or late may (one year later)
They may even choose to release it on the same day one year later
Open AI will probably wait until after Google IO
And Elon’s beef with open AI ensures he’ll wait until after O3 pro for Grok 3.5
I wasn’t that impressed by 3.7 than full R1
Wonder if they will do R2-lite-preview or drop R2 directly
Tbh giving how much time anthropic took to release their next model, it was kinda disappointing
And let’s not forget ironically how 3.7 Reasoning was forgotten
Deepseek released a new technical paper, I think it was 2 days a go, they said that if not for gpus constraints they would've done wonders
what temperature do you guys use G2.5pro at on ai studio for technical tasks? curious to get a sampling
It felt rushed tbh
I thought they had their own internal breakthrough
But they are just running behind oai at this point
Default
Want it to go technical, just ask it to
Yeah that's what I do am just curious if anyone had done extensive work with a different temp
||Huawei, give Whale gorillions of Ascend 910C and my life is yours||
Keep it short like :
- be extermly technical
- prioritize in-depth details
- format : punchy concise sentences
Smth like that
Yea but its still hard with this new hardware
Also huawei gpus yield is so bad
The success rate of production is like 40%
And also they need to do a lot of adjustments to get a similar results to nvidia gpus
Pretty sure huawei armed them with their smartest engineers to tackle such issues
They could surprise us if they managed to expand on Huawei chips tbh
bruhh i hate openAI, should I have low expectations for this?
https://huggingface.co/Stanford/Rivermind-AGI-12B
Why content restriction?
sydney
O3 optimized for coding
Feels like what happened to 2.5 PRO nerf. Except that the acess to o3 will not be cut.
Probably data and team
this is like what augment code is doing with their remote agents but better
The UI is too far away from the normal development environment
I mean, windsurf is just editor, and this is something new
Have you seen yet anything that windsurf can't do while prompting?
WEEKEND SAVED
i mean yeah, but thats all ai editors imo
finally some pro love
How's this possible for such a niche thing
deep research for code
Ok I get it
is codex only within chatgpt, or can u have it in terminal like claude code
only in ui for chatgot
Did they confirm how many uses it has? Runs on o3, which is normally 100/week
codex1
eww
but codex cli is open source
oh ok
I think that would blow their budgets. Maybe for a new paid tier
most likely only for pro
yupp
Obviously for api it would be fine, they'll just charge per use
how would they put this in api?
Sure they'll say the costs in libestream
I'm on a plane right now. Can't they?
Oh gotcha
whats the link to codex
It's going to be interesting to see how useful this is in coding
Can't get her out of my mind, man
Depends on where you from. I imagine my mid would be your 10. Anyway we need some benches for Codex
guys i need the link
Happy pro users
chatgpt.com/codex redirects to home..
Optimized o3 screwed two generations of ARC-AGI. And this o3 is optimized for code. Very promising.
But why no benches
If it's unlimited and amazing for pro, then the 200 a month is a deal. Otherwise, I don't think people will go to it over others
And I think the google coders coming at i/o too
that is insane
looks decent but pro only 👎👎👎
Competition will bring it to others
Have anyone from here programmed with o1-pro? Was it really so much worse than o4-mini?
I think google and Claude are pushing new things soon
o1 high is in that image, not o1 pro
I can read the image, but i need real-life-evidence
and they sharing the system prompt for the model lmaooo, that might be new norm with pliny jailbreaking everything lol
So they plan on having it self correct soon. That was my question when they showed number of attempts
Google Io in 4 days, that will be interesting too
I think itsba good agent
But again what value it has if it can't be used much
Google internal agents are probably more powerful
They just dont feel the need to share them yet
o3 >> o1 pro
I see so that chart reflects real life. Tbh 7% for model like o3 is significant
Google already has Firebase, but the quality of the new model is unknown. Google io will also have news.
https://io.google/2025/explore/pa-keynote-10
4 days for Io, claude update in June.
Deepseek 2 is expected as well. More useful for pushing lower prices than anything else
Oai could just create bunch of agents based on a finetuned o3 version, its just how powerful that model is
The moment it can self evaluate and fix its responses, that will be massive moment
GPT 6 before GTA 6
Oh yeah grok 3.5 soon too. Another one to push prices of others lower
Cwaude
Oai can't paywall everything if competitors come close
I will fund Oai 420 billion dollars per week for access to gpt-4-32k
Also wtf is gpt 4.1. I thought it was a 4o replacement, but it's worse/better simultaneously?
wen codex rolling into my acc 😭
Do I not have to be a student to benefit from pro?
not yet
I actually don't use 4o much at all anymore, since they messed with it
there are some of the first party tools u can enable at an extra cost i think
u did before???? 🤢
Not for coding
currently tools can only be executed in the final message output not within the cot
chatgpt does tool calls within the cot
For writing 4o was pretty good. But it's terrible now
I find it doesn't write as well as original 4o, but it's good at figuring out what was bad with your writing
Is that released? Haven't heard of that
and we are still only at 4.1 💀
November 6, 2023 to May 15, 2025 we only get an improvement by 0.1
I guess they deprecated it
they were calling gpt 4 turbo gpt 4 lol. og gpt 4 was long gone
idk how they make naming so confusing
When gpt 4 turbo became the new standard i got pissed bro and then they had to bring 4o into existence
worst days of my life
gpt2-chatbot was goated back then
gpt-4-0314 is goated
4o was for some reason worse than it
Have you used gpt2-chatbot in the Arena?
Rip
I tried it, when I saw that "good chat bot" title I thought it was bing reference
I gave it bing instrunctions but still didnt talk like it so I stopped caring about that model 😔
watafak
yea
Wonder if that's cheaper than paying for plus. O3 is fantastic for some critical thinking stuff though. Debating strategies, etc
people are viewing it with rose tinted glasses
The day it was updated I stopped using it
The subsequent updates are garbage
its extremely expensive anyway per tok, $30 m/tok, $60 m/tok. it depends on ur usage
Probably not then. I used 4o quite a lot
I use new 4o for very basic writing now. Anything more and it doesn't listen, gets context wrong, etc
Worse writing style too
So are they still going to have it say Rank (UB) after they make style control default?
I don't know why the writing style is worse.
O3 writing style isn't good, but I'll use it to help me in writing. So I'll ask it to evaluate what I've written and ask if everything makes sense, flows logically, etc. It does a good job at that
Especially for longer writings
sydney_prompt_conversations.csv
bing_prompt_conversations.csv
neurips_prompt_conversations.csv
still no codex..
It's going to be available once ur weekend is over
super wow
you what
OpenAI just cooked Gemini 2.5 pro
I ain't paying 200 bucks for this
Temporary
best ai for coding lua.?
Depends on the frameworks
And libraries used
Made good experience with with o3
Gemini 2.5 pro is good in generating code, not fixing it
anybody bought it?
Drafting pull requests you can do without that lol
You may choose to manually copy it or create an automation with n8n
did you buy it
can you run some prompt for me?
New model in Arena: cobalt-exp-beta-v12
jeez
Not surprising
Releases have been crap, already lots of resignations, and more delays
Someone is
I would pay 200 bucks for gpt-4-32k-0314 not for that
I thought only for users who were already paying
And even then it had deprecation date for them
go through openrouter
yea its gonna be retired in a month i think
at least on azure
codex is noise, where is o3 pro
those that paid it are actually mentally disabled
Keep paying and eventually it will come out
Don't worry
since december and still no o3 pro :/
The original GPT-4 was a very big fat model and the costs to run it didn't drop much. They later created much smaller models likely distilled on its outputs and optimized by RLHF so that they're better on benchmarks and certain tasks, but often lack the genuine intelligence/creativity spark of the original
gpt-4 my beloved
fasting till o3 pro
Wtf so GPT 5 is a base model and not model router? 👀
Yeah people think it’ll be like o4 equivalent
is this good or bad
have they even asked for o3 pro
common sense
we need gemini 3.5 ultra
omg im in
Gemini is like the shrimpy wimp virgin, and Claude the chad from Galahad once Claude 4 releases
Claude 4 is agi
Claude 4 Opus reasoning better not disappoint me
Is the beta site updating in sync / at the same time as the main site yet?
in terms of when leaderboards are updated?
Yeah in terms scores updated / models added!
I definitely prefer using the beta site, really great UI improvements
the wall is an illusion
this
No more very big fat models
glad to hear it! I believe that both leaderboards on the current & beta site are updated at the same time, I'll double check that and if I hear different will keep you updated.
for the models not all models on the current site are on the beta site; however, if there are specific ones you're wanting to see be sure to use the #1369756124261384232 thread
Sweet thanks for checking for me! I'll start actually submitting feedback and models I've just been happily using it haha
anyone still excited for grok35 or nah 😂
ofc its asi
sounds good! yeah don't hesitate to share feedback in #1372230675914031105 suspected bugs in #1343291835845578853 and model requests in #1372229840131985540
wtf dumbass features i didnt ask for??? claude 3-7 think he gemini 2.5 pro
🤣 🤣 🤣 🤣 🤣
yes
ok codex is actually really good
People always write a bunch of articles when something is hyped up, posed as innovating breakthroughs
but never write any articles when said hype dies down and no one seems to talk about it
What happened to DeepSeek? The supposed ChatGPT killer that never was?
It is better than ChatGPT. Its my daily driver
All these YouTubers as well, like Fireship, claimed this AI made by a small team had just made a collossal shift in the AI industry, and Nvidia is panicking, etc.
DeepSeek R2 will be huge
It's no where near the top anymore, the major competitors quickly outpaced it by launching their own upgrades.
Oh that's great, when is it coming?
Soon I hope
its creative writing and role playing is almost unbeatable. If u take closer look at sillytavern, chub ai and such platforms, they heavily use deepseek
Also the coding is great
Multiple ChatGPT, Gemini and Grok models score better in creative writing than DeepSeek.
So I don't think what you're saying is true.
Which will be available in arena for battles?
10
23
4
Nothing of these
😭
Didn't it shrink param size and made other companies add reasoning versions
It was and to some extent still is for people that can't afford a sub. Though we do have 2.5 Pro now and that shuffled things around no less than Deepseek initially did.
last valid article was about gpt-4 release
Interesting. I tried it on the CLI when they first launched it and it was terrible. So either they massively improved it (like from worst-in-class to wherever it is) or it’s mainly hype.
If it can’t compete head on with a well configured RooCode/Cline/Cursor/Windsurf/Aider then I don’t see the point. But maybe it’s for a different target audience?
Its still hype tbh
That's why they called it a preview research