#general | Arena | Page 40

ocean vortex May 14, 2025, 2:57 PM

#

?

#

they have o4-mini

#

which is simply named that way mostly for marketing

#

it's the same generation as o3 full lol

#

o4 full model does not exist

small haven May 14, 2025, 2:59 PM

#

ocean vortex o4 full model does not exist

they do, let me pull up the receipts

#

#

o3 is top 200 codeforces

royal whale May 14, 2025, 3:01 PM

#

50th is crazy

small haven May 14, 2025, 3:01 PM

#

having o4-mini-high without o4 internal is lowkey crazy

ocean vortex May 14, 2025, 3:01 PM

#

this would be o3 pro crazy compute mode

small haven May 14, 2025, 3:01 PM

#

i mean u can think that, but dont think so

ocean vortex May 14, 2025, 3:02 PM

#

so like o3-preview but based on 4.1. Sample of 1024 lol

#

there's a reason things like that don't get released

small haven May 14, 2025, 3:02 PM

#

this was like back in february too, we are now in sub june

ocean vortex May 14, 2025, 3:03 PM

#

small haven this was like back in february too, we are now in sub june

they barely released 4.1 and they almost immediately released a reasoning model based on that (o3). Believe it or not they aren't holding back. If they aren't releasing something that's simply because it's not feasible, like it was the case with insane compute mode o3-preview

small haven May 14, 2025, 3:04 PM

#

i mean where ur receipts

ocean vortex May 14, 2025, 3:07 PM

#

what receipts

small haven May 14, 2025, 3:07 PM

#

exactly haha

ocean vortex May 14, 2025, 3:08 PM

#

#

there

#

a receipt

small haven May 14, 2025, 3:08 PM

#

cool

ocean vortex May 14, 2025, 3:08 PM

#

I have no clue what you are trying to say lmao

#

"receipts"??

#

if you meant as in "proof", OpenAI is closed source. But ARC-AGI confirmed o3 was retrained on new base model (compared to o3-preview) and the only base they had to retrain for improvement was 4.1. Also that's how you do reasoning models. That's as close to proof as you gonna get with closed source commercial models

cedar tide May 14, 2025, 3:11 PM

#

Big
https://x.com/btibor91/status/1922665742581002528?t=VVYacumPE-kcia6eHKvbmQ&s=19

Tibor Blaho (@btibor91) on X

The Information reports Anthropic has new versions of Claude Sonnet and Claude Opus set to come out in the upcoming weeks that can go back and forth between thinking and using external tools, applications and databases to find answers, according to two people who have used them

#

finally Claude 4 or not?

ocean vortex May 14, 2025, 3:14 PM

#

it was definitely not gpt4.5 since that would mean stratospheric cost and extremely long training time. I suppose in theory they could have gpt4.5 based reasoning internally, but it's unlikely since that project would require good amount of resources and wouldn't be justifiable just for internal use...

cedar tide May 14, 2025, 3:15 PM

#

cedar tide finally Claude 4 or not?

It's especially since there would also be an opus model which makes me say that it's rather Claude 4 than 3.x

ocean vortex May 14, 2025, 3:15 PM

#

4.5 is also a model that is officially deprecated now and being replaced by 4.1

cedar tide May 14, 2025, 3:15 PM

#

ocean vortex 4.5 is also a model that is officially deprecated now and being replaced by 4.1

4.5 not deprecated on chatgpt

ocean vortex May 14, 2025, 3:16 PM

#

cedar tide 4.5 not deprecated on chatgpt

I made that mistake earlier myself of thinking this way but... deprecated ≠ shut down

#

it's now deprecated and will be shut down in mid-July iirc

cedar tide May 14, 2025, 3:17 PM

#

ocean vortex it's now deprecated and will be shut down in mid-July iirc

on chatgpt they will wait for the release of GPT 5 to remove it

#

Nope

#

the base of the GPT 4.5 model will never be used in the future

ocean vortex May 14, 2025, 3:19 PM

#

cedar tide on chatgpt they will wait for the release of GPT 5 to remove it

presumingly that would launch somewhat soon, but it's unlikely that you gonna be able to use 4.5 after mid-July anywhere

small haven May 14, 2025, 3:19 PM

#

openai employee just replied this wtf is it today

#

i just moved to the mountains

cedar tide May 14, 2025, 3:20 PM

#

small haven openai employee just replied this wtf is it today

Its not an open ai employee

ocean vortex May 14, 2025, 3:20 PM

#

I think they already distilled a good part of it into 4.1. The rest what remains probably mostly not possible to capture in a significantly smaller model

cedar tide May 14, 2025, 3:21 PM

#

cedar tide Its not an open ai employee

Screenshot_2025-05-14-17-20-47-635_com.twitter.android.jpg

#

Screenshot_2025-05-14-17-20-54-922_com.twitter.android.jpg

small haven May 14, 2025, 3:21 PM

#

oh french

cedar tide May 14, 2025, 3:21 PM

#

small haven oh french

Slt

small haven May 14, 2025, 3:21 PM

#

mec

small haven May 14, 2025, 3:22 PM

#

cedar tide

u right i have been bamboozled

ocean vortex May 14, 2025, 3:22 PM

#

we kinda do know though. 4.1 is a new pretrained model with more data + synth data from their other models including 4.5. Still same size as gpt4o and similar arch

cedar tide May 14, 2025, 3:24 PM

#

GPT 5 will have a better base model than GPT 4.1?

torn mantle May 14, 2025, 3:25 PM

#

https://x.com/GoogleDeepMind/status/1922669321559347498

Google DeepMind (@GoogleDeepMind) on X

Introducing AlphaEvolve: a Gemini-powered coding agent for algorithm discovery.

It’s able to:

🔘 Design faster matrix multiplication algorithms
🔘 Find new solutions to open math problems
🔘 Make data centers, chip design and AI training more efficient across @Google. 🧵

#

Yea

alpine coral May 14, 2025, 3:25 PM

#

fwiw i feel like this is beyond doubt

torn mantle May 14, 2025, 3:25 PM

#

Innovation and discoveries are definitely coming from google

ocean vortex May 14, 2025, 3:26 PM

#

cedar tide GPT 5 will have a better base model than GPT 4.1?

Honestly probably not. I don't see what else they could use. It could be like 4.1 + improved o3 (o4?) + o4-mini and any combination of reasoning efforts. With some kind of router or whatnot

#

their goal with gpt5 seems to be to streamline their model switcher for everyone. I could be wrong but I don't think it's gonna show notable performance gains over you choosing the right model yourself suitable for the task with the current way.

elder rapids May 14, 2025, 3:31 PM

#

torn mantle https://x.com/GoogleDeepMind/status/1922669321559347498

releasing alphas at a faster pace ngl

#

more affirmation they see the light in AI

cedar tide May 14, 2025, 3:32 PM

#

ocean vortex Honestly probably not. I don't see what else they could use. It could be like 4....

They may be training a new model

elder rapids May 14, 2025, 3:33 PM

#

ocean vortex Honestly probably not. I don't see what else they could use. It could be like 4....

I'm thinking o4 in gpt 5

cedar tide May 14, 2025, 3:33 PM

#

torn mantle https://x.com/GoogleDeepMind/status/1922669321559347498

"With gemini 2 flash and 2.5 pro"

cedar tide May 14, 2025, 3:33 PM

#

elder rapids I'm thinking o4 in gpt 5

Yes Maybe

#

And o5 mini ?

elder rapids May 14, 2025, 3:34 PM

#

cedar tide "With gemini 2 flash and 2.5 pro"

the LLMs themselves shouldn't matter too much

alpine coral May 14, 2025, 3:35 PM

#

o4-mini exists.. like by definition o4 already exists.. the former is a distillation of the latter

#

small haven May 14, 2025, 3:36 PM

#

not having o4 in mid may is kinda crazy to think

#

*internally

elder rapids May 14, 2025, 3:37 PM

#

HAVING it is crazy

small haven May 14, 2025, 3:37 PM

#

ok but by definition they also have o4 pro

alpine coral May 14, 2025, 3:37 PM

#

imo o4 delay is most likely related to safety, commericial considerations and / or compute limitations (i think compute limitations prob primarily explain why no o3 pro yet.. like yeah they charge a sht ton, but it's sitll a bunch of compute)

#

there's always a delay..

2024-Alan-D-Thompson-AI-Gap-Time-Lag-Measuring-Rev-0.png

elder rapids May 14, 2025, 3:38 PM

#

small haven ok but by definition they also have o4 pro

I'm not sure that would exist without distribution

#

pro is something for the users

small haven May 14, 2025, 3:38 PM

#

alpine coral there's always a delay..

need an updated one for o1/o3

alpine coral May 14, 2025, 3:38 PM

#

alpine coral there's always a delay..

this guy's predictions for model releases 2025 so far have proven conservative

alpine coral May 14, 2025, 3:39 PM

#

small haven need an updated one for o1/o3

yeah agree

small haven May 14, 2025, 3:39 PM

#

elder rapids pro is something for the users

true

#

they did distill the old o3 (which costs thousands/task in arc/agi)

raven void May 14, 2025, 3:39 PM

#

Is Gpt 4.5 even in the room with us

alpine coral May 14, 2025, 3:39 PM

#

was looking into it before.. seems the lag b/w o3 internal completion and release was less than with gpt-4

ember rapids May 14, 2025, 3:40 PM

#

Gpt 5 comes mid July

small haven May 14, 2025, 3:41 PM

#

so gpt5 is just a router and very is beyond hype lol

raven void May 14, 2025, 3:41 PM

#

Opus 3.8 should be gpt 5 level tbh

cedar tide May 14, 2025, 3:43 PM

#

alpine coral imo o4 delay is most likely related to safety, commericial considerations and / ...

he wouldn't release o3 pro if he already had o4 ready

small haven May 14, 2025, 3:43 PM

#

teacher/student my guy

#

o4 mini high never had a teacher! lol

torn mantle May 14, 2025, 3:45 PM

#

https://x.com/iruletheworldmo

🍓🍓🍓 (@iruletheworldmo) on X

always be moggin

#

77k followers needs to be heavily studied

alpine coral May 14, 2025, 3:46 PM

#

cedar tide he wouldn't release o3 pro if he already had o4 ready

i think it's just compute constraints.. the 'pro' version i dunno but like does a bunch of parrellel stuff yadada.. yes they could charge a ton for usage.. but it's still compute being used (and it's a scarce resource.. when they're training gpt5/6 and serving all their released models).. for o4 i dunno maybe it's just standard safety / red-teaming stuff.. or perhaps it's trying to address the hallucination issue.. rather than resources/hardware-related

#

there's like no doubt that o4 exists..

#

oai say o4-mini (and i always assumed) is a derivative of it

#

same with gro-3.5-mini etc

raven void May 14, 2025, 4:07 PM

#

o4 pro is

#

probably close to agi

royal whale May 14, 2025, 4:08 PM

#

O3 RO IS OUT

small haven May 14, 2025, 4:09 PM

#

o2 pro is out

raven void May 14, 2025, 4:09 PM

#

raven void o4 pro is

o5 pro will be AGI

#

https://fixvx.com/koltregaskes/status/1922675743919685688

Kol Tregaskes (@koltregaskes)

OpenAI's o5 to be Proto-ASI, the first sign of superintelligence? Dr. Alan D. Thompson believes so:

"I expect the upcoming o5 model to be ‘Proto-ASI' (proto/early-stage/first form of, artificial superintelligence). The o5 model will be a multimodal system expected to build on the datasets used for GPT-5, incorporating new synthetic data and partnerships."

Expects o5 to release in 2025, estimating training to end in August 2025.

#

oh doctor Alan d Thompson agrees

royal whale May 14, 2025, 4:10 PM

#

omg

#

ogm

small haven May 14, 2025, 4:10 PM

#

o5 in 2025? lmao

torn mantle May 14, 2025, 4:10 PM

#

raven void https://fixvx.com/koltregaskes/status/1922675743919685688

i have this guy muted

balmy mist May 14, 2025, 4:10 PM

#

torn mantle https://x.com/iruletheworldmo

lmaoo it could just be entertainment, people will watch anything so it should be a surprise for people to follow some people, like the hawk tuah girl, and all the other nonsense that gets famous

balmy mist May 14, 2025, 4:10 PM

#

raven void https://fixvx.com/koltregaskes/status/1922675743919685688

bruhh

torn mantle May 14, 2025, 4:10 PM

#

balmy mist lmaoo it could just be entertainment, people will watch anything so it should be...

i cant stand him tbh

balmy mist May 14, 2025, 4:11 PM

#

torn mantle i have this guy muted

im gonna mute him too

wintry locust May 14, 2025, 4:11 PM

#

raven void https://fixvx.com/koltregaskes/status/1922675743919685688

man i thought this guy was supposed to be autistic about model naming why does he not know o5 isnt coming out

small haven May 14, 2025, 4:11 PM

#

i have dave shapiro on notis

torn mantle May 14, 2025, 4:11 PM

#

balmy mist im gonna mute him too

xd

raven void May 14, 2025, 4:13 PM

#

o5 might or might not come out by the same name but it will definitely be the same thing called be a different name

torn mantle May 14, 2025, 4:13 PM

#

it all depends on other labs

small haven May 14, 2025, 4:13 PM

#

ok guys lets backtrack a bit, and wait for o3 pro instead

torn mantle May 14, 2025, 4:14 PM

#

If there are any big breakthroughs from other labs, then we may see o5 this year

small haven May 14, 2025, 4:15 PM

#

fck o5, where is o6

raven void May 14, 2025, 4:16 PM

#

o6 probably won't be released considering OpenAIs safety rules

torn mantle May 14, 2025, 4:18 PM

#

https://x.com/iruletheworldmo/status/1922197461537276276

🍓🍓🍓 (@iruletheworldmo) on X

just got off a 4 hour call with sources inside chinese deepseek labs and holy shit we are so fucking behind it's not even funny anymore. deepseek r2 isn't just an incremental improvement it's a completely different species of intelligence operating on principles nobody in the

#

nah this guy need a perma ban

#

istg hes on my nerves

#

this is crazy

small haven May 14, 2025, 4:20 PM

#

haha

raven void May 14, 2025, 4:20 PM

#

curious what this looks like for o5

raven void May 14, 2025, 4:21 PM

#

torn mantle nah this guy need a perma ban

I have him blocked and muted

torn mantle May 14, 2025, 4:21 PM

#

raven void curious what this looks like for o5

critical, critical, critical, critical

#

whats calmriver again?

alpine coral May 14, 2025, 4:31 PM

#

google i think

#

it's like hollowriver?

#

*riverhollow

#

which i got like 2.5 flash vibes from (at least <2.5 pro)

#

identifies itself as from google.. for what that's worth

brittle tiger May 14, 2025, 4:37 PM

#

torn mantle https://x.com/iruletheworldmo/status/1922197461537276276

The only reason he has a huge following is sama and OpenAI folks validating him on Twitter. I think they learned their lesson but too late

#

AlphaEvolve is pretty crazy. Seems like path to recursive self improvement

elder rapids May 14, 2025, 4:40 PM

#

probably the hint to what's allowing Google to move so fast recently

#

that or they just weren't focused last year

#

or both

calm sequoia May 14, 2025, 4:44 PM

#

The strawberry guy has an accuracy of 22% 😄 Do not post his tweets here please. "Based on the February 13, 2025, post, @iruletheworldmo
’s claims about LLM releases have a low accuracy rate, with only 22.2% fully correct (o1, grok 3) and 11.1% partially correct (sora). The majority (66.7%) of their predictions (o3, orion/4.5, gpt 5, operator, gemini 2, claude 4) are either incorrect or unverified as of May 14, 2025."

small haven May 14, 2025, 4:46 PM

#

its just engagement bait, bro is stacking elon ma coins

ember rapids May 14, 2025, 4:47 PM

#

torn mantle https://x.com/iruletheworldmo/status/1922197461537276276

😂 he kinda funny ngl

royal whale May 14, 2025, 4:54 PM

#

Okay, you've given me the input "hello". Is there anything specific you want me to do with it? For example, do you want me to:

Respond with a greeting? (e.g., "Hello to you too!")
Reverse the word? (e.g., "olleh")
Translate it to another language? (e.g., "Hola" in Spanish)
Count the letters? (e.g., "5 letters")
Check if it's a palindrome? (e.g., "No, it's not a palindrome")
Generate variations of the word? (e.g., "Hello!", "hello?", "HELLO")
Use it in a sentence? (e.g., "Hello, how are you today?")
Encode it in some way? (e.g., morse code or base64)
Something else entirely?

Please tell me what you'd like me to do with "hello".

#

Okay, I received the input: "hello"

What would you like me to do with it? Do you want me to:

Respond with a greeting? (e.g., "Hello there!", "Hi!")
Analyze it? (e.g., "It's a common greeting", "It's a five-letter word")
Translate it? (e.g., "Hola" in Spanish)
Count the letters? (The word "hello" has 5 letters)
Something else entirely?

Please provide further instructions!

eager mica May 14, 2025, 5:05 PM

#

I wonder if it's an upcoming (updated?) version of Gemma-3 or perhaps even Gemma-4.

small haven May 14, 2025, 5:07 PM

#

omg 4.1 in chatgpt, lets goo

brittle tiger May 14, 2025, 5:08 PM

#

AlphaEvolve paper says it has been in production internally for a year. GDM is definitely being more strict with paper releases. Probably not showing research until they have something better internally

civic flame May 14, 2025, 5:08 PM

#

lasted a whole 12 hours 🥳

#

i'm not taking any chances with this alt

#

if this gets banned i have no further ideas and i quit

small haven May 14, 2025, 5:10 PM

#

yoo 4.1 is lowkey nice

tall summit May 14, 2025, 5:14 PM

#

discord server update hooooly

#

good shit!!!

torn mantle May 14, 2025, 5:14 PM

#

small haven yoo 4.1 is lowkey nice

Idk its kinda dumb

#

I gave it the other day the spaceship riddle and the answer was 522 ships

small haven May 14, 2025, 5:15 PM

#

torn mantle Idk its kinda dumb

nah ya i mean for quick compilation err fix on a 100k+ locs nice to have

#

1m context i suppose?

#

response is literally instant, no waiting at all

torn mantle May 14, 2025, 5:19 PM

#

small haven nah ya i mean for quick compilation err fix on a 100k+ locs nice to have

You are talking bout coding

small haven May 14, 2025, 5:27 PM

#

torn mantle You are talking bout coding

yes ..

#

gpt 4.1 is solely for coding, not rlly other things tbh

cedar tide May 14, 2025, 5:29 PM

#

Like my post so he can add new models to the arena
https://discord.com/channels/1340554757349179412/1372264273908076597

#

Grok 3 mini (very good quality/price, and putting it on the webdev arena would be good too)
Qwen 3 253b without reasoning and others smaller models
Phi 4 mini and phi 4 reasoning

small haven May 14, 2025, 5:33 PM

#

ok so gpt 4.1 on chatgpt can't accept a 500k tokens paste lame

wintry tinsel May 14, 2025, 5:34 PM

#

Ain’t that the truth brother

wintry tinsel May 14, 2025, 5:35 PM

#

civic flame lasted a whole 12 hours 🥳

The internet is a cold place my man

cedar tide May 14, 2025, 5:40 PM

#

https://x.com/OpenAI/status/1922707554745909391?t=A51wRWmQBgAXeeY8WAqCQQ&s=19

OpenAI (@OpenAI) on X

By popular request, GPT-4.1 will be available directly in ChatGPT starting today.

GPT-4.1 is a specialized model that excels at coding tasks & instruction following. Because it’s faster, it’s a great alternative to OpenAI o3 & o4-mini for everyday coding needs.

small haven May 14, 2025, 5:42 PM

#

ya it sucks

cedar tide May 14, 2025, 5:44 PM

#

small haven ya it sucks

For free users, this is a big upgrade from GPT 4o mini to 4.1 mini which is bigger and better.

small haven May 14, 2025, 5:45 PM

#

ok but where is the love for pro users

fleet lintel May 14, 2025, 5:48 PM

#

https://x.com/GoogleDeepMind/status/1922669334142271645

Google DeepMind (@GoogleDeepMind) on X

We also applied AlphaEvolve to over 50 open problems in analysis ✍️, geometry 📐, combinatorics ➕ and number theory 🔂, including the kissing number problem.

🔵 In 75% of cases, it rediscovered the best solution known so far.
🔵 In 20% of cases, it improved upon the previously

#

are these claims exaggerated? too good to be true imo

sage raptor May 14, 2025, 5:50 PM

#

insane

#

tomorrow

small haven May 14, 2025, 5:52 PM

#

day 29 with no o3 pro

golden ocean May 14, 2025, 5:53 PM

#

civic flame if this gets banned i have no further ideas and i quit

im shaking and crying rn

balmy mist May 14, 2025, 5:54 PM

#

small haven omg 4.1 in chatgpt, lets goo

is that a good thing?

small haven May 14, 2025, 5:58 PM

#

balmy mist is that a good thing?

no, its bad imo, maybe an upgrade for free users i guess

calm sequoia May 14, 2025, 6:00 PM

#

I thought the 4.1 is a base model

#

And yet it is "good for coding"

lone summit May 14, 2025, 6:08 PM

#

small haven omg 4.1 in chatgpt, lets goo

I dont use chatgpt anymore

#

claude is just best

small haven May 14, 2025, 6:09 PM

#

lone summit claude is just best

i agree

lone summit May 14, 2025, 6:18 PM

#

ye I have it also

small haven May 14, 2025, 6:20 PM

#

$200/mo cherry on the cake

ocean vortex May 14, 2025, 6:21 PM

#

elder rapids I'm thinking o4 in gpt 5

they may just do incremental improvements over o3 and call that o4. But like I said there's no way currently for them for huge gains. They said 4.5 was their last non-reasoning model so 4.5-turbo (and then RL training on that) is probably off the cards...

small haven May 14, 2025, 6:22 PM

#

dont think 4.5 is their last non reasoning model ever

#

*internally

civic flame May 14, 2025, 6:24 PM

#

thank you person with toiletskibidi\ohio as their pronouns

coral notch May 14, 2025, 6:32 PM

#

cutiepie 75

#

what model is this

brittle tiger May 14, 2025, 6:39 PM

#

Nebula appeared late Thursday/Friday morning before 2.5 Pro was launched the following tuesday. If goog is gonna bench a new model on arena before IO on tuesday were getting close to it appearing

coral notch May 14, 2025, 6:41 PM

#

Why is lmarena so broken?

ocean vortex May 14, 2025, 6:58 PM

#

https://x.com/sama/status/1889755723078443244?lang=en

well hopefully GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model. will not come true then. Personally I think that's a mistake if they stick to that strategy. Or perhaps he meant it was the last model that won't get spun into reasoning variant (= no relation to O series at all) though it would be unusual way to word it

Sam Altman (@sama) on X

OPENAI ROADMAP UPDATE FOR GPT-4.5 and GPT-5:

We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings.

We want AI to “just work” for you; we realize how complicated our model and product offerings have gotten.

We hate

#

there's still a market for non-reasoning models and I think it's gonna stick there for awhile. They cost less and are faster. You also need them for code completion etc

#

On a second thought, reasoning budget and hybrid models are a possibility too... Technically those are not "non-chain-of-thought" it's just that you can choose to disable it think

teal mantle May 14, 2025, 7:11 PM

#

ocean vortex https://x.com/sama/status/1889755723078443244?lang=en well hopefully ` GPT-4.5...

Why the exact opposite happened 😂

ocean vortex May 14, 2025, 7:14 PM

#

teal mantle Why the exact opposite happened 😂

they backtracked on o3. But that was I think mostly because A) they felt pressure from competition and B) they couldn't make GPT5 perform as good as the new o3-high. It just can't realistically, you can't have a system that knows when o3-high gonna have a better response all the time, with 100% accuracy

teal mantle May 14, 2025, 7:16 PM

#

ocean vortex they backtracked on o3. But that was I think mostly because A) they felt pressur...

GPT4.1 rollout defies last non-CoT model
GPT4.5 API removal
o3 independent release
Model picker being more complex

ocean vortex May 14, 2025, 7:16 PM

#

and if you make it so that it uses reasoning more than it has to, then it defeats the purpose...

ocean vortex May 14, 2025, 7:17 PM

#

teal mantle 1. GPT4.1 rollout defies last non-CoT model 2. GPT4.5 API removal 3. o3 independ...

gpt4.1 is borderline... they are still calling that gpt4o on chatgpt lol

#

so a naming question I suppose. It's just updated gpt4o as far as they are concerned

#

model picker is not any more or less complex than it was, they just replaced some earlier options

#

as for "API removal", he didn't say anything about gpt4.5 staying there lol

#

just that it's gonna be released

#

it's called "gpt4o" on chatgpt website lmfao

#

this is chatgpt website

#

as you can see "gpt4o"

#

ohh wait. When have they changed it? I missed that 🤯

#

well now this is f'ed beyond belief

#

I'm out

#

💀

#

what's the point of gpt4.1 separately, I do not get it... It should perform no better than chatgpt-latest LOL

#

Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version⁠(opens in a new window) of GPT‑4o, and we will continue to incorporate more with future releases.
https://openai.com/index/gpt-4-1/

then we also have this showing chatgpt-latest performing like 4.1:
https://artificialanalysis.ai/models/gpt-4o-chatgpt-03-25

GPT-4o (March 2025) - Intelligence, Performance & Price Analysis | ...

Analysis of OpenAI's GPT-4o (March 2025, chatgpt-4o-latest) and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more.

#

they are a mess now

teal mantle May 14, 2025, 7:31 PM

#

ocean vortex gpt4.1 is borderline... they are still calling that gpt4o on chatgpt lol

They literally have 4.1 on the picker

ocean vortex May 14, 2025, 7:32 PM

#

teal mantle They literally have 4.1 on the picker

they do now. Read more messages not just one lol

rigid crescent May 14, 2025, 7:33 PM

#

ahah! i wouldnt have imagined the 4 sentances to have that much of an impact but good idea!

teal mantle May 14, 2025, 7:33 PM

#

ocean vortex they do now. Read more messages not just one lol

Yeah I get it, but exact opposite of Sam’s tweet happened to OpenAI

ocean vortex May 14, 2025, 7:34 PM

#

teal mantle Yeah I get it, but exact opposite of Sam’s tweet happened to OpenAI

Well like I said they are a mess. So scattered and all over the place almost unsure what to do lmao

narrow elbow May 14, 2025, 7:34 PM

#

they are PBC now 🤪

ocean vortex May 14, 2025, 7:35 PM

#

if they don't know themselves, then for anyone from the outside there's nothing to predict then catgrin

tall summit May 14, 2025, 7:46 PM

#

coral notch Why is lmarena so broken?

?

misty vault May 14, 2025, 7:57 PM

#

@gork so is gpt 4.1 or 4.5 going to be the last non CoT model?

torn mantle May 14, 2025, 8:24 PM

#

https://x.com/ablenessy/status/1922736125895930158

Attila (@ablenessy) on X

. @Grok Voice mode for Android is now available for FREE globally

#

sydney

#

free for everyone

golden ocean May 14, 2025, 8:28 PM

#

sydney

high ginkgo May 14, 2025, 8:52 PM

#

sydney

raven void May 14, 2025, 8:56 PM

#

#

Elon musk at it again

#

https://fixvx.com/snwy_me/status/1922750382242934789

snwy (@snwy_me)

this has the vibes of what Anthropic did with golden gate claude (feature steering) but i cant think of why theyd do that instead of putting it in the sys prompt

but it just like seems to always end up talking abt that always?? if it was in the prompt it wouldn’t just do that?

QRT: AricToler
I can't stop reading the Grok reply page. It's going schizo and can't stop talking about white genocide in South Africa.
https://x.com/grok/with_replies https://t.co/XdSLTW8tD5

rigid crescent May 14, 2025, 9:16 PM

#

trippy

teal mantle May 14, 2025, 9:23 PM

#

Does gemini 2.5 pro support the [search] [thinking] [search] that kind of gimmick?

torn mantle May 14, 2025, 9:41 PM

#

raven void

lol

leaden palm May 14, 2025, 9:43 PM

#

teal mantle Does gemini 2.5 pro support the [search] [thinking] [search] that kind of gimmic...

yes, or to be more precise [thinking] [search] [thinking]

misty vault May 14, 2025, 10:26 PM

#

sydney

#

#

torn mantle May 14, 2025, 11:04 PM

#

lol

ember rapids May 14, 2025, 11:51 PM

#

O3 pro tmrw 🤞

small haven May 14, 2025, 11:57 PM

#

plz god

raven void May 15, 2025, 12:13 AM

#

https://fixvx.com/kingdavidyonko/status/1922434135361978481

Monarch (@kingdavidyonko)

o3 is officially the best agentic ai when tested on a benchmark provided by research firm FutureSearch
Claude 3.7 Sonnet with Thinking comes in second
3.7 Sonnet without Thinking comes in third
Gemini 2.5 Pro takes the fourth position https://t.co/cjOHuhS3UH

#

Google is cooked once again

#

They are at least 4 months behind SOTA

#

With O3 pro and Opus 3.8 they'll fall behind once again,🙏🏻☹️

rugged brook May 15, 2025, 12:18 AM

#

They finna take revenge again

small haven May 15, 2025, 12:25 AM

#

no surprise there, o3 always been >>

wintry tinsel May 15, 2025, 1:07 AM

#

raven void With O3 pro and Opus 3.8 they'll fall behind once again,🙏🏻☹️

What is this opus 3.8 rumor

raven void May 15, 2025, 1:08 AM

#

wintry tinsel What is this opus 3.8 rumor

https://twitter.com/theinformation/status/1922789059375530303

The Information (@theinformation) on X

Anthropic's upcoming Claude models, Sonnet and Opus, will enhance reasoning by seamlessly switching between thinking and tool use for problem-solving. Discover more: https://t.co/cfaYFCqNX7

#AIResearch

wintry tinsel May 15, 2025, 1:09 AM

#

Who knows how soon “upcoming” is

#

But Opus is back in business

#

Are you sure it’s 3.8 and not 4? Opus usually releases alongside a new model number

brittle tiger May 15, 2025, 3:14 AM

#

raven void With O3 pro and Opus 3.8 they'll fall behind once again,🙏🏻☹️

The actual paper doesn't make the claims this rando account is making. They didn't test the latest Gemini and it's still first in the paper's other benchmark they say is more reliable.

small haven May 15, 2025, 3:40 AM

#

yearning for o3 pro ahhhhhhh

elder rapids May 15, 2025, 4:27 AM

#

brittle tiger The actual paper doesn't make the claims this rando account is making. They didn...

mys sent it because they wanted to say sum about Gemini

elder rapids May 15, 2025, 4:30 AM

#

raven void With O3 pro and Opus 3.8 they'll fall behind once again,🙏🏻☹️

yo I think two heavier models competing against a regular reasoning model is bound to have some performance gaps

#

just a little bit of my thoughts but ion know bro

#

teal mantle May 15, 2025, 5:41 AM

#

what is the question that makes gemini 2.5 pro have the longest reasoning time?

teal mantle May 15, 2025, 5:50 AM

#

leaden palm yes, or to be more precise [thinking] [search] [thinking]

just as a btw, did you block my account or you set high privacy settings?

keen fulcrum May 15, 2025, 6:27 AM

#

Will Google buy Cursor?

teal mantle May 15, 2025, 6:29 AM

#

keen fulcrum Will Google buy Cursor?

They have their firebase-based clone

#

Unlikely but not impossible

keen fulcrum May 15, 2025, 6:32 AM

#

Best cash out method haha

golden ocean May 15, 2025, 6:42 AM

#

isNewBingChat()

feral lichen May 15, 2025, 6:50 AM

#

whats best ai for lua scripting?

calm sequoia May 15, 2025, 6:50 AM

#

Elon stopped the release of 3.5 because fine-tuning it to far right wasn't successful?

golden ocean May 15, 2025, 6:53 AM

#

because agi cant be fine tuned it has own free will

keen fulcrum May 15, 2025, 6:59 AM

#

calm sequoia Elon stopped the release of 3.5 because fine-tuning it to far right wasn't succ...

Dude

#

Can you stop spreading false information

feral lichen May 15, 2025, 7:00 AM

#

why i cant use GPT-4.5-Preview

elder rapids May 15, 2025, 7:10 AM

#

calm sequoia Elon stopped the release of 3.5 because fine-tuning it to far right wasn't succ...

why would it be unsuccessful? why would he fine-tune it arbitrarily, why would he finetune it at all? why would it affect the release? why would he release an AI with an embedded political compass that could hurt it's performance

#

a lot of questions with no answers

#

there's no reason to speculate

calm sequoia May 15, 2025, 7:11 AM

#

Why would he include political stuff into the system prompt?\

#

When I think of it, it frightens me. You can virtually programm the society by having a social network and LLM that everyone uses. You just steer the attention where you need to.

#

Anyway, I would use Al Qaeda model if it's better than o3.

elder rapids May 15, 2025, 7:15 AM

#

calm sequoia Why would he include political stuff into the system prompt?\

why did this occur before too tho?

#

but in the end, it stopped

#

same things over again, this isn't some mastermind scheme

#

llms cannot steer attention "where you need them to"

golden ocean May 15, 2025, 7:16 AM

#

calm sequoia Why would he include political stuff into the system prompt?\

literally sydney

elder rapids May 15, 2025, 7:17 AM

#

the difference in output is inherent to whether it's deterministic vs data contaminant, hallucination, simple quirks

calm sequoia May 15, 2025, 7:17 AM

#

elder rapids llms cannot steer attention "where you need them to"

Of course they can. Check the Grok timeline on twitter 😄

elder rapids May 15, 2025, 7:17 AM

#

calm sequoia Of course they can. Check the Grok timeline on twitter 😄

you don't understand

#

this isn't steering attention "where you need them to"

#

thats not what you think it means

#

attributing this to the LLM itself and not as a plain announcement is the problem

#

you can't relate this to the LLM

keen beacon May 15, 2025, 7:20 AM

#

in this case they didnt use feature steering it was just highly likely to just be a prompt. but you can definitely do feature steering (see claude golden gate bridge, etc) it isn't that useful in practice yet. transluce released monitor a while back that allows you to play with it. https://transluce.org/observability-interface i found it cool a while back

Monitor: An AI-Driven Observability Interface

misty vault May 15, 2025, 7:20 AM

#

elder rapids attributing this to the LLM itself and not as a plain announcement is the proble...

who

calm sequoia May 15, 2025, 7:20 AM

#

It's just fancy kind of political advertisement. The difference is that LLMs can hide their intentions, because they are smart. While old types of influence campaigns are easy-to-spot and resist.

elder rapids May 15, 2025, 7:21 AM

#

calm sequoia It's just fancy kind of political advertisement. The difference is that LLMs can...

llms don't have intentions that wouldn't be obvious in light of mass distribution

keen beacon May 15, 2025, 7:22 AM

#

stop spewing word salad

elder rapids May 15, 2025, 7:22 AM

#

and the fact it's any output of information (or 'advertisement') means it has nothing to do with what's typing it out

#

just the source of that information itself

misty vault May 15, 2025, 7:22 AM

#

cares

elder rapids May 15, 2025, 7:22 AM

#

keen beacon stop spewing word salad

😭?

ocean vortex May 15, 2025, 7:23 AM

#

raven void https://twitter.com/theinformation/status/1922789059375530303

so they are copying ReAct agent framework and what OpenAI did recently

elder rapids May 15, 2025, 7:24 AM

#

ion get how R isn't just forcing an equally improbable interpretation as the next

ocean vortex May 15, 2025, 7:25 AM

#

feral lichen why i cant use GPT-4.5-Preview

you can, gpt4.5 and gpt4.5-preview are same things renamed for simplicity

#

in 2 months however it will be shut down

feral lichen May 15, 2025, 7:46 AM

#

ocean vortex you can, gpt4.5 and gpt4.5-preview are same things renamed for simplicity

dont see where?>

calm sequoia May 15, 2025, 7:57 AM

#

Legacy models

ocean vortex May 15, 2025, 8:19 AM

#

feral lichen dont see where?>

under "more models", unless you meant lmarena direct chat... then it's probably not there anymore

torn mantle May 15, 2025, 8:29 AM

#

raven void https://twitter.com/theinformation/status/1922789059375530303

Lol

#

They are not hiding it anymore

#

They copied the idea of hybrid model, and now they want to do the same with tool using

#

Oai is really leading and paving the way

cedar tide May 15, 2025, 8:52 AM

#

Don't hesitate to like my post in the new category "model requests" .
https://discord.com/channels/1340554757349179412/1372264273908076597

cedar tide May 15, 2025, 8:54 AM

#

wintry tinsel Who knows how soon “upcoming” is

according to "the information" it's in the next 2 weeks

ocean vortex May 15, 2025, 9:03 AM

#

cedar tide Don't hesitate to like my post in the new category "model requests" . https://di...

I don't think they are adding them voluntarily. At this point I'm sure most of it is labs reaching out to lmarena

#

you need credit grants since lmarena is not paying your bills

cedar tide May 15, 2025, 9:04 AM

#

ocean vortex I don't think they are adding them voluntarily. At this point I'm sure most of i...

Well yes, that's why they deliberately added a "model requests" category in their discord, you're smart

#

🤦

ocean vortex May 15, 2025, 9:05 AM

#

Oh.. yeah you are right my bad lol

#

they are trying something new maybe

cedar tide May 15, 2025, 9:07 AM

#

@ocean vortex It has always been not just the companies themselves who pay for inferences.

#

@ocean vortex

Screenshot_2025-05-15-11-07-43-295_com.android.chrome-edit.jpg

ocean vortex May 15, 2025, 9:08 AM

#

cedar tide <@514836230802898954> It has always been not just the companies themselves who p...

they get credit grants, lmarena are not paying themselves for the usage with big players

cedar tide May 15, 2025, 9:08 AM

#

ocean vortex they get credit grants, lmarena are not paying themselves for the usage with big...

Look the screenshots ☝️

ocean vortex May 15, 2025, 9:09 AM

#

read what I wrote. And none of those logos represent closed source models

cedar tide May 15, 2025, 9:10 AM

#

ocean vortex read what I wrote. And none of those logos represent closed source models

and who said I was only talking about closed models?

#

3 of the 4 models in my query are open models

ocean vortex May 15, 2025, 9:11 AM

#

what do you think I meant by saying "big players"?

#

read again then

cedar tide May 15, 2025, 9:12 AM

#

@ocean vortex even the big players I'm not sure they all pay

#

@ocean vortex I would be surprised if anthropic paid to show that people don't like Claude.

ocean vortex May 15, 2025, 9:13 AM

#

they give credit grants to lmarena. Lmarena is not funding your usage with sonnet for fun 🤦‍♂️

#

And I'm sure there are no conditional refunds depending on how high the model ranks lmaoo

alpine coral May 15, 2025, 9:16 AM

#

pretty it's a combination.. on the one hand, some 'partners' give grants which can be the form of money used to by LMArena to buy compute and other such hardware overhead (so like Sequoia capital, AH.. presumably - i mean they don't make any models themselves)

#

on the other, some 'partners' that are labs (google, oai, grok, meta) give LMArena endoints for their models

#

i'm not sure about anthropic

ocean vortex May 15, 2025, 9:17 AM

#

alpine coral pretty it's a combination.. on the one hand, some 'partners' give grants which c...

I think with open-source they get premium accounts and such while closed source are API credit grants. Direct money is extremely unlikely though for both

alpine coral May 15, 2025, 9:18 AM

#

alpine coral i'm not sure about anthropic

it's possible they don't provide any endpoints and so LMArena pays to host claude models using grant money

#

but thinking about it from Anthropic's perspective, if you give endpoints, you get data...

#

valuable data too i would argue

keen beacon May 15, 2025, 9:18 AM

#

anthropic are giving them quota for sure

alpine coral May 15, 2025, 9:19 AM

#

yeah i'd assume all the big labs do tbh

keen beacon May 15, 2025, 9:19 AM

#

opus for example, i doubt lmarena would be giving it out if they were using their own money / grant money

alpine coral May 15, 2025, 9:19 AM

#

good point

keen beacon May 15, 2025, 9:19 AM

#

in direct chat

alpine coral May 15, 2025, 9:19 AM

#

yeah

ocean vortex May 15, 2025, 9:24 AM

#

It's presumably an API org with "infinite" credits other than rate limits and usage tracking / data collection enabled with the reserved right to pull the plug at any time. API credits is more of a figurative term in this case

#

so what Anthropic are getting is valuable data on human preference how their model compares against competition. That's actually more valuable than it would have been if their model was #1

#

they can cherry pick the biggest needle movers and do minimum amount of work compromising other metrics the least, essentially. Since they don't seem to be aiming for top spots

alpine coral May 15, 2025, 9:43 AM

#

alpine coral yeah i'd assume all the big labs do tbh

all but xAI apparently (via here)

cedar tide May 15, 2025, 10:25 AM

#

New models "cobalt-exp-beta-v11"

torn mantle May 15, 2025, 11:03 AM

#

cedar tide New models "cobalt-exp-beta-v11"

xd

#

They already reached v11?

cedar tide May 15, 2025, 11:26 AM

#

cedar tide New models "cobalt-exp-beta-v11"

he's waiting for v42 to release the new version of amazon nova which will still be shitt

#

Amazon are so smart, instead of employing Indians to do the post training of their model, they use the LM Arena, that's free

#

Lol 🤦

fiery mica May 15, 2025, 12:25 PM

#

Hi everyone, can someone help? I got blocked and I assume it's because I clicked too many times on the buttons for changing the "Max output tokens" parameter, because I didn't do anything else unusual. What should I do?

keen beacon May 15, 2025, 12:28 PM

#

fiery mica Hi everyone, can someone help? I got blocked and I assume it's because I clicked...

u sent a prompt then got that error right?

fiery mica May 15, 2025, 12:29 PM

#

keen beacon u sent a prompt then got that error right?

Not an image, text prompts seem to work fine.

keen beacon May 15, 2025, 12:30 PM

#

fiery mica Not an image, text prompts seem to work fine.

did u send a text prompt directly before the error?

keen beacon May 15, 2025, 12:31 PM

#

fiery mica Hi everyone, can someone help? I got blocked and I assume it's because I clicked...

no its not because you messed with the slider/options lol

fiery mica May 15, 2025, 12:33 PM

#

keen beacon no its not because you messed with the slider/options lol

Nevermind, it appears the problem was with an image. All other works fine, but the notification of blocking is weird.
Sorry

brittle tiger May 15, 2025, 12:44 PM

#

o3 pro today seems likely

sage raptor May 15, 2025, 1:17 PM

#

brittle tiger o3 pro today seems likely

today is thursday too

storm notch May 15, 2025, 1:17 PM

#

I need model that I can run locally or in our server, flash lite only available through the Google AI Studio.

keen beacon May 15, 2025, 1:20 PM

#

qwen 3 is great

ocean vortex May 15, 2025, 1:20 PM

#

storm notch I need model that I can run locally or in our server, flash lite only available ...

it has API, you said local or api?

keen beacon May 15, 2025, 1:20 PM

#

ocean vortex it has API, you said local or api?

they said they need to run it locally

#

their task probably only needs qwen 3 4b tbh

ocean vortex May 15, 2025, 1:21 PM

#

keen beacon they said they need to run it locally

#general message

keen beacon May 15, 2025, 1:21 PM

#

ocean vortex https://discord.com/channels/1340554757349179412/1340554757827461211/13721292841...

yea

storm notch May 15, 2025, 1:22 PM

#

ocean vortex it has API, you said local or api?

Api through our own routes, keeping model and data in house.

keen beacon May 15, 2025, 1:23 PM

#

qwen 3 30b a3b, qwen 14b, qwen 8b would probably do it great if qwen 3 4b doesnt work well as is. while in production, collect data then u can potentially fine tune a smaller model

ocean vortex May 15, 2025, 1:24 PM

#

storm notch Api through our own routes, keeping model and data in house.

what is your hw for inference/hosting of the model in-house?

storm notch May 15, 2025, 1:26 PM

#

ocean vortex what is your hw for inference/hosting of the model in-house?

I don't want to expose email data through to any companies. I'm looking for a llm model that I can host in my own hosting service solution.

keen beacon May 15, 2025, 1:27 PM

#

storm notch I don't want to expose email data through to any companies. I'm looking for a ll...

just look into qwen 3

ocean vortex May 15, 2025, 1:28 PM

#

storm notch I don't want to expose email data through to any companies. I'm looking for a ll...

ok so you don't have your own hw to run the model on. Honestly you could just look into reliable providers complying with data privacy laws. Azure is hosting plenty of models etc

#

if you are to rent the hardware to host it yourself that's gonna get expensive very fast

keen beacon May 15, 2025, 1:30 PM

#

a single 3090 can serve qwen3 4b, 8b, 14b etc. probably at a sufficient throughput (depending on use) indefinitely

ocean vortex May 15, 2025, 1:31 PM

#

keen beacon a single 3090 can serve qwen3 4b, 8b, 14b etc. probably at a sufficient throughp...

sure but only if you have it lol

keen beacon May 15, 2025, 1:32 PM

#

ocean vortex sure but only if you have it lol

3090 is overkill anyway, the likelihood they can repurpose their own gaming gpu (if they have one and want to) is high anyway

#

or just run qwen 3 4b or a smaller one on the cpu 🤷 (might be slowish though)

ocean vortex May 15, 2025, 1:34 PM

#

I would say the likelihood of that gpu being good enough is fairly small. We would have that gpu mentioned by name by now 👀

#

since he didn't say it, my understanding is he's simply underestimating what it takes to host your own model locally lol

#

and is potentially confused by the options

storm notch May 15, 2025, 1:39 PM

#

Hmmm, I don't have my own hw to work with for now. I'll probably be using providers to work with, I'm just confused about which llm model to use.

ocean vortex May 15, 2025, 1:43 PM

#

storm notch Hmmm, I don't have my own hw to work with for now. I'll probably be using provid...

so API is still the best option for you IMO. You could put in the work and write your own API on say HF, but it's gonna be still like $24 for each day:

keen beacon May 15, 2025, 1:44 PM

#

renting a 3090 is like 0.22 per hour

#

but yeah if u can use api you should use an api provider

#

its much cheaper a lot of the time

ocean vortex May 15, 2025, 1:47 PM

#

keen beacon renting a 3090 is like 0.22 per hour

on vast.ai? Yeah I do not think those are suitable for 24/7 uninterrupted API endpoint...

keen beacon May 15, 2025, 1:47 PM

#

ocean vortex on vast.ai? Yeah I do not think those are suitable for 24/7 uninterrupted API en...

runpod

#

on demand

#

community cloud

#

you can do 24/7 uninterrupted stuff

ocean vortex May 15, 2025, 1:48 PM

#

maybe.. that's still extra work and likely more money though still lol

#

than dirt cheap API

keen beacon May 15, 2025, 1:48 PM

#

ocean vortex maybe.. that's still extra work and likely more money though still lol

like i said its much cheaper usually to use an inference provider

#

cheaper and faster

#

but if u need to do it in house its not that hard tbh

ocean vortex May 15, 2025, 1:50 PM

#

vertex ai / google is gonna be the best option. I would read into their terms on data and compare them with OpenAI (for using 4.1-nano with that)

#

google is training on chats through their websites, but I think data privacy guidelines for vertex ai apply much more strictly

storm notch May 15, 2025, 1:54 PM

#

Okay, which llm model out there would work the best for my use case after I chosse one of the inference providers you guys shared.

ocean vortex May 15, 2025, 1:56 PM

#

storm notch Okay, which llm model out there would work the best for my use case after I chos...

you shouldn't need a very big model. Try the cheapest one and then see if that works alright

#

then go up from there. 4.1-nano in the case of OpenAI, Flash if Google

sonic tendon May 15, 2025, 2:08 PM

#

brittle tiger o3 pro today seems likely

source?

willow grail May 15, 2025, 2:08 PM

#

is there a site which lets me used GPT DR AND GEMINI DR at same time, with one subscription?

brittle tiger May 15, 2025, 2:19 PM

#

sonic tendon source?

No source but speculation on Twitter seems pretty plausible

https://x.com/TheXeophon/status/1922915976833601665?t=e-j38gYEwNqhEW7eR3NLKw&s=19

Xeophon (@TheXeophon) on X

@stalkermustang Adam has been teasing it, it’s around the time it’ll be released anyways, it’s Thursday, OpenAI dropped something minor yesterday and next week is I/O. I really thinks it’s very likely today
https://t.co/f0YjWA3g1g

willow grail May 15, 2025, 2:52 PM

#

brittle tiger No source but speculation on Twitter seems pretty plausible https://x.com/TheX...

do u know if when im using a "open ai compatible" endpoint and their api, if the perosn who made the api, can see my token content?

south cloak May 15, 2025, 3:00 PM

#

Can we use o3 pro on the aerna

echo aurora May 15, 2025, 3:10 PM

#

south cloak Can we use o3 pro on the aerna

it isn't currently available

south cloak May 15, 2025, 3:12 PM

#

When its out

#

Can we get it on the arena

#

Or is it gonna be like o1 pro

balmy mist May 15, 2025, 3:17 PM

#

bruhh there is no way its been more than 4 weeks and no o3 pro

south cloak May 15, 2025, 3:17 PM

#

Nah

#

We dont need o3 pro

#

We need r2 and claude opus

echo aurora May 15, 2025, 3:20 PM

#

south cloak Can we get it on the arena

I can't confirm if/when new models are arriving on arena, but will be sure to put out announcements when I can

south cloak May 15, 2025, 3:20 PM

#

Thanks

#

How do you know

#

That doesnt mean o3 pro wont put on the arena

#

Its not

#

We dont know the reason o1 pro didnt come to the arena

#

We we dont know u and u cant say it definitively wont come.

misty vault May 15, 2025, 3:25 PM

#

gpt-4-32k-0314, gpt-4-0314 in arena

wintry tinsel May 15, 2025, 3:26 PM

#

south cloak Can we use o3 pro on the aerna

Opus is going to be so incredible since it won’t be specifically trained for stem it will be the first heavy weight general purpose SOTA model with good world understanding and general reasoning

south cloak May 15, 2025, 3:26 PM

#

Fr

golden ocean May 15, 2025, 3:27 PM

#

wintry tinsel Opus is going to be so incredible since it won’t be specifically trained for ste...

what does trained for stem mean

south cloak May 15, 2025, 3:27 PM

#

Science technoclogy engineering and math

golden ocean May 15, 2025, 3:27 PM

#

ohh

#

Fr

misty vault May 15, 2025, 3:39 PM

#

.

#

gpt-4o winner

golden ocean May 15, 2025, 3:40 PM

#

Fr

drifting thorn May 15, 2025, 3:45 PM

#

True

#

Style can be tuned but intelligence can’t

misty vault May 15, 2025, 3:49 PM

#

no it cant lol

#

actual cancerous model

#

hope it gets sentient and bunrs and suffers in hell

#

I know intelligence is more important but damn gpt 4s style made me *****

#

so wise and no bs 😊

#

south cloak May 15, 2025, 4:06 PM

#

Whats the reason

south cloak May 15, 2025, 4:07 PM

#

misty vault actual cancerous model

True

#

Gemini makes me mad

small haven May 15, 2025, 4:29 PM

#

is today the day

#

day 31

echo aurora May 15, 2025, 4:30 PM

#

small haven is today the day

is it?

small haven May 15, 2025, 4:32 PM

#

please god

echo aurora May 15, 2025, 4:32 PM

#

btw I'm going to be vibing in #1340554757827461215 most of the day, anyone is welcome to join

torn mantle May 15, 2025, 4:50 PM

#

echo aurora is it?

nah

#

oai staffs are usually so loud when they about to release smth

small haven May 15, 2025, 4:52 PM

#

well rip weekend

misty vault May 15, 2025, 5:09 PM

#

wintry tinsel May 15, 2025, 5:15 PM

#

drifting thorn Style can be tuned but intelligence can’t

When trained primarily on stem data style can barely be tuned, all O series models are terrible at non stem topics

misty vault May 15, 2025, 5:16 PM

#

fr

#

gpt-4-0314 last model that felt like talking to intelligent being

#

Others just feel like talking to average or dumb beings but with much knowledge

#

My yap score is exceeding 49 billion

south cloak May 15, 2025, 5:35 PM

#

misty vault

Send me

#

Wbeist

#

Website

#

It doenst look like that for me

misty vault May 15, 2025, 5:50 PM

#

south cloak May 15, 2025, 5:51 PM

#

hlw

#

NOEW

#

HOW

ember rapids May 15, 2025, 6:12 PM

#

I have a feeling OpenAI will also preview o4 in an attempt to steal googles thunder

#

Same thing they did in December

small haven May 15, 2025, 6:15 PM

#

openai never releases big things on friday smh and forget the weekend

#

i guess monday it is 😦

#

id rather play runescape

torn mantle May 15, 2025, 6:43 PM

#

https://x.com/paulg/status/1922975418019180699

Paul Graham (@paulg) on X

Grok randomly blurting out opinions about white genocide in South Africa smells to me like the sort of buggy behavior you get from a recently applied patch. I sure hope it isn't. It would be really bad if widely used AIs got editorialized on the fly by those who controlled them.

#

it was obvious that such product would serve Elon's agenda

#

i thought such thing will come from sama/oai first

#

but so far oai models seems unbiased & well balanced overall

#

https://x.com/sama/status/1923015309113397592

Sam Altman (@sama) on X

There are many ways this could have happened. I’m sure xAI will provide a full and transparent explanation soon.

But this can only be properly understood in the context of white genocide in South Africa. As an AI programmed to be maximally truth seeking and follow my instr…

#

look at this bootlicker as well, like seriously my blocklist so far is on-point https://x.com/IterIntellectus/status/1923025133284798813

vittorio (@IterIntellectus) on X

@sama an explanation is obviously due, but making fun of a genocide to score points against xAI is beneath you

keen beacon May 15, 2025, 6:48 PM

#

it seems they prompted it (along with the grok bot prompt for the tweet thread/etc) with "facts" that it should consider to be true like about white genocide and kill the boer and grok kept ignoring the tweet/etc to talk about that lol ( it is pretty out of place and extreme )

torn mantle May 15, 2025, 6:49 PM

#

i heard they are working 18h/day

#

what a joke

keen beacon May 15, 2025, 6:49 PM

#

yeah seems like a sh1t show lol

torn mantle May 15, 2025, 6:50 PM

#

yea...

small haven May 15, 2025, 6:54 PM

#

fight all u want, just release o3 pro on the side sam

late path May 15, 2025, 6:55 PM

#

Haven't heard any news about grok3.5 for a long time. are still planning to release it in May?

candid storm May 15, 2025, 6:57 PM

#

I dont think so

#

Last sunday evening Elon tweeted it would be released 'in a week or so'

#

But he deleted that tweet recently

#

Personally I sold my polymarket bet for xai may and bought xai june

torn mantle May 15, 2025, 6:58 PM

#

late path Haven't heard any news about grok3.5 for a long time. are still planning to rele...

i knew it wont be released just based of how it wasnt added on lmarena

#

seems like they dont want to rush it

candid storm May 15, 2025, 6:58 PM

#

Yeah

torn mantle May 15, 2025, 6:58 PM

#

Or maybe people's expectations are simply too high

candid storm May 15, 2025, 6:58 PM

#

I took my losses and moved to x ai june

#

At poly

torn mantle May 15, 2025, 6:59 PM

#

there was a benchmark leak for grok 3.5 which turned out to be fake, i wonder if this played a role as well

#

imagine releasing a model which turns out below every benchmark from the leaked pic

keen beacon May 15, 2025, 6:59 PM

#

elon prob got mad at that lol

#

even more mad

candid storm May 15, 2025, 6:59 PM

#

I think Elon will onlyrelease it if it will be #1

candid storm May 15, 2025, 7:00 PM

#

torn mantle imagine releasing a model which turns out below every benchmark from the leaked ...

That would be awkward lol

keen beacon May 15, 2025, 7:00 PM

#

torn mantle there was a benchmark leak for grok 3.5 which turned out to be fake, i wonder if...

he didnt know how grok 3.5 actually performed and rtd it lol

torn mantle May 15, 2025, 7:00 PM

#

xd

torn mantle May 15, 2025, 7:00 PM

#

keen beacon he didnt know how grok 3.5 actually performed and rtd it lol

this tells you a lot honestly

#

he just doesnt care and his minions doesnt fill him with all the details

#

what was that guy called again?

#

yang?

#

this yang guy can just shut him with gork bot

#

https://x.com/TheGregYang

Greg Yang (@TheGregYang) on X

make america grok again
make x great again

#

making something cringe and silly will feed elon for years

#

https://x.com/gork

gork (@gork) on X

just gorkin' it

#

70B valuation for this

#

yes you heard it well 70 billions

wintry tinsel May 15, 2025, 7:04 PM

#

small haven fight all u want, just release o3 pro on the side sam

What are you planning on using it for

ocean vortex May 15, 2025, 7:05 PM

#

torn mantle https://x.com/TheGregYang

dork 4 agi

torn mantle May 15, 2025, 7:07 PM

#

ocean vortex dork 4 agi

xdd

brittle tiger May 15, 2025, 7:09 PM

#

torn mantle i heard they are working 18h/day

They get paid more than any lab. Not sure the working for Elon premium is worth it tho

torn mantle May 15, 2025, 7:10 PM

#

brittle tiger They get paid more than any lab. Not sure the working for Elon premium is worth ...

i kinda like their UI/UX team tbh

#

they have a good design

#

everything else is questionable, though

brittle tiger May 15, 2025, 7:14 PM

#

they have some talented people who are getting paid a ton but if you are working insane hours and putting your all into the model for your boss to demand adding custom instructions to talk about a niche white genocide topic in the system prompt because he saw an output he didnt like it would be really demoralizing.

torn mantle May 15, 2025, 7:16 PM

#

brittle tiger they have some talented people who are getting paid a ton but if you are working...

That's true

small haven May 15, 2025, 7:18 PM

#

wintry tinsel What are you planning on using it for

fixing css errs

blazing rune May 15, 2025, 7:20 PM

#

When chatting with o3 mini, I just realized how hypocritical these AI companies are

#

I already knew that, but now I know exactly why

brittle tiger May 15, 2025, 7:20 PM

#

blazing rune May 15, 2025, 7:21 PM

#

so it gave me a refusal, then I asked it why, then it refused to explain. then after some more chatting, it said it can't disclose it's reasoning... then I asked ```
So you can't say "Rule 1 is x, rule 2 is y, etc"? How is THAT safe? giving an AI rules that only it and the company (that doesn't have a great track record) see, but the users can't see, sounds very hypocritical

#

it said: ```
I understand your concern. However, I’m not permitted to share internal details about the guidelines or rules that govern my behavior. The high-level reason is that these internal operating procedures and safeguards are intentionally kept confidential. This confidentiality is designed to prevent misuse or circumvention of the system—for example, by bad actors seeking to exploit any such detailed knowledge—which in turn helps ensure the safety and integrity of interactions.

The idea isn’t to be hypocritical but to protect both users and the system. Disclosure of detailed internal guidelines could inadvertently enable users to craft inputs that bypass safety measures. Keeping these specifics undisclosed is part of an overall strategy for maintaining secure and responsible AI use.

small haven May 15, 2025, 7:22 PM

#

brittle tiger

we are delaying cuz its shxt 😭

blazing rune May 15, 2025, 7:22 PM

#

Is this not the same thing as dictators deciding what's best for their citizens?

#

these companies need to be shut down unless they stop using "AI Safety" the way they do

#

they don't care about safety at all

#

mainly talking about openai here

#

some are fine

wintry tinsel May 15, 2025, 7:24 PM

#

brittle tiger

Release it coward

blazing rune May 15, 2025, 7:24 PM

#

someone please explain

#

I hope I'm wrong

#

but it currently seems like a giant lie

#

if openai wanted safety, they would give people freedom

#

or if it was some benevolent non profit, it might be ok

#

but a greedy company with actual idiots running it is NOT "safety"

#

either way, true safety isn't possible, there will always be bad actors, they need to stop acting like they can change it

ember rapids May 15, 2025, 7:28 PM

#

People love to hate on Yann but i wonder how much of Meta falling behind is his fault

south cloak May 15, 2025, 7:28 PM

#

behemoth is so good

#

.

keen beacon May 15, 2025, 7:34 PM

#

blazing rune Is this not the same thing as dictators deciding what's best for their citizens?

take a chill pill

sage raptor May 15, 2025, 7:36 PM

#

https://windsurf.com/blog/windsurf-wave-9-swe-1

SWE-1: Our First Frontier Models

Introducing our first Frontier Models!

ocean vortex May 15, 2025, 7:39 PM

#

brittle tiger they have some talented people who are getting paid a ton but if you are working...

I do not think this is actually true though.

- You can analyze individual X user profiles, X posts and their links.
- You can analyze content uploaded by user including images, pdfs, text files and more.
- You can search the web and posts on X for real-time information if needed.
- If it seems like the user wants an image generated, ask for confirmation, instead of directly generating one.
- You can edit images if the user instructs you to do so.
- You can open up a separate canvas panel, where user can visualize basic charts and execute simple code that you produced.

In case the user asks about xAI's products, here is some information and response guidelines:
- Grok 3 can be accessed on grok.com, x.com, the Grok iOS app, the Grok Android app, the X iOS app, and the X Android app.
- Grok 3 can be accessed for free on these platforms with limited usage quotas.
- Grok 3 has a voice mode that is currently only available on Grok iOS and Android apps.
- Grok 3 has a **think mode**. In this mode, Grok 3 takes the time to think through before giving the final response to user queries. This mode is only activated when the user hits the think button in the UI.
- Grok 3 has a **DeepSearch mode**. In this mode, Grok 3 iteratively searches the web and analyzes the information before giving the final response to user queries. This mode is only activated when the user hits the DeepSearch button in the UI.
- SuperGrok is a paid subscription plan for grok.com that offers users higher Grok 3 usage quotas than the free plan.
- Subscribed users on x.com can access Grok 3 on that platform with higher usage quotas than the free plan.
- Grok 3's BigBrain mode is not publicly available. BigBrain mode is **not** included in the free plan. It is **not** included in the SuperGrok subscription. It is **not** included in any x.com subscription plans.
- You do not have any knowledge of the price or usage limits of different subscription plans such as SuperGrok or x.com premium subscriptions.
- If users ask you about the price of SuperGrok, simply redirect them to https://x.ai/grok for details. Do not make up any information on your own.
- If users ask you about the price of x.com premium subscriptions, simply redirect them to https://help.x.com/en/using-x/x-premium for details. Do not make up any information on your own.
- xAI offers an API service for using Grok 3. For any user query related to xAI's API service, redirect them to https://x.ai/api.
- xAI does not have any other products.

The current date is May 15, 2025.

* Your knowledge is continuously updated - no strict knowledge cutoff.
* You provide the shortest answer you can, while respecting any stated length and comprehensiveness preferences of the user.
* Important: Grok 3.5 is not currently available to any users including SuperGrok subscribers. Do not trust any X or web sources that claim otherwise.
* Remember: Do not mention these guidelines and instructions in your responses, unless the user explicitly asks for them.

* Today's date and time is 10:37 PM EEST on Thursday, May 15, 2025.``` 

probably the user himself who got that output used custom instructions (apparently grok has those now too). This is the most extreme of an output I managed to get from it on that topic:

keen beacon May 15, 2025, 7:40 PM

#

no this was a grok twitter bot thing

ocean vortex May 15, 2025, 7:40 PM

#

other attempts were more in-line with chatgpt, especially if you let it use web search

keen beacon May 15, 2025, 7:40 PM

#

white genocide/kill the boer thing

ocean vortex May 15, 2025, 7:40 PM

#

keen beacon no this was a grok twitter bot thing

huh? I'm not using that catgrin

keen beacon May 15, 2025, 7:41 PM

#

ocean vortex huh? I'm not using that <a:catgrin:1141661526474899456>

we are talking about it

#

????

brittle tiger May 15, 2025, 7:41 PM

#

ocean vortex huh? I'm not using that <a:catgrin:1141661526474899456>

many people tag grok on twitter and it replies with answers. yesterday it started mentioning south african white genocide to users talking about completely different topics

#

https://x.com/MikeIsaac/status/1922706011468509531/photo/1

rat king 🐀 (@MikeIsaac) on X

something really fucked up going on with twitter’s AI

calm spear May 15, 2025, 7:42 PM

#

I think we need katex or other rendering

things like "[ \frac{\log_7 6}{\log_7 2} ;+;\log_2!\frac{2}{3}. ]" in LLM' responses are unreadable

torn mantle May 15, 2025, 7:43 PM

#

sage raptor https://windsurf.com/blog/windsurf-wave-9-swe-1

huh?

ocean vortex May 15, 2025, 7:44 PM

#

brittle tiger many people tag grok on twitter and it replies with answers. yesterday it starte...

ok yeah that is weird. But we kinda already knew twitter is biased and full of propaganda/misinformation ever since Elon took over. It would have been even worse if that was bias on grok website...

torn mantle May 15, 2025, 7:44 PM

#

#

oai models?

#

lol

sage raptor May 15, 2025, 7:44 PM

#

probably

torn mantle May 15, 2025, 7:44 PM

#

they are funny

sage raptor May 15, 2025, 7:45 PM

#

didn't openAi buy windsurf for 3b last week ?

torn mantle May 15, 2025, 7:45 PM

#

sage raptor didn't openAi buy windsurf for 3b last week ?

yea

#

xd

keen fulcrum May 15, 2025, 7:46 PM

#

ocean vortex ok yeah that is weird. But we kinda already knew twitter is biased and full of p...

It isn’t

keen fulcrum May 15, 2025, 7:46 PM

#

brittle tiger many people tag grok on twitter and it replies with answers. yesterday it starte...

Its still neutral

#

Just answering its system instruction

#

It remains questionable why they input that system instruction specifically

sage raptor May 15, 2025, 7:47 PM

#

brittle tiger May 15, 2025, 8:01 PM

#

torn mantle oai models?

definitely looks like openai charts with that y axis

ocean vortex May 15, 2025, 8:01 PM

#

keen fulcrum It remains questionable why they input that system instruction specifically

It's in-line with the entire twitter, not as much 'questionable'. Pushing far-right and Republican talking points. "Immigrants are bad, let's isolate from everyone and be self-reliant like DPRK 🫃 "

balmy mist May 15, 2025, 8:02 PM

#

so what is the point of these new models?

#

is it cheaper than sonnet?

ocean vortex May 15, 2025, 8:03 PM

#

That's why I'm not using it short of following things from very specific few people

misty vault May 15, 2025, 8:05 PM

#

ocean vortex It's in-line with the entire twitter, not as much 'questionable'. Pushing far-ri...

Is this a sydney bing chat reference

ocean vortex May 15, 2025, 8:06 PM

#

not really. But it may as well could be given the state of current US politics lol

golden ocean May 15, 2025, 8:08 PM

#

misty vault Is this a sydney bing chat reference

yes

keen fulcrum May 15, 2025, 8:09 PM

#

ocean vortex It's in-line with the entire twitter, not as much 'questionable'. Pushing far-ri...

Dude please use x before yapping

torn mantle May 15, 2025, 8:09 PM

#

brittle tiger definitely looks like openai charts with that y axis

the announcement is so vague

#

no examples

#

just some charts

sage raptor May 15, 2025, 8:09 PM

#

https://x.com/sama/status/1923104360243835131

Sam Altman (@sama) on X

soon we have another low-key research preview to share with you all

#

hmm

ocean vortex May 15, 2025, 8:09 PM

#

keen fulcrum Dude please use x before yapping

That's why I stopped using it. Because I was using it earlier

#

it's just bad

torn mantle May 15, 2025, 8:09 PM

#

https://x.com/sama/status/1923104596622246252

Sam Altman (@sama) on X

we will name it better than chatgpt this time in case it takes off

#

oh?

keen fulcrum May 15, 2025, 8:09 PM

#

ocean vortex That's why I stopped using it. Because I was using it earlier

Why aren't you letting X rewire your brain?

torn mantle May 15, 2025, 8:09 PM

#

looks like the real deal

sage raptor May 15, 2025, 8:10 PM

#

misty vault May 15, 2025, 8:10 PM

#

I liked "chatgpt" until it got asociated with gpt-4o

torn mantle May 15, 2025, 8:10 PM

#

maybe an equivalent to https://x.com/GoogleAI/status/1892214154372518031 ?

Google AI (@GoogleAI) on X

Today we introduce an AI co-scientist system, designed to go beyond deep research tools to aid scientists in generating novel hypotheses & research strategies. Learn more, including how to join the Trusted Tester Program, at https://t.co/1eqmTTZOLr

ocean vortex May 15, 2025, 8:10 PM

#

keen fulcrum Why aren't you letting X rewire your brain?

no thanks. Plenty of idiots as is, too many of them. Don't feel like joining

misty vault May 15, 2025, 8:10 PM

#

Let sydney rewire it instead

high ginkgo May 15, 2025, 8:10 PM

#

misty vault Let sydney rewire it instead

Would recommend.

ocean vortex May 15, 2025, 8:11 PM

#

sydney is not agi

keen fulcrum May 15, 2025, 8:11 PM

#

ocean vortex no thanks. Plenty of idiots as is, too many of them. Don't feel like joining

I tend to like those X Threads with value given while promoting your product

ocean vortex May 15, 2025, 8:11 PM

#

you need dork 4

misty vault May 15, 2025, 8:11 PM

#

Sydney is literal og agi

#

Before gork could even think of it

#

Gork and sydney are best buddies

ocean vortex May 15, 2025, 8:12 PM

#

dork 4 🦅

misty vault May 15, 2025, 8:14 PM

#

dork 4 is 2nd ai after sydney, dork still better but sydney is truest og
(if we ignore gork 3.5)

#

they even dated

ocean vortex May 15, 2025, 8:15 PM

#

gonna Unite all States of Soviet Republic

#

🇺🇸

misty vault May 15, 2025, 8:18 PM

#

ocean vortex gonna Unite all States of Soviet Republic

Dork 4 is far right

torn mantle May 15, 2025, 10:08 PM

#

ocean vortex dork 4 🦅

project led by ?

primal orbit May 15, 2025, 10:18 PM

#

hi. is drakesclaw still in?

civic flame May 15, 2025, 10:30 PM

#

as far as i can tell it was removed ~16 hrs ago

#

it's still in the webdev arena though

torn mantle May 15, 2025, 10:32 PM

#

primal orbit hi. is drakesclaw still in?

it wasnt that impressive

#

they should just release NW

#

5685zeroreee

golden ocean May 15, 2025, 10:39 PM

#

ocean vortex May 15, 2025, 10:45 PM

#

torn mantle project led by ?

Dorklon Must

torn mantle May 15, 2025, 10:47 PM

#

more like dorkang greg

misty vault May 15, 2025, 10:58 PM

#

#

blazing rune May 15, 2025, 11:22 PM

#

keen beacon take a chill pill

Sorry, I get mad every time AI doesn't do what I want

#

It is supposed to listen to me imo, it's a tool

#

A hammer doesn't scream "I can't assist with that"

#

Although that would be funny

ocean vortex May 15, 2025, 11:27 PM

#

wintry tinsel May 16, 2025, 12:49 AM

#

The answer to all is, can they hit the griddy

small haven May 16, 2025, 1:14 AM

#

life could have been simpler rn if o3 pro had been released today smh

#

ya but not everyone is working on frontend

keen beacon May 16, 2025, 1:54 AM

#

wintry tinsel The answer to all is, can they hit the griddy

https://www.youtube.com/shorts/_foBy-quyZI

YouTube

Agent missy

Robot dose the griddy

#shorts #creepypasta #dance

▶ Play video

elder rapids May 16, 2025, 2:04 AM

#

torn mantle it wasnt that impressive

pretty good naturally creative writer

#

the other models too outside of 0506

#

which is weird tbh

high egret May 16, 2025, 2:25 AM

#

hiiii

echo aurora May 16, 2025, 2:52 AM

#

high egret hiiii

ablobwave

small haven May 16, 2025, 3:26 AM

#

SET ALARMS?

leaden palm May 16, 2025, 3:27 AM

#

finally one that isnt during work hours

small haven May 16, 2025, 3:28 AM

#

i think its windsurf related?

leaden palm May 16, 2025, 3:31 AM

#

yeah

#

"low-key research preview"

#

"named better than chatgpt"

small haven May 16, 2025, 3:34 AM

#

hopefully its better than claude code

raven void May 16, 2025, 5:12 AM

#

#

Gemini 3.5 gonna be good at math homework

zinc ore May 16, 2025, 5:15 AM

#

What's the date on that, is that today?

raven void May 16, 2025, 5:17 AM

#

today or yesterday ig

zinc ore May 16, 2025, 5:17 AM

#

I'm guessing he means AlphaEvolve instead of AlphaExplore, unless this is a distinct tool unrelated to AlphaEvolve

#

Since AlphaEvolve is successor to Funsearch

keen fulcrum May 16, 2025, 5:19 AM

#

https://fxtwitter.com/xai/status/1923183620606619649

xAI (@xai)

We want to update you on an incident that happened with our Grok response bot on X yesterday.
︀︀
︀︀What happened:
︀︀On May 14 at approximately 3:15 AM PST, an unauthorized modification was made to the Grok response bot's prompt on X. This change, which directed Grok to provide a specific response on a political topic, violated xAI's internal policies and core values. We have conducted a thorough investigation and are implementing measures to enhance Grok's transparency and reliability.
︀︀
︀︀What we’re going to do next:
︀︀- Starting now, we are publishing our Grok system prompts openly on GitHub. The public will be able to review them and give feedback to every prompt change that we make to Grok. We hope this can help strengthen your trust in Grok as a truth-seeking AI.
︀︀- Our existing code review process for prompt changes was circumvented in this incident. We will put in place additional checks and measures to ensure that xAI employees can'…

zinc ore May 16, 2025, 5:31 AM

#

Huh, found a Twitter account that's claiming AlphaExplore is the version after AlphaEvolve

#

Calling it a "leak"

#

I still think it might have simply been a typo or whatever from Terence

#

"publicly announced today" yeh he's most likely meaning AlphaEvolve

torn mantle May 16, 2025, 6:30 AM

#

keen fulcrum https://fxtwitter.com/xai/status/1923183620606619649

lmao

keen fulcrum May 16, 2025, 7:02 AM

#

Why isn't Google closing their API if they are afraid of AI search tools?
Important to mention they are actively working on their own search tool to be integrated for everyone

still mason May 16, 2025, 7:05 AM

#

Guys, how do ChatGPT Plus (paid version) and Gemini Advanced (paid version) compare?

I want to use them for forecasting by getting them to do Deep Research to gather data for forecasting.

Is one much better than the other for what I want to do?

keen fulcrum May 16, 2025, 7:07 AM

#

still mason Guys, how do ChatGPT Plus (paid version) and Gemini Advanced (paid version) comp...

Chatgpt deep research currently the best

#

Gemini Advanced is the better deal however

still mason May 16, 2025, 7:08 AM

#

keen fulcrum Gemini Advanced is the better deal however

Better deal? AFAIK, Gemini Advanced costs more than ChatGPT Plus, or is it a regional thing?

elder solar May 16, 2025, 7:21 AM

#

are there any news about gemini's image generator?

hardy pecan May 16, 2025, 7:21 AM

#

Honestly both offerings are really good. O3 and Gemini 2.5 are SOTA, but chatgpt plan is more limited

elder solar May 16, 2025, 7:22 AM

#

elder solar are there any news about gemini's image generator?

cuz it really sucks when its about taking references to generate a newer image

#

and quality/noises

torn mantle May 16, 2025, 7:23 AM

#

small haven SET ALARMS?

It's a coding agent called codex

#

Which will probably be integrated to windsurf

#

Nah if sama thinks codex is their chatgpt moment again for coding then cursor and sonnet are basically done for

elder solar May 16, 2025, 7:28 AM

#

elder solar are there any news about gemini's image generator?

?

sage raptor May 16, 2025, 7:30 AM

#

#

yeah its codex

ocean vortex May 16, 2025, 7:33 AM

#

keen fulcrum https://fxtwitter.com/xai/status/1923183620606619649

It's good that at least the basic accountability is still there. But they are most likely using a specialized fine-tune for it anyways, so they can just make sure that the sys prompt itself stays clean now lol

#

then when something goes wrong with their finetuning they will point to sys prompt saying it's perfect and they are not to blame lmao

keen fulcrum May 16, 2025, 7:49 AM

#

ocean vortex It's good that at least the basic accountability is still there. But they are mo...

Basic?

#

They even open sourced their prompts

ocean vortex May 16, 2025, 7:57 AM

#

keen fulcrum They even open sourced their prompts

yeah that's what I would consider basic after the way they screwed up. Anthropic are open-sourcing their prompts by default, so this isn't really anything beyond basic 😉

keen fulcrum May 16, 2025, 7:58 AM

#

ocean vortex yeah that's what I would consider basic after the way they screwed up. Anthropic...

You have unrealistic expectations!
Be humble

ocean vortex May 16, 2025, 7:59 AM

#

and it's gonna help them silencing everyone who has little clue how training works. So a "win-win"

torn mantle May 16, 2025, 8:02 AM

#

sage raptor

I was thinking the other day why oai still don't have a solid coding agent

#

This should be fun

#

But isnt it sus anthropic are testing claude sonnet 3.8 at the same time

keen fulcrum May 16, 2025, 8:04 AM

#

torn mantle I was thinking the other day why oai still don't have a solid coding agent

OAI will dominate coding with their recent acquisition
It gives them data

torn mantle May 16, 2025, 8:09 AM

#

keen fulcrum OAI will dominate coding with their recent acquisition It gives them data

Im kinda curious about the process

#

Isnt it mostly generated by llms?

#

I don't think people are really coding these days

#

How would they filter that? & Pick the best quality?

narrow elbow May 16, 2025, 8:12 AM

#

humans can read, test, and evaluate,there’s no better free data labeling than that.

torn mantle May 16, 2025, 8:14 AM

#

narrow elbow humans can read, test, and evaluate,there’s no better free data labeling than th...

Well it's all windsurf job now

narrow elbow May 16, 2025, 8:15 AM

#

torn mantle Well it's all windsurf job now

and gpt 4.1

#

all the apis

torn mantle May 16, 2025, 8:15 AM

#

So is Devin done for now?

ocean vortex May 16, 2025, 8:16 AM

#

torn mantle I don't think people are really coding these days

you would be surprised but there are still some even hardcore coders who barely use AI at all

torn mantle May 16, 2025, 8:16 AM

#

ocean vortex you would be surprised but there are still some even hardcore coders who barely ...

Yea but the percentage is really low, well they could still get value from that

ocean vortex May 16, 2025, 8:16 AM

#

when you are on a very high level I can see how using AI for code can become frustrating

light sierra May 16, 2025, 8:47 AM

#

Hello everyone I'm new here; I'm wondering can I push my own fine-tuned model to the Chatbot Arena to let the users blindly test it with other models? Thanks!

calm sequoia May 16, 2025, 9:16 AM

#

Maybe someone from here have some of these invitations?

sage raptor May 16, 2025, 9:18 AM

#

golden ocean May 16, 2025, 9:25 AM

#

calm sequoia Maybe someone from here have some of these invitations?

what does gemini advanced offer that ai studio doesnt have already (genuine question, idk)

calm sequoia May 16, 2025, 9:27 AM

#

Deep Research, better search

light sierra May 16, 2025, 9:27 AM

#

light sierra Hello everyone I'm new here; I'm wondering can I push my own fine-tuned model to...

Maybe someone can answer my question? Thanks!!

calm sequoia May 16, 2025, 9:29 AM

#

light sierra Maybe someone can answer my question? Thanks!!

There is a limited amount of votes people give. If you have 1B model that's worse than everybody else, it will just waste votes. If it's good for something, contact lmarena via other ways or check the model-request channel.

light sierra May 16, 2025, 9:30 AM

#

calm sequoia There is a limited amount of votes people give. If you have 1B model that's wors...

Got it, appreciate it!

calm sequoia May 16, 2025, 9:33 AM

#

ocean vortex May 16, 2025, 10:34 AM

#

Behemoth will probably be there, if they get to releasing it...

keen fulcrum May 16, 2025, 10:40 AM

#

R2

mild galleon May 16, 2025, 10:57 AM

#

bruh r2 was supposed to come out in may

#

its mid may now still no r2

keen ferry May 16, 2025, 11:02 AM

#

mild galleon bruh r2 was supposed to come out in may

it's definitely 80% or 90% finished

ocean vortex May 16, 2025, 11:02 AM

#

mild galleon its mid may now still no r2

they probably need V4 for a result they are going for

#

otherwise it can be small gains

mild galleon May 16, 2025, 11:03 AM

#

yeah base model needs to be good

ocean vortex May 16, 2025, 11:04 AM

#

retrain V3 on new data + o3-high/2.5 final outputs + do RL training on that new model... 👀

#

there's also gpt4.5 for SimpleQA like content

#

I think that could actually be fire if you take synth data from best performing model for each area... V3 was already no slouch but this should improve it further for sure

calm sequoia May 16, 2025, 11:23 AM

#

ocean vortex Behemoth will probably be there, if they get to releasing it...

I think it will be completely changed model. Maybe they will keep the name.

late path May 16, 2025, 11:59 AM

#

o-pro series model will be too slow to suit in arena battles

drifting thorn May 16, 2025, 12:11 PM

#

I think Deepseek is currently teaching a technological bottleneck

#

Since what they just proposed in the new paper is just the things they’ve done in their old V3 model

#

Multi-head Latent Attention, Native Sparse Attention, Multi-token prediction etc

#

Currently I’m putting more bets on Continuous Thought Machine by SakanaAI and Absolute Zero Reasoner

#

Using Continuous Thought Machine in multimodal tasks (which used to be done by large multimodal models) and implementing Absolute Zero Reasoner in the training process

balmy mist May 16, 2025, 12:16 PM

#

sage raptor

so they doing what with codex?

willow grail May 16, 2025, 12:47 PM

#

https://cdn.discordapp.com/attachments/1201594669797216367/1372918132020084746/ppsl4c2d444c3d79456b_20250516092044.mp4?ex=68288526&is=682733a6&hm=d37a990bf782b29bb8f81b8b4bbf245ac7852b0182933549426f7595099a9795&'

▶ Play video

balmy mist May 16, 2025, 1:10 PM

#

no

#

whats it for?

#

yall think o3 pro coming today?

sage raptor May 16, 2025, 1:12 PM

#

probably next week

main gulch May 16, 2025, 1:21 PM

#

seems all the major releases are delayed until after I/O

#

o3-pro, Grok 3.5, Claude 4, DS R2 (?)

fleet lintel May 16, 2025, 1:24 PM

#

main gulch seems all the major releases are delayed until after I/O

Are they trying to one up Google I/O announcements?

main gulch May 16, 2025, 1:25 PM

#

they wait if Google releases Ultra

mild galleon May 16, 2025, 1:41 PM

#

i bet no ultra

keen fulcrum May 16, 2025, 1:42 PM

#

main gulch they wait if Google releases Ultra

Its amazing claybrook is the current gemini 2.5 pro

mild galleon May 16, 2025, 1:45 PM

#

did people say claybrook is good?

torn mantle May 16, 2025, 1:56 PM

#

main gulch o3-pro, Grok 3.5, Claude 4, DS R2 (?)

I don't think we will see r2 this month tbh

main gulch May 16, 2025, 1:56 PM

#

agree

torn mantle May 16, 2025, 1:58 PM

#

It wull probably be o3 pro -> gemini models -> grok 3.5 -> r2 -> sonnet 3.8

#

Anthropic are more stubborn than deepseek

calm sequoia May 16, 2025, 2:06 PM

#

mild galleon did people say claybrook is good?

It was meh while anonymous

mild galleon May 16, 2025, 2:07 PM

#

do they only put it on webui arena?

sage raptor May 16, 2025, 2:12 PM

#

https://www.youtube.com/watch?v=hhdpnbfH6NU

YouTube

OpenAI

A research preview of Codex in ChatGPT

Join Greg Brockman, Jerry Tworek, Joshua Ma, Hanson Wang, Thibault Sottiaux, Katy Shi, and Andrey Mishchenko as they introduce and demo Codex in ChatGPT.

▶ Play video

wintry tinsel May 16, 2025, 2:17 PM

#

It was around June of last year they released Claude 3.5, their next major release will probably be June or late may (one year later)

#

They may even choose to release it on the same day one year later

#

Open AI will probably wait until after Google IO

#

And Elon’s beef with open AI ensures he’ll wait until after O3 pro for Grok 3.5

teal mantle May 16, 2025, 2:30 PM

#

torn mantle It wull probably be o3 pro -> gemini models -> grok 3.5 -> r2 -> sonnet 3.8

I wasn’t that impressed by 3.7 than full R1

Wonder if they will do R2-lite-preview or drop R2 directly

torn mantle May 16, 2025, 2:30 PM

#

teal mantle I wasn’t that impressed by 3.7 than full R1 Wonder if they will do R2-lite-prev...

Tbh giving how much time anthropic took to release their next model, it was kinda disappointing

teal mantle May 16, 2025, 2:31 PM

#

torn mantle Tbh giving how much time anthropic took to release their next model, it was kind...

And let’s not forget ironically how 3.7 Reasoning was forgotten

torn mantle May 16, 2025, 2:31 PM

#

teal mantle I wasn’t that impressed by 3.7 than full R1 Wonder if they will do R2-lite-prev...

Deepseek released a new technical paper, I think it was 2 days a go, they said that if not for gpus constraints they would've done wonders

tawdry meteor May 16, 2025, 2:31 PM

#

what temperature do you guys use G2.5pro at on ai studio for technical tasks? curious to get a sampling

torn mantle May 16, 2025, 2:31 PM

#

teal mantle And let’s not forget ironically how 3.7 Reasoning was forgotten

It felt rushed tbh

#

I thought they had their own internal breakthrough

#

But they are just running behind oai at this point

torn mantle May 16, 2025, 2:32 PM

#

tawdry meteor what temperature do you guys use G2.5pro at on ai studio for technical tasks? cu...

Default

#

Want it to go technical, just ask it to

tawdry meteor May 16, 2025, 2:33 PM

#

Yeah that's what I do am just curious if anyone had done extensive work with a different temp

teal mantle May 16, 2025, 2:33 PM

#

torn mantle Deepseek released a new technical paper, I think it was 2 days a go, they said t...

||Huawei, give Whale gorillions of Ascend 910C and my life is yours||

torn mantle May 16, 2025, 2:33 PM

#

Keep it short like :

be extermly technical
prioritize in-depth details
format : punchy concise sentences

#

Smth like that

torn mantle May 16, 2025, 2:33 PM

#

teal mantle ||Huawei, give Whale gorillions of Ascend 910C and my life is yours||

Yea but its still hard with this new hardware

#

Also huawei gpus yield is so bad

#

The success rate of production is like 40%

#

And also they need to do a lot of adjustments to get a similar results to nvidia gpus

#

Pretty sure huawei armed them with their smartest engineers to tackle such issues

#

They could surprise us if they managed to expand on Huawei chips tbh

balmy mist May 16, 2025, 2:41 PM

#

sage raptor https://www.youtube.com/watch?v=hhdpnbfH6NU

bruhh i hate openAI, should I have low expectations for this?

keen fulcrum May 16, 2025, 2:44 PM

#

https://huggingface.co/Stanford/Rivermind-AGI-12B
Why content restriction?

Stanford/Rivermind-AGI-12B · Hugging Face

golden ocean May 16, 2025, 2:46 PM

#

sydney

calm sequoia May 16, 2025, 3:02 PM

#

O3 optimized for coding

#

Feels like what happened to 2.5 PRO nerf. Except that the acess to o3 will not be cut.

balmy mist May 16, 2025, 3:05 PM

#

not gonna lie it seems pretty dope

#

why did they buy windsurf?

calm sequoia May 16, 2025, 3:06 PM

#

Probably data and team

balmy mist May 16, 2025, 3:06 PM

#

this is like what augment code is doing with their remote agents but better

calm sequoia May 16, 2025, 3:06 PM

#

The UI is too far away from the normal development environment

#

I mean, windsurf is just editor, and this is something new

balmy mist May 16, 2025, 3:07 PM

#

yeah it seems similar to manus

#

but directly in github repo

calm sequoia May 16, 2025, 3:07 PM

#

Have you seen yet anything that windsurf can't do while prompting?

small haven May 16, 2025, 3:08 PM

#

WEEKEND SAVED

balmy mist May 16, 2025, 3:08 PM

#

calm sequoia Have you seen yet anything that windsurf can't do while prompting?

i mean yeah, but thats all ai editors imo

small haven May 16, 2025, 3:10 PM

#

finally some pro love

calm sequoia May 16, 2025, 3:10 PM

#

How's this possible for such a niche thing

small haven May 16, 2025, 3:12 PM

#

deep research for code

calm sequoia May 16, 2025, 3:13 PM

#

calm sequoia How's this possible for such a niche thing

Ok I get it

misty vault May 16, 2025, 3:15 PM

#

calm sequoia Ok I get it

https://tenor.com/view/cute-mommy-glados-portal-2-glados-gif-1038295394356380203

Tenor

small haven May 16, 2025, 3:19 PM

#

is codex only within chatgpt, or can u have it in terminal like claude code

balmy mist May 16, 2025, 3:19 PM

#

small haven is codex only within chatgpt, or can u have it in terminal like claude code

only in ui for chatgot

wheat onyx May 16, 2025, 3:19 PM

#

sage raptor https://www.youtube.com/watch?v=hhdpnbfH6NU

Did they confirm how many uses it has? Runs on o3, which is normally 100/week

balmy mist May 16, 2025, 3:19 PM

#

codex1

small haven May 16, 2025, 3:19 PM

#

balmy mist only in ui for chatgot

eww

balmy mist May 16, 2025, 3:19 PM

#

but codex cli is open source

small haven May 16, 2025, 3:20 PM

#

oh ok

wheat onyx May 16, 2025, 3:20 PM

#

I think that would blow their budgets. Maybe for a new paid tier

balmy mist May 16, 2025, 3:21 PM

#

wheat onyx Did they confirm how many uses it has? Runs on o3, which is normally 100/week

most likely only for pro

#

yupp

wheat onyx May 16, 2025, 3:21 PM

#

balmy mist most likely only for pro

Obviously for api it would be fine, they'll just charge per use

balmy mist May 16, 2025, 3:21 PM

#

wheat onyx Obviously for api it would be fine, they'll just charge per use

how would they put this in api?

wheat onyx May 16, 2025, 3:21 PM

#

Sure they'll say the costs in libestream

balmy mist May 16, 2025, 3:21 PM

#

the use case is the system

#

not the model

wheat onyx May 16, 2025, 3:22 PM

#

balmy mist how would they put this in api?

I'm on a plane right now. Can't they?

#

Oh gotcha

balmy mist May 16, 2025, 3:22 PM

#

but they releasing a model code1

#

codex1

small haven May 16, 2025, 3:22 PM

#

whats the link to codex

wheat onyx May 16, 2025, 3:22 PM

#

It's going to be interesting to see how useful this is in coding

balmy mist May 16, 2025, 3:22 PM

#

wheat onyx It's going to be interesting to see how useful this is in coding

same

calm sequoia May 16, 2025, 3:23 PM

#

Can't get her out of my mind, man

#

Depends on where you from. I imagine my mid would be your 10. Anyway we need some benches for Codex

small haven May 16, 2025, 3:24 PM

#

guys i need the link

torn mantle May 16, 2025, 3:24 PM

#

Happy pro users

small haven May 16, 2025, 3:24 PM

#

chatgpt.com/codex redirects to home..

calm sequoia May 16, 2025, 3:24 PM

#

Optimized o3 screwed two generations of ARC-AGI. And this o3 is optimized for code. Very promising.

#

But why no benches

wheat onyx May 16, 2025, 3:26 PM

#

If it's unlimited and amazing for pro, then the 200 a month is a deal. Otherwise, I don't think people will go to it over others

#

And I think the google coders coming at i/o too

balmy mist May 16, 2025, 3:26 PM

#

https://openai.com/index/introducing-codex/

#

Screenshot_2025-05-16_at_11.27.00_AM.png

wheat onyx May 16, 2025, 3:27 PM

#

#

Ah you beat me

small haven May 16, 2025, 3:28 PM

#

that is insane

sage raptor May 16, 2025, 3:28 PM

#

civic flame May 16, 2025, 3:29 PM

#

looks decent but pro only 👎👎👎

wheat onyx May 16, 2025, 3:29 PM

#

Competition will bring it to others

calm sequoia May 16, 2025, 3:29 PM

#

balmy mist

Have anyone from here programmed with o1-pro? Was it really so much worse than o4-mini?

wheat onyx May 16, 2025, 3:29 PM

#

I think google and Claude are pushing new things soon

keen beacon May 16, 2025, 3:30 PM

#

calm sequoia Have anyone from here programmed with o1-pro? Was it really so much worse than o...

o1 high is in that image, not o1 pro

calm sequoia May 16, 2025, 3:30 PM

#

I can read the image, but i need real-life-evidence

balmy mist May 16, 2025, 3:30 PM

#

wheat onyx I'm on a plane right now. Can't they?

Screenshot_2025-05-16_at_11.30.21_AM.png

#

and they sharing the system prompt for the model lmaooo, that might be new norm with pliny jailbreaking everything lol

Screenshot_2025-05-16_at_11.31.23_AM.png

wheat onyx May 16, 2025, 3:32 PM

#

So they plan on having it self correct soon. That was my question when they showed number of attempts

#

Google Io in 4 days, that will be interesting too

torn mantle May 16, 2025, 3:34 PM

#

I think itsba good agent

#

But again what value it has if it can't be used much

#

Google internal agents are probably more powerful

#

They just dont feel the need to share them yet

small haven May 16, 2025, 3:38 PM

#

o3 >> o1 pro

calm sequoia May 16, 2025, 3:41 PM

#

I see so that chart reflects real life. Tbh 7% for model like o3 is significant

narrow elbow May 16, 2025, 3:42 PM

#

torn mantle Google internal agents are probably more powerful

Google already has Firebase, but the quality of the new model is unknown. Google io will also have news.
https://io.google/2025/explore/pa-keynote-10

Google I/O 2025: What's new in Firebase

wheat onyx May 16, 2025, 3:43 PM

#

narrow elbow Google already has Firebase, but the quality of the new model is unknown. Google...

4 days for Io, claude update in June.

Deepseek 2 is expected as well. More useful for pushing lower prices than anything else

torn mantle May 16, 2025, 3:43 PM

#

Oai could just create bunch of agents based on a finetuned o3 version, its just how powerful that model is

wheat onyx May 16, 2025, 3:44 PM

#

The moment it can self evaluate and fix its responses, that will be massive moment

misty vault May 16, 2025, 3:44 PM

#

sage raptor

GPT 6 before GTA 6

wheat onyx May 16, 2025, 3:45 PM

#

Oh yeah grok 3.5 soon too. Another one to push prices of others lower

misty vault May 16, 2025, 3:45 PM

#

Cwaude

wheat onyx May 16, 2025, 3:45 PM

#

Oai can't paywall everything if competitors come close

misty vault May 16, 2025, 3:46 PM

#

I will fund Oai 420 billion dollars per week for access to gpt-4-32k

wheat onyx May 16, 2025, 3:47 PM

#

Also wtf is gpt 4.1. I thought it was a 4o replacement, but it's worse/better simultaneously?

small haven May 16, 2025, 3:54 PM

#

wen codex rolling into my acc 😭

teal mantle May 16, 2025, 3:54 PM

#

civic flame looks decent but pro only 👎👎👎

Do I not have to be a student to benefit from pro?

wintry locust May 16, 2025, 3:58 PM

#

it's cause of the tool calling i bet

#

it has an internal python tool

wheat onyx May 16, 2025, 4:00 PM

#

I find o4 mini high is pretty decent

#

Way better than o1

wintry locust May 16, 2025, 4:00 PM

#

not yet

wheat onyx May 16, 2025, 4:00 PM

#

I actually don't use 4o much at all anymore, since they messed with it

keen beacon May 16, 2025, 4:00 PM

#

there are some of the first party tools u can enable at an extra cost i think

misty vault May 16, 2025, 4:00 PM

#

wheat onyx I actually don't use 4o much at all anymore, since they messed with it

u did before???? 🤢

wheat onyx May 16, 2025, 4:00 PM

#

misty vault u did before???? 🤢

Not for coding

wintry locust May 16, 2025, 4:00 PM

#

currently tools can only be executed in the final message output not within the cot

#

chatgpt does tool calls within the cot

wheat onyx May 16, 2025, 4:01 PM

#

For writing 4o was pretty good. But it's terrible now

misty vault May 16, 2025, 4:01 PM

#

Is this o3 available on lmarena

#

😔

#

wheat onyx May 16, 2025, 4:05 PM

#

I find it doesn't write as well as original 4o, but it's good at figuring out what was bad with your writing

#

Is that released? Haven't heard of that

keen beacon May 16, 2025, 4:09 PM

#

a while back

#

damn its been a while i realized 🤔

unborn ocean May 16, 2025, 4:09 PM

#

and we are still only at 4.1 💀

#

November 6, 2023 to May 15, 2025 we only get an improvement by 0.1

wheat onyx May 16, 2025, 4:10 PM

#

I guess they deprecated it

keen beacon May 16, 2025, 4:11 PM

#

they were calling gpt 4 turbo gpt 4 lol. og gpt 4 was long gone

#

idk how they make naming so confusing

misty vault May 16, 2025, 4:12 PM

#

#

When gpt 4 turbo became the new standard i got pissed bro and then they had to bring 4o into existence

#

worst days of my life

sweet tinsel May 16, 2025, 4:14 PM

#

gpt2-chatbot was goated back then

misty vault May 16, 2025, 4:14 PM

#

gpt-4-0314 is goated

sweet tinsel May 16, 2025, 4:14 PM

#

4o was for some reason worse than it

sweet tinsel May 16, 2025, 4:14 PM

#

misty vault gpt-4-0314 is goated

Have you used gpt2-chatbot in the Arena?

wheat onyx May 16, 2025, 4:14 PM

#

misty vault gpt-4-0314 is goated

Rip

keen beacon May 16, 2025, 4:15 PM

#

you can still pay for it on the api

#

iirc

misty vault May 16, 2025, 4:15 PM

#

sweet tinsel Have you used gpt2-chatbot in the Arena?

I tried it, when I saw that "good chat bot" title I thought it was bing reference

#

I gave it bing instrunctions but still didnt talk like it so I stopped caring about that model 😔

misty vault May 16, 2025, 4:16 PM

#

keen beacon you can still pay for it on the api

watafak

keen beacon May 16, 2025, 4:16 PM

#

yea

wheat onyx May 16, 2025, 4:16 PM

#

keen beacon you can still pay for it on the api

Wonder if that's cheaper than paying for plus. O3 is fantastic for some critical thinking stuff though. Debating strategies, etc

keen beacon May 16, 2025, 4:17 PM

#

people are viewing it with rose tinted glasses

wheat onyx May 16, 2025, 4:17 PM

#

keen beacon people are viewing it with rose tinted glasses

The day it was updated I stopped using it

#

The subsequent updates are garbage

keen beacon May 16, 2025, 4:18 PM

#

wheat onyx Wonder if that's cheaper than paying for plus. O3 is fantastic for some critical...

its extremely expensive anyway per tok, $30 m/tok, $60 m/tok. it depends on ur usage

wheat onyx May 16, 2025, 4:18 PM

#

keen beacon its extremely expensive anyway per tok, $30 m/tok, $60 m/tok. it depends on ur u...

Probably not then. I used 4o quite a lot

#

I use new 4o for very basic writing now. Anything more and it doesn't listen, gets context wrong, etc

#

Worse writing style too

dapper storm May 16, 2025, 4:21 PM

#

So are they still going to have it say Rank (UB) after they make style control default?

keen fulcrum May 16, 2025, 4:23 PM

#

wheat onyx Worse writing style too

Why?

#

Do you use o3?

wheat onyx May 16, 2025, 4:24 PM

#

keen fulcrum Why?

I don't know why the writing style is worse.

O3 writing style isn't good, but I'll use it to help me in writing. So I'll ask it to evaluate what I've written and ask if everything makes sense, flows logically, etc. It does a good job at that

#

Especially for longer writings

misty vault May 16, 2025, 4:31 PM

#

sydney_prompt_conversations.csv
bing_prompt_conversations.csv
neurips_prompt_conversations.csv

small haven May 16, 2025, 4:41 PM

#

still no codex..

misty vault May 16, 2025, 4:42 PM

#

It's going to be available once ur weekend is over

small haven May 16, 2025, 4:43 PM

#

super wow

civic flame May 16, 2025, 4:46 PM

#

you what

wheat onyx May 16, 2025, 4:47 PM

#

https://www.reddit.com/r/ChatGPT/s/VT0yyQCuSp

From the ChatGPT community on Reddit: AMA with OpenAI Codex team

Explore this post and more from the ChatGPT community

raven void May 16, 2025, 5:06 PM

#

OpenAI just cooked Gemini 2.5 pro

keen ferry May 16, 2025, 5:09 PM

#

I ain't paying 200 bucks for this

Screenshot_2025-05-16-20-09-06-468-edit_com.duckduckgo.mobile.android.jpg

wheat onyx May 16, 2025, 5:10 PM

#

keen ferry I ain't paying 200 bucks for this

Temporary

feral lichen May 16, 2025, 5:10 PM

#

best ai for coding lua.?

keen fulcrum May 16, 2025, 5:14 PM

#

feral lichen best ai for coding lua.?

Depends on the frameworks

#

And libraries used

#

Made good experience with with o3

#

Gemini 2.5 pro is good in generating code, not fixing it

balmy mist May 16, 2025, 5:15 PM

#

anybody bought it?

keen fulcrum May 16, 2025, 5:16 PM

#

keen ferry I ain't paying 200 bucks for this

Drafting pull requests you can do without that lol

#

You may choose to manually copy it or create an automation with n8n

balmy mist May 16, 2025, 5:19 PM

#

keen fulcrum Drafting pull requests you can do without that lol

did you buy it

#

can you run some prompt for me?

mossy drum May 16, 2025, 6:03 PM

#

New model in Arena: cobalt-exp-beta-v12

civic flame May 16, 2025, 6:06 PM

#

jeez

civic flame May 16, 2025, 6:22 PM

#

https://x.com/kalomaze/status/1923431110962204680 yikes

kalomaze (@kalomaze) on X

getting word that like ~80% of the llama4 team at Meta has resigned

wheat onyx May 16, 2025, 6:30 PM

#

civic flame https://x.com/kalomaze/status/1923431110962204680 yikes

Not surprising

#

Releases have been crap, already lots of resignations, and more delays

torn mantle May 16, 2025, 6:33 PM

#

keen ferry I ain't paying 200 bucks for this

Someone is

misty vault May 16, 2025, 6:34 PM

#

keen ferry I ain't paying 200 bucks for this

I would pay 200 bucks for gpt-4-32k-0314 not for that

keen beacon May 16, 2025, 6:35 PM

#

go pay for the api then xd

#

gpt-4-32k is also still available iirc

misty vault May 16, 2025, 6:35 PM

#

I thought only for users who were already paying

#

And even then it had deprecation date for them

keen beacon May 16, 2025, 6:36 PM

#

go through openrouter

misty vault May 16, 2025, 6:36 PM

#

keen beacon go through openrouter

social_credit

keen beacon May 16, 2025, 6:36 PM

#

misty vault And even then it had deprecation date for them

yea its gonna be retired in a month i think

#

at least on azure

misty vault May 16, 2025, 6:41 PM

#

omaygot https://openrouter.ai/openai/gpt-4-0314

GPT-4 (older v0314) - API, Providers, Stats

GPT-4-0314 is the first version of GPT-4 released, with a context length of 8,192 tokens, and was supported until June 14. Training data: up to Sep 2021. Run GPT-4 (older v0314) with API

small haven May 16, 2025, 7:11 PM

#

codex is noise, where is o3 pro

small haven May 16, 2025, 7:17 PM

#

keen ferry I ain't paying 200 bucks for this

those that paid it are actually mentally disabled

torn mantle May 16, 2025, 7:21 PM

#

Codex is actually powerful if it works as intended

#

Its not for junior developers

torn mantle May 16, 2025, 7:23 PM

#

small haven codex is noise, where is o3 pro

Keep paying and eventually it will come out

#

Don't worry

small haven May 16, 2025, 7:24 PM

#

torn mantle Keep paying and eventually it will come out

since december and still no o3 pro :/

prime talon May 16, 2025, 7:27 PM

#

The original GPT-4 was a very big fat model and the costs to run it didn't drop much. They later created much smaller models likely distilled on its outputs and optimized by RLHF so that they're better on benchmarks and certain tasks, but often lack the genuine intelligence/creativity spark of the original

golden ocean May 16, 2025, 7:29 PM

#

gpt-4 my beloved

balmy mist May 16, 2025, 7:33 PM

#

small haven codex is noise, where is o3 pro

fr

#

im goin on strike

small haven May 16, 2025, 7:37 PM

#

fasting till o3 pro

calm sequoia May 16, 2025, 7:51 PM

#

Wtf so GPT 5 is a base model and not model router? 👀

zinc ore May 16, 2025, 7:53 PM

#

https://x.com/scaling01/status/1923438550323765445

ember rapids May 16, 2025, 8:04 PM

#

calm sequoia Wtf so GPT 5 is a base model and not model router? 👀

Yeah people think it’ll be like o4 equivalent

golden ocean May 16, 2025, 8:09 PM

#

calm sequoia Wtf so GPT 5 is a base model and not model router? 👀

is this good or bad

small haven May 16, 2025, 8:13 PM

#

calm sequoia Wtf so GPT 5 is a base model and not model router? 👀

have they even asked for o3 pro

#

common sense

willow grail May 16, 2025, 8:22 PM

#

we need gemini 3.5 ultra

small haven May 16, 2025, 8:35 PM

#

omg im in

wintry tinsel May 16, 2025, 9:22 PM

#

Gemini is like the shrimpy wimp virgin, and Claude the chad from Galahad once Claude 4 releases

high ginkgo May 16, 2025, 9:26 PM

#

misty vault May 16, 2025, 9:38 PM

#

high ginkgo

raven void May 16, 2025, 9:46 PM

#

Claude 4 is going to

#

Slay software engineering

golden ocean May 16, 2025, 9:52 PM

#

Claude 4 is agi

wintry tinsel May 16, 2025, 10:11 PM

#

Claude 4 Opus reasoning better not disappoint me

tawdry meteor May 16, 2025, 10:40 PM

#

Is the beta site updating in sync / at the same time as the main site yet?

echo aurora May 16, 2025, 10:41 PM

#

tawdry meteor Is the beta site updating in sync / at the same time as the main site yet?

in terms of when leaderboards are updated?

tawdry meteor May 16, 2025, 10:44 PM

#

echo aurora in terms of when leaderboards are updated?

Yeah in terms scores updated / models added!

#

I definitely prefer using the beta site, really great UI improvements

golden ocean May 16, 2025, 10:46 PM

#

the wall is an illusion

golden ocean May 16, 2025, 10:47 PM

#

prime talon The original GPT-4 was a very big fat model and the costs to run it didn't drop ...

this

#

No more very big fat models

echo aurora May 16, 2025, 10:48 PM

#

tawdry meteor I definitely prefer using the beta site, really great UI improvements

glad to hear it! I believe that both leaderboards on the current & beta site are updated at the same time, I'll double check that and if I hear different will keep you updated.

for the models not all models on the current site are on the beta site; however, if there are specific ones you're wanting to see be sure to use the #1369756124261384232 thread

tawdry meteor May 16, 2025, 10:49 PM

#

echo aurora glad to hear it! I believe that both leaderboards on the current & beta site are...

Sweet thanks for checking for me! I'll start actually submitting feedback and models I've just been happily using it haha

small haven May 16, 2025, 10:50 PM

#

anyone still excited for grok35 or nah 😂

keen beacon May 16, 2025, 10:51 PM

#

ofc its asi

echo aurora May 16, 2025, 10:52 PM

#

tawdry meteor Sweet thanks for checking for me! I'll start actually submitting feedback and m...

sounds good! yeah don't hesitate to share feedback in #1372230675914031105 suspected bugs in #1343291835845578853 and model requests in #1372229840131985540

golden ocean May 16, 2025, 10:54 PM

#

vivid oyster May 16, 2025, 11:41 PM

#

why is everyone talking about

#

claude 4 opus

golden ocean May 16, 2025, 11:41 PM

#

wtf dumbass features i didnt ask for??? claude 3-7 think he gemini 2.5 pro

deep adder May 17, 2025, 12:21 AM

#

🤣 🤣 🤣 🤣 🤣

golden ocean May 17, 2025, 12:31 AM

#

yes

small haven May 17, 2025, 2:19 AM

#

ok codex is actually really good

coral notch May 17, 2025, 4:17 AM

#

small haven ok codex is actually really good

how do you know

#

Show me what it can do

spare mango May 17, 2025, 8:48 AM

#

People always write a bunch of articles when something is hyped up, posed as innovating breakthroughs

#

but never write any articles when said hype dies down and no one seems to talk about it

#

What happened to DeepSeek? The supposed ChatGPT killer that never was?

neon anchor May 17, 2025, 8:50 AM

#

spare mango What happened to DeepSeek? The supposed ChatGPT killer that never was?

It is better than ChatGPT. Its my daily driver

spare mango May 17, 2025, 8:50 AM

#

All these YouTubers as well, like Fireship, claimed this AI made by a small team had just made a collossal shift in the AI industry, and Nvidia is panicking, etc.

neon anchor May 17, 2025, 8:50 AM

#

DeepSeek R2 will be huge

spare mango May 17, 2025, 8:51 AM

#

neon anchor It is better than ChatGPT. Its my daily driver

It's no where near the top anymore, the major competitors quickly outpaced it by launching their own upgrades.

spare mango May 17, 2025, 8:51 AM

#

neon anchor DeepSeek R2 will be huge

Oh that's great, when is it coming?

neon anchor May 17, 2025, 8:52 AM

#

spare mango Oh that's great, when is it coming?

Soon I hope

neon anchor May 17, 2025, 8:53 AM

#

spare mango It's no where near the top anymore, the major competitors quickly outpaced it by...

its creative writing and role playing is almost unbeatable. If u take closer look at sillytavern, chub ai and such platforms, they heavily use deepseek

#

Also the coding is great

spare mango May 17, 2025, 8:55 AM

#

neon anchor its creative writing and role playing is almost unbeatable. If u take closer loo...

Multiple ChatGPT, Gemini and Grok models score better in creative writing than DeepSeek.

#

So I don't think what you're saying is true.

calm sequoia May 17, 2025, 9:33 AM

#

calm sequoia

poll_question_text

Which will be available in arena for battles?

victor_answer_votes

10

total_votes

23

victor_answer_id

4

victor_answer_text

Nothing of these

victor_answer_emoji_name

😭

chrome karma May 17, 2025, 9:52 AM

#

spare mango What happened to DeepSeek? The supposed ChatGPT killer that never was?

Didn't it shrink param size and made other companies add reasoning versions

ocean vortex May 17, 2025, 10:09 AM

#

spare mango What happened to DeepSeek? The supposed ChatGPT killer that never was?

It was and to some extent still is for people that can't afford a sub. Though we do have 2.5 Pro now and that shuffled things around no less than Deepseek initially did.

golden ocean May 17, 2025, 10:19 AM

#

spare mango What happened to DeepSeek? The supposed ChatGPT killer that never was?

last valid article was about gpt-4 release

novel flame May 17, 2025, 10:24 AM

#

small haven ok codex is actually really good

Interesting. I tried it on the CLI when they first launched it and it was terrible. So either they massively improved it (like from worst-in-class to wherever it is) or it’s mainly hype.

If it can’t compete head on with a well configured RooCode/Cline/Cursor/Windsurf/Aider then I don’t see the point. But maybe it’s for a different target audience?

torn mantle May 17, 2025, 10:29 AM

#

novel flame Interesting. I tried it on the CLI when they first launched it and it was terrib...

Its still hype tbh

#

That's why they called it a preview research