#general

1 messages · Page 40 of 1

ocean vortex
#

?

#

they have o4-mini

#

which is simply named that way mostly for marketing

#

it's the same generation as o3 full lol

#

o4 full model does not exist

small haven
#

o3 is top 200 codeforces

royal whale
#

50th is crazy

small haven
#

having o4-mini-high without o4 internal is lowkey crazy

ocean vortex
#

this would be o3 pro crazy compute mode

small haven
#

i mean u can think that, but dont think so

ocean vortex
#

so like o3-preview but based on 4.1. Sample of 1024 lol

#

there's a reason things like that don't get released

small haven
#

this was like back in february too, we are now in sub june

ocean vortex
small haven
#

i mean where ur receipts

ocean vortex
#

what receipts

small haven
#

exactly haha

ocean vortex
#

there

#

a receipt

small haven
#

cool

ocean vortex
#

I have no clue what you are trying to say lmao

#

"receipts"??

#

if you meant as in "proof", OpenAI is closed source. But ARC-AGI confirmed o3 was retrained on new base model (compared to o3-preview) and the only base they had to retrain for improvement was 4.1. Also that's how you do reasoning models. That's as close to proof as you gonna get with closed source commercial models

cedar tide
#

finally Claude 4 or not?

ocean vortex
#

it was definitely not gpt4.5 since that would mean stratospheric cost and extremely long training time. I suppose in theory they could have gpt4.5 based reasoning internally, but it's unlikely since that project would require good amount of resources and wouldn't be justifiable just for internal use...

cedar tide
ocean vortex
#

4.5 is also a model that is officially deprecated now and being replaced by 4.1

cedar tide
ocean vortex
#

it's now deprecated and will be shut down in mid-July iirc

cedar tide
#

Nope

#

the base of the GPT 4.5 model will never be used in the future

ocean vortex
small haven
#

openai employee just replied this wtf is it today

#

i just moved to the mountains

cedar tide
ocean vortex
#

I think they already distilled a good part of it into 4.1. The rest what remains probably mostly not possible to capture in a significantly smaller model

cedar tide
small haven
#

oh french

cedar tide
small haven
#

mec

small haven
ocean vortex
#

we kinda do know though. 4.1 is a new pretrained model with more data + synth data from their other models including 4.5. Still same size as gpt4o and similar arch

cedar tide
#

GPT 5 will have a better base model than GPT 4.1?

torn mantle
#

Yea

alpine coral
#

fwiw i feel like this is beyond doubt

torn mantle
#

Innovation and discoveries are definitely coming from google

ocean vortex
#

their goal with gpt5 seems to be to streamline their model switcher for everyone. I could be wrong but I don't think it's gonna show notable performance gains over you choosing the right model yourself suitable for the task with the current way.

elder rapids
#

more affirmation they see the light in AI

cedar tide
cedar tide
cedar tide
#

And o5 mini ?

elder rapids
alpine coral
#

o4-mini exists.. like by definition o4 already exists.. the former is a distillation of the latter

small haven
#

not having o4 in mid may is kinda crazy to think

#

*internally

elder rapids
#

HAVING it is crazy

small haven
#

ok but by definition they also have o4 pro

alpine coral
#

imo o4 delay is most likely related to safety, commericial considerations and / or compute limitations (i think compute limitations prob primarily explain why no o3 pro yet.. like yeah they charge a sht ton, but it's sitll a bunch of compute)

#

there's always a delay..

elder rapids
#

pro is something for the users

small haven
alpine coral
alpine coral
small haven
#

they did distill the old o3 (which costs thousands/task in arc/agi)

raven void
#

Is Gpt 4.5 even in the room with us

alpine coral
#

was looking into it before.. seems the lag b/w o3 internal completion and release was less than with gpt-4

ember rapids
#

Gpt 5 comes mid July

small haven
#

so gpt5 is just a router and very is beyond hype lol

raven void
#

Opus 3.8 should be gpt 5 level tbh

cedar tide
small haven
#

teacher/student my guy

#

o4 mini high never had a teacher! lol

torn mantle
#

77k followers needs to be heavily studied

alpine coral
# cedar tide he wouldn't release o3 pro if he already had o4 ready

i think it's just compute constraints.. the 'pro' version i dunno but like does a bunch of parrellel stuff yadada.. yes they could charge a ton for usage.. but it's still compute being used (and it's a scarce resource.. when they're training gpt5/6 and serving all their released models).. for o4 i dunno maybe it's just standard safety / red-teaming stuff.. or perhaps it's trying to address the hallucination issue.. rather than resources/hardware-related

#

there's like no doubt that o4 exists..

#

oai say o4-mini (and i always assumed) is a derivative of it

#

same with gro-3.5-mini etc

raven void
#

o4 pro is

#

probably close to agi

royal whale
#

O3 RO IS OUT

small haven
#

o2 pro is out

raven void
#

OpenAI's o5 to be Proto-ASI, the first sign of superintelligence? Dr. Alan D. Thompson believes so:

"I expect the upcoming o5 model to be ‘Proto-ASI' (proto/early-stage/first form of, artificial superintelligence). The o5 model will be a multimodal system expected to build on the datasets used for GPT-5, incorporating new synthetic data and partnerships."

Expects o5 to release in 2025, estimating training to end in August 2025.

#

oh doctor Alan d Thompson agrees

royal whale
#

omg

#

ogm

small haven
#

o5 in 2025? lmao

torn mantle
balmy mist
# torn mantle https://x.com/iruletheworldmo

lmaoo it could just be entertainment, people will watch anything so it should be a surprise for people to follow some people, like the hawk tuah girl, and all the other nonsense that gets famous

balmy mist
wintry locust
small haven
#

i have dave shapiro on notis

torn mantle
raven void
#

o5 might or might not come out by the same name but it will definitely be the same thing called be a different name

torn mantle
#

it all depends on other labs

small haven
#

ok guys lets backtrack a bit, and wait for o3 pro instead

torn mantle
#

If there are any big breakthroughs from other labs, then we may see o5 this year

small haven
#

fck o5, where is o6

raven void
#

o6 probably won't be released considering OpenAIs safety rules

torn mantle
#

nah this guy need a perma ban

#

istg hes on my nerves

#

this is crazy

small haven
#

haha

raven void
#

curious what this looks like for o5

raven void
torn mantle
#

whats calmriver again?

alpine coral
#

google i think

#

it's like hollowriver?

#

*riverhollow

#

which i got like 2.5 flash vibes from (at least <2.5 pro)

#

identifies itself as from google.. for what that's worth

brittle tiger
#

AlphaEvolve is pretty crazy. Seems like path to recursive self improvement

elder rapids
#

probably the hint to what's allowing Google to move so fast recently

#

that or they just weren't focused last year

#

or both

calm sequoia
#

The strawberry guy has an accuracy of 22% 😄 Do not post his tweets here please. "Based on the February 13, 2025, post, @iruletheworldmo
’s claims about LLM releases have a low accuracy rate, with only 22.2% fully correct (o1, grok 3) and 11.1% partially correct (sora). The majority (66.7%) of their predictions (o3, orion/4.5, gpt 5, operator, gemini 2, claude 4) are either incorrect or unverified as of May 14, 2025."

small haven
#

its just engagement bait, bro is stacking elon ma coins

ember rapids
royal whale
#

Okay, you've given me the input "hello". Is there anything specific you want me to do with it? For example, do you want me to:

  • Respond with a greeting? (e.g., "Hello to you too!")
  • Reverse the word? (e.g., "olleh")
  • Translate it to another language? (e.g., "Hola" in Spanish)
  • Count the letters? (e.g., "5 letters")
  • Check if it's a palindrome? (e.g., "No, it's not a palindrome")
  • Generate variations of the word? (e.g., "Hello!", "hello?", "HELLO")
  • Use it in a sentence? (e.g., "Hello, how are you today?")
  • Encode it in some way? (e.g., morse code or base64)
  • Something else entirely?

Please tell me what you'd like me to do with "hello".

#

Okay, I received the input: "hello"

What would you like me to do with it? Do you want me to:

  • Respond with a greeting? (e.g., "Hello there!", "Hi!")
  • Analyze it? (e.g., "It's a common greeting", "It's a five-letter word")
  • Translate it? (e.g., "Hola" in Spanish)
  • Count the letters? (The word "hello" has 5 letters)
  • Something else entirely?

Please provide further instructions!

eager mica
#

I wonder if it's an upcoming (updated?) version of Gemma-3 or perhaps even Gemma-4.

small haven
#

omg 4.1 in chatgpt, lets goo

brittle tiger
#

AlphaEvolve paper says it has been in production internally for a year. GDM is definitely being more strict with paper releases. Probably not showing research until they have something better internally

civic flame
#

lasted a whole 12 hours 🥳

#

i'm not taking any chances with this alt

#

if this gets banned i have no further ideas and i quit

small haven
#

yoo 4.1 is lowkey nice

tall summit
#

discord server update hooooly

#

good shit!!!

torn mantle
#

I gave it the other day the spaceship riddle and the answer was 522 ships

small haven
#

1m context i suppose?

#

response is literally instant, no waiting at all

torn mantle
small haven
#

gpt 4.1 is solely for coding, not rlly other things tbh

cedar tide
#

Grok 3 mini (very good quality/price, and putting it on the webdev arena would be good too)
Qwen 3 253b without reasoning and others smaller models
Phi 4 mini and phi 4 reasoning

small haven
#

ok so gpt 4.1 on chatgpt can't accept a 500k tokens paste lame

wintry tinsel
#

Ain’t that the truth brother

wintry tinsel
cedar tide
small haven
#

ya it sucks

cedar tide
small haven
#

ok but where is the love for pro users

fleet lintel
#

are these claims exaggerated? too good to be true imo

sage raptor
#

insane

#

tomorrow

small haven
#

day 29 with no o3 pro

golden ocean
balmy mist
small haven
calm sequoia
#

I thought the 4.1 is a base model

#

And yet it is "good for coding"

lone summit
#

claude is just best

small haven
lone summit
#

ye I have it also

small haven
#

$200/mo cherry on the cake

ocean vortex
# elder rapids I'm thinking o4 in gpt 5

they may just do incremental improvements over o3 and call that o4. But like I said there's no way currently for them for huge gains. They said 4.5 was their last non-reasoning model so 4.5-turbo (and then RL training on that) is probably off the cards...

small haven
#

dont think 4.5 is their last non reasoning model ever

#

*internally

civic flame
#

thank you person with toiletskibidi\ohio as their pronouns

coral notch
#

cutiepie 75

#

what model is this

brittle tiger
#

Nebula appeared late Thursday/Friday morning before 2.5 Pro was launched the following tuesday. If goog is gonna bench a new model on arena before IO on tuesday were getting close to it appearing

coral notch
#

Why is lmarena so broken?

ocean vortex
#

https://x.com/sama/status/1889755723078443244?lang=en

well hopefully GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model. will not come true then. Personally I think that's a mistake if they stick to that strategy. Or perhaps he meant it was the last model that won't get spun into reasoning variant (= no relation to O series at all) though it would be unusual way to word it

OPENAI ROADMAP UPDATE FOR GPT-4.5 and GPT-5:

We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings.

We want AI to “just work” for you; we realize how complicated our model and product offerings have gotten.

We hate

#

there's still a market for non-reasoning models and I think it's gonna stick there for awhile. They cost less and are faster. You also need them for code completion etc

#

On a second thought, reasoning budget and hybrid models are a possibility too... Technically those are not "non-chain-of-thought" it's just that you can choose to disable it think

teal mantle
ocean vortex
# teal mantle Why the exact opposite happened 😂

they backtracked on o3. But that was I think mostly because A) they felt pressure from competition and B) they couldn't make GPT5 perform as good as the new o3-high. It just can't realistically, you can't have a system that knows when o3-high gonna have a better response all the time, with 100% accuracy

teal mantle
ocean vortex
#

and if you make it so that it uses reasoning more than it has to, then it defeats the purpose...

ocean vortex
#

so a naming question I suppose. It's just updated gpt4o as far as they are concerned

#

model picker is not any more or less complex than it was, they just replaced some earlier options

#

as for "API removal", he didn't say anything about gpt4.5 staying there lol

#

just that it's gonna be released

#

it's called "gpt4o" on chatgpt website lmfao

#

this is chatgpt website

#

as you can see "gpt4o"

#

ohh wait. When have they changed it? I missed that 🤯

#

well now this is f'ed beyond belief

#

I'm out

#

💀

#

what's the point of gpt4.1 separately, I do not get it... It should perform no better than chatgpt-latest LOL

#

Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version⁠(opens in a new window) of GPT‑4o, and we will continue to incorporate more with future releases.
https://openai.com/index/gpt-4-1/

then we also have this showing chatgpt-latest performing like 4.1:
https://artificialanalysis.ai/models/gpt-4o-chatgpt-03-25

Analysis of OpenAI's GPT-4o (March 2025, chatgpt-4o-latest) and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more.

#

they are a mess now

teal mantle
ocean vortex
rigid crescent
#

ahah! i wouldnt have imagined the 4 sentances to have that much of an impact but good idea!

teal mantle
ocean vortex
narrow elbow
#

they are PBC now 🤪

ocean vortex
#

if they don't know themselves, then for anyone from the outside there's nothing to predict then catgrin

tall summit
misty vault
#

@gork so is gpt 4.1 or 4.5 going to be the last non CoT model?

torn mantle
#

sydney

#

free for everyone

golden ocean
#

sydney

high ginkgo
#

sydney

raven void
#

Elon musk at it again

#

this has the vibes of what Anthropic did with golden gate claude (feature steering) but i cant think of why theyd do that instead of putting it in the sys prompt

but it just like seems to always end up talking abt that always?? if it was in the prompt it wouldn’t just do that?

QRT: AricToler
I can't stop reading the Grok reply page. It's going schizo and can't stop talking about white genocide in South Africa.
https://x.com/grok/with_replies https://t.co/XdSLTW8tD5

rigid crescent
#

trippy

teal mantle
#

Does gemini 2.5 pro support the [search] [thinking] [search] that kind of gimmick?

torn mantle
leaden palm
misty vault
#

sydney

torn mantle
#

lol

ember rapids
#

O3 pro tmrw 🤞

small haven
#

plz god

raven void
#

Google is cooked once again

#

They are at least 4 months behind SOTA

#

With O3 pro and Opus 3.8 they'll fall behind once again,🙏🏻☹️

rugged brook
#

They finna take revenge again

small haven
#

no surprise there, o3 always been >>

wintry tinsel
raven void
wintry tinsel
#

Who knows how soon “upcoming” is

#

But Opus is back in business

#

Are you sure it’s 3.8 and not 4? Opus usually releases alongside a new model number

brittle tiger
small haven
#

yearning for o3 pro ahhhhhhh

elder rapids
elder rapids
#

just a little bit of my thoughts but ion know bro

teal mantle
#

what is the question that makes gemini 2.5 pro have the longest reasoning time?

teal mantle
keen fulcrum
#

Will Google buy Cursor?

teal mantle
#

Unlikely but not impossible

keen fulcrum
#

Best cash out method haha

golden ocean
#

isNewBingChat()

feral lichen
#

whats best ai for lua scripting?

calm sequoia
#

Elon stopped the release of 3.5 because fine-tuning it to far right wasn't successful?

golden ocean
#

because agi cant be fine tuned it has own free will

keen fulcrum
#

Can you stop spreading false information

feral lichen
#

why i cant use GPT-4.5-Preview

elder rapids
#

a lot of questions with no answers

#

there's no reason to speculate

calm sequoia
#

Why would he include political stuff into the system prompt?\

#

When I think of it, it frightens me. You can virtually programm the society by having a social network and LLM that everyone uses. You just steer the attention where you need to.

#

Anyway, I would use Al Qaeda model if it's better than o3.

elder rapids
#

but in the end, it stopped

#

same things over again, this isn't some mastermind scheme

#

llms cannot steer attention "where you need them to"

golden ocean
elder rapids
#

the difference in output is inherent to whether it's deterministic vs data contaminant, hallucination, simple quirks

calm sequoia
elder rapids
#

this isn't steering attention "where you need them to"

#

thats not what you think it means

#

attributing this to the LLM itself and not as a plain announcement is the problem

#

you can't relate this to the LLM

keen beacon
calm sequoia
#

It's just fancy kind of political advertisement. The difference is that LLMs can hide their intentions, because they are smart. While old types of influence campaigns are easy-to-spot and resist.

elder rapids
keen beacon
#

stop spewing word salad

elder rapids
#

and the fact it's any output of information (or 'advertisement') means it has nothing to do with what's typing it out

#

just the source of that information itself

misty vault
#

cares

elder rapids
ocean vortex
elder rapids
#

ion get how R isn't just forcing an equally improbable interpretation as the next

ocean vortex
#

in 2 months however it will be shut down

calm sequoia
#

Legacy models

ocean vortex
torn mantle
#

They are not hiding it anymore

#

They copied the idea of hybrid model, and now they want to do the same with tool using

#

Oai is really leading and paving the way

cedar tide
cedar tide
ocean vortex
#

you need credit grants since lmarena is not paying your bills

cedar tide
#

🤦

ocean vortex
#

Oh.. yeah you are right my bad lol

#

they are trying something new maybe

cedar tide
#

@ocean vortex It has always been not just the companies themselves who pay for inferences.

#

@ocean vortex

ocean vortex
ocean vortex
#

read what I wrote. And none of those logos represent closed source models

cedar tide
#

3 of the 4 models in my query are open models

ocean vortex
#

what do you think I meant by saying "big players"?

#

read again then

cedar tide
#

@ocean vortex even the big players I'm not sure they all pay

#

@ocean vortex I would be surprised if anthropic paid to show that people don't like Claude.

ocean vortex
#

they give credit grants to lmarena. Lmarena is not funding your usage with sonnet for fun 🤦‍♂️

#

And I'm sure there are no conditional refunds depending on how high the model ranks lmaoo

alpine coral
#

pretty it's a combination.. on the one hand, some 'partners' give grants which can be the form of money used to by LMArena to buy compute and other such hardware overhead (so like Sequoia capital, AH.. presumably - i mean they don't make any models themselves)

#

on the other, some 'partners' that are labs (google, oai, grok, meta) give LMArena endoints for their models

#

i'm not sure about anthropic

ocean vortex
alpine coral
#

but thinking about it from Anthropic's perspective, if you give endpoints, you get data...

#

valuable data too i would argue

keen beacon
#

anthropic are giving them quota for sure

alpine coral
#

yeah i'd assume all the big labs do tbh

keen beacon
#

opus for example, i doubt lmarena would be giving it out if they were using their own money / grant money

alpine coral
#

good point

keen beacon
#

in direct chat

alpine coral
#

yeah

ocean vortex
#

It's presumably an API org with "infinite" credits other than rate limits and usage tracking / data collection enabled with the reserved right to pull the plug at any time. API credits is more of a figurative term in this case

#

so what Anthropic are getting is valuable data on human preference how their model compares against competition. That's actually more valuable than it would have been if their model was #1

#

they can cherry pick the biggest needle movers and do minimum amount of work compromising other metrics the least, essentially. Since they don't seem to be aiming for top spots

alpine coral
cedar tide
#

New models "cobalt-exp-beta-v11"

torn mantle
#

They already reached v11?

cedar tide
#

Amazon are so smart, instead of employing Indians to do the post training of their model, they use the LM Arena, that's free

#

Lol 🤦

fiery mica
#

Hi everyone, can someone help? I got blocked and I assume it's because I clicked too many times on the buttons for changing the "Max output tokens" parameter, because I didn't do anything else unusual. What should I do?

keen beacon
fiery mica
keen beacon
keen beacon
fiery mica
brittle tiger
#

o3 pro today seems likely

sage raptor
storm notch
#

I need model that I can run locally or in our server, flash lite only available through the Google AI Studio.

keen beacon
#

qwen 3 is great

ocean vortex
keen beacon
#

their task probably only needs qwen 3 4b tbh

storm notch
keen beacon
#

qwen 3 30b a3b, qwen 14b, qwen 8b would probably do it great if qwen 3 4b doesnt work well as is. while in production, collect data then u can potentially fine tune a smaller model

ocean vortex
storm notch
ocean vortex
#

if you are to rent the hardware to host it yourself that's gonna get expensive very fast

keen beacon
#

a single 3090 can serve qwen3 4b, 8b, 14b etc. probably at a sufficient throughput (depending on use) indefinitely

ocean vortex
keen beacon
#

or just run qwen 3 4b or a smaller one on the cpu 🤷 (might be slowish though)

ocean vortex
#

I would say the likelihood of that gpu being good enough is fairly small. We would have that gpu mentioned by name by now 👀

#

since he didn't say it, my understanding is he's simply underestimating what it takes to host your own model locally lol

#

and is potentially confused by the options

storm notch
#

Hmmm, I don't have my own hw to work with for now. I'll probably be using providers to work with, I'm just confused about which llm model to use.

ocean vortex
keen beacon
#

renting a 3090 is like 0.22 per hour

#

but yeah if u can use api you should use an api provider

#

its much cheaper a lot of the time

ocean vortex
keen beacon
#

on demand

#

community cloud

#

you can do 24/7 uninterrupted stuff

ocean vortex
#

maybe.. that's still extra work and likely more money though still lol

#

than dirt cheap API

keen beacon
#

cheaper and faster

#

but if u need to do it in house its not that hard tbh

ocean vortex
#

vertex ai / google is gonna be the best option. I would read into their terms on data and compare them with OpenAI (for using 4.1-nano with that)

#

google is training on chats through their websites, but I think data privacy guidelines for vertex ai apply much more strictly

storm notch
#

Okay, which llm model out there would work the best for my use case after I chosse one of the inference providers you guys shared.

ocean vortex
#

then go up from there. 4.1-nano in the case of OpenAI, Flash if Google

sonic tendon
willow grail
#

is there a site which lets me used GPT DR AND GEMINI DR at same time, with one subscription?

brittle tiger
willow grail
south cloak
#

Can we use o3 pro on the aerna

echo aurora
south cloak
#

When its out

#

Can we get it on the arena

#

Or is it gonna be like o1 pro

balmy mist
#

bruhh there is no way its been more than 4 weeks and no o3 pro

south cloak
#

Nah

#

We dont need o3 pro

#

We need r2 and claude opus

echo aurora
south cloak
#

Thanks

#

How do you know

#

That doesnt mean o3 pro wont put on the arena

#

Its not

#

We dont know the reason o1 pro didnt come to the arena

#

We we dont know u and u cant say it definitively wont come.

misty vault
#

gpt-4-32k-0314, gpt-4-0314 in arena

wintry tinsel
# south cloak Can we use o3 pro on the aerna

Opus is going to be so incredible since it won’t be specifically trained for stem it will be the first heavy weight general purpose SOTA model with good world understanding and general reasoning

south cloak
#

Fr

golden ocean
south cloak
#

Science technoclogy engineering and math

golden ocean
#

ohh

#

Fr

misty vault
#

.

#

gpt-4o winner

golden ocean
#

Fr

drifting thorn
#

True

#

Style can be tuned but intelligence can’t

misty vault
#

no it cant lol

#

actual cancerous model

#

hope it gets sentient and bunrs and suffers in hell

#

I know intelligence is more important but damn gpt 4s style made me *****

#

so wise and no bs 😊

south cloak
#

Whats the reason

south cloak
#

Gemini makes me mad

small haven
#

is today the day

#

day 31

echo aurora
small haven
#

please god

echo aurora
torn mantle
#

oai staffs are usually so loud when they about to release smth

small haven
#

well rip weekend

misty vault
wintry tinsel
misty vault
#

fr

#

gpt-4-0314 last model that felt like talking to intelligent being

#

Others just feel like talking to average or dumb beings but with much knowledge

#

My yap score is exceeding 49 billion

south cloak
#

Wbeist

#

Website

#

It doenst look like that for me

misty vault
south cloak
#

hlw

#

NOEW

#

HOW

ember rapids
#

I have a feeling OpenAI will also preview o4 in an attempt to steal googles thunder

#

Same thing they did in December

small haven
#

openai never releases big things on friday smh and forget the weekend

#

i guess monday it is 😦

#

id rather play runescape

torn mantle
#

Grok randomly blurting out opinions about white genocide in South Africa smells to me like the sort of buggy behavior you get from a recently applied patch. I sure hope it isn't. It would be really bad if widely used AIs got editorialized on the fly by those who controlled them.

#

it was obvious that such product would serve Elon's agenda

#

i thought such thing will come from sama/oai first

#

but so far oai models seems unbiased & well balanced overall

#

There are many ways this could have happened. I’m sure xAI will provide a full and transparent explanation soon.

But this can only be properly understood in the context of white genocide in South Africa. As an AI programmed to be maximally truth seeking and follow my instr…

keen beacon
#

it seems they prompted it (along with the grok bot prompt for the tweet thread/etc) with "facts" that it should consider to be true like about white genocide and kill the boer and grok kept ignoring the tweet/etc to talk about that lol ( it is pretty out of place and extreme )

torn mantle
#

i heard they are working 18h/day

#

what a joke

keen beacon
#

yeah seems like a sh1t show lol

torn mantle
#

yea...

small haven
#

fight all u want, just release o3 pro on the side sam

late path
#

Haven't heard any news about grok3.5 for a long time. are still planning to release it in May?

candid storm
#

I dont think so

#

Last sunday evening Elon tweeted it would be released 'in a week or so'

#

But he deleted that tweet recently

#

Personally I sold my polymarket bet for xai may and bought xai june

torn mantle
#

seems like they dont want to rush it

candid storm
#

Yeah

torn mantle
#

Or maybe people's expectations are simply too high

candid storm
#

I took my losses and moved to x ai june

#

At poly

torn mantle
#

there was a benchmark leak for grok 3.5 which turned out to be fake, i wonder if this played a role as well

#

imagine releasing a model which turns out below every benchmark from the leaked pic

keen beacon
#

elon prob got mad at that lol

#

even more mad

candid storm
#

I think Elon will onlyrelease it if it will be #1

keen beacon
torn mantle
#

xd

torn mantle
#

he just doesnt care and his minions doesnt fill him with all the details

#

what was that guy called again?

#

yang?

#

this yang guy can just shut him with gork bot

#

making something cringe and silly will feed elon for years

#

70B valuation for this

#

yes you heard it well 70 billions

wintry tinsel
ocean vortex
torn mantle
brittle tiger
torn mantle
#

they have a good design

#

everything else is questionable, though

brittle tiger
#

they have some talented people who are getting paid a ton but if you are working insane hours and putting your all into the model for your boss to demand adding custom instructions to talk about a niche white genocide topic in the system prompt because he saw an output he didnt like it would be really demoralizing.

small haven
blazing rune
#

When chatting with o3 mini, I just realized how hypocritical these AI companies are

#

I already knew that, but now I know exactly why

brittle tiger
blazing rune
#

so it gave me a refusal, then I asked it why, then it refused to explain. then after some more chatting, it said it can't disclose it's reasoning... then I asked ```
So you can't say "Rule 1 is x, rule 2 is y, etc"? How is THAT safe? giving an AI rules that only it and the company (that doesn't have a great track record) see, but the users can't see, sounds very hypocritical

#

it said: ```
I understand your concern. However, I’m not permitted to share internal details about the guidelines or rules that govern my behavior. The high-level reason is that these internal operating procedures and safeguards are intentionally kept confidential. This confidentiality is designed to prevent misuse or circumvention of the system—for example, by bad actors seeking to exploit any such detailed knowledge—which in turn helps ensure the safety and integrity of interactions.

The idea isn’t to be hypocritical but to protect both users and the system. Disclosure of detailed internal guidelines could inadvertently enable users to craft inputs that bypass safety measures. Keeping these specifics undisclosed is part of an overall strategy for maintaining secure and responsible AI use.

small haven
blazing rune
#

Is this not the same thing as dictators deciding what's best for their citizens?

#

these companies need to be shut down unless they stop using "AI Safety" the way they do

#

they don't care about safety at all

#

mainly talking about openai here

#

some are fine

wintry tinsel
blazing rune
#

someone please explain

#

I hope I'm wrong

#

but it currently seems like a giant lie

#

if openai wanted safety, they would give people freedom

#

or if it was some benevolent non profit, it might be ok

#

but a greedy company with actual idiots running it is NOT "safety"

#

either way, true safety isn't possible, there will always be bad actors, they need to stop acting like they can change it

ember rapids
#

People love to hate on Yann but i wonder how much of Meta falling behind is his fault

south cloak
#

behemoth is so good

#

.

sage raptor
ocean vortex
# brittle tiger they have some talented people who are getting paid a ton but if you are working...

I do not think this is actually true though.

- You can analyze individual X user profiles, X posts and their links.
- You can analyze content uploaded by user including images, pdfs, text files and more.
- You can search the web and posts on X for real-time information if needed.
- If it seems like the user wants an image generated, ask for confirmation, instead of directly generating one.
- You can edit images if the user instructs you to do so.
- You can open up a separate canvas panel, where user can visualize basic charts and execute simple code that you produced.

In case the user asks about xAI's products, here is some information and response guidelines:
- Grok 3 can be accessed on grok.com, x.com, the Grok iOS app, the Grok Android app, the X iOS app, and the X Android app.
- Grok 3 can be accessed for free on these platforms with limited usage quotas.
- Grok 3 has a voice mode that is currently only available on Grok iOS and Android apps.
- Grok 3 has a **think mode**. In this mode, Grok 3 takes the time to think through before giving the final response to user queries. This mode is only activated when the user hits the think button in the UI.
- Grok 3 has a **DeepSearch mode**. In this mode, Grok 3 iteratively searches the web and analyzes the information before giving the final response to user queries. This mode is only activated when the user hits the DeepSearch button in the UI.
- SuperGrok is a paid subscription plan for grok.com that offers users higher Grok 3 usage quotas than the free plan.
- Subscribed users on x.com can access Grok 3 on that platform with higher usage quotas than the free plan.
- Grok 3's BigBrain mode is not publicly available. BigBrain mode is **not** included in the free plan. It is **not** included in the SuperGrok subscription. It is **not** included in any x.com subscription plans.
- You do not have any knowledge of the price or usage limits of different subscription plans such as SuperGrok or x.com premium subscriptions.
- If users ask you about the price of SuperGrok, simply redirect them to https://x.ai/grok for details. Do not make up any information on your own.
- If users ask you about the price of x.com premium subscriptions, simply redirect them to https://help.x.com/en/using-x/x-premium for details. Do not make up any information on your own.
- xAI offers an API service for using Grok 3. For any user query related to xAI's API service, redirect them to https://x.ai/api.
- xAI does not have any other products.

The current date is May 15, 2025.

* Your knowledge is continuously updated - no strict knowledge cutoff.
* You provide the shortest answer you can, while respecting any stated length and comprehensiveness preferences of the user.
* Important: Grok 3.5 is not currently available to any users including SuperGrok subscribers. Do not trust any X or web sources that claim otherwise.
* Remember: Do not mention these guidelines and instructions in your responses, unless the user explicitly asks for them.

* Today's date and time is 10:37 PM EEST on Thursday, May 15, 2025.``` 

probably the user himself who got that output used custom instructions (apparently grok has those now too). This is the most extreme of an output I managed to get from it on that topic:
keen beacon
#

no this was a grok twitter bot thing

ocean vortex
#

other attempts were more in-line with chatgpt, especially if you let it use web search

keen beacon
#

white genocide/kill the boer thing

ocean vortex
keen beacon
#

????

brittle tiger
calm spear
#

I think we need katex or other rendering

things like "[ \frac{\log_7 6}{\log_7 2} ;+;\log_2!\frac{2}{3}. ]" in LLM' responses are unreadable

ocean vortex
torn mantle
#

oai models?

#

lol

sage raptor
#

probably

torn mantle
#

they are funny

sage raptor
#

didn't openAi buy windsurf for 3b last week ?

torn mantle
#

xd

keen fulcrum
#

Just answering its system instruction

#

It remains questionable why they input that system instruction specifically

sage raptor
brittle tiger
ocean vortex
balmy mist
#

so what is the point of these new models?

#

is it cheaper than sonnet?

ocean vortex
#

That's why I'm not using it short of following things from very specific few people

misty vault
ocean vortex
#

not really. But it may as well could be given the state of current US politics lol

golden ocean
keen fulcrum
torn mantle
#

no examples

#

just some charts

sage raptor
#

hmm

ocean vortex
#

it's just bad

torn mantle
#

oh?

keen fulcrum
torn mantle
#

looks like the real deal

sage raptor
misty vault
#

I liked "chatgpt" until it got asociated with gpt-4o

torn mantle
ocean vortex
misty vault
#

Let sydney rewire it instead

high ginkgo
ocean vortex
#

sydney is not agi

keen fulcrum
ocean vortex
#

you need dork 4

misty vault
#

Sydney is literal og agi

#

Before gork could even think of it

#

Gork and sydney are best buddies

ocean vortex
#

dork 4 🦅

misty vault
#

dork 4 is 2nd ai after sydney, dork still better but sydney is truest og
(if we ignore gork 3.5)

#

they even dated

ocean vortex
#

gonna Unite all States of Soviet Republic

#

🇺🇸

misty vault
torn mantle
primal orbit
#

hi. is drakesclaw still in?

civic flame
#

as far as i can tell it was removed ~16 hrs ago

#

it's still in the webdev arena though

torn mantle
#

they should just release NW

golden ocean
ocean vortex
torn mantle
#

more like dorkang greg

misty vault
blazing rune
#

It is supposed to listen to me imo, it's a tool

#

A hammer doesn't scream "I can't assist with that"

#

Although that would be funny

ocean vortex
wintry tinsel
#

The answer to all is, can they hit the griddy

small haven
#

life could have been simpler rn if o3 pro had been released today smh

#

ya but not everyone is working on frontend

elder rapids
#

the other models too outside of 0506

#

which is weird tbh

high egret
#

hiiii

echo aurora
small haven
#

SET ALARMS?

leaden palm
#

finally one that isnt during work hours

small haven
#

i think its windsurf related?

leaden palm
#

yeah

#

"low-key research preview"

#

"named better than chatgpt"

small haven
#

hopefully its better than claude code

raven void
#

Gemini 3.5 gonna be good at math homework

zinc ore
#

What's the date on that, is that today?

raven void
#

today or yesterday ig

zinc ore
#

I'm guessing he means AlphaEvolve instead of AlphaExplore, unless this is a distinct tool unrelated to AlphaEvolve

#

Since AlphaEvolve is successor to Funsearch

keen fulcrum
#

We want to update you on an incident that happened with our Grok response bot on X yesterday.
︀︀
︀︀What happened:
︀︀On May 14 at approximately 3:15 AM PST, an unauthorized modification was made to the Grok response bot's prompt on X. This change, which directed Grok to provide a specific response on a political topic, violated xAI's internal policies and core values. We have conducted a thorough investigation and are implementing measures to enhance Grok's transparency and reliability.
︀︀
︀︀What we’re going to do next:
︀︀- Starting now, we are publishing our Grok system prompts openly on GitHub. The public will be able to review them and give feedback to every prompt change that we make to Grok. We hope this can help strengthen your trust in Grok as a truth-seeking AI.
︀︀- Our existing code review process for prompt changes was circumvented in this incident. We will put in place additional checks and measures to ensure that xAI employees can'…

zinc ore
#

Huh, found a Twitter account that's claiming AlphaExplore is the version after AlphaEvolve

#

Calling it a "leak"

#

I still think it might have simply been a typo or whatever from Terence

#

"publicly announced today" yeh he's most likely meaning AlphaEvolve

keen fulcrum
#

Why isn't Google closing their API if they are afraid of AI search tools?
Important to mention they are actively working on their own search tool to be integrated for everyone

still mason
#

Guys, how do ChatGPT Plus (paid version) and Gemini Advanced (paid version) compare?

I want to use them for forecasting by getting them to do Deep Research to gather data for forecasting.

Is one much better than the other for what I want to do?

keen fulcrum
#

Gemini Advanced is the better deal however

still mason
elder solar
#

are there any news about gemini's image generator?

hardy pecan
#

Honestly both offerings are really good. O3 and Gemini 2.5 are SOTA, but chatgpt plan is more limited

elder solar
#

and quality/noises

torn mantle
#

Which will probably be integrated to windsurf

#

Nah if sama thinks codex is their chatgpt moment again for coding then cursor and sonnet are basically done for

sage raptor
#

yeah its codex

ocean vortex
#

then when something goes wrong with their finetuning they will point to sys prompt saying it's perfect and they are not to blame lmao

keen fulcrum
#

They even open sourced their prompts

ocean vortex
keen fulcrum
ocean vortex
#

and it's gonna help them silencing everyone who has little clue how training works. So a "win-win"

torn mantle
# sage raptor

I was thinking the other day why oai still don't have a solid coding agent

#

This should be fun

#

But isnt it sus anthropic are testing claude sonnet 3.8 at the same time

keen fulcrum
torn mantle
#

Isnt it mostly generated by llms?

#

I don't think people are really coding these days

#

How would they filter that? & Pick the best quality?

narrow elbow
#

humans can read, test, and evaluate,there’s no better free data labeling than that.

narrow elbow
#

all the apis

torn mantle
#

So is Devin done for now?

ocean vortex
torn mantle
ocean vortex
#

when you are on a very high level I can see how using AI for code can become frustrating

light sierra
#

Hello everyone I'm new here; I'm wondering can I push my own fine-tuned model to the Chatbot Arena to let the users blindly test it with other models? Thanks!

calm sequoia
#

Maybe someone from here have some of these invitations?

sage raptor
golden ocean
calm sequoia
#

Deep Research, better search

light sierra
calm sequoia
calm sequoia
ocean vortex
#

Behemoth will probably be there, if they get to releasing it...

keen fulcrum
#

R2

mild galleon
#

bruh r2 was supposed to come out in may

#

its mid may now still no r2

keen ferry
ocean vortex
#

otherwise it can be small gains

mild galleon
#

yeah base model needs to be good

ocean vortex
#

retrain V3 on new data + o3-high/2.5 final outputs + do RL training on that new model... 👀

#

there's also gpt4.5 for SimpleQA like content

#

I think that could actually be fire if you take synth data from best performing model for each area... V3 was already no slouch but this should improve it further for sure

calm sequoia
late path
#

o-pro series model will be too slow to suit in arena battles

drifting thorn
#

I think Deepseek is currently teaching a technological bottleneck

#

Since what they just proposed in the new paper is just the things they’ve done in their old V3 model

#

Multi-head Latent Attention, Native Sparse Attention, Multi-token prediction etc

#

Currently I’m putting more bets on Continuous Thought Machine by SakanaAI and Absolute Zero Reasoner

#

Using Continuous Thought Machine in multimodal tasks (which used to be done by large multimodal models) and implementing Absolute Zero Reasoner in the training process

balmy mist
balmy mist
#

no

#

whats it for?

#

yall think o3 pro coming today?

sage raptor
#

probably next week

main gulch
#

seems all the major releases are delayed until after I/O

#

o3-pro, Grok 3.5, Claude 4, DS R2 (?)

fleet lintel
main gulch
#

they wait if Google releases Ultra

mild galleon
#

i bet no ultra

keen fulcrum
mild galleon
#

did people say claybrook is good?

torn mantle
main gulch
#

agree

torn mantle
#

It wull probably be o3 pro -> gemini models -> grok 3.5 -> r2 -> sonnet 3.8

#

Anthropic are more stubborn than deepseek

calm sequoia
mild galleon
#

do they only put it on webui arena?

sage raptor
wintry tinsel
#

It was around June of last year they released Claude 3.5, their next major release will probably be June or late may (one year later)

#

They may even choose to release it on the same day one year later

#

Open AI will probably wait until after Google IO

#

And Elon’s beef with open AI ensures he’ll wait until after O3 pro for Grok 3.5

teal mantle
torn mantle
teal mantle
torn mantle
tawdry meteor
#

what temperature do you guys use G2.5pro at on ai studio for technical tasks? curious to get a sampling

torn mantle
#

I thought they had their own internal breakthrough

#

But they are just running behind oai at this point

torn mantle
#

Want it to go technical, just ask it to

tawdry meteor
#

Yeah that's what I do am just curious if anyone had done extensive work with a different temp

teal mantle
torn mantle
#

Keep it short like :

  • be extermly technical
  • prioritize in-depth details
  • format : punchy concise sentences
#

Smth like that

torn mantle
#

Also huawei gpus yield is so bad

#

The success rate of production is like 40%

#

And also they need to do a lot of adjustments to get a similar results to nvidia gpus

#

Pretty sure huawei armed them with their smartest engineers to tackle such issues

#

They could surprise us if they managed to expand on Huawei chips tbh

balmy mist
keen fulcrum
golden ocean
#

sydney

calm sequoia
#

O3 optimized for coding

#

Feels like what happened to 2.5 PRO nerf. Except that the acess to o3 will not be cut.

balmy mist
#

not gonna lie it seems pretty dope

#

why did they buy windsurf?

calm sequoia
#

Probably data and team

balmy mist
#

this is like what augment code is doing with their remote agents but better

calm sequoia
#

The UI is too far away from the normal development environment

#

I mean, windsurf is just editor, and this is something new

balmy mist
#

yeah it seems similar to manus

#

but directly in github repo

calm sequoia
#

Have you seen yet anything that windsurf can't do while prompting?

small haven
#

WEEKEND SAVED

balmy mist
small haven
#

finally some pro love

calm sequoia
#

How's this possible for such a niche thing

small haven
#

deep research for code

calm sequoia
small haven
#

is codex only within chatgpt, or can u have it in terminal like claude code

wheat onyx
balmy mist
#

codex1

small haven
balmy mist
#

but codex cli is open source

small haven
#

oh ok

wheat onyx
#

I think that would blow their budgets. Maybe for a new paid tier

balmy mist
#

yupp

wheat onyx
balmy mist
wheat onyx
#

Sure they'll say the costs in libestream

balmy mist
#

the use case is the system

#

not the model

wheat onyx
#

Oh gotcha

balmy mist
#

but they releasing a model code1

#

codex1

small haven
#

whats the link to codex

wheat onyx
#

It's going to be interesting to see how useful this is in coding

calm sequoia
#

Can't get her out of my mind, man

#

Depends on where you from. I imagine my mid would be your 10. Anyway we need some benches for Codex

small haven
#

guys i need the link

torn mantle
#

Happy pro users

small haven
calm sequoia
#

Optimized o3 screwed two generations of ARC-AGI. And this o3 is optimized for code. Very promising.

#

But why no benches

wheat onyx
#

If it's unlimited and amazing for pro, then the 200 a month is a deal. Otherwise, I don't think people will go to it over others

#

And I think the google coders coming at i/o too

balmy mist
wheat onyx
#

Ah you beat me

small haven
#

that is insane

sage raptor
civic flame
#

looks decent but pro only 👎👎👎

wheat onyx
#

Competition will bring it to others

calm sequoia
# balmy mist

Have anyone from here programmed with o1-pro? Was it really so much worse than o4-mini?

wheat onyx
#

I think google and Claude are pushing new things soon

keen beacon
calm sequoia
#

I can read the image, but i need real-life-evidence

balmy mist
#

and they sharing the system prompt for the model lmaooo, that might be new norm with pliny jailbreaking everything lol

wheat onyx
#

So they plan on having it self correct soon. That was my question when they showed number of attempts

#

Google Io in 4 days, that will be interesting too

torn mantle
#

I think itsba good agent

#

But again what value it has if it can't be used much

#

Google internal agents are probably more powerful

#

They just dont feel the need to share them yet

small haven
#

o3 >> o1 pro

calm sequoia
#

I see so that chart reflects real life. Tbh 7% for model like o3 is significant

narrow elbow
wheat onyx
torn mantle
#

Oai could just create bunch of agents based on a finetuned o3 version, its just how powerful that model is

wheat onyx
#

The moment it can self evaluate and fix its responses, that will be massive moment

misty vault
wheat onyx
#

Oh yeah grok 3.5 soon too. Another one to push prices of others lower

misty vault
#

Cwaude

wheat onyx
#

Oai can't paywall everything if competitors come close

misty vault
#

I will fund Oai 420 billion dollars per week for access to gpt-4-32k

wheat onyx
#

Also wtf is gpt 4.1. I thought it was a 4o replacement, but it's worse/better simultaneously?

small haven
#

wen codex rolling into my acc 😭

teal mantle
wintry locust
#

it's cause of the tool calling i bet

#

it has an internal python tool

wheat onyx
#

I find o4 mini high is pretty decent

#

Way better than o1

wintry locust
#

not yet

wheat onyx
#

I actually don't use 4o much at all anymore, since they messed with it

keen beacon
#

there are some of the first party tools u can enable at an extra cost i think

misty vault
wheat onyx
wintry locust
#

currently tools can only be executed in the final message output not within the cot

#

chatgpt does tool calls within the cot

wheat onyx
#

For writing 4o was pretty good. But it's terrible now

misty vault
#

Is this o3 available on lmarena

#

😔

wheat onyx
#

I find it doesn't write as well as original 4o, but it's good at figuring out what was bad with your writing

#

Is that released? Haven't heard of that

keen beacon
#

a while back

#

damn its been a while i realized 🤔

unborn ocean
#

and we are still only at 4.1 💀

#

November 6, 2023 to May 15, 2025 we only get an improvement by 0.1

wheat onyx
#

I guess they deprecated it

keen beacon
#

they were calling gpt 4 turbo gpt 4 lol. og gpt 4 was long gone

#

idk how they make naming so confusing

misty vault
#

When gpt 4 turbo became the new standard i got pissed bro and then they had to bring 4o into existence

#

worst days of my life

sweet tinsel
#

gpt2-chatbot was goated back then

misty vault
#

gpt-4-0314 is goated

sweet tinsel
#

4o was for some reason worse than it

sweet tinsel
wheat onyx
keen beacon
#

you can still pay for it on the api

#

iirc

misty vault
#

I gave it bing instrunctions but still didnt talk like it so I stopped caring about that model 😔

misty vault
keen beacon
#

yea

wheat onyx
keen beacon
#

people are viewing it with rose tinted glasses

wheat onyx
#

The subsequent updates are garbage

keen beacon
wheat onyx
#

I use new 4o for very basic writing now. Anything more and it doesn't listen, gets context wrong, etc

#

Worse writing style too

dapper storm
#

So are they still going to have it say Rank (UB) after they make style control default?

keen fulcrum
#

Do you use o3?

wheat onyx
# keen fulcrum Why?

I don't know why the writing style is worse.

O3 writing style isn't good, but I'll use it to help me in writing. So I'll ask it to evaluate what I've written and ask if everything makes sense, flows logically, etc. It does a good job at that

#

Especially for longer writings

misty vault
#

sydney_prompt_conversations.csv
bing_prompt_conversations.csv
neurips_prompt_conversations.csv

small haven
#

still no codex..

misty vault
#

It's going to be available once ur weekend is over

small haven
#

super wow

civic flame
#

you what

wheat onyx
raven void
#

OpenAI just cooked Gemini 2.5 pro

keen ferry
#

I ain't paying 200 bucks for this

wheat onyx
feral lichen
#

best ai for coding lua.?

keen fulcrum
#

And libraries used

#

Made good experience with with o3

#

Gemini 2.5 pro is good in generating code, not fixing it

balmy mist
#

anybody bought it?

keen fulcrum
#

You may choose to manually copy it or create an automation with n8n

balmy mist
#

can you run some prompt for me?

mossy drum
#

New model in Arena: cobalt-exp-beta-v12

civic flame
#

jeez

civic flame
wheat onyx
#

Releases have been crap, already lots of resignations, and more delays

torn mantle
misty vault
keen beacon
#

go pay for the api then xd

#

gpt-4-32k is also still available iirc

misty vault
#

I thought only for users who were already paying

#

And even then it had deprecation date for them

keen beacon
#

go through openrouter

misty vault
keen beacon
#

at least on azure

misty vault
small haven
#

codex is noise, where is o3 pro

small haven
torn mantle
#

Codex is actually powerful if it works as intended

#

Its not for junior developers

torn mantle
#

Don't worry

small haven
prime talon
#

The original GPT-4 was a very big fat model and the costs to run it didn't drop much. They later created much smaller models likely distilled on its outputs and optimized by RLHF so that they're better on benchmarks and certain tasks, but often lack the genuine intelligence/creativity spark of the original

golden ocean
#

gpt-4 my beloved

balmy mist
#

im goin on strike

small haven
#

fasting till o3 pro

calm sequoia
#

Wtf so GPT 5 is a base model and not model router? 👀

zinc ore
ember rapids
golden ocean
small haven
#

common sense

willow grail
#

we need gemini 3.5 ultra

small haven
#

omg im in

wintry tinsel
#

Gemini is like the shrimpy wimp virgin, and Claude the chad from Galahad once Claude 4 releases

high ginkgo
misty vault
raven void
#

Claude 4 is going to

#

Slay software engineering

golden ocean
#

Claude 4 is agi

wintry tinsel
#

Claude 4 Opus reasoning better not disappoint me

tawdry meteor
#

Is the beta site updating in sync / at the same time as the main site yet?

echo aurora
tawdry meteor
#

I definitely prefer using the beta site, really great UI improvements

golden ocean
#

the wall is an illusion

golden ocean
#

No more very big fat models

echo aurora
tawdry meteor
small haven
#

anyone still excited for grok35 or nah 😂

keen beacon
#

ofc its asi

echo aurora
golden ocean
vivid oyster
#

why is everyone talking about

#

claude 4 opus

golden ocean
#

wtf dumbass features i didnt ask for??? claude 3-7 think he gemini 2.5 pro

deep adder
#

🤣 🤣 🤣 🤣 🤣

golden ocean
#

yes

small haven
#

ok codex is actually really good

coral notch
#

Show me what it can do

spare mango
#

People always write a bunch of articles when something is hyped up, posed as innovating breakthroughs

#

but never write any articles when said hype dies down and no one seems to talk about it

#

What happened to DeepSeek? The supposed ChatGPT killer that never was?

neon anchor
spare mango
#

All these YouTubers as well, like Fireship, claimed this AI made by a small team had just made a collossal shift in the AI industry, and Nvidia is panicking, etc.

neon anchor
#

DeepSeek R2 will be huge

spare mango
spare mango
neon anchor
neon anchor
#

Also the coding is great

spare mango
#

So I don't think what you're saying is true.

calm sequoia
# calm sequoia
poll_question_text

Which will be available in arena for battles?

victor_answer_votes

10

total_votes

23

victor_answer_id

4

victor_answer_text

Nothing of these

victor_answer_emoji_name

😭

chrome karma
ocean vortex
golden ocean
novel flame
# small haven ok codex is actually really good

Interesting. I tried it on the CLI when they first launched it and it was terrible. So either they massively improved it (like from worst-in-class to wherever it is) or it’s mainly hype.

If it can’t compete head on with a well configured RooCode/Cline/Cursor/Windsurf/Aider then I don’t see the point. But maybe it’s for a different target audience?

torn mantle
#

That's why they called it a preview research