#general | Arena | Page 29

keen beacon Apr 21, 2025, 2:32 PM

#

but you'll see

balmy mist Apr 21, 2025, 2:32 PM

#

omgg im getting so excited

keen beacon Apr 21, 2025, 2:32 PM

#

i'm not completely sure what their plans for it on chat.deepseek.com are

balmy mist Apr 21, 2025, 2:33 PM

#

i never wanted another country to win so bad before lol

keen beacon Apr 21, 2025, 2:33 PM

#

but it's very possible they finally introduce a subscription

fleet lintel Apr 21, 2025, 2:33 PM

#

keen beacon but you'll see

where are you getting this info from?

balmy mist Apr 21, 2025, 2:33 PM

#

all i want is that api

balmy mist Apr 21, 2025, 2:33 PM

#

fleet lintel where are you getting this info from?

he got birdies everywhere

keen beacon Apr 21, 2025, 2:33 PM

#

fleet lintel where are you getting this info from?

i guess you could say i have some connections

#

not many at deepseek tho

balmy mist Apr 21, 2025, 2:33 PM

#

liike that dude from GOT

keen beacon Apr 21, 2025, 2:33 PM

#

mainly just word of mouth

oblique flint Apr 21, 2025, 2:33 PM

#

I just hope it wont go in circles as much as R1

balmy mist Apr 21, 2025, 2:34 PM

#

its funny cause r1 crashed the market a lil

#

and rn the market is trash

keen beacon Apr 21, 2025, 2:34 PM

#

oblique flint I just hope it wont go in circles as much as R1

it doesn't

balmy mist Apr 21, 2025, 2:34 PM

#

imagine if r2 comes out this week lmaoo

keen beacon Apr 21, 2025, 2:34 PM

#

part of the training involved making the model better at determining how many reasoning tokens to use

#

although apparently it's still not as good at that as some other models

#

deepseek has always been a bit more

#

brute force-y

oblique flint Apr 21, 2025, 2:36 PM

#

you got any info on qwen 3?

keen beacon Apr 21, 2025, 2:36 PM

#

nope

#

unfortunately lmao

#

i do expect it in the next 2 weeks or so

#

but that's about it

balmy mist Apr 21, 2025, 2:39 PM

#

i aint gonna lie this channel might be one of the leading spaces for ai news lmaoo

sonic tendon Apr 21, 2025, 2:39 PM

#

i mean, that's probably mostly leo lol

sonic tendon Apr 21, 2025, 2:40 PM

#

keen beacon not many at deepseek tho

not many?

balmy mist Apr 21, 2025, 2:40 PM

#

sonic tendon i mean, that's probably mostly leo lol

true lol, but its also the convos we have

keen beacon Apr 21, 2025, 2:40 PM

#

sonic tendon not many?

yeah again the chinese labs are very strict with this stuff

sonic tendon Apr 21, 2025, 2:40 PM

#

tbh i'd be surprised if it doesn't outperform

balmy mist Apr 21, 2025, 2:40 PM

#

sonic tendon tbh i'd be surprised if it doesn't outperform

nahh thats pushing it

sonic tendon Apr 21, 2025, 2:41 PM

#

keen beacon yeah again the chinese labs are very strict with this stuff

not many?

keen beacon Apr 21, 2025, 2:41 PM

#

well yes i have some but it's very much in the single digits!

balmy mist Apr 21, 2025, 2:41 PM

#

dont forget deepseek dont got the same funds as OA

sonic tendon Apr 21, 2025, 2:41 PM

#

keen beacon well yes i have some but it's very much in the single digits!

how

keen beacon Apr 21, 2025, 2:41 PM

#

balmy mist dont forget deepseek dont got the same funds as OA

the chinese government is absolutely willing to bankroll them

#

you'd be surprised

oblique flint Apr 21, 2025, 2:41 PM

#

if r2 manages to outperform o3 at a way lower cost, I'd start to believe deepseek is actually the leading AI lab

fleet lintel Apr 21, 2025, 2:41 PM

#

balmy mist nahh thats pushing it

i'd be super surrprised if r2 > o3

keen beacon Apr 21, 2025, 2:41 PM

#

sonic tendon ***how***

🤐

balmy mist Apr 21, 2025, 2:41 PM

#

keen beacon the chinese government is absolutely willing to bankroll them

oh yeah i forgot

sonic tendon Apr 21, 2025, 2:41 PM

#

keen beacon the chinese government is absolutely willing to bankroll them

yeah, that's the main reason

balmy mist Apr 21, 2025, 2:42 PM

#

after it crashed US markets they saw the vision lol

keen beacon Apr 21, 2025, 2:42 PM

#

the people over at deepseek also have insane work ethic

#

as you might expect

#

most of them barely sleep

#

especially since the release date for R2 was changed to "ASAP"

balmy mist Apr 21, 2025, 2:42 PM

#

they should def call it R3 tho

keen beacon Apr 21, 2025, 2:42 PM

#

fleet lintel Apr 21, 2025, 2:42 PM

#

keen beacon most of them barely sleep

not sure how much this help with research. you absolutely do need clear mind to do research related things.

balmy mist Apr 21, 2025, 2:42 PM

#

troll OA a lil

keen beacon Apr 21, 2025, 2:42 PM

#

fleet lintel not sure how much this help with research. you absolutely do need clear mind to...

tell a CCP official that

#

😭

sonic tendon Apr 21, 2025, 2:43 PM

#

keen beacon the people over at deepseek also have insane work ethic

sometimes in tech i feel like people mistake passion for being "hardworking" but yeah for DS I'm not surprised lol

sonic tendon Apr 21, 2025, 2:43 PM

#

fleet lintel not sure how much this help with research. you absolutely do need clear mind to...

depends what you're doing

balmy mist Apr 21, 2025, 2:43 PM

#

fleet lintel not sure how much this help with research. you absolutely do need clear mind to...

when you have ai assisting you then you can push your limits a bit more

sonic tendon Apr 21, 2025, 2:43 PM

#

i imagine a lot of LM development is just throwing stuff at the wall rather than trying to imagine the next breakthrough

fleet lintel Apr 21, 2025, 2:43 PM

#

sonic tendon depends what you're doing

true... i do think american companies software engineers are getting lazy and they barely do enough to keep the job (most of them if not all)

balmy mist Apr 21, 2025, 2:43 PM

#

they prob have an insanse workflow now, look at google's workflow

sonic tendon Apr 21, 2025, 2:44 PM

#

keen beacon tell a CCP official that

also like

#

the entire CN education system, I think

balmy mist Apr 21, 2025, 2:45 PM

#

sonic tendon also like

you live in china right?

sonic tendon Apr 21, 2025, 2:45 PM

#

balmy mist you live in china right?

nope

oblique flint Apr 21, 2025, 2:45 PM

#

btw I wonder, why didnt the other labs adopt deepseeks MLA (or did they?). I guess that was one of the main cost saving measures they implemented right?

sonic tendon Apr 21, 2025, 2:45 PM

#

have a few friends there

fleet lintel Apr 21, 2025, 2:45 PM

#

oblique flint btw I wonder, why didnt the other labs adopt deepseeks MLA (or did they?). I gue...

MLA?

sonic tendon Apr 21, 2025, 2:45 PM

#

but not an expert

brittle tiger Apr 21, 2025, 2:45 PM

#

They are insanely cracked and I'm not certain it's happening with AI companies yet but western big tech firms don't have the advantage of NSA coming to them saying "hey check out the new methods they're working on across the ocean". Instead us govt is trying to break up big tech. Cooperation with govt is major slept on advantage over there. They're insanely talented regardless tho

balmy mist Apr 21, 2025, 2:45 PM

#

sonic tendon have a few friends there

ahh okay, i was wondering how their education system treated ai

oblique flint Apr 21, 2025, 2:46 PM

#

fleet lintel MLA?

https://medium.com/data-science/deepseek-v3-explained-1-multi-head-latent-attention-ed6bee2a67c4

from my understanding it allows them to save a lot of memory that would otherwise be needed for k/v cache

Medium

DeepSeek-V3 Explained 1: Multi-head Latent Attention

Key architecture innovation behind DeepSeek-V2 and DeepSeek-V3 for faster inference

sonic tendon Apr 21, 2025, 2:47 PM

#

brittle tiger They are insanely cracked and I'm not certain it's happening with AI companies y...

i feel like ai development is a lot less conspiratorial than people here seem to think it is

#

but, the chinese govt is absolutely bankrolling the hell out of AI development rn, from what I've heard

brittle tiger Apr 21, 2025, 2:48 PM

#

sonic tendon i feel like ai development is a lot less conspiratorial than people here seem to...

Yeah I don't think they've had any benefits like this yet but it is an advantage they will def get eventually

fleet lintel Apr 21, 2025, 2:50 PM

#

oblique flint https://medium.com/data-science/deepseek-v3-explained-1-multi-head-latent-attent...

part of it makes sense to me 🙂 . I think all the frontier companies will be doing lot of simialr things

#

is R2 on LMArena? Are they testing it here?

keen beacon Apr 21, 2025, 2:52 PM

#

no

alpine coral Apr 21, 2025, 2:52 PM

#

brittle tiger Yeah I don't think they've had any benefits like this yet but it is an advantage...

at the very least it's been a big boost for morale, from what i can tell

#

and a welcome one.. China hasn't bounced back since covid.. it hasn't been firing on all cylinders

keen beacon Apr 21, 2025, 2:53 PM

#

this race is a huge opportunity for china

#

if they win the ai war it will be truly be the beginning of the end for the idea of the US as the leader of the world

alpine coral Apr 21, 2025, 2:53 PM

#

but i think it's a false hope of sorts.. like deepseek did innovate, but they more so iterated

#

yeah i mean it's kinda wild and seemingly hyperbolic.. but i thnk it could plausibly said that the country dominates AI (/first achieves AGI whatvever that means) will have an advantage across everything, military included

fleet lintel Apr 21, 2025, 2:55 PM

#

keen beacon if they win the ai war it will be truly be the beginning of the end for the idea...

Why would US companies use China models?
I think Chinesse companies will win in China and Western companies will win in West

balmy mist Apr 21, 2025, 2:55 PM

#

yo this would be nuts:
https://x.com/testingcatalog/status/1914319517175460327

TestingCatalog News 🗞 (@testingcatalog) on X

BREAKING 🚨: Google AI Studio will get Apps rendering feature soon. Devs will be able to use Gemini to build apps that use Gemini.

Bonus 👀
A full system prompt used behind Apps on AI Studio (in the post below)

fleet lintel Apr 21, 2025, 2:56 PM

#

balmy mist yo this would be nuts: https://x.com/testingcatalog/status/1914319517175460327

how is it difference from Canvas feature?

balmy mist Apr 21, 2025, 2:56 PM

#

studio is free

keen beacon Apr 21, 2025, 2:56 PM

#

fleet lintel Why would US companies use China models? I think Chinesse companies will win ...

the US public already willingly use chinese models.. as for US businesses, if a model from china significantly outperforms any american model at tasks they use LLMs for, and match/beat it in value, plenty of companies would make the switch

#

although yes, enterprise will definitely be something chinese labs struggle with in the west

balmy mist Apr 21, 2025, 2:57 PM

#

fleet lintel how is it difference from Canvas feature?

canva is only available in gemini app right?

keen beacon Apr 21, 2025, 2:57 PM

#

currently there is a significant chance the US government bans deepseek

#

so it'll be interesting to see if that happens

balmy mist Apr 21, 2025, 2:57 PM

#

keen beacon currently there is a significant chance the US government bans deepseek

how can they do that?

keen beacon Apr 21, 2025, 2:57 PM

#

because normally whatever the US government do (annoyingly) much of the west follows

#

like in the case of huawei

balmy mist Apr 21, 2025, 2:58 PM

#

like providers will stop provding lol

#

what about if i download it locally?

keen beacon Apr 21, 2025, 2:58 PM

#

well, it'll be illegal for providers to provide deepseek services in the US

#

so they won't if they don't want to break the law

alpine coral Apr 21, 2025, 2:58 PM

#

keen beacon currently there is a significant chance the US government bans deepseek

i could see the ccp also banning deepseek lol

balmy mist Apr 21, 2025, 2:58 PM

#

alpine coral i could see the ccp also banning deepseek lol

what??

#

why

fleet lintel Apr 21, 2025, 2:58 PM

#

balmy mist canva is only available in gemini app right?

true.. may be they are bringing it to AIstudio. not a big deal

keen beacon Apr 21, 2025, 2:58 PM

#

balmy mist what about if i download it locally?

if you already have it then there's not much they can do

alpine coral Apr 21, 2025, 2:58 PM

#

balmy mist why

LLMs are a censor's nightmare

keen beacon Apr 21, 2025, 2:59 PM

#

but if you didn't, downloading deepseek models would also be hard because serving it would be illegal

fleet lintel Apr 21, 2025, 2:59 PM

#

balmy mist how can they do that?

with today's govt, anything can happen

balmy mist Apr 21, 2025, 2:59 PM

#

keen beacon if you already have it then there's not much they can do

bet as soon as r2 drops i am going to buy a new computer and download it, but what about fine tuned versions of deepseek models? like you cant just ban a model

#

if i fine tune r2 5 times over, is it the same model?

fleet lintel Apr 21, 2025, 3:00 PM

#

balmy mist bet as soon as r2 drops i am going to buy a new computer and download it, but wh...

you kind of can. If govt issues a ban, most medium and large size companies will stop using it. And it doesn't really part if fringe group continues to use it

alpine coral Apr 21, 2025, 3:00 PM

#

but the race to AI supremecy i think gives space.. but still.. the current AI tech seems incompatible with a one-party authortarian state that censors everything (or tries to)

balmy mist Apr 21, 2025, 3:00 PM

#

fleet lintel you kind of can. If govt issues a ban, most medium and large size companies wil...

but what about the fine tuning stuff i said?

#

a company can provide a model that was fine tuned from r2

#

or will that also be banned?

#

and how would they even know it was fine tuned from r2?

fleet lintel Apr 21, 2025, 3:01 PM

#

balmy mist but what about the fine tuning stuff i said?

again, fringe groups will continue to do it. but any serious company wont. It's popularity and usage will decrease over time if not instantneaously.

balmy mist Apr 21, 2025, 3:02 PM

#

idk man, its like the ban on pirating, its illegal but its so common

#

and you cant prove it

#

banning it would just be a political statement

#

nothing more

#

people will still be using their models

#

even big companies

#

cause they will fine tune on it

fleet lintel Apr 21, 2025, 3:03 PM

#

balmy mist idk man, its like the ban on pirating, its illegal but its so common

companies dont pirate things :). only individuals . For models, individuals are not the main consumers

balmy mist Apr 21, 2025, 3:04 PM

#

okay but you cant prove you have a deepseek model tbh, and if a company wants to save money they will use deepseek cheap models, and fine tune on them, models hallucinate a lot so you cant trust what a models says if you try and figure out the base model, and a lot of new companies are being made because of ai and for ai

#

they are most likely going to use deep seek models, and soon become mid level companies

#

its not going to just be individuals, even if it is just individuals, those people can make milliona dollar companies

#

bc of ai

ocean vortex Apr 21, 2025, 3:05 PM

#

fleet lintel Why would US companies use China models? I think Chinesse companies will win ...

Cause if the model is open source there is no risk of your data leaking. Educated people in west mostly do not give a sht where the model comes from as long as it performs, why should we?

fleet lintel Apr 21, 2025, 3:05 PM

#

may be.. but as soon as they become big, they need to pivot to something legal.

I

balmy mist Apr 21, 2025, 3:06 PM

#

fleet lintel may be.. but as soon as they become big, they need to pivot to something legal. ...

bro lol, they cant prove they are using deepseek tho

fleet lintel Apr 21, 2025, 3:06 PM

#

ocean vortex Cause if the model is open source there is no risk of your data leaking. Educate...

what is models are advanced enough that you can program models to only share data in a very secret way? I dont think that's the case today. But these models could do that in future

#

i think govt ban is unlikely but a ban IMO would undoubtly kill the business in that country.

tall summit Apr 21, 2025, 3:07 PM

#

fleet lintel companies dont pirate things :). only individuals . For models, individuals are ...

what are ya on about

narrow elbow Apr 21, 2025, 3:08 PM

#

fleet lintel what is models are advanced enough that you can program models to only share dat...

yea, a legal ban just like banning TikTok😆

ocean vortex Apr 21, 2025, 3:08 PM

#

fleet lintel what is models are advanced enough that you can program models to only share dat...

This is not the case for now. If that changes then the approach will be different

balmy mist Apr 21, 2025, 3:11 PM

#

but you are right a ban would kill the business bc they would be just copying secrets of deepseek and applying to other models and no longer using deepseek models, just at first to distill the knowledge, and deepseek is opensource to

fleet lintel Apr 21, 2025, 3:13 PM

#

narrow elbow yea, a legal ban just like banning TikTok😆

India banned TikTok like 5 years back some users found way to use it. but slowly it died down.
Currently it is non-existent in India.
Govt ban always results in slow death

brittle tiger Apr 21, 2025, 3:14 PM

#

https://x.com/_mcbench/status/1914283889394073972?t=J_kuVh4ohdZANbXKho4Z6g&s=19

Minecraft Benchmark (@_mcbench) on X

Current internal leaderboard when only the models currently present in the leaderboard (Without DEPRECATED models) play vs. each other.

Gemini 2.5 pro is still the king @OfficialLoganK, closely followed by the latest Checkpoint Version of GPT-4.1 that was available on Openrouter

#

4.1 showing impressive here

narrow elbow Apr 21, 2025, 3:25 PM

#

fleet lintel India banned TikTok like 5 years back some users found way to use it. but slo...

so Indians come to America in pursuit of freedom and upward mobility, seeking opportunities they may lack in their homeland.

#

and US is also going to ban TikTok🥲

fleet lintel Apr 21, 2025, 3:31 PM

#

narrow elbow so Indians come to America in pursuit of freedom and upward mobility, seeking op...

yup. . And for more money 🙂

narrow elbow Apr 21, 2025, 3:31 PM

#

so sad

ember rapids Apr 21, 2025, 3:57 PM

#

keen beacon the people over at deepseek also have insane work ethic

It’s scary how cracked they are

I’ve heard stories

#

7 fig comp tho

fleet lintel Apr 21, 2025, 4:16 PM

#

ember rapids 7 fig comp tho

1Million+ ? In China? Wow

balmy mist Apr 21, 2025, 4:44 PM

#

man if deepseek can launch this week i will be so happy man

#

as soon as yall see news about it please send here 🙂

cedar tide Apr 21, 2025, 5:16 PM

#

neither cobalt nor apricot are nova premier 🥴

ocean vortex Apr 21, 2025, 5:19 PM

#

narrow elbow yea, a legal ban just like banning TikTok😆

banning anything is a dangerous game and ideally nothing at all should be banned in this context. But at the same time I kinda understand the arguments for banning tiktok. It has been used more than once for political campaigns now of questionable origins and the fact it is controlled by China's government makes it high risk

deep adder Apr 21, 2025, 5:20 PM

#

country

ocean vortex Apr 21, 2025, 5:22 PM

#

Imagine an app developed in US or any other western country blowing up in popularity in China. Impossible because all of them are banned by default

narrow elbow Apr 21, 2025, 5:24 PM

#

ocean vortex banning anything is a dangerous game and ideally nothing at all should be banned...

concerning national security,MAGA

ocean vortex Apr 21, 2025, 5:24 PM

#

narrow elbow concerning national security,MAGA

MAGA itself is a biggest threat to US national security now LOL

gentle plinth Apr 21, 2025, 5:34 PM

#

https://www.rumidocs.com/newsroom/new-chatgpt-models-seem-to-leave-watermarks-on-text

New ChatGPT Models Seem to Leave Watermarks on Text

The newer GPT-o3 and GPT-o4 mini models appear to be embedding special character watermarks in generated text. However, removing these watermarks is relatively simple, making this seem more like a short-term measure than a long-term solution

narrow elbow Apr 21, 2025, 5:37 PM

#

No, Americans are smart, you are racist

balmy mist Apr 21, 2025, 5:44 PM

#

gentle plinth https://www.rumidocs.com/newsroom/new-chatgpt-models-seem-to-leave-watermarks-on...

this is very interesting

#

@gentle plinth have you tested this?

#

i just tested it and doesnt i didnt see this

brittle tiger Apr 21, 2025, 6:01 PM

#

gentle plinth https://www.rumidocs.com/newsroom/new-chatgpt-models-seem-to-leave-watermarks-on...

Really doubt this is watermarking and more result of rushing to get model out when they weren't originally planning to after 2.5 surprised. Watermarking text is much more sophisticated

sage raptor Apr 21, 2025, 6:26 PM

#

ocean vortex Apr 21, 2025, 6:26 PM

#

brittle tiger Really doubt this is watermarking and more result of rushing to get model out wh...

this doesn't just randomly happen. They simply replaced some spaces with " " in their dataset

#

just tried it and can confirm o4-mini does this 👀

ember rapids Apr 21, 2025, 6:30 PM

#

I thought they always did that

brittle tiger Apr 21, 2025, 6:31 PM

#

ocean vortex this doesn't just randomly happen. They simply replaced some spaces with " " in ...

I just can't think of a plausible rational. Break a bunch of code to watermark and piss ppl off? Much more elegant ways to do it

ocean vortex Apr 21, 2025, 6:32 PM

#

brittle tiger I just can't think of a plausible rational. Break a bunch of code to watermark a...

that's not how it works, there's no code involved to break. You simply add these symbols throughout your fine-tuning dataset and the model is gonna use them

#

you could even fine-tune your own gpt with their official tools to embed any hidden symbols you choose into it's responses

#

as far as the model is concerned, there's no big difference whichever space it uses anyway, in this case

#

but it saw enough of those special characters to use them some of the time

#

and the rationale is quite solid I think. Most people do not gonna bother to sanitize the text, so this makes it easy to detect the text generated by openai.

balmy mist Apr 21, 2025, 6:40 PM

#

wow codex being open source is really game changing:
https://x.com/dnak0v/status/1914380050931105894

Daniel Nakov (@dnak0v) on X

anon-codex is now merged to openai/codex. I didn't think they'd allow this, but they did. Kudos to @OpenAI on this one

rapid merlin Apr 21, 2025, 6:56 PM

#

This is not really related, but has anyone tried o4 in chatgpt? it seems LOBOTOMIZED into saving space. I asked it five times directly to give me a full script, and it "omits it for brevity", like what?

#

give me the full code
"omitted for brevity" (parts of it commented as that)
hm. did you miss something here? pastes the script it gave me
yes, i missed this, this and this
then what are you doing?? i told you to give me the full code!
"ommited for brevity" (parts of it commented as that)

#

to be fair, the code was indeed pretty long, but the fact it just starts completely ignoring commands is annoying

#

Yes

willow grail Apr 21, 2025, 7:28 PM

#

which discord server has this bot?

balmy mist Apr 21, 2025, 7:37 PM

#

willow grail which discord server has this bot?

its a paid server

#

well he aint release it yet

#

but it will be paid soon

willow grail Apr 21, 2025, 7:38 PM

#

who is he @balmy mist 😄

balmy mist Apr 21, 2025, 7:44 PM

#

willow grail who is he <@367710025994731520> 😄

https://x.com/legit_api

ʟᴇɢɪᴛ (@legit_api) on X

AI is pretty cool huh, well APIs are too!

#

he is a dev that is able to get details on new models

#

he has some tool he built that gives him alerts

#

you think you can built it for us?

upper wolf Apr 21, 2025, 8:00 PM

#

rapid merlin This is not really related, but has anyone tried o4 in chatgpt? it seems LOBOTOM...

are you on mobile

#

most models on mobile have a system prompt that tell them to keep it brief

#

chatgpt included

tall summit Apr 21, 2025, 8:02 PM

#

didn't know that

opaque adder Apr 21, 2025, 8:26 PM

#

dude when is nightwhisper comng out

#

i'm tired of 2.5 pro

#

mm tbh

#

it is actually good at things like making animations in HTML
and you can use this in thread/banner designs

#

although what i will say is
no ai is ready for coding outside of web development yet

#

well yea very simple

#

ai right now in terms of programming is a front end specalist

#

go on

#

tell me some projects it can do in c++

#

that isnt a simple snake game

small haven Apr 21, 2025, 8:29 PM

#

opaque adder i'm tired of 2.5 pro

im tired of o1 pro

opaque adder Apr 21, 2025, 8:29 PM

#

lol, you can't even prove a project it did

#

because it can't

opaque adder Apr 21, 2025, 8:29 PM

#

small haven im tired of o1 pro

o1 pro is dogshit

small haven Apr 21, 2025, 8:29 PM

#

ive been limited with o3 and o1 pro already so....

opaque adder Apr 21, 2025, 8:30 PM

#

im asking projects it has built

small haven Apr 21, 2025, 8:30 PM

#

opaque adder Apr 21, 2025, 8:30 PM

#

not hard

#

yes give me a project

#

4th time asking

small haven Apr 21, 2025, 8:31 PM

#

gonna waste a deep research req, idgaf

opaque adder Apr 21, 2025, 8:32 PM

#

no dude
if you are building advanced projects, it simply just won't understand

#

ai isn't there yet

#

lol
go ahead and prompt it to build you a um/km that communicates with eachother

#

wont happen for a good few years

#

usermode
and kernelmode

#

both have interest in one another

#

no

#

no ai will be able do windows internals

#

at a even decent level

#

for a long time

#

yea

#

well thats the truth

#

ai just isnt at that level..

small haven Apr 21, 2025, 8:38 PM

#

i want o3 pro so bad holy fck

opaque adder Apr 21, 2025, 8:39 PM

#

well i work around it

#

why so

#

in what way exactly 😄

#

yes ive used 3.7
used 2.5 pro

tall summit Apr 21, 2025, 8:41 PM

#

craig you sound like 3.7 or 2.5 pro

opaque adder Apr 21, 2025, 8:41 PM

#

its okay

tall summit Apr 21, 2025, 8:41 PM

#

in this current conversation

opaque adder Apr 21, 2025, 8:41 PM

#

this is jsut the start of ai

#

ai in terms of coding has literally just begun

#

so its expected for it to be mid asf

#

no it isnt xdDD

tall summit Apr 21, 2025, 8:42 PM

#

99% of devs is crazy

#

how bad do you think devs are

opaque adder Apr 21, 2025, 8:42 PM

#

this dude..

#

you might aswell just say to people stop learning coding

#

what statement is that

balmy mist Apr 21, 2025, 8:42 PM

#

that levels game i think

opaque adder Apr 21, 2025, 8:42 PM

#

😂

#

now its getting obvious with r bait

#

next thing you'll tell me is that your jewish

#

its way easier to learn

#

ai is at the tip of ur hands for help

tall summit Apr 21, 2025, 8:43 PM

#

opaque adder next thing you'll tell me is that your jewish

woah man

#

because making things is fun

opaque adder Apr 21, 2025, 8:44 PM

#

ai cant do backend for its life

tall summit Apr 21, 2025, 8:44 PM

#

people will still become artists

#

well tbh ai art is still terrible

opaque adder Apr 21, 2025, 8:44 PM

#

just tell people to stop being graphics designers

#

ai is here

tall summit Apr 21, 2025, 8:45 PM

#

"stop doing codeforces, ai can solve them all"

opaque adder Apr 21, 2025, 8:45 PM

#

you were tryna make jokes

tall summit Apr 21, 2025, 8:45 PM

#

bait

opaque adder Apr 21, 2025, 8:45 PM

#

prety funny if u were to be jewish

tall summit Apr 21, 2025, 8:45 PM

#

thats the most boring joke of all time

opaque adder Apr 21, 2025, 8:46 PM

#

exactly

tall summit Apr 21, 2025, 8:46 PM

#

oh there we go @alpine pasture

#

deleted message moment

opaque adder Apr 21, 2025, 8:46 PM

#

you have common traits with them

#

restriction of speech

worthy thunder Apr 21, 2025, 8:47 PM

#

OpenAI-MRCR results on Llama 4: https://x.com/DillonUzar/status/1914415635582607770

Llama 4 Scout performs similar to GPT-4.1 Nano at higher context lengths.
Llama 4 Maverick is similar to (but slightly underperforms) GPT-4.1 Mini.

I ran these just in case ppl needed it. It's probably not a top priority for people, but sharing nonetheless.

Enjoy.

Update to benchmark setup - Noticed various models had some missing test results due to various server errors returned, or oddities in API outputs. Also some endpoints didn't support candidate outputs, so some models were missing multiple runs to smooth the output. Fixed those and reran most models, and confirmed all tests completed successfully except for those that exceeded model limits. Certain models have seen a decent change in results (see tables). Notably Gemini 2.5 Flash (thinking enabled) seemed to have been lucky with the original results, and now more in-line with what I was expecting.

Grok 3 results should be next, and hopefully ready in a few hours. It's been surprisingly difficult to run them without server timeout errors (almost behaves like some kind of throttling).

Any other models people are interested in?

small haven Apr 21, 2025, 8:47 PM

#

lmao everyone just shut up

tall summit Apr 21, 2025, 8:47 PM

#

worthy thunder OpenAI-MRCR results on Llama 4: https://x.com/DillonUzar/status/1914415635582607...

i appreciate it

#

i look forward to your charts now

small haven Apr 21, 2025, 8:48 PM

#

we got unskippable ads in discord

tall summit Apr 21, 2025, 8:48 PM

#

my ass

small haven Apr 21, 2025, 8:49 PM

#

worthy thunder OpenAI-MRCR results on Llama 4: https://x.com/DillonUzar/status/1914415635582607...

can you benchmark o1 pro

worthy thunder Apr 21, 2025, 8:51 PM

#

small haven can you benchmark o1 pro

Always the super expensive models with everyone 😂
I'll see about adding it to the TODO. Have to budget based on what my company is willing to cover

opaque adder Apr 21, 2025, 8:51 PM

#

y?

calm sequoia Apr 21, 2025, 8:51 PM

#

worthy thunder OpenAI-MRCR results on Llama 4: https://x.com/DillonUzar/status/1914415635582607...

Crazy how good 2.5 pro is

small haven Apr 21, 2025, 8:52 PM

#

worthy thunder Always the super expensive models with everyone 😂 I'll see about adding it to ...

i mean its unlimited (kind of) thru chatgpt

balmy mist Apr 21, 2025, 8:53 PM

#

opaque adder y?

he gonna pull up on you

worthy thunder Apr 21, 2025, 8:53 PM

#

small haven i mean its unlimited (kind of) thru chatgpt

Unfortunately the benchmark sets up a history of chat messages between the user and the LLM before asking the benchmark question, so I'd need the ability to set up what was said both from the user and the model.
Plus ChatGPT has some system instructions and tooling behind the scenes which can impact the results. Too many uncontrolled variables :/

small haven Apr 21, 2025, 8:58 PM

#

fair enough

native shoreBOT Apr 21, 2025, 9:21 PM

#

dynoSuccess runo001 has been warned.

rapid merlin Apr 21, 2025, 9:28 PM

#

upper wolf are you on mobile

No, i was doing some debugging on pc

#

cancelled the openai sub for claude, he aced it 0 shot

#

without any laziness

torn mantle Apr 21, 2025, 9:58 PM

#

small haven lmao everyone just shut up

can i say it

#

its a

#

ss

ocean vortex Apr 21, 2025, 10:14 PM

#

worthy thunder OpenAI-MRCR results on Llama 4: https://x.com/DillonUzar/status/1914415635582607...

lmao. This potentially could have something to do with llama4 having so little activated parameters/experts tbh

#

total size offsets it but not nearly enough

ocean vortex Apr 21, 2025, 11:13 PM

#

Probably something from HLE or simple-bench test set. o4-mini had big gains over o3-mini there. Though I don’t have anything specific without testing

balmy mist Apr 21, 2025, 11:16 PM

#

Has anybody used O3 within the API for OpenAI? I'm scared that I might get wrecked by cost.

balmy mist Apr 21, 2025, 11:46 PM

#

Yo, does anybody have any more o3 requests? I just ran out until like two days and I have a pressing request for my root code. Custom modes, I needed to be consolidated and I want to use o3.

#

Oh, wo I just saw your comment. Actually, I might try it. I'm gonna try it and hopefully I don't get wrecked.

#

grok 3 it's booty, bro, I'ma be honest. Like, for complicated, high-level tasks, even Gemini isn't as good, but I would rather use o3 for really high-level reasoning. Like, nothing is on its level.

#

nahh i need the best of the best lmaoo

#

Oh, that's actually smart. I didn't think about that. Instead of paying $200, you basically paying $60 wow. Or not even $60, maybe just $20. Maybe just $40 all you need. You just rotate. When you run out with that one, you just use the other one. That is genius, bro. You are-- oh my gosh. You're playing next level chess.

#

lol i used super whisper to dictate that so it sounds a bit off lmaoo

#

I turned on the data sharing but I don't see the free tokens. Like I said, it was a message for free tokens. I don't see it after I enabled it.

small haven Apr 22, 2025, 1:31 AM

#

Share a chatgpt pro with 3 friends >

keen beacon Apr 22, 2025, 1:43 AM

#

opaque adder ai cant do backend for its life

depends

#

on roblox studio gemini really understands both backend and frontend

#

i'd say it needs more help on frontend because it doesnt fully comprehend 3d shapes and connecting colors to colors etc

#

i always start my prompts with

#

"always use module scripts, start with firstly your module loader, then your modules, create the core aspect of the game highly optimized, keep in mind servers will be full so your script scalability should change, etc"

keen beacon Apr 22, 2025, 2:28 AM

#

cant u disable it?

small haven Apr 22, 2025, 2:46 AM

#

Who tf care about memory loool

#

Its more of a gimmick

languid forum Apr 22, 2025, 3:16 AM

#

Hey everyone, we are xanthorax Ai , Mods please drop me a Pm we want to get bench marked 🦹‍♂️

#

does anyone know how to get darkbert if you dont have a non prof org ? am dying to get my hands on it

balmy mist Apr 22, 2025, 3:54 AM

#

to use o3 in the api yall had to identify your org with the persona app?

#

they doing a lot lol

kind cloud Apr 22, 2025, 4:01 AM

#

Claybrook always returns an API error now.

balmy mist Apr 22, 2025, 4:04 AM

#

kind cloud Claybrook always returns an API error now.

sad day

keen beacon Apr 22, 2025, 4:05 AM

#

ocean vortex that's not how it works, there's no code involved to break. You simply add these...

i dont think this is a fine tuning thing

kind cloud Apr 22, 2025, 4:07 AM

#

kind cloud Claybrook always returns an API error now.

I hope Google makes something happens today

#

I think I saw a similar situation before Google released gemini-2.0-flash or something like that.

#

don't trust me

small haven Apr 22, 2025, 5:30 AM

#

ok yea 50 o3 reqs/week + memory > unlimtied o3 reqs !!

calm sequoia Apr 22, 2025, 6:52 AM

#

Sadly, the pro didn't got into the benchmarks last time due to price issues. I wonder what would be the ELO.

#

Is there any difference in 2.5 PRO performance with and without subscription?

keen beacon Apr 22, 2025, 7:07 AM

#

the gemini product has a super long degrading sys prompt i believe

#

u should use it on aistudio

#

it doesnt matter at all on aistudio whether ur subbed or not

ocean vortex Apr 22, 2025, 7:15 AM

#

calm sequoia Sadly, the pro didn't got into the benchmarks last time due to price issues. I w...

it was tested on several benchmarks. On most of them it wasn't impressive lol

calm sequoia Apr 22, 2025, 7:17 AM

#

ocean vortex it was tested on several benchmarks. On most of them it wasn't impressive lol

You mean the o1-pro? I guess it is for specialized logic tasks which is not what most of the lmarena user's need

keen beacon Apr 22, 2025, 7:17 AM

#

keen beacon it doesnt matter at all on aistudio whether ur subbed or not

on aistudio its free and basically unlimited they train on ur data tho just a disclaimer just in case lol

calm sequoia Apr 22, 2025, 7:18 AM

#

The free version of chatgpt seems nerfed, that's why I'm asking for gemini

keen beacon Apr 22, 2025, 7:18 AM

#

calm sequoia The free version of chatgpt seems nerfed, that's why I'm asking for gemini

2.5 pro is way better

#

than gpt 4.1 on free chatgpt

calm sequoia Apr 22, 2025, 7:19 AM

#

That's a fact

#

I have subscription on chatgpt though. May switch to gemini if they continue the pace

#

It seems that claybrook likes to draw penises instead of what I'm asking. Very human-like behavior.

keen beacon Apr 22, 2025, 7:21 AM

#

Lmao

#

what did u ask

#

Lmao 🤣

keen beacon Apr 22, 2025, 8:05 AM

#

do u have a link

#

im interested

cedar tide Apr 22, 2025, 9:45 AM

#

Tomay good ?

hardy pecan Apr 22, 2025, 10:11 AM

#

its ~gemini 2.5 level i felt, like dragontail etc

leaden thunder Apr 22, 2025, 10:23 AM

#

Is there an API to get leaderboard data?

balmy mist Apr 22, 2025, 10:35 AM

#

https://x.com/btibor91/status/1914628025175495051

Tibor Blaho (@btibor91) on X

GPT-4.1 models - "Nano is obviously a new pre-train. We also have a new pre-train for Mini. And then the larger version is a new mid-train."

Improved ChatGPT memory (Moonshine - "the dreaming feature") - "Right now, the dreaming feature, we kind of have some of these memories

#

wild its funny they are calling it dreaming

keen beacon Apr 22, 2025, 10:38 AM

#

balmy mist https://x.com/btibor91/status/1914628025175495051

4.1 mini seems to have haziness wrt recent events after oct 2023, i guess they didnt curate the pretraining dataset on more recent events for 4.1 mini

balmy mist Apr 22, 2025, 10:40 AM

#

yeah i stopped using mini, its a good model but flash is just a better model

#

https://x.com/altryne/status/1914421814455099680

Alex Volkov (Thursd/AI) (@altryne) on X

HOLY CRAP, a new super tiny 1.6B param voice model just dropped that seems to.. outperform 11labs!? 😵‍💫

From Nari-labs, Dia is an Apache 2.0 voice model, that can generate laughs, sniffs and emotions, copy an existing voice and is effectively real time on larger GPUs:

#

srry for spam but this is kinda crazy

#

i could be late tho

tall summit Apr 22, 2025, 10:54 AM

#

balmy mist https://x.com/altryne/status/1914421814455099680

hahahaha what the hell

cedar tide Apr 22, 2025, 11:27 AM

#

I gave Gemini the opportunity to think in the middle of his answers

Screenshot_2025-04-22-13-26-10-805_com.android.chrome-edit.jpg

brittle tiger Apr 22, 2025, 11:49 AM

#

cedar tide I gave Gemini the opportunity to think in the middle of his answers

What was prompt to do this?

calm sequoia Apr 22, 2025, 12:39 PM

#

tall summit Apr 22, 2025, 1:01 PM

#

calm sequoia

is this asking "What will the highest ARC-AGI score be at the end of 2025?"

calm sequoia Apr 22, 2025, 1:01 PM

#

Obviously

tall summit Apr 22, 2025, 1:02 PM

#

calm sequoia Obviously

it's just 3 words

#

nothing is obvious about it

#

you removed almost all context

#

so i wanted to clarify

calm sequoia Apr 22, 2025, 1:02 PM

#

Answer "Skibidi" if you lack context

tall summit Apr 22, 2025, 1:03 PM

#

context as in what the question is saying

#

i know what arc-agi is

oblique flint Apr 22, 2025, 1:03 PM

#

objectively 15+ should be the most correct answer lol, I mean 90% is still 15+

tall summit Apr 22, 2025, 1:04 PM

#

lol

tall summit Apr 22, 2025, 2:14 PM

#

nobody knows what the poll really is about...

brittle tiger Apr 22, 2025, 2:25 PM

#

Anyone know why epoch never posted 2.5 scores for FrontierMath?

narrow elbow Apr 22, 2025, 3:16 PM

#

OpenAI Five is also interesting

leaden meteor Apr 22, 2025, 3:28 PM

#

Are we expecting any new announcment today from Google? I remember seeing someone posting about APRIL 22 placeholder for something...

cedar tide Apr 22, 2025, 3:40 PM

#

leaden meteor Are we expecting any new announcment today from Google? I remember seeing someon...

"leaker as featured in techcrunch"
https://x.com/chatgpt21/status/1914556100906545248?t=kmNaTJqXwI2ROk3y-Q2EoQ&s=19

Chris (@chatgpt21) on X

If you have no reason to be excited Google is shipping today 🙂

novel flame Apr 22, 2025, 3:54 PM

#

Possibly hot take: Gemini 2.5 Pro is very good, but it’s not as good as 3.7 Sonnet or indeed o3. You might have had a better time (and a lighter wallet) with Sonnet

#

Correct. We have magnificently good yappers, but not actual intelligence.

#

I mean, technically you can pay with crypto on OpenRouter

oblique flint Apr 22, 2025, 4:04 PM

#

you dont have a credit card?

novel flame Apr 22, 2025, 4:04 PM

#

No argument there; as someone who bought BTC in 2013 the scheme worked out well for me. But you don’t have to buy crypto to speculate or gamble with your life savings, you can also just buy it to pay for goods and services as an alternative to PayPal.

oblique flint Apr 22, 2025, 4:05 PM

#

soon enough I think all the free goodies will cease to exist. Stuff like google ai studio, cursor, windsurf etc are just burning money right now

oblique flint Apr 22, 2025, 4:24 PM

#

what is that list of countries lol

#

I am as well, in nl. I got a 'student' credit card a couple years ago just to make subscriptions for stuff like ai services easier. Although atm Im only subbed to cursor

novel flame Apr 22, 2025, 4:26 PM

#

Me neither. Scandinavia here 👋

oblique flint Apr 22, 2025, 4:27 PM

#

I went from gpt 4 turbo subscription to claude 3 opus / 3.5 sonnet, then I went to cursor since ai studio is good enough for general non coding use, and just having ai integrated in the ide directly makes coding so much faster

novel flame Apr 22, 2025, 4:31 PM

#

Don’t play much; don’t have too much time between work and kids and secret plans for world domination

#

Played or built 😉

balmy mist Apr 22, 2025, 4:42 PM

#

Is it true that Google is shipping today?

keen beacon Apr 22, 2025, 4:44 PM

#

balmy mist Is it true that Google is shipping today?

coding model maybe , or image gen

balmy mist Apr 22, 2025, 4:44 PM

#

lol idk bro, im finding tweet

novel flame Apr 22, 2025, 4:46 PM

#

I toyed with building something similar to this: https://youtu.be/1dSJ1oIBWCw?si=W01tllbd9lQvlAFU

YouTube

OrcDev

Next.js project Textual Games to Visual Novel: Implementation of DA...

⚔️ Join The Horde

Discord: https://discord.com/invite/uFB5YzH9YG
Github: https://github.com/TheOrcDev

In this video, we'll explore the exciting world of textual games and how they are being evolved into visual novels. We'll also dive into the technical aspects of implementing DALL-E-3, a new AI technology that allows for more realistic ...

▶ Play video

balmy mist Apr 22, 2025, 4:46 PM

#

this:
https://x.com/kimmonismus/status/1914645157384798279

Chubby♨️ (@kimmonismus) on X

Ok what? Google is shipping today? :O

#

i been waiting for have SOTA models in games so badly

#

im not buying a game until we have that

#

cant we use deepseek models in games?

fleet lintel Apr 22, 2025, 4:47 PM

#

balmy mist this: https://x.com/kimmonismus/status/1914645157384798279

Trolling? nothing i have seeen from Logan or other Google folks ?

balmy mist Apr 22, 2025, 4:47 PM

#

dont say that 😦

#

i was so excited

#

i started my morning real good bc of it

fleet lintel Apr 22, 2025, 4:49 PM

#

i think nothing interesting will be announced by google for couple of weeks..

#

i am waitin for deepseek r2

keen beacon Apr 22, 2025, 4:49 PM

#

balmy mist this: https://x.com/kimmonismus/status/1914645157384798279

when do they drop and where ? live stream ?

balmy mist Apr 22, 2025, 4:56 PM

#

in our dreams

#

everybody saying the same thing:
https://x.com/apples_jimmy/status/1914719405801660864

Jimmy Apples 🍎/acc (@apples_jimmy) on X

Plus users need higher o3 limits.

Tool use + search with o3 feels like magic for my use cases, coding is still 2.5 pro for me but everything else it’s o3.

#

im really gonna just get another account

#

so that 100 rpw fr $40

#

thats actually not bad, yall know if we have token limits for o3 on plus?

#

also whats the context window and token limit per message?

unborn ocean Apr 22, 2025, 5:15 PM

#

Really interesting paper (and also a quick read)

#

📎 The_Era_of_Experience_Paper.pdf

keen beacon Apr 22, 2025, 5:29 PM

#

balmy mist so that 100 rpw fr $40

i suggest sharing pro plan with 1-2 others, 65-100$

cedar tide Apr 22, 2025, 5:58 PM

#

https://x.com/lmarena_ai/status/1914737052144558512?t=jzsl4iZrOB6z-pnUOBG2oA&s=19

lmarena.ai (formerly lmsys.org) (@lmarena_ai) on X

New Arena launch: Sentiment Control - decoupling the impact of tone and emotion from response quality in human evaluation💗

How much do emojis, enthusiasm, and positive sentiment affect human preference? How can we adjust the leaderboard to counteract the effect of

balmy mist Apr 22, 2025, 6:03 PM

#

unborn ocean Really interesting paper (and also a quick read)

lmaoo quick read

#

quick read is a paragraph my guy

#

but its okay, i got ai to tdlr

balmy mist Apr 22, 2025, 6:03 PM

#

keen beacon i suggest sharing pro plan with 1-2 others, 65-100$

oh yeah you did say that yesterday

#

we should make one for this chat, if we can get 20 people we each put in $10 and make a new gmail that a mod can control or sum

#

this will really test how unlimited the requests are lol

novel flame Apr 22, 2025, 6:11 PM

#

cedar tide https://x.com/lmarena_ai/status/1914737052144558512?t=jzsl4iZrOB6z-pnUOBG2oA&s=1...

You know, I had completely forgot about Gemma 3 until yesterday, when I did a quick benchmark of the cheapest non-free models on OpenRouter and it came second after Gemini 2.0 Flash!

cedar tide Apr 22, 2025, 6:13 PM

#

novel flame You know, I had completely forgot about Gemma 3 until yesterday, when I did a qu...

Have you tried llama Maverick? and gemini 2.5 flash without reasoning?

unborn ocean Apr 22, 2025, 6:27 PM

#

balmy mist quick read is a paragraph my guy

its like 8 pages of some text in a decent font size

#

read so many papers that reading this takes like 3 min at most (although I am arguably not very thorough)

keen beacon Apr 22, 2025, 6:31 PM

#

balmy mist we should make one for this chat, if we can get 20 people we each put in $10 and...

Cmon that will get flagged , but 3 people i would say should be in the clear

mellow frigate Apr 22, 2025, 6:40 PM

#

Are there any benchmark results for gemini 2.5 flash non thinking?

cedar tide Apr 22, 2025, 6:50 PM

#

Here

cedar tide Apr 22, 2025, 6:50 PM

#

mellow frigate Are there any benchmark results for gemini 2.5 flash non thinking?

☝️

mellow frigate Apr 22, 2025, 6:51 PM

#

Ah nice find

cedar tide Apr 22, 2025, 6:53 PM

#

mellow frigate Are there any benchmark results for gemini 2.5 flash non thinking?

artificial analysis is also coming

keen fulcrum Apr 22, 2025, 6:57 PM

#

Any new models this week
r2 and qwen 3?

keen ferry Apr 22, 2025, 6:57 PM

#

keen fulcrum Any new models this week r2 and qwen 3?

i hope for r2

#

deepseek servers are gonna be dead forever

novel flame Apr 22, 2025, 6:58 PM

#

cedar tide Have you tried llama Maverick? and gemini 2.5 flash without reasoning?

I only tested Scout in this one because I was going for the cheapest (though TBF Maverick is cheap enough that it should have been included). Gem2.5 Flash failed on one of the tests so it ended up in 4th place.

cedar tide Apr 22, 2025, 6:59 PM

#

novel flame I only tested Scout in this one because I was going for the cheapest (though TBF...

2.0 better ?

#

You have you classement ?

keen fulcrum Apr 22, 2025, 6:59 PM

#

It will be quiet a while when google drops gemini 3.

novel flame Apr 22, 2025, 6:59 PM

#

cedar tide You have you classement ?

I'm putting together a sheet now, will share after

cedar tide Apr 22, 2025, 7:00 PM

#

novel flame I only tested Scout in this one because I was going for the cheapest (though TBF...

It would be good if you would try maverick

cedar tide Apr 22, 2025, 7:00 PM

#

novel flame I'm putting together a sheet now, will share after

Thx

harsh flume Apr 22, 2025, 7:05 PM

#

Little bit of a tangent

#

But I was thinking on how LLama gamed lmarena

#

by optimizing for human stylistic preference, which solved for a lot of emoji use

#

and wondered if the same would then apply let's say on dating apps chats

#

I mean, that a higher emoji style of conversation would gather favor since that can be inferred on human preference when it comes to voting ai

#

hah that's interesting

#

ive often avoided it under the impression that it'd give a 'boyish' feeling to the message instead of a man's one

#

I personally prefer non emoji responses from AI but I don't think at all my preferences fit the norm

#

Yea it's a good point

#

I mean the result speaks for itself

#

I rather disagree on the agreeable part

#

no pun intended lol

#

Yea I know what you mean but from anecdotal experience I see the opposite. There's a fine line

#

Being rather disagreeable can create some tension that then sets up for a release when the tension is diffused and that emotional roller coaster is enticing

#

that's the whole banter play

#

esp in male-female dynamics

#

but I think it extends besides that as well

#

I read the study you sent and the problem is that on face value it doesn't mean much. It could always be that people higher in extroversion would communicate with more emoji and these people would also be the ones naturally engaging in more interpersonal connection

blazing rune Apr 22, 2025, 7:19 PM

#

@ocean vortex What are your thoughts on GPT-4.1? I'm wondering about nano, Mini, and the full one, but especially the full one

ocean vortex Apr 22, 2025, 7:20 PM

#

blazing rune <@514836230802898954> What are your thoughts on GPT-4.1? I'm wondering about nan...

full one is very good

blazing rune Apr 22, 2025, 7:20 PM

#

because it's decently close to sonnet even in areas that sonnet is best at, at least according to benchmarks

#

and it's considerably cheaper, right?

ocean vortex Apr 22, 2025, 7:20 PM

#

don't do web design or visuals with it (coding visuals)

#

but other than that it\s great

blazing rune Apr 22, 2025, 7:21 PM

#

the per token price is a lot better and I think it has a more efficient tokenizer than Claude, but I could be wrong

blazing rune Apr 22, 2025, 7:21 PM

#

ocean vortex don't do web design or visuals with it (coding visuals)

oh?

#

I thought they made it way better at that

keen beacon Apr 22, 2025, 7:22 PM

#

cute

ocean vortex Apr 22, 2025, 7:22 PM

#

blazing rune I thought they made it way better at that

better but it's still nowhere near 3.7 sonnet in this specific aspect tbh

#

it's much improved but still the same size as gpt4o

#

@blazing rune 4.1:

#

3.7 sonnet:

novel flame Apr 22, 2025, 7:28 PM

#

A quick benchmark of small (cheap) models only. Note: the last 3 tests I only ran on the top models.

I was extremely surprised to see Gemma 3 punch so far above its weight (cost) here.

ocean vortex Apr 22, 2025, 7:33 PM

#

as for o3-high vs 3.7 sonnet-thinking....

o3-high:

#

3.7-thinking:

N2zUaAAAECBAgQIECAAAECBAoKiB0Fl25kAgQIECBAgAABAgQIECCQWUDsyLxdsxEgQIAAAQIECBAgQIAAgYICYkfBpRuZAAECBAgQIECAAAECBAhkFvgXOfXECGKYS6wAAAAASUVORK5CYII.png

#

there are still differences and the winner for this is clear but it looks like reasoning helps quite a bit with openai models

keen beacon Apr 22, 2025, 7:35 PM

#

novel flame A quick benchmark of small (cheap) models only. Note: the last 3 tests I only ra...

love to see self made benchmarks.

balmy mist Apr 22, 2025, 7:35 PM

#

https://x.com/arcprize/status/1914758993882562707

ARC Prize (@arcprize) on X

o3 and o4-mini on ARC-AGI's Semi Private Evaluation

* o3-medium scores 53% on ARC-AGI-1
* o4-mini shows state-of-the-art efficiency
* ARC-AGI-2 remains virtually unsolved (<3%)

Through analysis we highlight differences from o3-preview and other model behavior

#

dont know fi yall saw yet

keen beacon Apr 22, 2025, 7:35 PM

#

novel flame A quick benchmark of small (cheap) models only. Note: the last 3 tests I only ra...

hope you can extend to o3 and o4 too

zinc ore Apr 22, 2025, 7:36 PM

#

ARC 1 will soon be saturated

balmy mist Apr 22, 2025, 7:36 PM

#

weird they didnt do the high versions

thorny drum Apr 22, 2025, 7:37 PM

#

yeah are they still testing the high versions?

#

or are they just not planning on testing them at all

balmy mist Apr 22, 2025, 7:37 PM

#

i dont think so

#

but they will test o3 pro tho

#

why not give us an o4-mini pro as well?

thorny drum Apr 22, 2025, 7:39 PM

#

mini pro 😭

keen beacon Apr 22, 2025, 7:40 PM

#

balmy mist https://x.com/arcprize/status/1914758993882562707

OpenAI is so full of bs in their marketing. They showed the o3-preview benchmarks only to deliver much weaker o3

#

my guess is that only o3-pro will get 80%+ on ARC-1

#

and from someone working on the ARC-2 benchmark , he stated that ARC-2 score for o3-pro should be anywhere from 10% to 20%

#

Pretty good considering o3 preview high costed thousands per task

balmy mist Apr 22, 2025, 7:42 PM

#

keen beacon Pretty good considering o3 preview high costed thousands per task

https://x.com/arcprize/status/1912567067024453926

ARC Prize (@arcprize) on X

Clarifying o3’s ARC-AGI Performance

OpenAI has confirmed:

* The released o3 is a different model from what we tested in December 2024

* All released o3 compute tiers are smaller than the version we tested

* The released o3 was not trained on ARC-AGI data, not even the train

#

i still wanted to be able to play with that version

balmy mist Apr 22, 2025, 7:43 PM

#

keen beacon my guess is that only o3-pro will get 80%+ on ARC-1

hopefully

sage raptor Apr 22, 2025, 7:43 PM

#

100+ dollars for each question

keen beacon Apr 22, 2025, 7:43 PM

#

balmy mist https://x.com/arcprize/status/1912567067024453926

Ya ik

ocean vortex Apr 22, 2025, 7:44 PM

#

keen beacon OpenAI is so full of bs in their marketing. They showed the o3-preview benchmark...

I'm kinda surprised it scored as high as it did to be completely honest

#

Like I do not see improvement in spatial reasoning equivalent to this. But they probably hyperfocused exactly on what is being tested there

keen beacon Apr 22, 2025, 7:46 PM

#

ocean vortex Like I do not see improvement in spatial reasoning equivalent to this. But they ...

They didn't really train for it tho

ocean vortex Apr 22, 2025, 7:46 PM

#

as for o3-preview... we already knew majority voting and similar systems help a ton in this benchmark which that model was (close to pro). And yeah I completely agree that they were misleading lol

keen beacon Apr 22, 2025, 7:47 PM

#

They included the train set for o3 preview where you're specifically allowed to do that. It's a small number of questions. They didn't for the released o3

#

It was mainly the compute that got that score

ocean vortex Apr 22, 2025, 7:50 PM

#

keen beacon They included the train set for o3 preview where you're specifically allowed to ...

I think it's safe to say everyone includes everything in training data they can get away with. But they do not overfit for everything though - that wouldn't be possible

small haven Apr 22, 2025, 7:53 PM

#

day 7 and still no o3 pro

ember rapids Apr 22, 2025, 7:54 PM

#

balmy mist https://x.com/arcprize/status/1914758993882562707

not bad given the cost reduction lmao. i wonder how o3 pro will do

keen beacon Apr 22, 2025, 7:58 PM

#

keen beacon Pretty good considering o3 preview high costed thousands per task

It wasn't thousands for v1

#

Per task. The benchmark costed thousands to run tho

novel flame Apr 22, 2025, 8:00 PM

#

cedar tide It would be good if you would try maverick

I tried Maverick and it scored 15, a joint 7th spot, despite being the most expensive model of them all. Maverick hallucinates a LOT.

ocean vortex Apr 22, 2025, 8:00 PM

#

ocean vortex better but it's still nowhere near 3.7 sonnet in this specific aspect tbh

I probably said this a bit too strong looking more into it now... it's somewhat worse but not "nowhere near" anymore, quite significant improvements in webdev arena:

keen beacon Apr 22, 2025, 8:01 PM

#

keen beacon Per task. The benchmark costed thousands to run tho

I read somewhere that ARC AGI V2 costed thousands per task on o3 high preview tho. Confused it with that 🤔

ocean vortex Apr 22, 2025, 8:02 PM

#

then arc-agi and simple-bench notable improvements for o3 as well which is using the new base model. Both of these benchmarks need spatial awareness too, even though in different ways

balmy mist Apr 22, 2025, 8:02 PM

#

claude is so good

#

like its actually crushing the rest lol

#

but where is our baby NW?

#

no love the true champ

novel flame Apr 22, 2025, 8:03 PM

#

A quick benchmark of small (cheap)

ocean vortex Apr 22, 2025, 8:04 PM

#

balmy mist like its actually crushing the rest lol

it's the best for spatial awareness and relating coding tasks, but other than that... it's behind competition for most other tasks (including coding not based on visuals) tbh

keen fulcrum Apr 22, 2025, 8:20 PM

#

ocean vortex I probably said this a bit too strong looking more into it now... it's somewhat ...

It would be great if it gets access to supabase

torn mantle Apr 22, 2025, 8:24 PM

#

balmy mist https://x.com/arcprize/status/1914758993882562707

pretty good

#

tbh

#

as i said before, o3 felt like a huge leap over other models

#

its really different

balmy mist Apr 22, 2025, 8:28 PM

#

im about to just buy pro now

#

i was using 4.5 recently and its better than i thought lol

#

and its emotional iq is good

#

been using it for some of my convos with people

#

and way better any other model

#

still does some cringe stuff sometimes but for the most part its valid

#

lmaoo

#

true but if you getting o3 unlimited and o3 pro and unlimited image gen and 4.5

#

and sora

#

even tho sora kinda cheeks

#

$200 is a fair price

#

they are forcing you to pay that at this point

#

i just wish 4.5 was faster

warped estuary Apr 22, 2025, 8:39 PM

#

balmy mist true but if you getting o3 unlimited and o3 pro and unlimited image gen and 4.5

Is that with the o3 api ? For coding with cline for example

calm sequoia Apr 22, 2025, 8:39 PM

#

calm sequoia

poll_question_text

ARC AGI 2025

victor_answer_votes

5

total_votes

14

victor_answer_id

1

victor_answer_text

90% +

victor_answer_emoji_name

👀

small haven Apr 22, 2025, 8:46 PM

#

at least +1 million ppl have a pro membership, so ur losing against them by not having it

#

its a fact tho

torn mantle Apr 22, 2025, 8:47 PM

#

@keen beacon what type of reasoning effort does o3 has in lmarena?

#

is it o3-low?

small haven Apr 22, 2025, 8:49 PM

#

one of them is me

#

ok o3 says 75k; i have been debunked

balmy mist Apr 22, 2025, 8:56 PM

#

https://x.com/btibor91/status/1914785011771040098

Tibor Blaho (@btibor91) on X

According to The Information, OpenAI executive Nick Turley testified that OpenAI would consider buying Google’s Chrome browser if the court forces Google to sell it as part of remedies proposed by the DOJ against Google's alleged illegal search monopoly

- Turley testified that

brittle tiger Apr 22, 2025, 9:01 PM

#

small haven ok o3 says 75k; i have been debunked

I wonder how many will sign up for the 20k per month offering

small haven Apr 22, 2025, 9:02 PM

#

brittle tiger I wonder how many will sign up for the 20k per month offering

maybe me, if im sharing the acc with 100 other people

tall summit Apr 22, 2025, 9:12 PM

#

it doesnt know anything regarding strategy about so many games besides using the internet

keen beacon Apr 22, 2025, 9:14 PM

#

brittle tiger I wonder how many will sign up for the 20k per month offering

how many companies ..

keen beacon Apr 22, 2025, 9:14 PM

#

torn mantle <@456226577798135808> what type of reasoning effort does o3 has in lmarena?

I assume it's set to the default which is medium but I'm not sure

elder rapids Apr 22, 2025, 9:44 PM

#

balmy mist true but if you getting o3 unlimited and o3 pro and unlimited image gen and 4.5

I don't think it's a fair price, if I can get 2.5 pro unlimited, the best video gen currently for free for limited requests, unlimited image gen (although not as good), and an AI like 2.5 that can imitate 4.5, all for free

#

then I'd use Google

#

what I would buy though

#

it's all the cumulative tools and search things

#

combined with o3

#

convenience to use 4.5

elder rapids Apr 22, 2025, 9:45 PM

#

torn mantle <@456226577798135808> what type of reasoning effort does o3 has in lmarena?

if it's not specified I'm pretty sure its gonna be medium

viral sky Apr 22, 2025, 9:49 PM

#

hey all, maybe a weird question, but I just did an arena battle and really liked the output of one of the models and might be interested in using it in one of my projects. after voting, it tells me the model is called "claybrook" but I can't find any model by that name listed anywhere on the leaderboard, on hugging face, or even Google except for a reddit post from 3 days ago referring to it as an "experimental Google model," but not providing any link to more information. does anyone know where to find this?

balmy mist Apr 22, 2025, 9:49 PM

#

elder rapids I don't think it's a fair price, if I can get 2.5 pro unlimited, the best video ...

Not 4.5 or o3

#

And best image gen is OpenAI

#

And then o3 pro

elder rapids Apr 22, 2025, 9:50 PM

#

balmy mist Not 4.5 or o3

I can successfully mimick 4.5 with 2.5 pro, and 2.5 pro >> o3 for general tasks, no risk of hallucinations either

balmy mist Apr 22, 2025, 9:50 PM

#

I have been testing the models recently and o3 is very useful if areas Gemini can’t come close

elder rapids Apr 22, 2025, 9:50 PM

#

i haven't come across this

#

o3 is often way worse in general tasks

balmy mist Apr 22, 2025, 9:51 PM

#

It’s good at reasoning

elder rapids Apr 22, 2025, 9:51 PM

#

and when they're closer 2.5 pro seems to gap when you ask it to take a step back, evaluate, and move farther

balmy mist Apr 22, 2025, 9:51 PM

#

And creativity, if you combo 4.5 and o3 you get very interesting results

elder rapids Apr 22, 2025, 9:51 PM

#

whereas o3 doesn't seem to really understand beyond its initial comprehension

balmy mist Apr 22, 2025, 9:52 PM

#

Yeah you can’t

#

Especially with the search and tooling in o3

elder rapids Apr 22, 2025, 9:52 PM

#

balmy mist And creativity, if you combo 4.5 and o3 you get very interesting results

ye but rarely substantive in my case, 4.5 struggles with genuine philosophical insight

balmy mist Apr 22, 2025, 9:52 PM

#

Gemini just doesn’t have that

elder rapids Apr 22, 2025, 9:52 PM

#

ie, doesn't know the relationship between established facts and progressive inquiry

#

and how to relate that to the current situation

#

I can lmao

#

you don't know what that means

#

go ahead and ask me

#

tho

balmy mist Apr 22, 2025, 9:53 PM

#

Hallucinations are good imo

#

Good creativity

elder rapids Apr 22, 2025, 9:53 PM

#

basic established philosophical concepts, or set theory, or category theory

balmy mist Apr 22, 2025, 9:53 PM

#

Gemini is a good general model that’s it, o3 is just a different type of model

elder rapids Apr 22, 2025, 9:53 PM

#

axioms

#

it's not external knowledge like literal facts

balmy mist Apr 22, 2025, 9:54 PM

#

I think you are promoting incorrectly

elder rapids Apr 22, 2025, 9:54 PM

#

that you think I'm talking about

balmy mist Apr 22, 2025, 9:54 PM

#

How many tests have you ran?

elder rapids Apr 22, 2025, 9:54 PM

#

balmy mist How many tests have you ran?

as much as finding any reason to analyze a chat or passage, so a lot

#

and btw I started this stuff with o1 and 4o

#

not the Gemini family

#

I know their personality very well

#

and I can definitely get a lot out of them

#

but this is why I like the Gemini models so much

balmy mist Apr 22, 2025, 9:56 PM

#

Do you have plus?

elder rapids Apr 22, 2025, 9:56 PM

#

ye

balmy mist Apr 22, 2025, 9:56 PM

#

So you tried with 50 attempts?

elder rapids Apr 22, 2025, 9:56 PM

#

lmarena too lol

balmy mist Apr 22, 2025, 9:56 PM

#

On OpenAI platform it’s diff

#

You more than 50 bro

elder rapids Apr 22, 2025, 9:56 PM

#

?

balmy mist Apr 22, 2025, 9:57 PM

#

I used 2.5 pro for at least 2000 requests, how can you judge o3 from your factual prompts that only are 50

elder rapids Apr 22, 2025, 9:57 PM

#

because it's not only 50

balmy mist Apr 22, 2025, 9:57 PM

#

o3 behaves differently on the app

#

It’s obvious

#

Gemini is a good model, very solid

elder rapids Apr 22, 2025, 9:58 PM

#

this is exactly the opposite in my case lol

balmy mist Apr 22, 2025, 9:58 PM

#

But they are just diff

elder rapids Apr 22, 2025, 9:59 PM

#

ask Gemini to create its own philosophy of design and apply it

balmy mist Apr 22, 2025, 9:59 PM

#

I guess some people are better with o3 and some are better with Gemini

elder rapids Apr 22, 2025, 9:59 PM

#

and then ask o3 to create its own philosophy of design and apply it

#

to whatever you think should have sophistication

#

o3 just doesn't comprehend as much as 2.5 pro

balmy mist Apr 22, 2025, 10:00 PM

#

And also you have memory in chatgpt that’s another reason why using it on their app matters

elder rapids Apr 22, 2025, 10:00 PM

#

ye memory is fine

#

can't find a use for it tho

balmy mist Apr 22, 2025, 10:00 PM

#

Memory plus o3 plus tooling is cracked

elder rapids Apr 22, 2025, 10:00 PM

#

but I know eventually it's gonna be nice

#

but regardless

#

deadass I think 2.5 pro is just better

#

gpt models seem to be capped at their initial presentation

balmy mist Apr 22, 2025, 10:01 PM

#

I guess that’s your opinion

elder rapids Apr 22, 2025, 10:01 PM

#

same problem with 4o

balmy mist Apr 22, 2025, 10:01 PM

#

I like both

elder rapids Apr 22, 2025, 10:01 PM

#

you cant use the context as much to your advantage

balmy mist Apr 22, 2025, 10:01 PM

#

It’s worth the price to me and my use case

#

Yes you can

elder rapids Apr 22, 2025, 10:01 PM

#

and build its personality nearly as much

balmy mist Apr 22, 2025, 10:01 PM

#

That’s why memory matters lol

#

U gotta get creative with it

elder rapids Apr 22, 2025, 10:02 PM

#

balmy mist That’s why memory matters lol

that's not what I'm saying

#

ye

balmy mist Apr 22, 2025, 10:02 PM

#

Like I said that’s why memory matters

#

Yes it can

#

You guys are using it differently from me

elder rapids Apr 22, 2025, 10:02 PM

#

balmy mist Like I said that’s why memory matters

how is that relevant to its capacity

#

of doing so

#

ye

balmy mist Apr 22, 2025, 10:03 PM

#

Its damn near like micro fine tuning with the outputs that i am getting

elder rapids Apr 22, 2025, 10:03 PM

#

baseline o3 is better than 2.5

#

this is a fact

#

but after a few prompts, 2.5 pro can be improved like no other

balmy mist Apr 22, 2025, 10:04 PM

#

I guess we can agree to disagree, memory is amazing imo and it compliments o3 nicely for me

elder rapids Apr 22, 2025, 10:04 PM

#

ye, but with such a strong baseline like 2.5 pro, even being slightly weaker than o3

#

can be improved so much more

#

and that's so great

#

like in writing, how Claude can find nuances and balances

elder rapids Apr 22, 2025, 10:05 PM

#

elder rapids like in writing, how Claude can find nuances and balances

2.5 pro sees this

#

and can actively iterate through why and when it should do this

#

ye

elder rapids Apr 22, 2025, 10:06 PM

#

balmy mist I guess we can agree to disagree, memory is amazing imo and it compliments o3 ni...

I think as an ecosystem

#

gpt has to be better

#

for now

#

because veo 2 and 2.5 flash and stuff, and then canvas

#

but the tool usage

#

is just

#

godly in chatgpt

#

ye

#

but damn

#

having all that for free

#

for Gemini

balmy mist Apr 22, 2025, 10:07 PM

#

I was just saying why the $200 is fair based on what they providing

elder rapids Apr 22, 2025, 10:07 PM

#

ion know about that, deepmind prolly peeking the tool usage

tall summit Apr 22, 2025, 10:08 PM

#

elder rapids ye but rarely substantive in my case, 4.5 struggles with genuine philosophical i...

gemini 2.5 is shockingly good at this

elder rapids Apr 22, 2025, 10:08 PM

#

like apple vs Android

#

Android is just better now, but apple basically stole ts

balmy mist Apr 22, 2025, 10:08 PM

#

It technically cannot be completely reproduced based on what OpenAI has built in their app, especially it for unlimited usages

elder rapids Apr 22, 2025, 10:08 PM

#

it's the ecosystem

#

the design

#

the aesthetic of openAI

#

transcends Google

#

ye that's what I mean

#

that's what enables the analogy

balmy mist Apr 22, 2025, 10:09 PM

#

And what did Gemini score on arc?

#

I don’t even think they tested it

elder rapids Apr 22, 2025, 10:09 PM

#

def not

#

check any stat

#

ecosystem dependent

elder rapids Apr 22, 2025, 10:10 PM

#

tall summit gemini 2.5 is shockingly good at this

yep

#

the more you discuss with 2.5 pro

#

the more you realize the generalizing ability

#

is just so far ahead in it

#

it's crazy

#

makes me feel like deepmind truly understands "general intelligence"

tall summit Apr 22, 2025, 10:11 PM

#

i mean i asked it like one philosophical question but it found the main crux of it and related it to the actual mainstream philosophical ideas

elder rapids Apr 22, 2025, 10:11 PM

#

ye this is a consequence of that general ability

tall summit Apr 22, 2025, 10:12 PM

#

what a game changer for philosophy study because keeping up with every single thing is kind of impossible

#

thats why philosophy changes so much with culture

elder rapids Apr 22, 2025, 10:12 PM

#

ye but it's like, just pop that bubble of roboticness with a single prompt

tall summit Apr 22, 2025, 10:12 PM

#

i mean you should still read first party texts if its the kind of texts that benefit from being read

elder rapids Apr 22, 2025, 10:12 PM

#

whereas for gpt models you have to deadass go step by step

#

how it should act

#

examples for a respond

#

etc etc

balmy mist Apr 22, 2025, 10:13 PM

#

With a system prompt you can make any model act any way

#

That’s how I do it for Gemini and any other model I use

elder rapids Apr 22, 2025, 10:13 PM

#

balmy mist With a system prompt you can make any model act any way

not what I mean

balmy mist Apr 22, 2025, 10:13 PM

#

It’s gives you the same behavior

elder rapids Apr 22, 2025, 10:13 PM

#

this is why I gave the example of 4.5 tho

#

you said 2.5 pro can't act that way

#

I know what you mean

balmy mist Apr 22, 2025, 10:14 PM

#

I got Gemini to think it was human based on a system prompt

elder rapids Apr 22, 2025, 10:14 PM

#

but that's why you're wrong

elder rapids Apr 22, 2025, 10:14 PM

#

tall summit thats why philosophy changes so much with culture

ye

#

philosophy + AI becomes crazy

#

before, AI used to try hard to be inoffensive

#

to philosophical ideas

balmy mist Apr 22, 2025, 10:15 PM

#

System prompts are the key to llms

elder rapids Apr 22, 2025, 10:15 PM

#

and not try to touch things

elder rapids Apr 22, 2025, 10:15 PM

#

balmy mist System prompts are the key to llms

let me explain what I mean

#

given the analogy of a "wall"

#

bubble vs iron wall

#

there's a difference in how easy you can get through to something

#

whether it's any attempt to change at all

#

doesn't matter

#

it's how effortless and what entails that ease of pass

#

if 2.5 pro is the bubble that can be popped, little resistance, and gpt models can be the iron wall, that should be explanatory

balmy mist Apr 22, 2025, 10:17 PM

#

any model can be popped tho

elder rapids Apr 22, 2025, 10:17 PM

#

that's not the point

balmy mist Apr 22, 2025, 10:17 PM

#

thats why pliny can crack them all

#

that is the point, its all about the prompting at the end of the day

elder rapids Apr 22, 2025, 10:18 PM

#

balmy mist that is the point, its all about the prompting at the end of the day

yeah, so that's not the point lmao

balmy mist Apr 22, 2025, 10:18 PM

#

if you cant get a model to do something it does not mean someone else can tho

elder rapids Apr 22, 2025, 10:18 PM

#

if it's about prompting, and any model can be prompted to a certain way

#

then its an entirely different discussion here

balmy mist Apr 22, 2025, 10:18 PM

#

but you are saying you cant get certain behavior from gpt models

elder rapids Apr 22, 2025, 10:18 PM

#

balmy mist but you are saying you cant get certain behavior from gpt models

no?

balmy mist Apr 22, 2025, 10:19 PM

#

what is your argument?

elder rapids Apr 22, 2025, 10:19 PM

#

I can get o3 to act like gpt 4.5, with the help of 2.5 pro prompting it

balmy mist Apr 22, 2025, 10:19 PM

#

how are system prompts not important?

elder rapids Apr 22, 2025, 10:19 PM

#

but what if I wanted to get 2.5 pro to act like gpt 4.5

balmy mist Apr 22, 2025, 10:20 PM

#

you use prompting again

elder rapids Apr 22, 2025, 10:20 PM

#

yep

balmy mist Apr 22, 2025, 10:20 PM

#

so why isnt prompting the most importatn thing?

elder rapids Apr 22, 2025, 10:20 PM

#

no the difference is

balmy mist Apr 22, 2025, 10:20 PM

#

system prompt is just a type of prompt

elder rapids Apr 22, 2025, 10:20 PM

#

why do I need 2.5 pro to prompt o3 with me to give enough insight

#

for it to act a certain way

balmy mist Apr 22, 2025, 10:20 PM

#

bc prompting is a skill

elder rapids Apr 22, 2025, 10:20 PM

#

no no no

#

but why doesn't it go both ways

#

why can I prompt 2.5 pro to be like 4.5

#

without any help

balmy mist Apr 22, 2025, 10:21 PM

#

you can tho

#

you are chosing to

elder rapids Apr 22, 2025, 10:21 PM

#

choosing to what?

balmy mist Apr 22, 2025, 10:22 PM

#

to not use prompt without help, you can prompt with help and without help, it might be easier with help but the reality it goes both ways

elder rapids Apr 22, 2025, 10:22 PM

#

thats not what I'm saying tho

balmy mist Apr 22, 2025, 10:22 PM

#

and its all about how well yo ucan prompt

elder rapids Apr 22, 2025, 10:22 PM

#

eventually you'll get to a point with any LLM, so that it's favorable

#

but I'm talking about how it receives any quality of information

#

and how much it absorbs it

#

if I essentially need the help of 2.5 pro, to prompt engineer o3 with me, to act like 4.5

but I only need a single prompt for 2.5 pro to act like 4.5, especially WITHOUT the help of anything at all

#

what does that tell you

#

about o3, and about 2.5 pro

balmy mist Apr 22, 2025, 10:25 PM

#

i mean who cares? the magic is being able to morph and shape it with your prompts

#

i do not want it to be easy

#

i like that prompting is a skill

elder rapids Apr 22, 2025, 10:25 PM

#

but it is easy, with 2.5 pro

#

and more effective

#

yes?

balmy mist Apr 22, 2025, 10:25 PM

#

and its not easy to get anything you want from the model

elder rapids Apr 22, 2025, 10:25 PM

#

it is

#

with 2.5 pro

balmy mist Apr 22, 2025, 10:25 PM

#

thats gonna be the difference with people making money, it kinda is right now already

#

yeah but thats because you can easily insert a system prompt, its built to be shaped, o3 and openai models are not built for that

#

if that was the case they would allow us to insert system prompts

elder rapids Apr 22, 2025, 10:26 PM

#

balmy mist yeah but thats because you can easily insert a system prompt, its built to be sh...

yeah but that's my premise

#

you're acting like that's irrelevant

#

when that's the whole entire discussion

balmy mist Apr 22, 2025, 10:26 PM

#

im saying why cry about it

#

its still a good model

#

and top teir in reasoning

elder rapids Apr 22, 2025, 10:27 PM

#

yes

#

but when I can make 2.5 pro better

#

do you still not understand

brittle tiger Apr 22, 2025, 10:27 PM

#

I can't remember the last time i used a prompt for something important without an llm writing it for me

balmy mist Apr 22, 2025, 10:27 PM

#

and wen o3 pro drops the $200 will be fair like i said originally

balmy mist Apr 22, 2025, 10:27 PM

#

brittle tiger I can't remember the last time i used a prompt for something important without a...

fr

#

i have a system prompt that mkaes prompts or shapes my prompts for me

#

i already understand that talking to llms takes a lil skill

elder rapids Apr 22, 2025, 10:28 PM

#

balmy mist and wen o3 pro drops the $200 will be fair like i said originally

I disagree because that's not what makes pro valuable

balmy mist Apr 22, 2025, 10:28 PM

#

like a random person cant hop on any llm(with no experience) and get exactly what they want, usually there is misalignment

#

im saying all of it does

#

o3 pro, 4.5, 4o image, sora(which is ehh) and the ecosystem

elder rapids Apr 22, 2025, 10:29 PM

#

I'll get more out of a free unlimited 2.5 pro with 1m context, better answers, more intelligence, more flexibility

#

than o3 pro

balmy mist Apr 22, 2025, 10:29 PM

#

thats bc of how you prompt

elder rapids Apr 22, 2025, 10:29 PM

#

because, I won't just use the base model

balmy mist Apr 22, 2025, 10:29 PM

#

its higher reasoning, thats what you are getting

#

gemini is capped at reasoning

elder rapids Apr 22, 2025, 10:29 PM

#

hn

#

o1 pro is very good at initial context retention

#

by all means

#

but not progressive instructions

balmy mist Apr 22, 2025, 10:30 PM

#

no matter what you cant force higher reasoning, yeah there are some tricks, but you are not going to be able to mimic o3 pro from gemini no matter how you prompt bro

elder rapids Apr 22, 2025, 10:30 PM

#

balmy mist no matter what you cant force higher reasoning, yeah there are some tricks, but ...

deadass you can

balmy mist Apr 22, 2025, 10:30 PM

#

what

elder rapids Apr 22, 2025, 10:30 PM

#

not improving the reasoning process tho

balmy mist Apr 22, 2025, 10:30 PM

#

aii show me

#

i wanna learn

#

cause that is worth money

#

and if that is the case, why aren't more people doing that?

elder rapids Apr 22, 2025, 10:31 PM

#

I'll give an example

balmy mist Apr 22, 2025, 10:31 PM

#

please, Imma take notes lol

elder rapids Apr 22, 2025, 10:31 PM

#

let's say we ask a philosophical question

#

or actually nah

#

just a loaded question

#

"why is life painful"

#

what would you expect from the model

#

to respond with

#

"that's a loaded question, it has yadda yadda yadda"

#

right

#

but if you know other problems that you can actively tell it to demonstrate in the initial query

#

"'why is life painful' what kind of category error is this? is this a meaningful question 2.5 pro?"

#

ok now you've shifted it from an unintelligible premise and gave it enough context to respond with the level you're asking it for

#

baseline, it wouldn't have introduced the fact it's a category error to me

#

but informed me it was a loaded question

#

with 2.5 pro in my case, it's so good at improving with these hints and shifting its direction

#

it's able to apply its own developed philosophy/approach to these questions

brittle tiger Apr 22, 2025, 10:38 PM

#

it's not really a winnable debate. both are better for different things. o3 + tools is nuts for quick research. I get most value from 2.5 for my projects but it's lots of data and does better with good guidance. I think o3 is better for less crafted queries.

#

that's fair but the highest iq people I know wouldn't touch gemini before 2.5 and prefer it over everything now

elder rapids Apr 22, 2025, 10:39 PM

#

buying chrome? or successfully competing with it

#

they can't buy chrome

balmy mist Apr 22, 2025, 10:39 PM

#

my thing is why waste time trying to force that out of a model when you can just give it a system prompt prior? i noticed a huge improvement with models whne you apply a specific system prompt prior to guide/ground the convo, i still dont see how that is getting to o3 pro levels like on an arc test? if you prompt it in a certain way trying to get the answer you want defeats the purpose, o3 pro is prob gonna 0 shot prompts without the extra context and fluff

elder rapids Apr 22, 2025, 10:39 PM

#

that's impossible

#

but ye open AI is becoming a competitor

#

no? chrome isn't a subsidiary

#

lmao

brittle tiger Apr 22, 2025, 10:40 PM

#

openai would be extremely dumb to pay whatever chrome would cost

elder rapids Apr 22, 2025, 10:40 PM

#

huh?

#

chrome isn't a subsidiary

#

it's a product division of alphabet

#

Google literally could not allow that

#

it's fundemental to alphabet itself lmao

#

this won't happen

#

deadass, read the specific claims being made

brittle tiger Apr 22, 2025, 10:41 PM

#

it's the judges call. the DOJ case makes very little sense. it had nothing to do with chrome. but judge could force it eventually years from now after final appeals

elder rapids Apr 22, 2025, 10:41 PM

#

they never proved it

#

and that likely won't be the main focus anymore

brittle tiger Apr 22, 2025, 10:42 PM

#

most likely outcome is google doesn't pay $20B a year for search priority anymore

#

to apple

elder rapids Apr 22, 2025, 10:42 PM

#

and btw openAI wouldn't be able to, since it's not even fully independent

elder rapids Apr 22, 2025, 10:47 PM

#

balmy mist my thing is why waste time trying to force that out of a model when you can just...

take what I said for literally anything you ask it, o3 pro is going to eventually build the sufficient reasoning to do tasks, that's inherent to the reasoning process itself

#

and with not much effort myself for tons of gains, I can force 2.5 pro to "gain" intelligence, field specific ofc

#

and that's me building it's reasoning for tasks

#

this isnt with models in general due to what was prioritized in training

#

but just so happens 2.5 pro is just, insanely good at this

#

receiving MY reasoning

#

comes from the product itself

#

didn't choose a side

#

I have plus

#

in chatgpt

#

but goddamn

#

it's crazy

#

ong

#

basically

#

I just don't think Google is gonna take that much losses with this

balmy mist Apr 22, 2025, 10:52 PM

#

elder rapids take what I said for literally anything you ask it, o3 pro is going to eventuall...

i see what you mean, maybe this is google's niche, we are going to have multiple models in end game, so maybe we will have a usecase for each of their models, wether it be gpt5, claude 4, or gemini 3

elder rapids Apr 22, 2025, 10:52 PM

#

if any at all

elder rapids Apr 22, 2025, 10:52 PM

#

balmy mist i see what you mean, maybe this is google's niche, we are going to have multiple...

gpt 5, Claude 4 and Gemini 3 is just so cold

ocean vortex Apr 22, 2025, 10:52 PM

#

ok o3-high is really really good

#

I'm impressed with what it can do crunching numbers manually and decoding, breaking things down and doing it consistently 👀

balmy mist Apr 22, 2025, 10:53 PM

#

ocean vortex ok o3-high is really really good

which o3 level is in the ChatGPT app?

elder rapids Apr 22, 2025, 10:53 PM

#

balmy mist which o3 level is in the ChatGPT app?

medium

elder rapids Apr 22, 2025, 10:54 PM

#

ocean vortex I'm impressed with what it can do crunching numbers manually and decoding, break...

yep

balmy mist Apr 22, 2025, 10:54 PM

#

Wait, isn't R2 supposed to come out this week?

elder rapids Apr 22, 2025, 10:54 PM

#

rumors

#

if r2 is actually any good

#

I'm gonna be surprised

#

ye

balmy mist Apr 22, 2025, 10:54 PM

#

so there's no way to use o3 high right now? is o3 high just gonna be pro?

elder rapids Apr 22, 2025, 10:55 PM

#

especially compared with o3 o4 mini and 2.5 pro

brittle tiger Apr 22, 2025, 10:55 PM

#

balmy mist so there's no way to use o3 high right now? is o3 high just gonna be pro?

API i believe

elder rapids Apr 22, 2025, 10:55 PM

#

balmy mist so there's no way to use o3 high right now? is o3 high just gonna be pro?

if you used o3 pro with expectations of o3 high

#

you'd be surprised

ocean vortex Apr 22, 2025, 10:56 PM

#

balmy mist so there's no way to use o3 high right now? is o3 high just gonna be pro?

only API

balmy mist Apr 22, 2025, 10:57 PM

#

wow

elder rapids Apr 22, 2025, 10:57 PM

#

o3 pro is gonna be better

balmy mist Apr 22, 2025, 10:58 PM

#

damn so this whole time i was using o3 medium

brittle tiger Apr 22, 2025, 10:59 PM

#

love o3 but it's much more accessible than 2.5. I'd bet anything the top 10 researchers at OAI are using 2.5 than than o3

#

they are with 2.5

small haven Apr 22, 2025, 11:01 PM

#

guys stop complaining about usage and pay $200/mo, ur welcome

ocean vortex Apr 22, 2025, 11:28 PM

#

brittle tiger love o3 but it's much more accessible than 2.5. I'd bet anything the top 10 rese...

All OpenAI employees have grants for free usage I'm fairly sure lol

#

even some IT firms / startups give access to their employees to openai org/keys, effectively paying for them

#

openai are behind on model sizing and arch planning (too much on the small side with reasoning and probably too big to make sense with gpt4.5), but they seem to be well ahead on RL training and fine-tuning and even just training in general I think

brittle tiger Apr 22, 2025, 11:36 PM

#

ocean vortex All OpenAI employees have grants for free usage I'm fairly sure lol

I dont follow. Pricing isn't part of the convo when talking about model preference of the best ai researchers in the world.

ocean vortex Apr 22, 2025, 11:37 PM

#

brittle tiger I dont follow. Pricing isn't part of the convo when talking about model preferen...

then what did you mean by "more accesible"? I assumed you made a typo too and meant to say 2.5 is more accessible

brittle tiger Apr 22, 2025, 11:40 PM

#

i mean o3 is way more wow factor to majority of ppl. elite ppl in niche fields get more out of 2.5

small haven Apr 22, 2025, 11:40 PM

#

so o3 can handle less context than o1 pro, weird

ocean vortex Apr 22, 2025, 11:43 PM

#

brittle tiger i mean o3 is way more wow factor to majority of ppl. elite ppl in niche fields g...

o3 has some needless hype to it for sure, and me personally I wasn't all that impressed with o3 on chatgpt at first. But if you look at o3-high it really is much better at reasoning and breaking things down / doing a lot - more so than 2.5. Even o3-medium is marginally better than 2.5 if you just look at their metrics... It has some flaws but overall it is a slightly better model

#

then you also have the tools that can help for sure on cgpt website

brittle tiger Apr 22, 2025, 11:44 PM

#

I really really like o3. all i was saying is i bet best openai ppl are using 2.5 more

#

without context and speed i don't think that would be the case tho

ocean vortex Apr 22, 2025, 11:45 PM

#

brittle tiger I really really like o3. all i was saying is i bet best openai ppl are using 2.5...

I mean I don't see how you can get more out of 2.5. For niche specific things sure, but generally speaking... it outputs less than o3

#

so if you need for it to analyze something exhaustively and do it reliably, I would bet on o3 with price not being an issue

#

more test-time compute (longer outputs) and less likely to hallucinate

#

2.5 pro sometimes does this thing where it simply takes a shortcut and guesses the final answer whichever sounds plausible enough at the time lol

#

but yeah that's not to say it is not close. Still very performant and very close, it's just that I wouldn't bet on it getting things done over o3

warped estuary Apr 23, 2025, 12:25 AM

#

Can someone confirm that openai pro gives you unlimited api calls to use? For example with codex or cline

balmy mist Apr 23, 2025, 12:36 AM

#

warped estuary Can someone confirm that openai pro gives you unlimited api calls to use? For ex...

who tols you this?

#

told*

#

you giving him the gems lmaoo

keen fulcrum Apr 23, 2025, 12:41 AM

#

No

#

We are moving there anyway and it will be beneficial to have a deeper understanding

#

It will begin with novel discoveries

#

Because in immoral hands its an effective weapon
especially uncensored ones

#

Lets hope humans won't destroy themselves

warped estuary Apr 23, 2025, 1:18 AM

#

balmy mist told*

No one did I'm just trying to determine because I mainly use llms for coding

small haven Apr 23, 2025, 1:27 AM

#

brittle tiger I really really like o3. all i was saying is i bet best openai ppl are using 2.5...

bruh loool they surely are using a year ahead internal model as we speak, at least o4 pro

fringe carbon Apr 23, 2025, 1:40 AM

#

small haven bruh loool they surely are using a year ahead internal model as we speak, at lea...

year ahead seems a bit much

brittle tiger Apr 23, 2025, 1:40 AM

#

U can 5x your money betting on it if you believe that

small haven Apr 23, 2025, 1:43 AM

#

fringe carbon year ahead seems a bit much

deep research was started last year and got released two months ago

brittle tiger Apr 23, 2025, 1:44 AM

#

How much did you put down?

small haven Apr 23, 2025, 1:45 AM

#

ur better off betting on whatever meta is releasing, they love overfitting lmarena lmao

wintry tinsel Apr 23, 2025, 1:51 AM

#

Imagine having a nuclear reactor, some Vietnamese slaves, to work the nuclear reactor, and your own 1 million GPU mega cluster, you could “locally” run O3 an unlimited amount, that’s the life

fringe carbon Apr 23, 2025, 1:51 AM

#

i mean there is some guy on fiverr right now looking for ppl to vote for o3

#

so like solid bet ig

#

ggs for me if it works out for him

#

could be u ig

balmy mist Apr 23, 2025, 2:26 AM

#

How?

fringe carbon Apr 23, 2025, 2:30 AM

#

balmy mist How?

he’s simultaneously betting on oai and paying ppl to vote in the arena

small haven Apr 23, 2025, 2:30 AM

#

i know theres a lmarena intern sneaking in a bet prerelease 😭

fringe carbon Apr 23, 2025, 2:30 AM

#

pretty easy trade

balmy mist Apr 23, 2025, 2:50 AM

#

fringe carbon he’s simultaneously betting on oai and paying ppl to vote in the arena

lmaooo

tall summit Apr 23, 2025, 5:20 AM

#

fringe carbon he’s simultaneously betting on oai and paying ppl to vote in the arena

LOL

kind cloud Apr 23, 2025, 6:33 AM

#

I think claybrook is gone from arena.

kind cloud Apr 23, 2025, 6:50 AM

#

I've played lmarena for 45 minutes and I've not seen it yet.

small haven Apr 23, 2025, 6:52 AM

#

top workflow rn is asking o3 for git diffs and o1 pro to apply it in full

keen beacon Apr 23, 2025, 6:54 AM

#

small haven top workflow rn is asking o3 for git diffs and o1 pro to apply it in full

im curious about this ..

golden ocean Apr 23, 2025, 6:55 AM

#

Any way to make gemini not touch code that i didnt tell it to or thats just how gemini 2.5 is

#

most annoying model ive worked with

keen beacon Apr 23, 2025, 6:56 AM

#

atm i ask o3 for code and at the top to note down the file path , for example # project/app/scr/main.tsx
then i have a script that copies that output and writes it directly to that file
then i also have ngrok set up so that o3 can visit my site and see the result of what it wrote

that way it writes code, visits site to review, then modifies code again if need be, all in 1 prompt

small haven Apr 23, 2025, 6:56 AM

#

keen beacon im curious about this ..

actually; o3 for git diffs, apply with gemini 2.5 in cursor (as a test) while waiting for o1 pro to craft the fully applied file.

calm sequoia Apr 23, 2025, 6:59 AM

#

o3 mini-high performed better on ARC AGI 2 than the Gemini 2.5 Pro 😄 What a humiliation

golden ocean Apr 23, 2025, 7:00 AM

#

#

keen beacon Apr 23, 2025, 7:00 AM

#

lol

small haven Apr 23, 2025, 7:01 AM

#

keen beacon atm i ask o3 for code and at the top to note down the file path , for example # ...

thats lowkey op for frontend esp for debugging, wonder if it checks the site code content or just text.. does it

keen beacon Apr 23, 2025, 7:04 AM

#

small haven thats lowkey op for frontend esp for debugging, wonder if it checks the site cod...

ive made endpoitns for it to view existing code, for example site/app/src/main returns the code for main.tsx

#

atm i want to make it so that o3 can write just the difference, not the whole file, will probably do it via git diff, but still learning how

#

also I need to have some logs for the bash commands that o3 runs in an endpoint. after that maybe magic can happen

#

But i find o3 lazy as hell, once i asked to iterrate getting data from a site until it got the right result
for instance try and get the names of the top models in lmarea, iterate until they are = ['gemini-2.5-pro-exp-03-25',
'chatgpt-4o-latest-20250326',
'grok-3-preview-02-24',
.....

code by o3

try:
some code that didnt work
except:
fallback_list = ['gemini-2.5-pro-exp-03-25',
'chatgpt-4o-latest-20250326',
'grok-3-preview-02-24',
'gpt-4.5-preview-2025-02-27',
'gemini-2.5-flash-preview-04-17',
'gemini-2.0-pro-exp-02-05',
'gemini-2.0-flash-thinking-exp-01-21',
'deepseek-v3-0324',
'deepseek-r1',
'gemini-2.0-flash-001']
return fallback_list

And I thought the code was working, i didnt know this mf cheated

torn mantle Apr 23, 2025, 7:20 AM

#

cobalt-exp-beta-v6

#

a lot of versions already

calm sequoia Apr 23, 2025, 7:36 AM

#

Considering the number of models that are being tested as anonymous but not released, the lmarena is going into direction of becoming RLHF platform instead of a benchmark.

alpine coral Apr 23, 2025, 7:42 AM

#

lol yeah kinda feels a bit like that doesn't it

#

i preferred it as an academic project..

#

now it's a sequoia-backed start up...

kind cloud Apr 23, 2025, 7:50 AM

#

tomay seems to be connected to the internet

fleet lintel Apr 23, 2025, 8:06 AM

#

kind cloud I think claybrook is gone from arena.

looks like it

#

unfortunately there is nothing in Arena right now that excites me.
After NW, all models are just meh

sage raptor Apr 23, 2025, 9:09 AM

#

Dayhush is not meh

cedar tide Apr 23, 2025, 9:46 AM

#

grok 3 preview is new to the WebdevArena?

plain zinc Apr 23, 2025, 10:17 AM

#

kind cloud tomay seems to be connected to the internet

Has he started appearing more often?

#

I've NEVER had it before.

ornate stump Apr 23, 2025, 10:41 AM

#

Usually, the image models, like Imagen, they gonna be tested somewhere?

cedar tide Apr 23, 2025, 11:34 AM

#

But there are just the early version in the leaderboard

keen fulcrum Apr 23, 2025, 11:35 AM

#

https://www.androidauthority.com/google-one-ai-premium-pro-plus-plans-apk-teardown-3547130/

Android Authority

Google could soon introduce more Google One AI plans to take on Cha...

Google One could soon introduce "AI Premium Plus" and "AI Premium Pro" tiers, possibly spreading AI features across more price points.

calm spear Apr 23, 2025, 12:34 PM

#

current LMArena UI will is going to be fully replaced with new one?

calm spear Apr 23, 2025, 12:34 PM

#

fleet lintel unfortunately there is nothing in Arena right now that excites me. After NW, a...

NW?

#

o3 doesn't look much different in terms of performance from Gemini-2.5 pro exp

balmy mist Apr 23, 2025, 12:39 PM

#

they raised the usage for o3?

plain zinc Apr 23, 2025, 12:40 PM

#

cedar tide I gave Gemini the opportunity to think in the middle of his answers

Give it to me, prompt, please

#

Bro @cedar tide, please 🙏🙏🙏

#

That's what I needed too.

cedar tide Apr 23, 2025, 12:41 PM

#

plain zinc That's what I needed too.

one it doesn't always work and two I haven't noticed any improvement in performance and three what would your use be?

balmy mist Apr 23, 2025, 12:44 PM

#

its 50 per day now

#

for plus

#

wym?

#

for students

#

you found a work around for everyone else?

#

not sure, but it should be good at it

#

i really hope they surprise us and release o3 pro this week

#

that would make me so happy

#

bruhh

#

so like 50 per week lol

#

yeah that will be the last time i pay anything for openai

#

this is why google will win

plain zinc Apr 23, 2025, 12:48 PM

#

cedar tide one it doesn't always work and two I haven't noticed any improvement in performa...

For example, so that the model immediately analyzes exactly when writing the code, and not when it is finished.

balmy mist Apr 23, 2025, 12:48 PM

#

you just said they out of gpus

plain zinc Apr 23, 2025, 12:48 PM

#

Because 2.5 has syntax errors.

balmy mist Apr 23, 2025, 12:48 PM

#

thats never going to be an issue with google

plain zinc Apr 23, 2025, 12:48 PM

#

And they need to be fixed somehow.

keen beacon Apr 23, 2025, 12:48 PM

#

plain zinc For example, so that the model immediately analyzes exactly when writing the cod...

its out of distribution it will probably harm performance. but it isnt hard to do. claude is way more fun since u can actually break the chat template

balmy mist Apr 23, 2025, 12:49 PM

#

bro its not chatgpt vs gemini, its google vs openai which is what you are missing

#

and you just said openai running out of gpus

sage raptor Apr 23, 2025, 12:49 PM

#

no

balmy mist Apr 23, 2025, 12:50 PM

#

google has way more money and are releasing their models for free already

#

and more ppl know about google than openai

#

more peopl trust google then openai

keen beacon Apr 23, 2025, 12:50 PM

#

chatgpt is ai to most people

balmy mist Apr 23, 2025, 12:50 PM

#

ask people 40 year old and up

#

they dont know about chatgpt

fleet lintel Apr 23, 2025, 12:51 PM

#

O3 pro is not on Arena. Atleast I haven't encountered it

balmy mist Apr 23, 2025, 12:51 PM

#

but they know google

keen beacon Apr 23, 2025, 12:51 PM

#

4o native image gen was prob much bigger than gemini 2.5 pro to the public

balmy mist Apr 23, 2025, 12:51 PM

#

what browser are yall on right now?

#

openai is even trying to buy chrome lol

cedar tide Apr 23, 2025, 12:51 PM

#

plain zinc Give it to me, prompt, please

just in the prompt system explain to him exactly that he can think several times in the middle of his response, and explain to him very clearly when to do it how to do it and that's it, if it doesn't work try to improve the prompt each time

balmy mist Apr 23, 2025, 12:51 PM

#

bc they cant beat google

#

no company can tbh, elon and sam founded openai trying to do that and look at them now? two opposing companies instead of being united against google

#

isnt openai losing money?

keen beacon Apr 23, 2025, 12:52 PM

#

mindshare doesnt really matter in the end tbh. whoever achieves agi first will matter the most

balmy mist Apr 23, 2025, 12:52 PM

#

thats why they charge $200?

fleet lintel Apr 23, 2025, 12:53 PM

#

I think most peolpe dont want to pay for GPT .. only business folks wants to pay

balmy mist Apr 23, 2025, 12:53 PM

#

agi is subjective

#

agi is based on the time period, to some we already reached agi

#

why do people keep leaving open ai?

fleet lintel Apr 23, 2025, 12:53 PM

#

If Google AI overview becomes good enough for most queries, why individuals will ever pay for chatgpt or even gemini.google.com
only freelancers and companies will pay for higher quality agentic workflows