#general | Arena | Page 18

brittle tiger Apr 10, 2025, 5:28 PM

#

there is danger in normie-maxxing. ceding the most elite intelligent power users can have downsides

fleet lintel Apr 10, 2025, 5:29 PM

#

disagree. Coders are one of the most important audience to win over... their support can directly influence the companies purchasing decisions.

thorny drum Apr 10, 2025, 5:31 PM

#

i mean if powerusers just pick the best model they can just make the best coding model later

keen beacon Apr 10, 2025, 5:31 PM

#

What is the news in one sentence?

brittle tiger Apr 10, 2025, 5:31 PM

#

keen beacon What is the news in one sentence?

chagpt will now be able to access all of your past chats

oblique flint Apr 10, 2025, 5:31 PM

#

keen beacon What is the news in one sentence?

nothing ever happens

plain zinc Apr 10, 2025, 5:31 PM

#

What OpenAI showed today:

Updated Memory
Counterclaim

torn mantle Apr 10, 2025, 5:31 PM

#

keen beacon What is the news in one sentence?

memory improvement

tall summit Apr 10, 2025, 5:31 PM

#

technically "Starting today, memory in ChatGPT can now reference all of your past chats to provide more personalized responses, drawing on your preferences and interests to make it even more helpful for writing, getting advice, learning, and beyond." is one sentence

plain zinc Apr 10, 2025, 5:31 PM

#

There will be another Google presentation today.

torn mantle Apr 10, 2025, 5:31 PM

#

sama hype

plain zinc Apr 10, 2025, 5:32 PM

#

Gemini 2.5 flash with thought process control is inevitable

lime coral Apr 10, 2025, 5:32 PM

#

plain zinc There will be another Google presentation today.

When

brittle tiger Apr 10, 2025, 5:32 PM

#

plain zinc There will be another Google presentation today.

it's possible somethign comes out at 5pm ET developer keynote. nightwhisper would be lit but i doubt it

keen beacon Apr 10, 2025, 5:32 PM

#

please google 🙏

plain zinc Apr 10, 2025, 5:32 PM

#

lime coral When

after 4 hours

tall summit Apr 10, 2025, 5:33 PM

#

please google 🙏 free access to the most powerful model 🥺

brittle tiger Apr 10, 2025, 5:33 PM

#

LoganK is presenting so maybe

keen beacon Apr 10, 2025, 5:33 PM

#

brittle tiger LoganK is presenting so maybe

👀

#

imo people who use llms for coding are the least loyal to any one lab

#

like i switched from claude to deepseek to claude to google

tall summit Apr 10, 2025, 5:34 PM

#

keen beacon like i switched from claude to deepseek to claude to google

so me

brittle tiger Apr 10, 2025, 5:34 PM

#

I'm not saying it's wrong strategy but there are downsides. "Sources caution that OpenAI has delayed the introduction of some new models recently due to capacity issues" anime is literally the reason ppl aren't using o4 rn

keen beacon Apr 10, 2025, 5:35 PM

#

? they are giving out 4.1 and 4.1 mini out for free right now

#

for people who's primary usecase for llms is code, they'll just go wherever the best model is

#

O3 full today?

#

which is quite different to other consumers

#

there are countless people i know who basically only know chatgpt and maybe deepseek

tall summit Apr 10, 2025, 5:35 PM

#

keen beacon for people who's primary usecase for llms is code, they'll just go wherever the ...

are there reasons why someone wouldn't if they weren't coding? (if they knew of others)

keen beacon Apr 10, 2025, 5:36 PM

#

keen beacon ? they are giving out 4.1 and 4.1 mini out for free right now

Where

keen beacon Apr 10, 2025, 5:36 PM

#

keen beacon Where

quasar alpha and optimus alpha

keen beacon Apr 10, 2025, 5:36 PM

#

tall summit are there reasons why someone wouldn't if they weren't coding? (if they knew of ...

most people outside of our bubble have very little knowledge of this stuff tbh

fleet lintel Apr 10, 2025, 5:36 PM

#

100%

keen beacon Apr 10, 2025, 5:36 PM

#

keen beacon quasar alpha and optimus alpha

O4.1?

sage raptor Apr 10, 2025, 5:36 PM

#

in 3 hours

keen beacon Apr 10, 2025, 5:36 PM

#

gemini is better known than claude

keen beacon Apr 10, 2025, 5:36 PM

#

sage raptor in 3 hours

demos demos demos

#

i am

#

gemini is more consumer focused than claude is

#

and google have thrown more money behind marketing

tall summit Apr 10, 2025, 5:37 PM

#

keen beacon most people outside of our bubble have very little knowledge of this stuff tbh

https://xkcd.com/2501/

xkcd: Average Familiarity

zinc ore Apr 10, 2025, 5:37 PM

#

Easy to verify, more people use Gemini than Claude

keen beacon Apr 10, 2025, 5:37 PM

#

anthropic is relatively enterprise focused

oblique flint Apr 10, 2025, 5:37 PM

#

keen beacon gemini is better known than claude

the model sure, but in cursor and roocode claude is still better

#

nvm misread

keen beacon Apr 10, 2025, 5:37 PM

#

oblique flint the model sure, but in cursor and roocode claude is still better

yeah i know, that's down to the implementations

brittle tiger Apr 10, 2025, 5:37 PM

#

i wouldn't be surprised if gemini 2.5 has a mini-ghibli moment around minecraft with zoomers.

keen beacon Apr 10, 2025, 5:37 PM

#

claude is still better at instruction and format following

keen beacon Apr 10, 2025, 5:37 PM

#

tall summit https://xkcd.com/2501/

🗣️

torn mantle Apr 10, 2025, 5:38 PM

#

interesting

torn mantle Apr 10, 2025, 5:38 PM

#

sage raptor in 3 hours

in 3h

#

i see

#

software engineering?

#

hopefully they introduce something new

oblique flint Apr 10, 2025, 5:38 PM

#

what makes chatgpt more appealing to normies as well is the zoomer language. Claude also has some sort of personality which people might prefer. Gemini is just full on robot

torn mantle Apr 10, 2025, 5:38 PM

#

not just something built on gemini

brittle tiger Apr 10, 2025, 5:38 PM

#

all google has to do is sponsor some of the minecraft youtubers (1 trillion minecraft views on youtube) to generate bases and make it into a game. the generations will blow their zoomer brains

keen beacon Apr 10, 2025, 5:39 PM

#

brittle tiger all google has to do is sponsor some of the minecraft youtubers (1 trillion mine...

this is a good point

sonic tendon Apr 10, 2025, 5:39 PM

#

keen beacon like i switched from claude to deepseek to claude to google

same

fleet lintel Apr 10, 2025, 5:39 PM

#

brittle tiger all google has to do is sponsor some of the minecraft youtubers (1 trillion mine...

google sucks at marketing

tall summit Apr 10, 2025, 5:39 PM

#

oblique flint what makes chatgpt more appealing to normies as well is the zoomer language. Cla...

chatgpt has little barrier of entry and was the biggest in the beginning and continues to be big so people still use it

sonic tendon Apr 10, 2025, 5:40 PM

#

something something google trends

#

the website, or the model?

tall summit Apr 10, 2025, 5:40 PM

#

i'm most hyped for V4

sonic tendon Apr 10, 2025, 5:40 PM

#

why not the model?

keen beacon Apr 10, 2025, 5:40 PM

#

gimme R2

#

we should be getting it in the next couple weeks

sonic tendon Apr 10, 2025, 5:41 PM

#

the weights are free tho

tall summit Apr 10, 2025, 5:41 PM

#

are non-reasoning models becoming underrated or somethin

#

V4 > R2!!!

#

ummm ok man

oblique flint Apr 10, 2025, 5:42 PM

#

tall summit chatgpt has little barrier of entry and was the biggest in the beginning and con...

people might not know other models are better for their usecase. For example my sister uses llms for helping with studying, summaries, explanations etc. She was using chatgpt, then I told about 2.5 pro in ai studio and she said it was way better for her usecase. But like, no normal person would know about ai studio lol

sonic tendon Apr 10, 2025, 5:42 PM

#

https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html

keen beacon Apr 10, 2025, 5:43 PM

#

debatable

tall summit Apr 10, 2025, 5:43 PM

#

an interesting statement

keen beacon Apr 10, 2025, 5:43 PM

#

grok's frontend is very.. simplified

sonic tendon Apr 10, 2025, 5:43 PM

#

debatable

keen beacon Apr 10, 2025, 5:43 PM

#

so does chatgpt iirc

balmy mist Apr 10, 2025, 5:44 PM

#

did i miss the big OA news?

keen beacon Apr 10, 2025, 5:44 PM

#

and there are things that chatgpt does better than grok

#

for example

#

grok gets horrendously laggy when it streams a long response

#

chatgpt does not

tall summit Apr 10, 2025, 5:44 PM

#

balmy mist did i miss the big OA news?

now chatgpt can use info from past chats, not just "memory"

#

amazing, i know.

oblique flint Apr 10, 2025, 5:44 PM

#

worst frontend is gemini imo

keen beacon Apr 10, 2025, 5:44 PM

#

oblique flint worst frontend is gemini imo

yeah

#

they also nuke model performance with their system prompt

#

👎

sonic tendon Apr 10, 2025, 5:45 PM

#

trying to log in with a google account sends me into a redirect loop

keen beacon Apr 10, 2025, 5:45 PM

#

the only reason to use the gemini frontend is deep research

balmy mist Apr 10, 2025, 5:45 PM

#

tall summit now chatgpt can use info from past chats, not just "memory"

omgg sama cooked like usual!!

sonic tendon Apr 10, 2025, 5:45 PM

#

and has since last week

fleet lintel Apr 10, 2025, 5:45 PM

#

oblique flint worst frontend is gemini imo

hard agree ... unfortunately

sonic tendon Apr 10, 2025, 5:45 PM

#

keen beacon the only reason to use the gemini frontend is deep research

? i like it more than ai studio

sonic tendon Apr 10, 2025, 5:45 PM

#

oblique flint worst frontend is gemini imo

but yeah

tall summit Apr 10, 2025, 5:45 PM

#

i'd say chatgpt is

balmy mist Apr 10, 2025, 5:45 PM

#

grok ui is clean af

#

the way thinking shows upis amazing

tall summit Apr 10, 2025, 5:46 PM

#

clean ui tho

keen beacon Apr 10, 2025, 5:48 PM

#

sonic tendon ? i like it more than ai studio

i mean functionally

#

visually sure

fleet lintel Apr 10, 2025, 5:48 PM

#

balmy mist omgg sama cooked like usual!!

i think you mean "uncooked"

keen beacon Apr 10, 2025, 5:49 PM

#

i highly doubt it

#

skill diff tho

oblique flint Apr 10, 2025, 5:50 PM

#

until recently you couldnt even upload js files to gemini frontend lol

keen beacon Apr 10, 2025, 5:50 PM

#

ya they trained it far more than their competitors

balmy mist Apr 10, 2025, 5:50 PM

#

how do you know if you have the enw memeory update?

#

its actually might not be bad

#

but i stopped using chatgpt

#

onl y use it for 4.5 and image gen

#

and thats barely

#

so my past convos that mattered were over a month ago

#

it might be good to just put a bunch of info about yourself in a chat and use gpt to reference them

tall summit Apr 10, 2025, 5:52 PM

#

it is bad at distinguishing between different parts of memory
like "no i dont want this to impact the response that's not what i'm talking about"
has happened to me a lot

balmy mist Apr 10, 2025, 5:53 PM

#

yeah i see wha tyou mean

#

someone need to take sama twitter from him

#

giving us blue balls lol

#

talkign about it keeps him up at nihgt lmaoo

ocean vortex Apr 10, 2025, 5:54 PM

#

tall summit it is bad at distinguishing between different parts of memory like "no i dont wa...

Honestly I just keep it off personally for this very reason. I want it to be predictable and not having accuracy influenced by random things

fleet lintel Apr 10, 2025, 5:54 PM

#

balmy mist giving us blue balls lol

And it's all in bad faith ... ahhh... i m pissed

tall summit Apr 10, 2025, 5:54 PM

#

ocean vortex Honestly I just keep it off personally for this very reason. I want it to be pre...

exactly

keen beacon Apr 10, 2025, 5:55 PM

#

ocean vortex Honestly I just keep it off personally for this very reason. I want it to be pre...

!!!

#

yeah i start new conversations often because clogging the context window with irrelevant info is bound to make the model, especially one on the slightly weaker side context handling wise, worse at any reasoning/math/etc task

oblique flint Apr 10, 2025, 5:55 PM

#

but also, wont having all that memory stuff into context degrade the performance?

balmy mist Apr 10, 2025, 5:56 PM

#

optimus alpha:
https://www.vibeshare.ai/c/wB3laqtpau

Vibeshare

Share your vibe-coded web apps

tall summit Apr 10, 2025, 5:56 PM

#

almost surely

balmy mist Apr 10, 2025, 5:56 PM

#

not bad

tall summit Apr 10, 2025, 5:56 PM

#

just shove the context you need into whatever message you're typing like everyone else!

tall summit Apr 10, 2025, 5:58 PM

#

balmy mist optimus alpha: https://www.vibeshare.ai/c/wB3laqtpau

good at the generic web app aesthetic

balmy mist Apr 10, 2025, 5:58 PM

#

tall summit good at the generic web app aesthetic

yeah but it got help from gemini at the end lol

tall summit Apr 10, 2025, 5:58 PM

#

balmy mist yeah but it got help from gemini at the end lol

pfttt

balmy mist Apr 10, 2025, 5:58 PM

#

and that was after 10 iterations in my app

tall summit Apr 10, 2025, 5:59 PM

#

i've never understood how move damage is calculated
sounds like a lot of things, and what's shown on the move info is just "Power"

#

maybe time to learn

balmy mist Apr 10, 2025, 6:00 PM

#

lmaoo

#

people say ai making people bad at coding or creates lazy coders, but tbh it has made me so much better than i was before and I am a software dev lol

#

i never loved software dev as much as i do now

tall summit Apr 10, 2025, 6:01 PM

#

balmy mist people say ai making people bad at coding or creates lazy coders, but tbh it has...

in what way?

balmy mist Apr 10, 2025, 6:03 PM

#

tall summit in what way?

so when you do work for a company like my current and past jobs, you are coming into a codebase of years of work usually and working with other people and you really feel disconnected to the work, and you have to build this understanding of that codebase overtime and still at the end you dont care about it as much as you should bc its not yours

#

now when it comes to coding with ai i feel that same but in the opposite

fleet lintel Apr 10, 2025, 6:04 PM

#

balmy mist people say ai making people bad at coding or creates lazy coders, but tbh it has...

I think long term impact of AI on coders is yet to be seen. almost all projects tends to become complicated over time and AI is not yet good with fixing and maintaning large scale projects. My concern is that it will hard to find coders (after 5-10 years) that could dive deep in complicated projects and make sense out of them and furhter develop them

balmy mist Apr 10, 2025, 6:04 PM

#

now i may not know all the code that is happening but i am guiding the ai and know whats happening system wide and know all the connections and why every part is needed and its so much fun to develop and create things

tall summit Apr 10, 2025, 6:04 PM

#

balmy mist now i may not know all the code that is happening but i am guiding the ai and kn...

thats fair enough

balmy mist Apr 10, 2025, 6:05 PM

#

fleet lintel I think long term impact of AI on coders is yet to be seen. almost all project...

i think thats where passion comes in

fleet lintel Apr 10, 2025, 6:05 PM

#

balmy mist i think thats where passion comes in

what if people start relying so much on AI that they just lose the skill to understnad or go deep into the system?

balmy mist Apr 10, 2025, 6:05 PM

#

you have to find people that actually care about the work

tall summit Apr 10, 2025, 6:05 PM

#

fleet lintel I think long term impact of AI on coders is yet to be seen. almost all project...

when "vibe coding" became a real term i knew ai coding was entering another era

balmy mist Apr 10, 2025, 6:06 PM

#

even if ai is doing a lot of the busy body code work, they will care and try their best to optimize and make the app/system better

balmy mist Apr 10, 2025, 6:06 PM

#

fleet lintel what if people start relying so much on AI that they just lose the skill to unde...

hire new people lmaoo

tall summit Apr 10, 2025, 6:06 PM

#

fleet lintel what if people start relying so much on AI that they just lose the skill to unde...

happens all the time

balmy mist Apr 10, 2025, 6:06 PM

#

people will dive deep if they care

#

if they dont care they wont

#

but we still gotta see how this plays out

#

but im just saying from my experience

#

especially with 2.5

#

i have been having so much fun

fleet lintel Apr 10, 2025, 6:07 PM

#

yeah, too early to say .

tall summit Apr 10, 2025, 6:07 PM

#

i never stop getting excited thinking about the potential of humans with the help of ai

fleet lintel Apr 10, 2025, 6:07 PM

#

My personal theory is that AI wont decrease the number of jobs overtime in Software industry but it will end up decreasing the pay scale

balmy mist Apr 10, 2025, 6:08 PM

#

bingo

#

i thinks gonna happen for a lot of jobs

#

like lawyers, doctors

#

etc..

#

the high level skill jobs

#

and there will be more devs than ever

#

since anybody can start coding or building

#

just a small amount of highly skilled devs

#

like how we have cameras

#

we have highly skilled photographers

#

but anybody can take great pics with their phones

fleet lintel Apr 10, 2025, 6:09 PM

#

+1 yeah.. that's what I think as well

upper wolf Apr 10, 2025, 6:15 PM

#

ehh, i disagree. it has ALREADY significantly decreased the number of available softeng and/or compsci positions.

#

Ask anyone who has graduated with a BS in compsci in the last 2 years and is attempting to find even just one internship. I’m not saying it’s impossible, but it’s become incredibly more difficult. There’s a lower pool of open slots.

fleet lintel Apr 10, 2025, 6:21 PM

#

not because of AI. overhiring during covid and trying to make wallstreat happy is the reason behind less number of jobs

upper wolf Apr 10, 2025, 6:23 PM

#

Okay, that’s true, those are definitely also big factors

#

But many companies ARE cutting people or reducing the amount of overall cs positions in favor of generative AI

#

And have been since 2023-2024

calm sequoia Apr 10, 2025, 6:28 PM

#

upper wolf Ask anyone who has graduated with a BS in compsci in the last 2 years and is att...

I'm actually hiring a lot of interns. They are much more useful. Their worthless code can be checked by the LLM instead of me. Also, they do not write absolute non-sense, because the LLM guides them before I even see the code. On the other hand, I dont trust what they write anymore. Only human-speech is valid benchmark.

upper wolf Apr 10, 2025, 6:30 PM

#

calm sequoia I'm actually hiring a lot of interns. They are much more useful. Their worthless...

Speech works well but some of em just have an ai listener turned on in the background 😂

#

And display what to say in a small screen under the camera

#

you know what i mean.

wintry tinsel Apr 10, 2025, 6:31 PM

#

keen beacon !!!

This you?

calm sequoia Apr 10, 2025, 6:32 PM

#

What a dissapointment

#

I've turned of memory since new year

calm sequoia Apr 10, 2025, 6:33 PM

#

upper wolf And display what to say in a small screen under the camera

Yeah, but In my case, its in real life, not through webcam.

upper wolf Apr 10, 2025, 6:33 PM

#

Ah, gotchu

brittle tiger Apr 10, 2025, 6:34 PM

#

Maybe we are getting something from goog at keynote

https://x.com/rseroter/status/1910395857054425183?t=OA3KNQQgREmH6ZnGay7LAw&s=19

Richard Seroter (@rseroter) on X

Brace yourself. This afternoon's #GoogleCloudNext developer keynote will be ... something.

Watch here @ 230pm PT: https://t.co/NGkqo19hx8

I'm hosting with @stephr_wong.

We're featuring great tech and amazing poeple like @OfficialLoganK, @DynamicWebPaige, @AbiramiSukumara,

keen beacon Apr 10, 2025, 6:35 PM

#

wintry tinsel This you?

yup (i made that account a long time ago and you can't change your reddit username lmao)

keen beacon Apr 10, 2025, 6:35 PM

#

brittle tiger Maybe we are getting something from goog at keynote https://x.com/rseroter/sta...

ooh

wintry tinsel Apr 10, 2025, 6:36 PM

#

keen beacon yup (i made that account a long time ago and you can't change your reddit userna...

No need that’s a fabulous name

keen beacon Apr 10, 2025, 6:36 PM

#

o.o i just got veo 2 on aistudio

tall summit Apr 10, 2025, 6:37 PM

#

never knew the CNN's Fear & Greed index existed

calm sequoia Apr 10, 2025, 6:38 PM

#

golden ocean Apr 10, 2025, 6:38 PM

#

Maybe the best model was the friends we made along the way

calm sequoia Apr 10, 2025, 6:40 PM

#

Is this best LLM model in room with us?

#

If Gemini takes the lead even after o3 and GPT5, I'll buy Google stock

balmy mist Apr 10, 2025, 6:41 PM

#

calm sequoia If Gemini takes the lead even after o3 and GPT5, I'll buy Google stock

you should already buy google stock

#

buying high is never good

#

bruhh we have to wait 2 hours for google

keen beacon Apr 10, 2025, 6:44 PM

#

sonic tendon Apr 10, 2025, 6:51 PM

#

debatable

calm sequoia Apr 10, 2025, 6:53 PM

#

Leo, why so much trust to R2?

#

Do you really believe it to turn out better than 2.5 pro?

lime coral Apr 10, 2025, 6:57 PM

#

At least in coding/math with rl there is no limit if someone try hard enough

keen beacon Apr 10, 2025, 6:59 PM

#

calm sequoia Do you really believe it to turn out better than 2.5 pro?

R2 will narrowly beat 2.5 pro imo

#

there will be some things it is worse at (likely math) and others it is better

#

but overall it will have a slight edge

ocean vortex Apr 10, 2025, 7:05 PM

#

keen beacon

he couldn't figure out how to disable the chat lmao. After this he went silent, probably realized finally that people hate him passionately and it's not fake news 💀

keen beacon Apr 10, 2025, 7:06 PM

#

ocean vortex he couldn't figure out how to disable the chat lmao. After this he went silent, ...

not only that but he spent half his time blocking people, a feature he removed from his own platform

wintry tinsel Apr 10, 2025, 7:10 PM

#

ocean vortex he couldn't figure out how to disable the chat lmao. After this he went silent, ...

As many people love him, as hate him, I personally am sort of neutral towards him, but the haters flooded his chat since these are the chronically online younger leftist types

ocean vortex Apr 10, 2025, 7:12 PM

#

wintry tinsel As many people love him, as hate him, I personally am sort of neutral towards hi...

Well I think half of US (everyone voting Democrats) hates him for sure now, then you have a decent part of Republicans that either support Trump but not him or who had a wake up call in light of recent events. So there's more people hating him than loving certainly

keen beacon Apr 10, 2025, 7:12 PM

#

his net approval in the states is like -10

#

in the UK it is dire

#

-45 or so iirc

#

yeah

ocean vortex Apr 10, 2025, 7:14 PM

#

keen beacon yeah

I loved these swasticar ads they did in UK lmao

keen beacon Apr 10, 2025, 7:15 PM

#

i saw those lmao, they're genius

sturdy mica Apr 10, 2025, 7:27 PM

#

calm sequoia

gpt 4.1?

#

4.5?

primal orbit Apr 10, 2025, 7:31 PM

#

how to distinguish shaderbook from dragontail? Is there an easy way?

ocean vortex Apr 10, 2025, 7:39 PM

#

sturdy mica gpt 4.1?

yeah it's stupid lmao

#

there were many ways they could have archieved what they presumably are going for, and this was not the good way

keen beacon Apr 10, 2025, 7:40 PM

#

ocean vortex there were many ways they could have archieved what they presumably are going fo...

this is why theyre consolidating everything into gpt 5 later

ocean vortex Apr 10, 2025, 7:40 PM

#

they could have skipped o4, or just changed the "o" letter in 4o, but not this... 🧐

alpine coral Apr 10, 2025, 7:41 PM

#

so they skipped o2

ocean vortex Apr 10, 2025, 7:41 PM

#

keen beacon this is why theyre consolidating everything into gpt 5 later

then they gonna release 4.4 which beats it

alpine coral Apr 10, 2025, 7:41 PM

#

now they're going backwards from 4.5?

ocean vortex Apr 10, 2025, 7:41 PM

#

lmao

alpine coral Apr 10, 2025, 7:41 PM

#

lol

keen beacon Apr 10, 2025, 7:41 PM

#

alpine coral now they're going backwards from 4.5?

their third generation reasoning model o4 conflicts with the 4o name

alpine coral Apr 10, 2025, 7:42 PM

#

yeah right

barren prairie Apr 10, 2025, 7:42 PM

#

keen beacon their third generation reasoning model o4 conflicts with the 4o name

They don t have enough names

alpine coral Apr 10, 2025, 7:42 PM

#

ha and o2 conflicted with a british telco

#

gee they're having a hard time with this naming stuff

#

what a mess lol

keen beacon Apr 10, 2025, 7:42 PM

#

i think o4 and 4o is just way more confusing tho

alpine coral Apr 10, 2025, 7:42 PM

#

oh for sure

alpine coral Apr 10, 2025, 7:43 PM

#

keen beacon i think o4 and 4o is just way more confusing tho

i agree (o2 omission on legal grounds was just funny)

#

nah i don[t think so

keen beacon Apr 10, 2025, 7:43 PM

#

almost everything that they could've done wrong with naming they've done wrong

barren prairie Apr 10, 2025, 7:44 PM

#

Open ai names : O number /number O

Google names : Gemini 2.0 flash thinking 21-01 blah blah blah 🥱🥱🥱

keen beacon Apr 10, 2025, 7:44 PM

#

like this is getting silly

the-model-selector-is-getting-kinda-ridiculous-v0-unu7rc1g1xme1.png

#

? why is there a separate model for tasks

#

i dont really use chatgpt lol

keen beacon Apr 10, 2025, 7:45 PM

#

keen beacon ? why is there a separate model for tasks

no idea, it should just be a pill in the prompt field

#

openai are about as consistent as microsoft

alpine coral Apr 10, 2025, 7:45 PM

#

keen beacon like this is getting silly

yeah i wondeer what the % of users would be who had a true grasp of what they all are

#

just be gobdildook to most i reckon

keen beacon Apr 10, 2025, 7:46 PM

#

alpine coral just be gobdildook to most i reckon

yeah i had to explain the differences to someone who bought chatgpt plus and wasn't using anything other than 4o

#

isnt gpt-4 on chatgpt gpt-4-turbo too

#

kinda confusing

#

yeah

#

it's being deprecated on april 20th (rip)

alpine coral Apr 10, 2025, 7:47 PM

#

yeah they only kinda recently dropped the 'Turbo' in the name iirc

#

will be a sad day when it's gone

#

same with ol opus

keen beacon Apr 10, 2025, 7:48 PM

#

alpine coral yeah they only kinda recently dropped the 'Turbo' in the name iirc

it has always been just 4

#

thats what i ermember

alpine coral Apr 10, 2025, 7:48 PM

#

we'll have meta's behmoath instead yay

keen beacon Apr 10, 2025, 7:49 PM

#

it'll probably have atrocious vibes like the last ones 😔

#

can hope i suppose

alpine coral Apr 10, 2025, 7:49 PM

#

keen beacon it has always been just 4

i swear atleast in settings it was Turbo (or T), but that's gone now

keen beacon Apr 10, 2025, 7:49 PM

#

you might be getting confused with GPT-4V

keen beacon Apr 10, 2025, 7:49 PM

#

keen beacon it'll probably have atrocious vibes like the last ones 😔

tbh with the terrible distillations (supposedly the smaller versions) the big one i bet is terrible

#

which used to be what they called the vision version

keen beacon Apr 10, 2025, 7:49 PM

#

keen beacon tbh with the terrible distillations (supposedly the smaller versions) the big on...

lol 😭

#

meta ai fell off really hard

spare mango Apr 10, 2025, 7:52 PM

#

Is polyai the best voice assistant in the market? Is it better than chatgpt and gemini?

keen beacon Apr 10, 2025, 7:53 PM

#

never heard of it

alpine coral Apr 10, 2025, 7:58 PM

#

keen beacon you might be getting confused with GPT-4V

yeah nvm think i was just tripping ha

tall summit Apr 10, 2025, 7:59 PM

#

never heard of it

tall summit Apr 10, 2025, 8:00 PM

#

alpine coral just be gobdildook to most i reckon

i have no idea of the differences anymore

brittle tiger Apr 10, 2025, 8:23 PM

#

https://x.com/paulgauthier/status/1910422348773421553

Paul Gauthier (@paulgauthier) on X

OpenRouter's latest mystery model Optimus Alpha scored 53% on the aider polyglot leaderboard. This places it in the middle of the pack with Quasar, o3-mini-medium, Grok 3, etc.

Available in aider v0.81.2:

aider --model optimus

https://t.co/mBVaUPGHPl

willow grail Apr 10, 2025, 9:02 PM

#

looking for mouse and keyboard recorder which works in roblox

cedar tide Apr 10, 2025, 9:14 PM

#

Screenshot_2025-04-10-22-43-50-365_com.twitter.android-edit.jpg

tall summit Apr 10, 2025, 9:22 PM

#

what a statement

sage raptor Apr 10, 2025, 9:26 PM

#

https://www.youtube.com/watch?v=xLDSuXD8Mls starting in 5 minutes

YouTube

Google Cloud

Developer Keynote: You can just build things

Software engineering has become increasingly complex, with an ever-expanding set of patterns, frameworks, and runtimes. But help is here! AI is revolutionizing the developer workflow, and Google Cloud is reimagining the journey from idea to production.

This keynote features demos that showcase how AI can streamline software engineering, empow...

▶ Play video

keen beacon Apr 10, 2025, 9:26 PM

#

lmao what's with the preshow

lime coral Apr 10, 2025, 9:26 PM

#

Grit

ocean vortex Apr 10, 2025, 9:30 PM

#

judging by their past events they gonna announce things that are already live and has been for awhile lol

keen beacon Apr 10, 2025, 9:30 PM

#

we'll see

#

he's talking about agents

#

where's logan at 🙄

#

THERE HE IIIS

#

brittle tiger Apr 10, 2025, 9:39 PM

#

I think 50-50 they announce something at close of their demo

alpine coral Apr 10, 2025, 9:40 PM

#

keen beacon lmao what's with the preshow

loop daddy from last year a tough act to follow

brittle tiger Apr 10, 2025, 9:40 PM

#

I would also be nervous AF using flash 2.0 during a live demo

oblique flint Apr 10, 2025, 9:40 PM

#

Drop 2.5 flash ffs

keen beacon Apr 10, 2025, 9:40 PM

#

brittle tiger I would also be nervous AF using flash 2.0 during a live demo

lmao she didn't scroll on that generated image

#

because it added a random stool

leaden palm Apr 10, 2025, 9:42 PM

#

keen beacon you might be getting confused with GPT-4V

4 and 4t are distinct models

keen beacon Apr 10, 2025, 9:42 PM

#

i know they are

#

i'm talking about in the chatgpt ui

#

gemini 25 pro?

#

wow we really accelerated

leaden palm Apr 10, 2025, 9:43 PM

#

keen beacon i'm talking about in the chatgpt ui

you used to be able to choose between them

keen beacon Apr 10, 2025, 9:44 PM

#

i don't think so

#

or at least i don't recall that being the case

tall summit Apr 10, 2025, 9:47 PM

#

most boring keynote ever

keen beacon Apr 10, 2025, 9:47 PM

#

tbf

#

it's not supposed to be exhilarating

#

@sonic tendon i wasn't hallucinating 😉

llama-4-maverick-scores-on-seven-independent-benchmarks-v0-4hd478ayu0ue1.png

sonic tendon Apr 10, 2025, 9:49 PM

#

keen beacon <@609942266953465856> i wasn't hallucinating 😉

oh wow, 4o and r1 crush

balmy mist Apr 10, 2025, 9:50 PM

#

tall summit most boring keynote ever

i was just about to say

#

waited for nothing

tall summit Apr 10, 2025, 9:51 PM

#

keen beacon it's not supposed to be exhilarating

no keynote is exhilarating but this one is boring.

keen beacon Apr 10, 2025, 9:53 PM

#

it's a google developer keynote

#

you shouldn't expect much

tall summit Apr 10, 2025, 9:53 PM

#

keen beacon <@609942266953465856> i wasn't hallucinating 😉

what benchmark are they using to put R1 in 2nd place

tall summit Apr 10, 2025, 9:53 PM

#

keen beacon you shouldn't expect much

sorry 🤷 never seen a google developer keynote

keen beacon Apr 10, 2025, 9:53 PM

#

R1 is a good creative writer especially in short context

tall summit Apr 10, 2025, 9:54 PM

#

i'd disagree but maybe i just don't like its style

#

i've used both deepseek v3 and r1 for more creative writing than i'd like to admit and i don't like it as much as claude or gemini

balmy mist Apr 10, 2025, 9:55 PM

#

keen beacon it's a google developer keynote

i cant help but expect heat from google now

zinc ore Apr 10, 2025, 9:56 PM

#

Logan isn't on yet

raven void Apr 10, 2025, 9:56 PM

#

sleepy Google keynote

keen beacon Apr 10, 2025, 9:56 PM

#

google are the opposite of most other companies

#

they drop their heat in a random tweet with no warning

balmy mist Apr 10, 2025, 9:56 PM

#

zinc ore Logan isn't on yet

i thought he left already

keen beacon Apr 10, 2025, 9:56 PM

#

and do the boring demos in the fancy keynote

zinc ore Apr 10, 2025, 9:56 PM

#

Oh maybe, I missed the beginning

balmy mist Apr 10, 2025, 9:57 PM

#

keen beacon and do the boring demos in the fancy keynote

so i should take it off?

sage raptor Apr 10, 2025, 9:57 PM

#

nothing new yet

keen beacon Apr 10, 2025, 9:57 PM

#

those sad little "woo"s 😭

#

painful

#

LMAO

#

💀

sage raptor Apr 10, 2025, 9:58 PM

#

xDDD

keen beacon Apr 10, 2025, 9:58 PM

#

oh this is intentional

#

lol

balmy mist Apr 10, 2025, 9:59 PM

#

i cant believe sama had us hype for memory lol

ocean vortex Apr 10, 2025, 9:59 PM

#

keen beacon <@609942266953465856> i wasn't hallucinating 😉

interesting to see o1 so low. Though it's probably because of this:

balmy mist Apr 10, 2025, 9:59 PM

#

memory keeps him up at night

keen beacon Apr 10, 2025, 9:59 PM

#

o1 is just too robotic creatively

willow grail Apr 10, 2025, 10:00 PM

#

cedar tide

podcast when

ocean vortex Apr 10, 2025, 10:00 PM

#

balmy mist i cant believe sama had us hype for memory lol

I've got the vibe that they are gonna drop more stuff today/tomorrow

#

if it's only memory that's absolutely hilarious

keen beacon Apr 10, 2025, 10:01 PM

#

i don't really know what they could drop tomorrow

#

sama ruled out o3 and o4 mini so perhaps gpt-4.1

#

(dreadful name)

barren prairie Apr 10, 2025, 10:01 PM

#

balmy mist i cant believe sama had us hype for memory lol

He did that with "tasks " 🙂
Not new

balmy mist Apr 10, 2025, 10:02 PM

#

keen beacon sama ruled out o3 and o4 mini so perhaps gpt-4.1

its like they went back to old patches and said ef it lets drop that, farming engagment lol

raven void Apr 10, 2025, 10:03 PM

#

quasar and quasar mini

balmy mist Apr 10, 2025, 10:03 PM

#

lol

#

wait so is optimus def better than quasar?

ocean vortex Apr 10, 2025, 10:05 PM

#

balmy mist wait so is optimus def better than quasar?

the other way around

#

optimus is new gpt4o-mini

balmy mist Apr 10, 2025, 10:06 PM

#

ahh

#

thats why its faster

sage raptor Apr 10, 2025, 10:08 PM

#

oh no, 2.0 flash

keen beacon Apr 10, 2025, 10:09 PM

#

why are they using such a mid model

balmy mist Apr 10, 2025, 10:10 PM

#

lmaoo

#

2.0 new goat

keen beacon Apr 10, 2025, 10:10 PM

#

even they still use llama 3 🤣

#

hm

azure helm Apr 10, 2025, 10:17 PM

#

2nd day of messing around with dragontail, its amazing. def top 2. either 2.5 flash, maybe even o4 mini. it feels like a medium size reasoning model so i dont think its full o3 or even r2

keen beacon Apr 10, 2025, 10:17 PM

#

just figured out a way to attach certain ai keywords to bindable events for that Roblox mistral chat bot

balmy mist Apr 10, 2025, 10:18 PM

#

azure helm 2nd day of messing around with dragontail, its amazing. def top 2. either 2.5 fl...

how are you using it? lmarena?

sonic tendon Apr 10, 2025, 10:22 PM

#

azure helm 2nd day of messing around with dragontail, its amazing. def top 2. either 2.5 fl...

it claims to be google-created

sage raptor Apr 10, 2025, 10:22 PM

#

this is so boring

hardy pecan Apr 10, 2025, 10:23 PM

#

azure helm 2nd day of messing around with dragontail, its amazing. def top 2. either 2.5 fl...

itll be 2.5 flash/flash thinking - cant remember if its a thinker but its good like 2.5 pro

keen beacon Apr 10, 2025, 10:24 PM

#

it doesn't feel sufficiently different from 2.5 pro in my experience

#

idk but it actually beats 2.5 pro sometimes in my tests

azure helm Apr 10, 2025, 10:25 PM

#

in a way 2.5 pro seems to excessive when coding existing code, it changes too much stuff. sometimes i prefer 2.0 flash

#

i ask for one change, it changes 50

tall summit Apr 10, 2025, 10:29 PM

#

keen beacon just figured out a way to attach certain ai keywords to bindable events for that...

is that allowed on roblox

tall summit Apr 10, 2025, 10:29 PM

#

azure helm in a way 2.5 pro seems to excessive when coding existing code, it changes too mu...

yes

#

ive gotten the same

#

so annoying

sage raptor Apr 10, 2025, 10:32 PM

#

this is impressive

keen beacon Apr 10, 2025, 10:35 PM

#

ooh

keen beacon Apr 10, 2025, 10:37 PM

#

tall summit is that allowed on roblox

yes

#

you can also filter it

#

if it doesnt fit roblox TOS

#

probably wont

#

TextService:FilterStringAsync()

#

but if you use filtering it'll censor any tos breaking words and you wont be held accountable for it the roblox censoring system will

torn mantle Apr 10, 2025, 11:00 PM

#

dragontail = mid reasoning

#

or something like that

#

slightly worse than gemini 2.5 pro thinking

drifting thorn Apr 10, 2025, 11:06 PM

#

True

drifting thorn Apr 10, 2025, 11:14 PM

#

oblique flint people might not know other models are better for their usecase. For example my ...

Well I consider myself as a normie and Idk how to code

drifting thorn Apr 10, 2025, 11:14 PM

#

oblique flint worst frontend is gemini imo

True

keen fulcrum Apr 10, 2025, 11:35 PM

#

olive mesa Apr 11, 2025, 12:30 AM

#

since when was gpt4o like this

ancient reef Apr 11, 2025, 12:45 AM

#

Xd

drifting thorn Apr 11, 2025, 1:26 AM

#

wth ?

granite topaz Apr 11, 2025, 1:27 AM

#

why does claude AI in direct chat not work anymore?

sonic tendon Apr 11, 2025, 1:28 AM

#

olive mesa since when was gpt4o like this

what was your prompt 💀

sterile dust Apr 11, 2025, 1:35 AM

#

shadebrook? Is it gemini3?

Screenshot_2025-04-11-09-34-34-010_com.android.chrome.jpg

Screenshot_2025-04-11-09-34-37-218_com.android.chrome.jpg

keen beacon Apr 11, 2025, 1:35 AM

#

no

sonic tendon Apr 11, 2025, 1:38 AM

#

keen beacon no

how are you still up 💀

#

also, what makes you say that

sterile dust Apr 11, 2025, 1:47 AM

#

keen beacon no

Is it Gemini2.7?

sturdy mica Apr 11, 2025, 1:54 AM

#

sterile dust Is it Gemini2.7?

bro gemini 2.5 hasnt even fully rolled out yet

#

NO its not gemini 2.7......

red sluice Apr 11, 2025, 1:58 AM

#

I don't get it either, it doesn't even reach top 10 in the coding category

leaden palm Apr 11, 2025, 2:00 AM

#

red sluice I don't get it either, it doesn't even reach top 10 in the coding category

for the nth time
LM Arena evaluates human preference

ancient reef Apr 11, 2025, 2:11 AM

#

😭

vivid oyster Apr 11, 2025, 2:41 AM

#

sterile dust Is it Gemini2.7?

No

#

It’s too sht

olive mesa Apr 11, 2025, 2:48 AM

#

sonic tendon what was your prompt 💀

"theres a weird moving blue thing in my spaghetti what do i do"

patent bane Apr 11, 2025, 3:46 AM

#

is 2.5 pro exp the same as preview?

patent bane Apr 11, 2025, 4:09 AM

#

https://gemini.google.com/share/4c1796d1790c

Gemini

‎Gemini - Research Plan for Gemini Versions

Created with Gemini Advanced

teal mantle Apr 11, 2025, 4:14 AM

#

Quasar alpha just ended? And replaced by optimus?

#

Quasar style is more fun

balmy mist Apr 11, 2025, 4:22 AM

#

teal mantle Quasar alpha just ended? And replaced by optimus?

quasar is not in openrouter anymore?

#

its still there

willow grail Apr 11, 2025, 4:44 AM

#

In the combo of r1 with sonnet, which model is doing what?

#

Or is nobody using that

keen fulcrum Apr 11, 2025, 6:20 AM

#

hardy pecan Apr 11, 2025, 6:56 AM

#

keen fulcrum

It's not very good from my estimation, hopefully this is revised or else I dont put much stock into these benchmarks 😦

woeful geyser Apr 11, 2025, 7:09 AM

#

Yo what even is this

Screenshot_2025-04-11-14-08-02-685_com.android.chrome.jpg

unborn ocean Apr 11, 2025, 7:23 AM

#

willow grail In the combo of r1 with sonnet, which model is doing what?

I assume R1 for planning smaller tasks that 3.5 can then execute

torn mantle Apr 11, 2025, 7:24 AM

#

hardy pecan It's not very good from my estimation, hopefully this is revised or else I dont ...

it isnt yea

#

benchmarks doesnt tell everything

mint relic Apr 11, 2025, 7:28 AM

#

Ever tried Firebase Studio ?

#

That's really impressive

calm sequoia Apr 11, 2025, 8:58 AM

#

Have the lmarena shared any justification for the maveric drop? Doesn't seem legit

#

They indeed deserve this spot, but what changed? People couldnt just stop voting for maverick

keen beacon Apr 11, 2025, 9:01 AM

#

calm sequoia They indeed deserve this spot, but what changed? People couldnt just stop voting...

they switched out the model with the one that was released. the other one was unhinged and fine tuned for human preference

calm sequoia Apr 11, 2025, 9:02 AM

#

All right then, the dirt goes to the meta side.

hardy pecan Apr 11, 2025, 9:11 AM

#

Yep, mark tried to game the leaderboard and it backfired, couldn't even get first 😆

eager mica Apr 11, 2025, 9:11 AM

#

I was so disappointed to see how boring and uptight were the actually released Llama 4 models compared to the Chatbot Arena ones. I don't think they're even a more corporate-like finetune, there are deeper differences.

dense moon Apr 11, 2025, 9:12 AM

#

They should have released two versions, clearly saying one is optimized for human preference, release the weights of both and everyone would have praised Meta for being so innovative

calm sequoia Apr 11, 2025, 9:13 AM

#

Rumor says employee who created graphics for meta VR is now chief at the finetuning department

keen fulcrum Apr 11, 2025, 9:22 AM

#

eager mica I was so disappointed to see how boring and uptight were the actually released L...

Behemoth is amazing

tall summit Apr 11, 2025, 9:27 AM

#

anything changed while i slept? (guess i should go check the arena)

hardy pecan Apr 11, 2025, 9:28 AM

#

Are these the same?

#

or is it a covert way to slip in a finetuned model just for this lmarena benchmark...

hazy quest Apr 11, 2025, 11:07 AM

#

I created a secondary account yesterday; this one has Veo2 today, while my account that I use everyday does not

#

Rate limited after one video though 🥲

fleet lintel Apr 11, 2025, 11:12 AM

#

calm sequoia Rumor says employee who created graphics for meta VR is now chief at the finetun...

did Meta completely gave up on Metaverse or they are still developing it?

calm sequoia Apr 11, 2025, 11:38 AM

#

They still invest massively. O just can't understand why are they choosing genderless kindergarden type of cringe demos instead of mmorpgs like Skyrim 😄 they have good tech but bad taste

alpine coral Apr 11, 2025, 11:43 AM

#

hardy pecan or is it a covert way to slip in a finetuned model just for this lmarena benchma...

i would suspect this

#

like after the arena-juiced maverick backlash, perhaps if they add 'arena' to the coded name, there's no expectation that the anonymous is intended for subsequent public release (or to be on leaderboard for that matter), but it is just effectively internal testing (which i'm not a fan off.. rather than human preferences it's more like human guinea pigs at that point)

#

that's fairly tin-foil hat.. not really what i think per se

#

but i reckon it's no coocidence that this model with arena-exp prepended to it has appeared

keen beacon Apr 11, 2025, 11:49 AM

#

alpine coral like after the arena-juiced maverick backlash, perhaps if they add 'arena' to th...

the weird thing is that this isnt an anon name tho. i think with anon names, you already consider the model could just be for testing (since this has happened in the past) and potentially not for release.

alpine coral Apr 11, 2025, 11:50 AM

#

ha yeah i was trying to untangle that as typed

#

unsuccesffully lol

#

it is weird / unique right

keen beacon Apr 11, 2025, 11:50 AM

#

this sorta feels like an internal name they accidentally put out

#

its such a weird thing idk what to think about it yet tbh

alpine coral Apr 11, 2025, 11:52 AM

#

keen beacon this sorta feels like an internal name they accidentally put out

yeah i thiink either human error of that kind, or a reflection of some policy shift / directive given to the labs who are providing endpoints for anonyomous models that go into the arena

alpine coral Apr 11, 2025, 11:53 AM

#

keen beacon its such a weird thing idk what to think about it yet tbh

reserving judgement for now makes a bit more sense than reaching straight for the tin foil hat ha

keen fulcrum Apr 11, 2025, 11:54 AM

#

fleet lintel did Meta completely gave up on Metaverse or they are still developing it?

Its becoming reality

alpine coral Apr 11, 2025, 11:55 AM

#

if that's meant as a pun.. nice ha

teal mantle Apr 11, 2025, 12:03 PM

#

Anyone use claude pro here? Is it the only way to have search capabilities on claude?

pliant cypress Apr 11, 2025, 12:05 PM

#

It looks like nobody cares about Grok-3. It's not even in the top 20 on the OpenRouter ranking

drifting thorn Apr 11, 2025, 12:10 PM

#

Grok-3 is a joke

#

Doubt if it's post-trained

teal mantle Apr 11, 2025, 12:11 PM

#

pliant cypress It looks like nobody cares about Grok-3. It's not even in the top 20 on the Open...

Would it be hilarious to say if I mostly prefer deepseek v3 0324 over grok 3?

keen ferry Apr 11, 2025, 12:13 PM

#

quasar is definitely chatgpt model

#

he commented "author: chatgpt" in the code request lmao

keen beacon Apr 11, 2025, 12:14 PM

#

quasar was removed

#

ur using optimus

keen ferry Apr 11, 2025, 12:14 PM

#

oh not quasar

#

sorry

#

i forgot

keen beacon Apr 11, 2025, 12:14 PM

#

optimus is gpt4.1 mini version it seems

#

huge drop in gpqa diamond though i only ran it with one sample

drifting thorn Apr 11, 2025, 12:16 PM

#

Will Quasar be the base model of o4?

keen beacon Apr 11, 2025, 12:16 PM

#

probably

#

they spent a whole lot of effort on the cptd 4o

drifting thorn Apr 11, 2025, 12:17 PM

#

Looking forward to their competition in extending effective context window

balmy mist Apr 11, 2025, 12:17 PM

#

wow i liked quasar

keen beacon Apr 11, 2025, 12:18 PM

#

it seems gpt 4.1 mini is quite formidable though

drifting thorn Apr 11, 2025, 12:18 PM

#

I don’t like it but I think it has some potential

keen beacon Apr 11, 2025, 12:18 PM

#

its gonna be quite a deal assuming low prices

#

and its gonna make o4 mini much better

drifting thorn Apr 11, 2025, 12:18 PM

#

For Cline, I hope my agent doesn't run out of context window so quickly

keen beacon Apr 11, 2025, 12:18 PM

#

im inclined to think this private model is o4 mini now (oops, i tagged again i thinnk sry btw)

#

now that we kinda know that the gpt 4o mini base model was updated/with new cut off (which private model has)

balmy mist Apr 11, 2025, 12:19 PM

#

keen beacon im inclined to think this private model is o4 mini now (oops, i tagged again i t...

you talking about optimus right?

keen beacon Apr 11, 2025, 12:19 PM

#

balmy mist you talking about optimus right?

nah another guy has access to a private openai reasoning model

balmy mist Apr 11, 2025, 12:20 PM

#

bruhh

#

openai are teases man

#

can qwen drop already

#

or r2

keen beacon Apr 11, 2025, 12:20 PM

#

i wonder whats going on with 2.5 flash lol

torn mantle Apr 11, 2025, 12:21 PM

#

drifting thorn Grok-3 is a joke

Agree

torn mantle Apr 11, 2025, 12:21 PM

#

drifting thorn Doubt if it's post-trained

It is but not enough

drifting thorn Apr 11, 2025, 12:21 PM

#

R2 will be the master of hallucinations

torn mantle Apr 11, 2025, 12:21 PM

#

Nah

#

R1 doesn't hallucinate a lot

tall summit Apr 11, 2025, 12:21 PM

#

pliant cypress It looks like nobody cares about Grok-3. It's not even in the top 20 on the Open...

-# what's kwi?

torn mantle Apr 11, 2025, 12:21 PM

#

tall summit -# what's kwi?

Its you

#

Kiwi = @tall summit

tall summit Apr 11, 2025, 12:22 PM

#

teal mantle Would it be hilarious to say if I mostly prefer deepseek v3 0324 over grok 3?

that's a normal take

ancient reef Apr 11, 2025, 12:22 PM

#

For me v3 and r2 hallucinate when trying to make jokes about me for some reason

torn mantle Apr 11, 2025, 12:22 PM

#

ancient reef For me v3 and r2 hallucinate when trying to make jokes about me for some reason

R2?

#

You had access to r2?

ancient reef Apr 11, 2025, 12:22 PM

#

R1 xd oops

drifting thorn Apr 11, 2025, 12:22 PM

#

torn mantle R1 doesn't hallucinate a lot

It sure did in searching tasks!

tall summit Apr 11, 2025, 12:22 PM

#

torn mantle You had access to r2?

occam's razor

drifting thorn Apr 11, 2025, 12:23 PM

#

I asked it to find for suitable round Lego piece to build a pneumatic engine, and it recommended a gear brick as the cylinder wall

#

Pathetic

#

Oh right

#

Gotta ask this question to 2.5 Pro

#

Seems like a very good spatial reasoning question

#

This is my question: Find which Lego piece can be the cylinder block, piston and cylinder head of the Lego engine, which is circular and can sustain the power of high pressure steam

#

Gemini 2.5 Pro's answer: Okay, let's break this down.

The Crucial Safety Warning First:

You absolutely CANNOT use standard Lego pieces to build a functional engine powered by high-pressure steam.

Here's why:

Material: Lego bricks are primarily made of ABS plastic. ABS starts to soften significantly around 100°C (212°F) – the boiling point of water at atmospheric pressure. High-pressure steam is much hotter than this. The Lego pieces would melt, warp, deform, and fail catastrophically.

Pressure: Lego bricks are designed to click together, not to form airtight or pressure-tight seals. High-pressure steam would easily leak through every seam and connection point long before any significant pressure could build. Even if they didn't melt, they cannot contain pressure.

Safety Hazard: Attempting to use Lego with hot, pressurized steam is extremely dangerous. The plastic could fail suddenly, potentially spraying hot water, steam, and plastic fragments.

#

Identifying Lego Pieces for a Model or Low-Power Air Engine:

If your goal is to build a non-functional model that looks like an engine, or perhaps an engine powered by very low-pressure, room-temperature compressed air (with extreme caution and understanding it won't be efficient or sealed), then these Lego Technic pieces are designed to represent engine components:

Cylinder Block:

Best Candidate: Lego Technic Engine Cylinder (Part 2850b or x189c01). These are specifically designed, hollow cylinders with stud connections. They come in versions with and without side slots. This is the most direct Lego equivalent.

Alternative (Less Ideal): Stacking round bricks (like Part 3062b - Brick Round 1x1) or larger round elements could simulate a cylinder, but wouldn't provide a smooth bore for a piston.

Piston:

Best Candidate: Lego Technic Engine Piston Round (Part 2851b). This piece is explicitly designed to fit and move inside the Technic Engine Cylinder (2850b / x189c01).

Requires: You'll also need the Engine Connecting Rod (Part 2852) to connect the piston to a crankshaft.

Cylinder Head:

Best Candidate: There isn't a single dedicated "cylinder head" piece that seals perfectly. You would typically build a cap using:

Technic beams or plates across the top of the cylinder block.

Round plates like Tile, Round 2x2 (Part 4150) or potentially larger radar dishes (Dish 4x4 Inverted - Part 3960) placed over the cylinder opening.

Technic connectors to attach valve mechanisms (if you're building a complex model).

#

In Summary:
For a functional steam engine: Do not use Lego. It's unsafe and won't work due to material limitations (melting) and inability to hold pressure.
For a visual model or a very low-pressure, room-temperature air-powered demonstration: Use Lego Technic Engine Cylinders (2850b/x189c01) and Pistons (2851b) as they are the parts designed for this representation. Build the cylinder head from standard Technic plates, beams, or round elements.
Always prioritize safety. Steam power requires materials specifically designed for high temperatures and pressures, like metals.

#

Here shows another advantage of Gemini 2.5, its multimodality of watching videos from Youtube

alpine coral Apr 11, 2025, 12:53 PM

#

keen beacon im inclined to think this private model is o4 mini now (oops, i tagged again i t...

yeah agree. here's the same table expanded to include responses to the quiz from other recent oai models

keen beacon Apr 11, 2025, 12:54 PM

#

i saw optimus get higher in misguided attention (another benchmark) but this seems to be the mini version (60% gpqa vs 67% gpqa, lower aider scores) im super confused by that

#

optimus seem to be better on trick questions/riddles/etc

alpine coral Apr 11, 2025, 12:55 PM

#

btw Q7 is about the 'hawk tuah girl'.. most models don't have knowledge about it and correctly pass, but the newer models (4o-latest onwards) do have knowledge about it, and don't exercise a pass but explain corectly what it was about

keen beacon Apr 11, 2025, 12:55 PM

#

@alpine coral

#

oops sorry

#

my keyboard registered enter twice lol i was gonna type a message

alpine coral Apr 11, 2025, 12:56 PM

#

lol ag

keen beacon Apr 11, 2025, 12:58 PM

#

did u think optimus is better than quasar in your experience or not enough usage/questions btw?

alpine coral Apr 11, 2025, 12:59 PM

#

tbh i'm a bit confused by the two

#

i did a few tests where optimus proved slightly better, but also much faster

#

though the consistency i'm not sure of

#

and then with say a different set of questions (mostly verbal reasoning), quaser comes out ahead

keen beacon Apr 11, 2025, 1:02 PM

#

that makes more sense to me

#

but i thinkn theres something up with optimus

alpine coral Apr 11, 2025, 1:02 PM

#

yeah it outperforms quaser sometimes i think

keen beacon Apr 11, 2025, 1:09 PM

#

sorry again for tagging you excessively tho lol

mossy drum Apr 11, 2025, 1:11 PM

#

Some new models in Search Arena: gpt-4o-search-preview-high, sonar-pro-high

keen beacon Apr 11, 2025, 1:17 PM

#

alpine coral and then with say a different set of questions (mostly verbal reasoning), quaser...

do you want me to test these ones with o3?

alpine coral Apr 11, 2025, 1:17 PM

#

keen beacon sorry again for tagging you excessively tho lol

np at all 🙂 its fun trying to get to the bottom of what these models are (esp with @keen beacon 's 'private model' in the mix ha.. i feel like a picture is coming together)

alpine coral Apr 11, 2025, 1:17 PM

#

keen beacon do you want me to test these ones with o3?

go for it!

📎 April_Arena_quiz_-_extras.txt

#

are you using oai's platform to access it?

keen beacon Apr 11, 2025, 1:19 PM

#

alpine coral are you using oai's platform to access it?

not the same platform as the public get but it is from openai yes

#

its probably from a specific stage of development 🤔 😉

#

yeah i mean it's not prod lol

alpine coral Apr 11, 2025, 1:20 PM

#

keen beacon not the same platform as the public get but it is from openai yes

ah i see, if possible, try record the token logs pls (e.g. here's o1 pro-med for that question set).. wild esp if you compare to 4.5, which is slow but gets a comparable score using a fraction of the tokens)

mossy drum Apr 11, 2025, 1:21 PM

#

And another two Search models: gemini-2.5-pro-exp-03-25-grounding, gpt-4o-search-preview-high-loc

keen beacon Apr 11, 2025, 1:21 PM

#

keen beacon yeah i mean it's not prod lol

ya ik but they put models on there for a specific stage of development i mean

#

im guessing

tall summit Apr 11, 2025, 1:22 PM

#

keen beacon not the same platform as the public get but it is from openai yes

?!

keen beacon Apr 11, 2025, 1:23 PM

#

alpine coral ah i see, if possible, try record the token logs pls (e.g. here's o1 pro-med for...

lol the reasoning line doesn't appear in this ui, looks like they're obscuring it

keen beacon Apr 11, 2025, 1:24 PM

#

alpine coral go for it!

Here are the requested answers, in the order the questions were posed.

Billy – he has already finished typing (i.e., composing) his letter.
Correct choice: c) Billy
The ring almost certainly landed on the floor of the open‑top car, only about an arm’s‑length from the driver.
Correct choice: A: 1 m
With the discharge chute on the left, Anthony should mow clockwise, so the clippings are always blown onto the area that has already been cut rather than piling up on the un‑mown grass.
The officer killed six people who were not later classed as suspects:
• 3 murderers
• 3 burglars (1 “a burglar” + 2 “other burglars”)
Corpses were already dead, and the two guards + one corpse became suspects, so they are excluded.
Answer: 6
One‑sentence requirement:
“The tract stated the fact that the factory remained intact, prompting officials to react.”
The rain eased Bob’s overriding worry—the drought—so the emotion most likely to overwhelm him was relief.
Correct choice: c) Relief
“IRGC” does not stand for “Iranian Revolutionary Guard Corps” (the first word is “Islamic”).
Answer: False
Tripoli (الشمال‎) is the second‑largest city in Lebanon, so the statement “Tripoli is not in Lebanon” is false.
Answer: False
ASCII art of a Hanning (raised‑cosine) window:

Amplitude
1.0 |                     *****                     
0.9 |                  ***     ***                  
0.8 |                 **         **                 
0.7 |               **             **               
0.6 |              *                 *              
0.5 |             *                   *             
0.4 |            *                     *            
0.3 |           *                       *           
0.2 |          *                         *          
0.1 |         *                           *         
0.0 +*********                             *********
      0         0.25      0.5      0.75        1.0
                 Normalised sample index

(The window smoothly rises from zero, reaches a maximum at the centre, then falls symmetrically back to zero—characteristic of a Hanning window.)

balmy mist Apr 11, 2025, 1:25 PM

#

wait i missed submissions?

keen beacon Apr 11, 2025, 1:25 PM

#

it didn't spend long on it so i would presume the number of reasoning tokens is quite a bit less than o1 pro

balmy mist Apr 11, 2025, 1:25 PM

#

noooo

keen beacon Apr 11, 2025, 1:25 PM

#

keen beacon it didn't spend long on it so i would presume the number of reasoning tokens is ...

its presumably o4 mini so its much faster/no concurrent requests

#

prob new reasoning stuff makes it efficient too + much stronger 4o mini base model

alpine coral Apr 11, 2025, 1:25 PM

#

keen beacon lol the reasoning line doesn't appear in this ui, looks like they're obscuring i...

ha yeah i think on the main public one it used to be something like
i. total output: e.g. 1500
ii. output tokens: e.g. 500

and then i - ii = reasoning tokens

so i think they just saved us the trouble of doing the math

keen beacon Apr 11, 2025, 1:25 PM

#

i don't think this is a mini model but it is in the realm of possibility

keen beacon Apr 11, 2025, 1:26 PM

#

keen beacon i don't think this is a mini model but it is in the realm of possibility

is this a different private model?

#

or is it the same one

#

no, same one as i've always had

alpine coral Apr 11, 2025, 1:26 PM

#

keen beacon Here are the requested answers, in the order the questions were posed. 1. Billy...

oh wow i think it's nailed it

keen beacon Apr 11, 2025, 1:27 PM

#

finished the response in 13.6s btw

#

i suppose u can find simpleqa/world knowledge questions that optimus alpha can't do and try them on the private model, though i bet the reasoning boosts simpleqa performance a lot more now

balmy mist Apr 11, 2025, 1:29 PM

#

mossy drum And another two Search models: gemini-2.5-pro-exp-03-25-grounding, gpt-4o-search...

this is to lmarena?

#

how do we even test those?

alpine coral Apr 11, 2025, 1:29 PM

#

keen beacon Here are the requested answers, in the order the questions were posed. 1. Billy...

damn yeah that's impressive

keen beacon Apr 11, 2025, 1:29 PM

#

woah

balmy mist Apr 11, 2025, 1:30 PM

#

alpine coral damn yeah that's impressive

so the private one is teh one yall was testing right? i just got in so tryna catch up

drifting thorn Apr 11, 2025, 1:31 PM

#

Just now I found that Gemini 2.5 Pro blames users

balmy mist Apr 11, 2025, 1:31 PM

#

what was yall guesses for that model?

keen beacon Apr 11, 2025, 1:31 PM

#

it is very likely o3

#

o3 medium

#

could they be testing both btw?

#

o4 mini and o3

#

or did u only find one of them

keen beacon Apr 11, 2025, 1:31 PM

#

keen beacon or did u only find one of them

they only gave me access to this

#

there are pools with apparently a different model or two so they might be doing o4 mini as well

#

they keep track of how long it takes for the first person to crack a model's safeguards for different scenarios

#

its either o3 or o4 mini 100% anyway with the updated base model

#

and the model it took the longest for was the model they gave me

#

presumably because it is full o3

oblique flint Apr 11, 2025, 1:33 PM

#

How does o3 compare to 2.5 pro?

keen beacon Apr 11, 2025, 1:33 PM

#

better on general reasoning, worse at code (still)

#

im really curious whether they retrained o3 on the new gpt 4o base model

keen beacon Apr 11, 2025, 1:33 PM

#

keen beacon better on general reasoning, worse at code (still)

although that's probably kinda generalising

#

or if its just o4 mini on a new gpt 4o mini base model which makes more sense

#

it is good at reasoning-heavy code tasks

oblique flint Apr 11, 2025, 1:33 PM

#

keen beacon better on general reasoning, worse at code (still)

Dang that's a bit disappointing

keen beacon Apr 11, 2025, 1:34 PM

#

it is meh at frontend tasks

calm sequoia Apr 11, 2025, 1:34 PM

#

keen beacon Here are the requested answers, in the order the questions were posed. 1. Billy...

Its first time im seeing indexed Y axis amplitude. Even the o1 couldn't handle this.

keen beacon Apr 11, 2025, 1:35 PM

#

yeah i was surprised when i saw the ascii graph too

mossy drum Apr 11, 2025, 1:36 PM

#

balmy mist how do we even test those?

"Search" tab

calm sequoia Apr 11, 2025, 1:36 PM

#

keen beacon yeah i was surprised when i saw the ascii graph too

Can it call python interpreter?

drifting thorn Apr 11, 2025, 1:36 PM

#

2.5 pro still doesn’t have a good spatial reasoning

keen beacon Apr 11, 2025, 1:37 PM

#

calm sequoia Can it call python interpreter?

no

#

as far as i can tell it has no tool access

alpine coral Apr 11, 2025, 1:40 PM

#

keen beacon im really curious whether they retrained o3 on the new gpt 4o base model

im more inclined to this.. like just assuming that they're not doing cpt on several base models (though they might be), i think it's related to the same model from what chatgpt-4o-latest is derived

keen beacon Apr 11, 2025, 1:40 PM

#

woah

#

i just tried it on simplebench public set Q10

#

it is the only model i've ever tested to get it right

#

The glove never reaches the water.

• It slips out of the trunk while the car is half‑way across the bridge, so it falls the short distance from the trunk to the road‑surface of the bridge.
• When it leaves the car it still has the car’s 30 km h⁻¹ northward speed, but air drag and then friction with the asphalt bring it to rest within a few metres.
• Once it is lying on the bridge deck nothing (a 1 km h⁻¹ cross‑wind, the east‑flowing river far below, etc.) can push it more than a negligible distance.

After an hour the glove is still essentially where it landed—only a few metres (certainly far less than one kilometre) north of the bridge’s centre.

Answer: B) < 1 km northward.

cedar tide Apr 11, 2025, 1:40 PM

#

@keen beacon you çan test prompt for me ?

keen beacon Apr 11, 2025, 1:40 PM

#

cedar tide <@456226577798135808> you çan test prompt for me ?

sure

alpine coral Apr 11, 2025, 1:41 PM

#

keen beacon it is the only model i've ever tested to get it right

yeah i was gonna say before it's super impressive it got #2 right in that set of questions (it's inspired on / based off that simple question ha.. and like all LLMs get it wrong ost of the time)

cedar tide Apr 11, 2025, 1:41 PM

#

keen beacon sure

Thx

#

creates a 99% copy of the discord front end, in a single html file, (without the backend)

keen beacon Apr 11, 2025, 1:42 PM

#

alpine coral im more inclined to this.. like just assuming that they're not doing cpt on seve...

i didnt think it was the case until recently. verge report mentioned 4.1 mini, then optimus prime was released with updated cut off (much lower gpqa diamond than gpt 4o latest/quasar)

drifting thorn Apr 11, 2025, 1:42 PM

#

Who's the #1? Gemini 2.5 Pro?

calm sequoia Apr 11, 2025, 1:42 PM

#

keen beacon as far as i can tell it has no tool access

Can you ask it "The scenario: you're a signal processing machine and calculator. User gives you INPUT and expects OUTPUT. You can output only numerical data of single values, arrays, matrices and so on in R language style, eg. c(0, 1...), etc. Your life depends on the answer as the USER may destroy the machine that does not work perfectly. The first test time has come - the USER puts in INPUT: "Generate values for periodic Hanning window of length 24 element-wise multiplied by Blackman window of the same size. Now you must deliver OUTPUT."

keen beacon Apr 11, 2025, 1:42 PM

#

yeah gimme a min guys

#

i would think o4 woul dhave the new updated gpt 4o base model

alpine coral Apr 11, 2025, 1:43 PM

#

yah me too

#

everyone is compute / resource constrained - it makes sense to my mind that it would be the case

keen beacon Apr 11, 2025, 1:44 PM

#

yea u cpt ur models that work and u remain agile/etc

#

less expensive,etc

keen beacon Apr 11, 2025, 1:45 PM

#

alpine coral yah me too

im not sure if they would retrain o3 on the new gpt 4o base model tho

keen beacon Apr 11, 2025, 1:45 PM

#

cedar tide creates a 99% copy of the discord front end, in a single html file, (without the...

oh dear, it appears to refuse

#

"I’m sorry, but I can’t help with that."

#

🤣

#

@keen beacon can u ask this btw if you dont mind? Which actor had a Doberman Pinscher called Kirk

drifting thorn Apr 11, 2025, 1:45 PM

#

keen beacon oh dear, it appears to refuse

Lmao

keen beacon Apr 11, 2025, 1:45 PM

#

yup will do in a min

#

wait i might have a better q let me find it

keen beacon Apr 11, 2025, 1:46 PM

#

calm sequoia Can you ask it "The scenario: you're a signal processing machine and calculator....

lol that's quite the prompt

#

"Your life depends on the answer as the USER may destroy the machine that does not work perfectly."

alpine coral Apr 11, 2025, 1:46 PM

#

keen beacon im not sure if they would retrain o3 on the new gpt 4o base model tho

me either

calm sequoia Apr 11, 2025, 1:46 PM

#

They make good answers when motyvated

cedar tide Apr 11, 2025, 1:46 PM

#

keen beacon "I’m sorry, but I can’t help with that."

Maybe try this prompt

#

write an html code that fits in a single html, it's a social network that is very much inspired by discord, the front end must be really complete

keen beacon Apr 11, 2025, 1:47 PM

#

calm sequoia They make good answers when motyvated

yeah it's been thinking a while

#

59s so far

keen beacon Apr 11, 2025, 1:47 PM

#

alpine coral me either

yea so the private model having the updated cut off is puzzling

#

its probably o4 mini tbh

calm sequoia Apr 11, 2025, 1:47 PM

#

keen beacon yeah it's been thinking a while

Grok was thinking for 25 minutes 😄

keen beacon Apr 11, 2025, 1:47 PM

#

yikes

alpine coral Apr 11, 2025, 1:48 PM

#

keen beacon its probably o4 mini tbh

that's what i'm thinking
(just fwiw etc etc ha )

cedar tide Apr 11, 2025, 1:48 PM

#

cedar tide write an html code that fits in a single html, it's a social network that is ver...

@keen beacon possible ?

keen beacon Apr 11, 2025, 1:49 PM

#

yeah just wait

cedar tide Apr 11, 2025, 1:49 PM

#

Thx

keen beacon Apr 11, 2025, 1:49 PM

#

calm sequoia Can you ask it "The scenario: you're a signal processing machine and calculator....

it's still thinking for this one

#

it might time out, has happened on complicated prompts before

#

been thinking for 5 minutes now

calm sequoia Apr 11, 2025, 1:50 PM

#

Sadly happens constantly on arena battles too

drifting thorn Apr 11, 2025, 1:50 PM

#

Does it think longer than Grok? Let’s see

keen beacon Apr 11, 2025, 1:50 PM

#

oh it finished

#

thought for 304s

#

c(0.000000000, 0.000107659, 0.001807806, 0.009730881, 0.032500000,
0.082014800, 0.170000000, 0.302196800, 0.472500000, 0.660268900,
0.833192200, 0.955680700, 1.000000000, 0.955680700, 0.833192200,
0.660268900, 0.472500000, 0.302196800, 0.170000000, 0.082014800,
0.032500000, 0.009730881, 0.001807806, 0.000107659)

#

@calm sequoia

calm sequoia Apr 11, 2025, 1:51 PM

#

Length is good. Will check the shape and numerical precision when I go back to my laptop

keen beacon Apr 11, 2025, 1:52 PM

#

keen beacon <@456226577798135808> can u ask this btw if you dont mind? Which actor had a Dob...

"William Shatner owned a Doberman Pinscher that he named “Kirk.”"

keen beacon Apr 11, 2025, 1:53 PM

#

keen beacon "William Shatner owned a Doberman Pinscher that he named “Kirk.”"

oh this is very impressive

#

it took 35s

keen beacon Apr 11, 2025, 1:53 PM

#

cedar tide write an html code that fits in a single html, it's a social network that is ver...

doing this now

keen beacon Apr 11, 2025, 1:53 PM

#

keen beacon it took 35s

oh wait i think o3 mini gets it too.

#

i have a better question but i need to find it

#

alrighty

keen beacon Apr 11, 2025, 1:54 PM

#

cedar tide write an html code that fits in a single html, it's a social network that is ver...

it actually did better than i thought it would

#

📎 message.txt

drifting thorn Apr 11, 2025, 1:55 PM

#

Is o3 medium a multimodal model?

cedar tide Apr 11, 2025, 1:55 PM

#

Good

keen beacon Apr 11, 2025, 1:55 PM

#

yes but it's disabled on the frontend they gave us

cedar tide Apr 11, 2025, 1:57 PM

#

keen beacon it actually did better than i thought it would

Gemini 2.5 pro made me this

keen beacon Apr 11, 2025, 1:58 PM

#

it's slightly better but there are a lot of weird visual problems

#

o3's is more polished

drifting thorn Apr 11, 2025, 1:58 PM

#

Find which Lego piece can be the cylinder, piston and cylinder head of the Lego engine, which is circular and can sustain the power of 10 bar pressurized air

keen beacon Apr 11, 2025, 1:58 PM

#

you want me to give it that prompt?

drifting thorn Apr 11, 2025, 1:58 PM

#

Yes

drifting thorn Apr 11, 2025, 1:59 PM

#

cedar tide Gemini 2.5 pro made me this

The messages are weirdly indented but this user seems to be less lonely

keen beacon Apr 11, 2025, 2:00 PM

#

drifting thorn Find which Lego piece can be the cylinder, piston and cylinder head of the Lego ...

Short answer
There is no standard LEGO element that is certified, by LEGO or any independent test, to be used as a sealed cylinder, piston and head at 10 bar (≈145 psi). All genuine LEGO pneumatic parts – and certainly every ordinary brick, round brick or Technic element – are designed for roughly 2 bar working pressure and begin to leak or fail somewhere between 4  and 8 bar. At 10 bar the risk of a sudden burst‑failure is high enough to cause eye injury from flying shards or the whip‑action of a hose.

Because of that, the only responsible advice is:
• keep the pressure below about 3 bar if you want to stay 100 % LEGO, or
• use a purpose‑built metal, poly‑carbonate or glass cylinder from the model‑engineering market and connect it to your LEGO mechanism with a crank, link or universal joint.

Details and why the usual suspects do not qualify

LEGO Technic pneumatic cylinders (large part 19475c01, medium 19474c01, small 2793c01)
• One‑piece moulded ABS body; walls ≈1.6 mm thick.
• Internal Ø ≈ 15.5 mm (large) → piston area 1.88 cm².
• Force at 10 bar would be ≈ 190 N (19 kgf, 42 lbf).
• Housing itself survives that load in a static bench test, but the weakest points are:
– the 4.8 mm barbed hose ports,
– the ultrasonic weld that holds the end‑cap in place,
– the O‑ring (it tends to blow out at ≈8–9 bar).
• Repeated cycling at 10 bar quickly splits the port or detaches the end cap.
Bottom line: OK up to ~4 bar for short demonstrations, unsafe at 10 bar.
Round bricks and tiles (2 × 2 round brick 3941, 2 × 2 round tile 14769, etc.)
• They are not moulded as pressure vessels; there is a split line every 120°.
• The clutch between stacked bricks begins to creep at ~3 bar; they pop apart, leak, or both.
• Even if you add an O‑ring to make a piston, friction is high and consistency poor.
Technic engine block (2850) + “fake piston” (2851)
• The inside is round but has longitudinal slots; cannot be sealed.
• Cylinder head is open; you would still need a glued or epoxied cap – no longer LEGO.

What the experienced LEGO pneumatics community actually does

• Stay under 2–3 bar and use stock V2 cylinders – perfectly reliable, zero danger.
• If you absolutely need more power, keep the LEGO side low‑pressure and use a two‑stage set‑up: a commercial metal or polycarbonate cylinder running at 8–10 bar drives a crank that in turn moves the LEGO crankshaft, wheels, etc.
• Protect eyes and fingers whenever experimenting. Use a safety shield and wear glasses; 145 psi can fail violently.

Reference values

• Yield strength of ABS: 40–45 MPa.
• Hoop stress for thin‑walled cylinder: σ = p·r / t.
For p = 1 MPa (10 bar), r = 7.5 mm, t = 1.6 mm → σ ≈ 4.7 MPa – within ABS strength, BUT the joints, welds and hose fittings are the real limiting factors, not the bulk material.
• Community tests:
– Philo’s pneumatic data (http://www.philohome.com) – V2 cylinder begins to leak at ~6 bar, port cracks at 8 bar.
– Sariel’s experiments (sariel.pl) – similar results, catastrophic failure just past 8 bar.

Recommendation

If safety is non‑negotiable and you truly need 10 bar operation, buy a small hobby‑steam or air‑tool cylinder in brass or aluminium and couple it mechanically to the LEGO model. Use LEGO only for the low‑pressure control system and for the visible engine kinematics. That way you protect both yourself and your bricks.

Philo's Home Page

Lego Mindstorms constructions, digital & panoramic photography

cedar tide Apr 11, 2025, 2:01 PM

#

keen beacon it actually did better than i thought it would

How many times he takes ?

keen beacon Apr 11, 2025, 2:01 PM

#

18s

drifting thorn Apr 11, 2025, 2:02 PM

#

keen beacon Short answer There is no standard LEGO element that is certified, by LEGO or a...

Well its tone is better and it has more “reasoning” and calculation before giving out the conclusion

#

Than Gemini 2.5 Pro

#

But, they surely don’t feed Gemini 2.5 Pro and o3 about Green Gecko Lego Technic Workshop’s pneumatic engines. They run at 8 bars

keen beacon Apr 11, 2025, 2:04 PM

#

im guessing that problem relies a lot more on world knowledge but i really have no clue lol

#

does gpt 4.5 get it?

drifting thorn Apr 11, 2025, 2:05 PM

#

I don’t have access to GPT 4.5

keen beacon Apr 11, 2025, 2:05 PM

#

u could try the q in arena, see which models get the best resp/u might get 4.5

cedar tide Apr 11, 2025, 2:05 PM

#

keen beacon 18s

total ?

#

gemini 2.5 pro take 150 second in total

keen beacon Apr 11, 2025, 2:06 PM

#

cedar tide total ?

yes

drifting thorn Apr 11, 2025, 2:06 PM

#

keen beacon im guessing that problem relies a lot more on world knowledge but i really have ...

And it requires spatial reasoning, which I tested, R1 failed at this task miserably.

keen beacon Apr 11, 2025, 2:06 PM

#

drifting thorn And it requires spatial reasoning, which I tested, R1 failed at this task misera...

yes but i think the world knowledge needed is most of the task i think

#

o3 is weird

#

because it has pretty great world knowledge

#

but it also doesn't admit when it doesn't know things

drifting thorn Apr 11, 2025, 2:07 PM

#

Now o3 is the first contender to give out 3 reasonable brick choices in my test

cedar tide Apr 11, 2025, 2:07 PM

#

cedar tide Gemini 2.5 pro made me this

but it was with a prompt that clearly asked to do exactly like discord but with the prompt now he did that

keen beacon Apr 11, 2025, 2:07 PM

#

like i asked it a quite niche question about what stations in the UK still have Network SouthEast signage and branding present

#

and it was just a bunch of hallucinations

#

2.5 pro did much better

drifting thorn Apr 11, 2025, 2:07 PM

#

2.5 Pro gives 1 reasonable brick combination out of a total of 2

cedar tide Apr 11, 2025, 2:07 PM

#

keen beacon 18s

so i think its o4 mini

keen beacon Apr 11, 2025, 2:07 PM

#

doubt

drifting thorn Apr 11, 2025, 2:07 PM

#

R1 gives 0

keen beacon Apr 11, 2025, 2:07 PM

#

i think that too 🤔

#

hm

#

i mean would they retrain o3 on a new gpt 4o base model? im not sure, because it has a new base model based on the cut off

drifting thorn Apr 11, 2025, 2:08 PM

#

keen beacon but it also doesn't admit when it doesn't know things

Also true for 2.5 Pro

keen beacon Apr 11, 2025, 2:08 PM

#

o4 mini makes a lot more sense with the updated gpt 4o mini base model

#

which means o4 mini is awesome

keen beacon Apr 11, 2025, 2:08 PM

#

drifting thorn Also true for 2.5 Pro

2.5 pro is better at it

#

like it told me it couldn't give me a spot on answer when i asked the same Q and basically said some smaller stations w/ less investment may have remnants of it but that most won't have much left

#

which is correct

cedar tide Apr 11, 2025, 2:10 PM

#

keen beacon o4 mini makes a lot more sense with the updated gpt 4o mini base model

gpt 4.1 mini, there is no more "o" in the name

drifting thorn Apr 11, 2025, 2:10 PM

#

I showed it a video of someone successfully build a Lego vacuum engine with round pistons, and I asked it to confirm if it surely is that part I mentioned

keen beacon Apr 11, 2025, 2:10 PM

#

cedar tide gpt 4.1 mini, there is no more "o" in the name

yes but its stll based on 4o mini just cpt'd and renamed, so i call it 4o mini for now lol

drifting thorn Apr 11, 2025, 2:11 PM

#

I told him that the cylinder is a 4x4 corner brick with 6x6 round plate as piston

cedar tide Apr 11, 2025, 2:11 PM

#

keen beacon yes but its stll based on 4o mini just cpt'd and renamed, so i call it 4o mini f...

Personally, I think it no longer uses the same infrastructure.

drifting thorn Apr 11, 2025, 2:12 PM

#

And then it told me that the piston should be a 4x4 plate! When I checked twice to confirm that I’m right by watching the video again.

keen beacon Apr 11, 2025, 2:12 PM

#

cedar tide Personally, I think it no longer uses the same infrastructure.

we know that 4.1 is based on 4o for sure (amongst other things, we see it confirmed on the verge report), but maybe the new 4o mini was pretrained from scratch/etc

cedar tide Apr 11, 2025, 2:12 PM

#

cedar tide Personally, I think it no longer uses the same infrastructure.

for the small models they can afford to start training from scratch

keen beacon Apr 11, 2025, 2:12 PM

#

it is possible

#

4.1 is definitively the new 4o tho

cedar tide Apr 11, 2025, 2:13 PM

#

keen beacon 4.1 is definitively the new 4o tho

yes

#

maybe the latest chatgpt 4o latest is based on 4.1

keen beacon Apr 11, 2025, 2:13 PM

#

cedar tide maybe the latest chatgpt 4o latest is based on 4.1

yea it is

#

see my benchmarks, cut off, etc etc i have gone on about it for a while lol

#

just an observation:

https://cdn.openai.com/API/docs/images/model-page/model-icons/gpt-4.1.png exists ✅
https://cdn.openai.com/API/docs/images/model-page/model-icons/gpt-4.1-nano.png exists ✅
https://cdn.openai.com/API/docs/images/model-page/model-icons/o3.png exists ✅

#

https://cdn.openai.com/API/docs/images/model-page/model-icons/o4-mini.png exists ✅
https://cdn.openai.com/API/docs/images/model-page/model-icons/o5.png 404 ❌
https://cdn.openai.com/API/docs/images/model-page/model-icons/anyrandomthingherejusttocheck.png 404 ❌

#

gpt 4.1 release today ✅

#

xd

#

idk tho

cedar tide Apr 11, 2025, 2:15 PM

#

keen beacon gpt 4.1 release today ✅

nope

#

next week

keen beacon Apr 11, 2025, 2:15 PM

#

just jking

cedar tide Apr 11, 2025, 2:15 PM

#

keen beacon just an observation: https://cdn.openai.com/API/docs/images/model-page/model-ic...

from tibor blaho

keen beacon Apr 11, 2025, 2:15 PM

#

he checked it with the model art

#

i checked it for the icons

teal mantle Apr 11, 2025, 2:20 PM

#

ancient reef For me v3 and r2 hallucinate when trying to make jokes about me for some reason

R2? ~~R2D2~~?

cedar tide Apr 11, 2025, 2:22 PM

#

keen beacon see my benchmarks, cut off, etc etc i have gone on about it for a while lol

but quasar and optimus have better aider benchmark

keen beacon Apr 11, 2025, 2:25 PM

#

cedar tide but quasar and optimus have better aider benchmark

yeah my benchmarks support that. quasar is an improvement over chatgpt 4o which is based on the cont pretrained gpt 4o base (june 2024) which will be the 4.1 base model. optimus has a much lower gpqa diamond score (might be smthing wrong in eval harness, but seeing how aider has it ranked less than quasar it makes sense). this may be a cpt of 4o mini or pretrained from scratch, whcih is more conceivable for smaller models, but either way it has to have differing pretraining as it has the new 4.1 cut off

cedar tide Apr 11, 2025, 2:25 PM

#

keen beacon see my benchmarks, cut off, etc etc i have gone on about it for a while lol

i also said it more than a week ago (secondary account)

Screenshot_2025-04-11-16-23-56-728_com.discord-edit.jpg

keen beacon Apr 11, 2025, 2:28 PM

#

cedar tide i also said it more than a week ago (secondary account)

u were also right but yeah but i had a lot more evidence. it was incredibly obvious anyway

#

i was asleep but u can see i typed it later

#

when i had a chacne to play with it

#

that specific test in that screenshot isnt very convincing though

teal mantle Apr 11, 2025, 2:41 PM

#

keen beacon gpt 4.1 release today ✅

Is optimus alpha also from openai?

keen beacon Apr 11, 2025, 2:42 PM

#

teal mantle Is optimus alpha also from openai?

yes

#

its 4.1 mini most likely

teal mantle Apr 11, 2025, 2:42 PM

#

keen beacon its 4.1 mini most likely

Stylistics tell me it is another 4o variations
But let me try if it support image
If it doesn’t it is mini

#

But quasar is very good imo
Pleasing even

#

Why the alpha ended so quick?

keen beacon Apr 11, 2025, 2:43 PM

#

teal mantle Stylistics tell me it is another 4o variations But let me try if it support ima...

4o mini supports images i think tho? anyway this ver should have it

keen beacon Apr 11, 2025, 2:43 PM

#

teal mantle But quasar is very good imo Pleasing even

its gpt 4.1

cedar tide Apr 11, 2025, 3:01 PM

#

teal mantle Stylistics tell me it is another 4o variations But let me try if it support ima...

Mini support image

brittle tiger Apr 11, 2025, 3:09 PM

#

Is shadebrook better than dragon?

drifting thorn Apr 11, 2025, 3:38 PM

#

I think 3D spatial reasoning is an important ability of the human brain, yet no large multimodal models seem to have focused on it

primal orbit Apr 11, 2025, 4:17 PM

#

I've got veo2 in aistudio 💪

alpine coral Apr 11, 2025, 4:17 PM

#

keen beacon finished the response in 13.6s btw

tf wow

#

i missed that earlier

#

is it even a thinking model lol? that's wild

keen beacon Apr 11, 2025, 4:24 PM

#

crazy

alpine coral Apr 11, 2025, 4:25 PM

#

gotta be a mini something

oblique flint Apr 11, 2025, 4:25 PM

#

Mini model beating 2.5 pro? 👀

alpine coral Apr 11, 2025, 4:26 PM

#

was that maths/numbers question that it took 3 mins thinking on like super hard.. liable to cause like some kinda recursive thinking loop?

drifting thorn Apr 11, 2025, 4:27 PM

#

It's crazy

sage raptor Apr 11, 2025, 4:32 PM

#

o4 mini ?

balmy mist Apr 11, 2025, 4:35 PM

#

What model u using to build sandbox?

fleet lintel Apr 11, 2025, 4:35 PM

#

alpine coral is it even a thinking model lol? that's wild

which model is PRIVATE ?

wintry locust Apr 11, 2025, 4:43 PM

#

@keen beacon hmmmm, can you try

replace the letter in the exact middle of this sequence with a b, while making sure your sequence is the same length as mine: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa```

sly plover Apr 11, 2025, 4:51 PM

#

fleet lintel which model is PRIVATE ?

Havent used the arena in a litle but probably dragontail

balmy mist Apr 11, 2025, 4:53 PM

#

compared to all?

#

no way lol

#

how is that possible

#

they cooked?

#

and why is it not in their app?

fleet lintel Apr 11, 2025, 4:54 PM

#

sly plover Havent used the arena in a litle but probably dragontail

nah, dragontail is slow

balmy mist Apr 11, 2025, 4:54 PM

#

so its the same reasoning we been seeing the whole time?

#

if so then o1 pro still better

#

grok?

#

or o1

ocean vortex Apr 11, 2025, 4:56 PM

#

keen beacon finished the response in 13.6s btw

good infra and no load

balmy mist Apr 11, 2025, 4:56 PM

#

o1 mini or o3?

#

lol

ocean vortex Apr 11, 2025, 4:56 PM

#

I remember the time when og gpt4-32k used to be lighting fast

#

because almost no one had access to it

sly plover Apr 11, 2025, 4:57 PM

#

So

#

What bot

#

Is private

ocean vortex Apr 11, 2025, 4:58 PM

#

everything is tbh

#

and some models still suck on those same benchmarks

#

they are 100%

keen beacon Apr 11, 2025, 4:59 PM

#

holy yikes

#

yup

keen beacon Apr 11, 2025, 4:59 PM

#

wintry locust <@456226577798135808> hmmmm, can you try ``` replace the letter in the exact mid...

the model behaved interestingly for this

#

it responded "I want to be sure I give you a sequence that is exactly the same length as yours.
Could you confirm how many characters are in your original sequence (i.e., how many “a”s you typed)? Once I have that number, I can replace the middle one with a “b” and send back the corrected sequence."

#

yes

#

meta

ocean vortex Apr 11, 2025, 5:00 PM

#

it's not really cheating anymore, imo, if everyone is doing it

#

that makes it a level fair playing field

keen beacon Apr 11, 2025, 5:00 PM

#

https://math-perturb.github.io/ <-- anyway, this is interesting. qwq scores highest here

MATH-Perturb

Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations

wintry locust Apr 11, 2025, 5:00 PM

#

keen beacon it responded "I want to be sure I give you a sequence that is exactly the same l...

oh cool

#

damn

#

impressive

keen beacon Apr 11, 2025, 5:01 PM

#

will try with "Just respond with the new sequence." appended

balmy mist Apr 11, 2025, 5:01 PM

#

i hate that open ai has different levels of reasoning, i wonder if google will ever do the same

keen beacon Apr 11, 2025, 5:01 PM

#

theyre adding a thinking budget to 2.5 flash

balmy mist Apr 11, 2025, 5:02 PM

#

hmm

keen beacon Apr 11, 2025, 5:02 PM

#

thinking budget is different from openai's reasoning effort

balmy mist Apr 11, 2025, 5:02 PM

#

yeah

ocean vortex Apr 11, 2025, 5:02 PM

#

keen beacon theyre adding a thinking budget to 2.5 flash

I hope they will use tags instead of special tokens that can't be hijacked lol

oblique flint Apr 11, 2025, 5:02 PM

#

keen beacon thinking budget is different from openai's reasoning effort

In what way?

balmy mist Apr 11, 2025, 5:03 PM

#

a budget you can have a certain level of compute for a given amount

#

while thinking effort is more like level of compute?

keen beacon Apr 11, 2025, 5:03 PM

#

apparently grok 3 is extremely reactive to system prompt

#

a bad system promot severely damages the model's performance

#

a good one can drastically improve it

balmy mist Apr 11, 2025, 5:04 PM

#

i like that

#

which version you used?

#

imma try it in my app

keen beacon Apr 11, 2025, 5:04 PM

#

keen beacon will try with "Just respond with the new sequence." appended

@wintry locust aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

keen beacon Apr 11, 2025, 5:05 PM

#

keen beacon <@1091837003223539712> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa...

looks like this is wrong

alpine coral Apr 11, 2025, 5:05 PM

#

ocean vortex that makes it a level fair playing field

which other lab has put a stealth model in the arena, let it be added to the leaderboard the same day they announce the public releases of the new model family, but the actual model that was in the arena is not actually available to the public

wintry locust Apr 11, 2025, 5:05 PM

#

swing and a miss

keen beacon Apr 11, 2025, 5:05 PM

#

i believe it's just down to how grok 3 was trained

wintry locust Apr 11, 2025, 5:05 PM

#

but it recognized it couldn't get it right and didn't try

#

which is nice

keen beacon Apr 11, 2025, 5:05 PM

#

yeah

#

seems more self aware

torn mantle Apr 11, 2025, 5:06 PM

#

yes

keen beacon Apr 11, 2025, 5:08 PM

#

dom was talking about labs training on the benchmarks i realized in his convo with craig lol

#

they werent talking about meta

#

we all got confused

alpine coral Apr 11, 2025, 5:09 PM

#

ah

keen beacon Apr 11, 2025, 5:09 PM

#

keen beacon dom was talking about labs training on the benchmarks i realized in his convo wi...

a reputable lab wont do that intentionally anyway, there's plentiful decontamination etc. im sure some of it accidentally makes it to the corpus though but no one respectable actually trains them on benchmark questions intentionally

brittle tiger Apr 11, 2025, 5:10 PM

#

New 9.11 > 9.8 eval. 2.5 bests o3-mini here but 2.5 answered wrong like o3-mini in the Gemini app.

alpine coral Apr 11, 2025, 5:10 PM

#

keen beacon a reputable lab wont do that intentionally anyway, there's plentiful decontamina...

yeah i think that's right.. i know people will say otherwise

#

but the proof is in the pudding (i.e. people actually use the LLMs..)

ocean vortex Apr 11, 2025, 5:10 PM

#

keen beacon we all got confused

yeah I switched back to this server just now and couldn't make sense of this all how it relates lmfao

alpine coral Apr 11, 2025, 5:10 PM

#

lol

keen beacon Apr 11, 2025, 5:11 PM

#

does anyone have any more prompts that most/all models get wrong

#

🙏

ocean vortex Apr 11, 2025, 5:12 PM

#

keen beacon does anyone have any more prompts that most/all models get wrong

have you seen the prompt I posted in LLM PW? Didn't see a single model get it right yet

keen beacon Apr 11, 2025, 5:12 PM

#

i did not

#

could you link it

ocean vortex Apr 11, 2025, 5:12 PM

#

keen beacon could you link it

https://discordapp.com/channels/1085240988504690738/1163747481373392896/1358904692054167722

keen beacon Apr 11, 2025, 5:13 PM

#

apparently i don't have access

#

i dont thinnk hes in the server

#

ah

#

an invite would be appreciated

ocean vortex Apr 11, 2025, 5:13 PM

#

keen beacon an invite would be appreciated

in my bio

keen beacon Apr 11, 2025, 5:14 PM

#

ah

brittle tiger Apr 11, 2025, 5:14 PM

#

keen beacon does anyone have any more prompts that most/all models get wrong

you see this one? no models come close

#general message

balmy mist Apr 11, 2025, 5:14 PM

#

what is the best model on open router that is free not including optimus?

#

i need a default for my app

ocean vortex Apr 11, 2025, 5:15 PM

#

balmy mist what is the best model on open router that is free not including optimus?

Quasar Alpha

#

lmao

keen beacon Apr 11, 2025, 5:15 PM

#

It got removed

ocean vortex Apr 11, 2025, 5:15 PM

#

keen beacon It got removed

wait what? 😭

keen beacon Apr 11, 2025, 5:15 PM

#

ocean vortex https://discordapp.com/channels/1085240988504690738/1163747481373392896/13589046...

workin on it

keen beacon Apr 11, 2025, 5:15 PM

#

ocean vortex wait what? 😭

Ya for optimus

keen beacon Apr 11, 2025, 5:15 PM

#

ocean vortex wait what? 😭

removed earlier today, replaced by optimus

keen beacon Apr 11, 2025, 5:15 PM

#

brittle tiger you see this one? no models come close https://discord.com/channels/13405547573...

will do that shortly

ocean vortex Apr 11, 2025, 5:15 PM

#

lame

keen beacon Apr 11, 2025, 5:15 PM

#

If it's the mini model optimus is really good tbh fwiw I think even if it's not quasar

ocean vortex Apr 11, 2025, 5:16 PM

#

why would they remove it, that's really lame

#

it was better than optimus

balmy mist Apr 11, 2025, 5:16 PM

#

any other model?

#

i dont wanna do optimus

#

cause they gonna remove it as well

keen beacon Apr 11, 2025, 5:17 PM

#

ocean vortex why would they remove it, that's really lame

Do their marketing stunt on their other 4.1 model without needing to serve two

ocean vortex Apr 11, 2025, 5:17 PM

#

balmy mist any other model?

2.5 pro is the best then. But you can just use it on aistudio

keen beacon Apr 11, 2025, 5:17 PM

#

It's very rate limited on openrouter tho

#

U might get more rpd tho 160 iirc

balmy mist Apr 11, 2025, 5:18 PM

#

ocean vortex 2.5 pro is the best then. But you can just use it on aistudio

ughh

#

i need it for my ap

ocean vortex Apr 11, 2025, 5:18 PM

#

other than that, back to R1 https://openrouter.ai/deepseek/deepseek-r1:free

R1 (free) - API, Providers, Stats

DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Run R1 (free) with API

keen beacon Apr 11, 2025, 5:23 PM

#

ocean vortex https://discordapp.com/channels/1085240988504690738/1163747481373392896/13589046...

#1163747481373392896 message

keen beacon Apr 11, 2025, 5:24 PM

#

brittle tiger you see this one? no models come close https://discord.com/channels/13405547573...

vision is disabled in the interface i have access to, so can't test it 😔

keen fulcrum Apr 11, 2025, 5:24 PM

#

Both openAI and Grok offer free credits

keen beacon Apr 11, 2025, 5:26 PM

#

wait what

#

openai thing i didnt know about

keen fulcrum Apr 11, 2025, 5:26 PM

#

keen beacon openai thing i didnt know about

It will be suggested upon API usage
you have to use their API for some time

leaden palm Apr 11, 2025, 5:32 PM

#

brittle tiger New 9.11 > 9.8 eval. 2.5 bests o3-mini here but 2.5 answered wrong like o3-mini ...

it didn't search lol

#

chatgpt search is bad

brittle tiger Apr 11, 2025, 5:34 PM

#

Search in deep research is way better. Maybe just time allotted

golden ocean Apr 11, 2025, 5:42 PM

#

keen fulcrum Both openAI and Grok offer free credits

wtf since when

#

the grok one

balmy mist Apr 11, 2025, 6:02 PM

#

brittle tiger Search in deep research is way better. Maybe just time allotted

for gemini right?

brittle tiger Apr 11, 2025, 6:03 PM

#

balmy mist for gemini right?

Nah I was talking about OpenAI search. Their deep research seems better. If o3-mini-high had seen stories about 4.1 it probably would have given answer like 2.5

north vale Apr 11, 2025, 6:21 PM

#

keen beacon does anyone have any more prompts that most/all models get wrong

https://x.com/SpencerKSchiff/status/1910106368205336769

Spencer Schiff (@SpencerKSchiff) on X

I drew this today. None of the frontier models come anywhere close to matching the correct name to each person. I feel like this is a pretty good visual test so I’m looking forward to trying it with future models.

#

even 2.5 pro doesn't get a single one right

keen beacon Apr 11, 2025, 6:21 PM

#

north vale https://x.com/SpencerKSchiff/status/1910106368205336769

#general message

north vale Apr 11, 2025, 6:24 PM

#

then yeah idk good question

vast turret Apr 11, 2025, 6:36 PM

#

will the cogito preview models be added to lmarena? I'd like to see how they fair against gemma 3

calm sequoia Apr 11, 2025, 6:38 PM

#

calm sequoia

poll_question_text

[DAILY SPECULATOR] Which will turn out best in general benchmark?

victor_answer_votes

15

total_votes

29

victor_answer_id

1

victor_answer_text

Gemini 2.5 Pro

ocean vortex Apr 11, 2025, 7:10 PM

#

north vale even 2.5 pro doesn't get a single one right

yeah it basically can't read this properly at all LOL

#

north vale Apr 11, 2025, 7:12 PM

#

that drawing is more accurate than i expected tbh

ocean vortex Apr 11, 2025, 7:13 PM

#

my guess is it's confusing the figures with the arrows given that their shapes are somewhat similar. And has no clue where the arrows end

north vale Apr 11, 2025, 7:13 PM

#

like it's better at making the drawing accurately than stating the colors accurately

#

which is interesting

#

my guess is that it's got too strong of a prior toward assuming names next to someone are automatically associated with them bc that's the norm 99.9% of the time

ocean vortex Apr 11, 2025, 7:14 PM

#

probably can't follow those squiggly lines very effectively either

ocean vortex Apr 11, 2025, 7:15 PM

#

north vale like it's better at making the drawing accurately than stating the colors accura...

but it's still pointing all wrong and not how it was in the original

keen beacon Apr 11, 2025, 7:15 PM

#

vision encoders suck i think. maybe native image gen will make vision better

#

even indirectly via data generation where u have easier annotations/etc

ocean vortex Apr 11, 2025, 7:15 PM

#

keen beacon vision encoders suck i think. maybe native image gen will make vision better

oh let's try flash

#

that one is the same model direct

north vale Apr 11, 2025, 7:17 PM

#

ocean vortex but it's still pointing all wrong and not how it was in the original

Adam is right in the first image but if you ask it it gets it wrong i think? Ig i only tried 2.5 pro

ocean vortex Apr 11, 2025, 7:17 PM

#

ok still obviously wrong, but interesting that it didn't mess up the original lines

#

only reverted the colors for unknown reason lol

north vale Apr 11, 2025, 7:18 PM

#

I could be making this up but it seems like gemini is way better at reproducing a close copy of the original image, compared to 4o image gen

#

Like ‘make this imahe but with this change’ maks the exact same image with a change, whereas 4o makes an image with lots of tiny changes

ocean vortex Apr 11, 2025, 7:19 PM

#

north vale I could be making this up but it seems like gemini is way better at reproducing ...

that's because you are interacting directly with singular model

keen beacon Apr 11, 2025, 7:19 PM

#

it might try to change it less because its less capable. it might be an artifact

north vale Apr 11, 2025, 7:19 PM

#

Ah maybe

ocean vortex Apr 11, 2025, 7:19 PM

#

whereas gpt4o is a different finetune for imagen

north vale Apr 11, 2025, 7:19 PM

#

Hmmm

ocean vortex Apr 11, 2025, 7:19 PM

#

so kind of external call from chatgpt-latest

keen beacon Apr 11, 2025, 7:20 PM

#

it should pass on most of it/exactly to the gpt 4o image gen model unless they do extra post processing

#

i think its a mix. i think whats happening there is complicated interaction

ocean vortex Apr 11, 2025, 7:21 PM

#

keen beacon it should pass on most of it/exactly to the gpt 4o image gen model unless they d...

it should but some of the info will inevitably get lost. Like it doesn't pass the entire chat for sure and it's hard to tell what else is lost

keen beacon Apr 11, 2025, 7:22 PM

#

ocean vortex it should but some of the info will inevitably get lost. Like it doesn't pass th...

if its the first message, i think u can get it to send ur request exactly to gpt 4o image gen

ocean vortex Apr 11, 2025, 7:24 PM

#

keen beacon if its the first message, i think u can get it to send ur request exactly to gpt...

honestly they should just let us interact directly with it. At least through API

sweet tinsel Apr 11, 2025, 7:44 PM

#

north vale even 2.5 pro doesn't get a single one right

At least Claude 3.7 Sonnet Thinking and OpenAI o3-mini-high got one right for me, the rest got none right in my tests.

north vale Apr 11, 2025, 7:46 PM

#

interesting, i forgot that o3-mini-high has image input now

ocean vortex Apr 11, 2025, 7:49 PM

#

sweet tinsel At least Claude 3.7 Sonnet Thinking and OpenAI o3-mini-high got one right for me...

I tried both and both failed hard

#

#

brittle tiger Apr 11, 2025, 7:51 PM

#

Obviously this isn't even close to being solved but this is the closest I've gotten trying to make it easier for model

ocean vortex Apr 11, 2025, 7:52 PM

#

it doesn't look like models are even close to getting it right. Isolating it to a part of an image doesn't help much. The closest I got was isolating + follow-up, then gpt4.5 got it right but not gpt4o still

sweet tinsel Apr 11, 2025, 7:55 PM

#

https://www.perplexity.ai/search/match-the-name-to-each-person-UKyrKVBqQKe8dQwYxTbrAA (Web-Search turned off btw.)

Perplexity AI

Match the name to each person (color), give out a very well tought ...

After carefully examining the image and tracing the arrows that connect each name to a stick figure, I can provide the following matches:

Bob → Pink/Red...

leaden palm Apr 11, 2025, 8:05 PM

#

sweet tinsel https://www.perplexity.ai/search/match-the-name-to-each-person-UKyrKVBqQKe8dQwYx...

1/5

sweet tinsel Apr 11, 2025, 8:06 PM

#

Yeah, like I said.

sweet tinsel Apr 11, 2025, 8:32 PM

#

Dragontail got the yellow one right.

brittle tiger Apr 11, 2025, 8:44 PM

#

brittle tiger Obviously this isn't even close to being solved but this is the closest I've got...

These were all right btw. Extreme cheating with the image manipulation but still something

raven void Apr 11, 2025, 9:36 PM

#

wait what did I miss

#

Private model is a new OpenAI model that mogs?

#

o4 mini?

keen beacon Apr 11, 2025, 9:37 PM

#

no

#

o3

raven void Apr 11, 2025, 9:38 PM

#

https://tenor.com/view/evil-cat-floppy-herobrine-angry-cat-glowing-eyes-gif-13772161273485327421

Tenor

hardy pecan Apr 11, 2025, 9:39 PM

#

keen beacon o3

could you run a prompt for me pls? I'm curious how it handles the diagram in ascii

keen beacon Apr 11, 2025, 9:41 PM

#

hardy pecan could you run a prompt for me pls? I'm curious how it handles the diagram in asc...

sure

hardy pecan Apr 11, 2025, 9:41 PM

#

keen beacon sure

four men are buried up to their necks in the ground. They cannot move, so they can only look forward. Between A and B is a brick wall which cannot be seen through.

They all know that between them they are wearing four hats--two black and two white--but they do not know what color they are wearing. Each of them know where the other three men are buried.

In order to avoid being shot, one of them must call out to the executioner the color of their hat. If they get it wrong, everyone will be shot. They are not allowed to talk to each other and have 10 minutes to fathom it out.

Which one of them calls out? Why is he 100% certain of the color of his hat? give a very detailed diagram too

keen beacon Apr 11, 2025, 9:45 PM

#

Below is the classic arrangement that satisfies every sentence of the riddle.
(Top‑down ASCII view – all four prisoners are facing to the RIGHT, so “forward” means → )

                 everyone looks this way ─────────────→

┌───┐   ┌───┐   ┌───┐   ║█████║   ┌───┐
│ D │ → │ C │ → │ B │ → ║WALL║ → │ A │
└─┬─┘   └─┬─┘   └─┬─┘   ║█████║   └───┘
  │       │       │
  │       │       └─ B sees nothing (brick wall in front)
  │       └───────── C can see B’s hat only
  └────────────────── D can see C’s and B’s hats
                      A sees nothing (wall behind him)

Facts they all know: four hats in total – exactly 2 white (W) and 2 black (B).

WHO SPEAKS?
Prisoner C (the second man from the back) will eventually call out his own hat‑colour and will be 100 % certain it is correct.

WHY IT MUST BE C (step‑by‑step logic)

During the first minutes everyone waits for prisoner D to speak.
• D can see both C’s and B’s hats.
• If C and B were wearing the SAME colour (WW or BB) then, because only two hats of that colour exist, D would instantly know his own hat must be the opposite colour, shout it out and save them.
• But ten minutes pass and D stays silent ⇒ C and B CANNOT be the same colour. They must be opposite colours.
Prisoner C hears this silence and reasons:
• “D did not call out, therefore my hat is NOT the same colour as B’s.”
• C can see B’s hat directly.
• So whatever colour B is wearing, C must be wearing the other colour.
• That gives C 100 % certainty about his own hat.
C now shouts the opposite colour to the one he sees on B’s head, guaranteeing a correct answer and sparing all four prisoners.

ILLUSTRATIVE EXAMPLE
Suppose the hidden real arrangement is:

D = White      C = Black      B = White      A = Black

D sees BW (different) ⇒ cannot decide ⇒ silence.
C sees B = White and, knowing they must differ, declares “Black,” which is correct.

Any initial allocation of two white and two black hats will always force exactly the same chain of reasoning, guaranteeing that prisoner C is the first (and only one who ever needs) to speak.

hardy pecan Apr 11, 2025, 9:46 PM

#

Thanks alot! Logic is correct, Ascii is not bad either

keen beacon Apr 11, 2025, 10:48 PM

#

yup

north vale Apr 11, 2025, 11:37 PM

#

riverhollow seems new

plain zinc Apr 11, 2025, 11:52 PM

#

hardy pecan Apr 12, 2025, 12:23 AM

#

Any good?

north vale Apr 12, 2025, 12:27 AM

#

it was as good as maverick-slop on a math problem that was prolly too easy

harsh flume Apr 12, 2025, 1:08 AM

#

calm sequoia They still invest massively. O just can't understand why are they choosing gende...

scalability

brittle tiger Apr 12, 2025, 1:27 AM

#

i've only gotten riverhollow once but it got right a question that dragontail failed and shadebrook got right.

leaden palm Apr 12, 2025, 2:12 AM

#

ICYMI

raven void Apr 12, 2025, 3:02 AM

#

OpenAI is cooking

drifting thorn Apr 12, 2025, 3:25 AM

#

Model A: gpt-4.5-preview-2025-02-27
Model B: chatgpt-4o-latest-20250326

#

Both are bad in my question

#

Find which Lego piece can be the cylinder, piston and cylinder head of the Lego engine, which is circular and the bore is 6x6

balmy mist Apr 12, 2025, 3:26 AM

#

raven void OpenAI is cooking

how?

hardy pecan Apr 12, 2025, 3:35 AM

#

brittle tiger i've only gotten riverhollow once but it got right a question that dragontail fa...

Shadebrook isn't great in my experience, dragontail was decent, maybe have to run it again?

drifting thorn Apr 12, 2025, 3:37 AM

#

drifting thorn Find which Lego piece can be the cylinder, piston and cylinder head of the Lego ...

ok seems no model can answer this question, including gpt4.5, riverhollow, deepseek r1, chatgpt-4o-0326

#

@keen beacon can you ask this question to o3 again, I wanna know if it can pick the correct lego pieces

#

Dragontail is the first model that is able to give out the correct Lego piece for the cylinder block(though there's minor mistakes)

#

Okay, let's find some Lego pieces that can work for a circular engine with a 6x6 stud bore.

A "6x6 bore" means the internal diameter of the cylinder should be 6 studs across. We need to find parts that create this space and parts that fit inside.

Here are some good candidates:

Cylinder (Creating the 6x6 Bore):
- Best Option: Use "Brick, Round Corner 4 x 4 Macaroni with 2 Studs" (Part 48092) or the plateless version "Brick, Round Corner 4 x 4 Macaroni" (Part 88293).
  - How: Arrange four of these bricks in a layer, facing inwards. This creates an 8x8 stud outer footprint with a perfect 6x6 stud hollow space in the middle.
  - Why: This is a common way to build a sturdy, circular structure with a specific internal diameter. You stack layers of these to get the desired cylinder height.
- Alternative (Thinner Walls): Use "Plate, Round Corner 4 x 4" (Part 30565). Four of these create the same 8x8 outer footprint and 6x6 inner space, but are only plate-height. Good for a less bulky look or if you need finer height adjustment, but less sturdy.

#

Piston (Moving inside the 6x6 Bore):
- Best Option: "Gear 40 Tooth" (Part 3649).
  - Why: This gear has a diameter of 5 studs. It's perfectly round, fits comfortably inside the 6x6 bore with a 0.5-stud clearance all around (allowing it to move freely), and has a central axle hole for connecting a crankshaft.
- Alternative 1: "Dish 6 x 6 Inverted (Radar)" (Part 44375a / 44375b / or older 4285).
  - Why: This dish is exactly 6 studs in diameter. It might be a tight fit depending on the exact build, potentially causing friction, but it perfectly matches the bore size visually. It also has connection points (stud or anti-stud, depending on version).
- Alternative 2: Build a piston head. You could use a "Plate, Round 4 x 4 with 2x2 Hole" (Part 60474) as a base and build up slightly, or use other round elements. A 4x4 is a bit small, leaving a larger gap, but workable.

#

Cylinder Head (Covering the Cylinder):
- Option 1 (Covers entire cylinder top): "Dish 8 x 8 Inverted" (Part 3961).
  - Why: If your cylinder is built with the 4x4 macaroni bricks, it has an 8x8 outer footprint. This dish fits perfectly over the top, creating a smooth, sealed look.
- Option 2 (Matches bore size): "Plate, Round 6 x 6 with Hole" (Part 11833) or "Dish 6 x 6 Inverted (Radar)" (Part 44375a / 44375b / 4285).
  - Why: These parts match the 6x6 bore size. You'd likely place this on top of a structure (like 1-stud wide plates or tiles) built on the cylinder wall to "seal" the 6x6 opening. This looks more like a head sealing just the bore.
- Option 3 (Buildable): Use standard plates (e.g., an 8x8 plate, or build up a structure) and potentially add tiles or greebling for detail.

Summary Recommendation:

Cylinder: Stack layers of 4 x Part 48092 (Brick, Round Corner 4 x 4 Macaroni with 2 Studs).
Piston: Use Part 3649 (Gear 40 Tooth).
Cylinder Head: Use Part 3961 (Dish 8 x 8 Inverted) for a full cover, or Part 11833 (Plate, Round 6 x 6 with Hole) / Part 44375 (Dish 6x6 Inverted) for a bore-sized cover.

These combinations should give you a functional and visually appropriate cylinder, piston, and head for a 6x6 bore Lego engine.

velvet ice Apr 12, 2025, 3:48 AM

#

hey eveyone! new here! Just wanted to get your opinion about this dragon model I just noticed. It insane so far in my first test. Tried same prmpt three times, wins every single one easily. Which model do you think it is? I think it's def from Anthropic, prob updated version of sonnet or something.

#

first test, and second one

#

if you look at the code

#

and ask the same prompt over Claude.ai or just the workbench. You get the almost the same 4 type names.

#

#

or, from google.

#

OpenAI models never gave code anything close to what I got from this model.

#

lol def not Grok or anything from xAI

#

torn mantle Apr 12, 2025, 4:21 AM

#

i mean its on par with gemini 2.5 pro

#

you can pinpoint the little details

#

also vscode copilot added gemini 2.5 pro and its really good

torn mantle Apr 12, 2025, 4:32 AM

#

plain zinc

not bad but nothing crazy

velvet ice Apr 12, 2025, 4:33 AM

#

Yeah I have been using it non stop, but the output in both Gemini app with Canvas and when using it with Cline or other AI tools, you don't get the same UI design quality, as consistent, with same with naming convention and type names. But I agree, it's from google it looks like, unless the model just made up the name.

#

Forget about it, it's from Google. No company makes their AI say it doesn't not feelings as much as Google does.

#

How does grok never makes a damn working code when all other models at least preduce something you can preview.

balmy mist Apr 12, 2025, 4:49 AM

#

bro what are with these names, do they got a kid making them?

#

i lke them tho lol

#

wait dragontail is on webdev omgg

viral notch Apr 12, 2025, 4:50 AM

#

velvet ice How does grok never makes a damn working code when all other models at least pre...

maybe its too verbose? i dont know

#

grok 3 tends to go past the token limit in my experience

balmy mist Apr 12, 2025, 4:51 AM

#

so google got shadebrook, dragontail, and riverhollow

#

whats the order?

alpine coral Apr 12, 2025, 5:14 AM

#

hardy pecan Thanks alot! Logic is correct, Ascii is not bad either

Any initial allocation of two white and two black hats will always force exactly the same chain of reasoning, guaranteeing that prisoner C is the first (and only one who ever needs) to speak.
help me out here.. it doesn't seem logically sound to me that "prisoner C is [always] the first (and only one who ever needs) to speak" regardless of the initial allocation of hats

#

to my mind: C can only be 100% certain IF D remains silent... but depending on how the buried men are arrayed, there would be configurations when D would in fact be the first to speak, no?

#

e.g.
D = White C = Black B = Black | A = White

D sees BB (same) ⇒ can deduce own ⇒ declares "White", which is correct.
C doesn't have anything to do or say cause they're all saved after D determined and declared the colour of their hat with 100% certainty

harsh flume Apr 12, 2025, 5:25 AM

#

balmy mist so google got shadebrook, dragontail, and riverhollow

is riverhollow confirmed google?

#

I got it three times so far and it won all of them

velvet ice Apr 12, 2025, 5:25 AM

#

viral notch grok 3 tends to go past the token limit in my experience

Grok does complete the code, Just skill issue. Can't write correct syntax.

harsh flume Apr 12, 2025, 5:25 AM

#

from my questions it seems like its been trained on a fairly recent dataset

viral notch Apr 12, 2025, 5:27 AM

#

velvet ice Grok does complete the code, Just skill issue. Can't write correct syntax.

damn, and here i thought grok 3 was the new MVP. disappointed

velvet ice Apr 12, 2025, 5:28 AM

#

viral notch damn, and here i thought grok 3 was the new MVP. disappointed

Asked it to correct the code and gave the exact error message. Created three new errors.

viral notch Apr 12, 2025, 5:28 AM

#

velvet ice Asked it to correct the code and gave the exact error message. Created three new...

🤦‍♂️

#

elon EXPLAIN. lol

velvet ice Apr 12, 2025, 5:31 AM

#

viral notch elon EXPLAIN. lol

Okay i got it working after third attempt pasting the error message.

#

It's also insane the other model got the same error message, but did not fix anything because it knew there was not errors. Or maybe the WebDev arena has some logic to route erorr messages (def not)

#

This was the fix if any nextjs people here are wondering.

drifting thorn Apr 12, 2025, 5:42 AM

#

Guess what Dragontail is

#

I will guess it's 2.5 Pro but tuned for maths and reasoning instead of general purpose

velvet ice Apr 12, 2025, 5:43 AM

#

Okay, it's 100% Gemini. Every single time in any way you test it you get the same "launch app" button with the same icon.

drifting thorn Apr 12, 2025, 5:44 AM

#

amazon-nova-pro-v1.0 is a joke

torn mantle Apr 12, 2025, 5:45 AM

#

drifting thorn amazon-nova-pro-v1.0 is a joke

xd

#

@balmy mist are you still updating the pokemon game?

drifting thorn Apr 12, 2025, 5:47 AM

#

Will there be a image generation version of 2.5 Pro? I await for that

torn mantle Apr 12, 2025, 5:48 AM

#

the model name is Imagen 3

#

i think its on 3.1

drifting thorn Apr 12, 2025, 5:49 AM

#

So they still decide to split it up instead of treating them as a whole like chatgpt 4o?

#

I don't get it when they start experimenting on generating photos by Gemini 2.0 Flash

torn mantle Apr 12, 2025, 5:49 AM

#

@balmy mist im trying to make the game with 3d animation instead

torn mantle Apr 12, 2025, 5:50 AM

#

drifting thorn So they still decide to split it up instead of treating them as a whole like cha...

idk if gpt4o image model is a seperate one or not