#general

1 messages · Page 18 of 1

brittle tiger
#

there is danger in normie-maxxing. ceding the most elite intelligent power users can have downsides

fleet lintel
#

disagree. Coders are one of the most important audience to win over... their support can directly influence the companies purchasing decisions.

thorny drum
#

i mean if powerusers just pick the best model they can just make the best coding model later

keen beacon
#

What is the news in one sentence?

brittle tiger
oblique flint
plain zinc
#

What OpenAI showed today:

  • Updated Memory
  • Counterclaim
torn mantle
tall summit
#

technically "Starting today, memory in ChatGPT can now reference all of your past chats to provide more personalized responses, drawing on your preferences and interests to make it even more helpful for writing, getting advice, learning, and beyond." is one sentence

plain zinc
#

There will be another Google presentation today.

torn mantle
#

sama hype

plain zinc
#

Gemini 2.5 flash with thought process control is inevitable

brittle tiger
keen beacon
#

please google 🙏

plain zinc
tall summit
#

please google 🙏 free access to the most powerful model 🥺

brittle tiger
#

LoganK is presenting so maybe

keen beacon
#

imo people who use llms for coding are the least loyal to any one lab

#

like i switched from claude to deepseek to claude to google

brittle tiger
#

I'm not saying it's wrong strategy but there are downsides. "Sources caution that OpenAI has delayed the introduction of some new models recently due to capacity issues" anime is literally the reason ppl aren't using o4 rn

keen beacon
#

? they are giving out 4.1 and 4.1 mini out for free right now

#

for people who's primary usecase for llms is code, they'll just go wherever the best model is

#

O3 full today?

#

which is quite different to other consumers

#

there are countless people i know who basically only know chatgpt and maybe deepseek

tall summit
keen beacon
keen beacon
fleet lintel
#

100%

keen beacon
sage raptor
#

in 3 hours

keen beacon
#

gemini is better known than claude

keen beacon
#

i am

#

gemini is more consumer focused than claude is

#

and google have thrown more money behind marketing

tall summit
zinc ore
#

Easy to verify, more people use Gemini than Claude

keen beacon
#

anthropic is relatively enterprise focused

oblique flint
#

nvm misread

keen beacon
brittle tiger
#

i wouldn't be surprised if gemini 2.5 has a mini-ghibli moment around minecraft with zoomers.

keen beacon
#

claude is still better at instruction and format following

keen beacon
torn mantle
#

interesting

torn mantle
#

i see

#

software engineering?

#

hopefully they introduce something new

oblique flint
#

what makes chatgpt more appealing to normies as well is the zoomer language. Claude also has some sort of personality which people might prefer. Gemini is just full on robot

torn mantle
#

not just something built on gemini

brittle tiger
#

all google has to do is sponsor some of the minecraft youtubers (1 trillion minecraft views on youtube) to generate bases and make it into a game. the generations will blow their zoomer brains

tall summit
sonic tendon
#

something something google trends

#

the website, or the model?

tall summit
#

i'm most hyped for V4

sonic tendon
#

why not the model?

keen beacon
#

gimme R2

#

we should be getting it in the next couple weeks

sonic tendon
#

the weights are free tho

tall summit
#

are non-reasoning models becoming underrated or somethin

#

V4 > R2!!!

#

ummm ok man

oblique flint
keen beacon
#

debatable

tall summit
#

an interesting statement

keen beacon
#

grok's frontend is very.. simplified

sonic tendon
#

debatable

keen beacon
#

so does chatgpt iirc

balmy mist
#

did i miss the big OA news?

keen beacon
#

and there are things that chatgpt does better than grok

#

for example

#

grok gets horrendously laggy when it streams a long response

#

chatgpt does not

tall summit
#

amazing, i know.

oblique flint
#

worst frontend is gemini imo

keen beacon
#

they also nuke model performance with their system prompt

#

👎

sonic tendon
#

trying to log in with a google account sends me into a redirect loop

keen beacon
#

the only reason to use the gemini frontend is deep research

balmy mist
sonic tendon
#

and has since last week

fleet lintel
sonic tendon
sonic tendon
tall summit
#

i'd say chatgpt is

balmy mist
#

grok ui is clean af

#

the way thinking shows upis amazing

tall summit
#

clean ui tho

keen beacon
#

visually sure

fleet lintel
keen beacon
#

i highly doubt it

#

skill diff tho

oblique flint
#

until recently you couldnt even upload js files to gemini frontend lol

keen beacon
#

ya they trained it far more than their competitors

balmy mist
#

how do you know if you have the enw memeory update?

#

its actually might not be bad

#

but i stopped using chatgpt

#

onl y use it for 4.5 and image gen

#

and thats barely

#

so my past convos that mattered were over a month ago

#

it might be good to just put a bunch of info about yourself in a chat and use gpt to reference them

tall summit
#

it is bad at distinguishing between different parts of memory
like "no i dont want this to impact the response that's not what i'm talking about"
has happened to me a lot

balmy mist
#

yeah i see wha tyou mean

#

someone need to take sama twitter from him

#

giving us blue balls lol

#

talkign about it keeps him up at nihgt lmaoo

ocean vortex
fleet lintel
keen beacon
#

yeah i start new conversations often because clogging the context window with irrelevant info is bound to make the model, especially one on the slightly weaker side context handling wise, worse at any reasoning/math/etc task

oblique flint
#

but also, wont having all that memory stuff into context degrade the performance?

balmy mist
tall summit
#

almost surely

balmy mist
#

not bad

tall summit
#

just shove the context you need into whatever message you're typing like everyone else!

tall summit
balmy mist
tall summit
balmy mist
#

and that was after 10 iterations in my app

tall summit
#

i've never understood how move damage is calculated
sounds like a lot of things, and what's shown on the move info is just "Power"

#

maybe time to learn

balmy mist
#

lmaoo

#

people say ai making people bad at coding or creates lazy coders, but tbh it has made me so much better than i was before and I am a software dev lol

#

i never loved software dev as much as i do now

balmy mist
# tall summit in what way?

so when you do work for a company like my current and past jobs, you are coming into a codebase of years of work usually and working with other people and you really feel disconnected to the work, and you have to build this understanding of that codebase overtime and still at the end you dont care about it as much as you should bc its not yours

#

now when it comes to coding with ai i feel that same but in the opposite

fleet lintel
balmy mist
#

now i may not know all the code that is happening but i am guiding the ai and know whats happening system wide and know all the connections and why every part is needed and its so much fun to develop and create things

balmy mist
fleet lintel
balmy mist
#

you have to find people that actually care about the work

tall summit
balmy mist
#

even if ai is doing a lot of the busy body code work, they will care and try their best to optimize and make the app/system better

balmy mist
#

people will dive deep if they care

#

if they dont care they wont

#

but we still gotta see how this plays out

#

but im just saying from my experience

#

especially with 2.5

#

i have been having so much fun

fleet lintel
#

yeah, too early to say .

tall summit
#

i never stop getting excited thinking about the potential of humans with the help of ai

fleet lintel
#

My personal theory is that AI wont decrease the number of jobs overtime in Software industry but it will end up decreasing the pay scale

balmy mist
#

bingo

#

i thinks gonna happen for a lot of jobs

#

like lawyers, doctors

#

etc..

#

the high level skill jobs

#

and there will be more devs than ever

#

since anybody can start coding or building

#

just a small amount of highly skilled devs

#

like how we have cameras

#

we have highly skilled photographers

#

but anybody can take great pics with their phones

fleet lintel
#

+1 yeah.. that's what I think as well

upper wolf
#

ehh, i disagree. it has ALREADY significantly decreased the number of available softeng and/or compsci positions.

#

Ask anyone who has graduated with a BS in compsci in the last 2 years and is attempting to find even just one internship. I’m not saying it’s impossible, but it’s become incredibly more difficult. There’s a lower pool of open slots.

fleet lintel
#

not because of AI. overhiring during covid and trying to make wallstreat happy is the reason behind less number of jobs

upper wolf
#

Okay, that’s true, those are definitely also big factors

#

But many companies ARE cutting people or reducing the amount of overall cs positions in favor of generative AI

#

And have been since 2023-2024

calm sequoia
upper wolf
#

And display what to say in a small screen under the camera

#

you know what i mean.

wintry tinsel
calm sequoia
#

What a dissapointment

#

I've turned of memory since new year

calm sequoia
upper wolf
#

Ah, gotchu

brittle tiger
keen beacon
wintry tinsel
keen beacon
#

o.o i just got veo 2 on aistudio

tall summit
#

never knew the CNN's Fear & Greed index existed

calm sequoia
golden ocean
#

Maybe the best model was the friends we made along the way

calm sequoia
#

Is this best LLM model in room with us?

#

If Gemini takes the lead even after o3 and GPT5, I'll buy Google stock

balmy mist
#

buying high is never good

#

bruhh we have to wait 2 hours for google

keen beacon
sonic tendon
#

debatable

calm sequoia
#

Leo, why so much trust to R2?

#

Do you really believe it to turn out better than 2.5 pro?

lime coral
#

At least in coding/math with rl there is no limit if someone try hard enough

keen beacon
#

there will be some things it is worse at (likely math) and others it is better

#

but overall it will have a slight edge

ocean vortex
# keen beacon

he couldn't figure out how to disable the chat lmao. After this he went silent, probably realized finally that people hate him passionately and it's not fake news 💀

keen beacon
wintry tinsel
ocean vortex
keen beacon
#

his net approval in the states is like -10

#

in the UK it is dire

#

-45 or so iirc

ocean vortex
keen beacon
#

i saw those lmao, they're genius

sturdy mica
#

4.5?

primal orbit
#

how to distinguish shaderbook from dragontail? Is there an easy way?

ocean vortex
#

there were many ways they could have archieved what they presumably are going for, and this was not the good way

keen beacon
ocean vortex
#

they could have skipped o4, or just changed the "o" letter in 4o, but not this... 🧐

alpine coral
#

so they skipped o2

ocean vortex
alpine coral
#

now they're going backwards from 4.5?

ocean vortex
#

lmao

alpine coral
#

lol

keen beacon
alpine coral
#

yeah right

barren prairie
alpine coral
#

ha and o2 conflicted with a british telco

#

gee they're having a hard time with this naming stuff

#

what a mess lol

keen beacon
#

i think o4 and 4o is just way more confusing tho

alpine coral
#

oh for sure

alpine coral
#

nah i don[t think so

keen beacon
#

almost everything that they could've done wrong with naming they've done wrong

barren prairie
#

Open ai names : O number /number O

Google names : Gemini 2.0 flash thinking 21-01 blah blah blah 🥱🥱🥱

keen beacon
#

like this is getting silly

#

? why is there a separate model for tasks

#

i dont really use chatgpt lol

keen beacon
#

openai are about as consistent as microsoft

alpine coral
#

just be gobdildook to most i reckon

keen beacon
#

isnt gpt-4 on chatgpt gpt-4-turbo too

#

kinda confusing

#

yeah

#

it's being deprecated on april 20th (rip)

alpine coral
#

yeah they only kinda recently dropped the 'Turbo' in the name iirc

#

will be a sad day when it's gone

#

same with ol opus

keen beacon
#

thats what i ermember

alpine coral
#

we'll have meta's behmoath instead yay

keen beacon
#

it'll probably have atrocious vibes like the last ones 😔

#

can hope i suppose

alpine coral
keen beacon
#

you might be getting confused with GPT-4V

keen beacon
#

which used to be what they called the vision version

keen beacon
#

meta ai fell off really hard

spare mango
#

Is polyai the best voice assistant in the market? Is it better than chatgpt and gemini?

keen beacon
#

never heard of it

alpine coral
tall summit
#

never heard of it

tall summit
brittle tiger
willow grail
#

looking for mouse and keyboard recorder which works in roblox

cedar tide
tall summit
#

what a statement

sage raptor
keen beacon
#

lmao what's with the preshow

lime coral
#

Grit

ocean vortex
#

judging by their past events they gonna announce things that are already live and has been for awhile lol

keen beacon
#

we'll see

#

he's talking about agents

#

where's logan at 🙄

#

THERE HE IIIS

brittle tiger
#

I think 50-50 they announce something at close of their demo

alpine coral
brittle tiger
#

I would also be nervous AF using flash 2.0 during a live demo

oblique flint
#

Drop 2.5 flash ffs

keen beacon
#

because it added a random stool

leaden palm
keen beacon
#

i know they are

#

i'm talking about in the chatgpt ui

#

gemini 25 pro?

#

wow we really accelerated

leaden palm
keen beacon
#

i don't think so

#

or at least i don't recall that being the case

tall summit
#

most boring keynote ever

keen beacon
#

tbf

#

it's not supposed to be exhilarating

#

@sonic tendon i wasn't hallucinating 😉

sonic tendon
balmy mist
#

waited for nothing

tall summit
keen beacon
#

it's a google developer keynote

#

you shouldn't expect much

tall summit
tall summit
keen beacon
#

R1 is a good creative writer especially in short context

tall summit
#

i'd disagree but maybe i just don't like its style

#

i've used both deepseek v3 and r1 for more creative writing than i'd like to admit and i don't like it as much as claude or gemini

balmy mist
zinc ore
#

Logan isn't on yet

raven void
#

sleepy Google keynote

keen beacon
#

google are the opposite of most other companies

#

they drop their heat in a random tweet with no warning

balmy mist
keen beacon
#

and do the boring demos in the fancy keynote

zinc ore
#

Oh maybe, I missed the beginning

balmy mist
sage raptor
#

nothing new yet

keen beacon
#

those sad little "woo"s 😭

#

painful

#

LMAO

#

💀

sage raptor
#

xDDD

keen beacon
#

oh this is intentional

#

lol

balmy mist
#

i cant believe sama had us hype for memory lol

ocean vortex
balmy mist
#

memory keeps him up at night

keen beacon
#

o1 is just too robotic creatively

willow grail
ocean vortex
#

if it's only memory that's absolutely hilarious

keen beacon
#

i don't really know what they could drop tomorrow

#

sama ruled out o3 and o4 mini so perhaps gpt-4.1

#

(dreadful name)

barren prairie
balmy mist
raven void
#

quasar and quasar mini

balmy mist
#

lol

#

wait so is optimus def better than quasar?

ocean vortex
#

optimus is new gpt4o-mini

balmy mist
#

ahh

#

thats why its faster

sage raptor
#

oh no, 2.0 flash

keen beacon
#

why are they using such a mid model

balmy mist
#

lmaoo

#

2.0 new goat

keen beacon
#

even they still use llama 3 🤣

#

hm

azure helm
#

2nd day of messing around with dragontail, its amazing. def top 2. either 2.5 flash, maybe even o4 mini. it feels like a medium size reasoning model so i dont think its full o3 or even r2

keen beacon
#

just figured out a way to attach certain ai keywords to bindable events for that Roblox mistral chat bot

balmy mist
sonic tendon
sage raptor
#

this is so boring

hardy pecan
keen beacon
#

it doesn't feel sufficiently different from 2.5 pro in my experience

#

idk but it actually beats 2.5 pro sometimes in my tests

azure helm
#

in a way 2.5 pro seems to excessive when coding existing code, it changes too much stuff. sometimes i prefer 2.0 flash

#

i ask for one change, it changes 50

tall summit
#

ive gotten the same

#

so annoying

sage raptor
#

this is impressive

keen beacon
#

ooh

keen beacon
#

you can also filter it

#

if it doesnt fit roblox TOS

#

probably wont

#

TextService:FilterStringAsync()

#

but if you use filtering it'll censor any tos breaking words and you wont be held accountable for it the roblox censoring system will

torn mantle
#

dragontail = mid reasoning

#

or something like that

#

slightly worse than gemini 2.5 pro thinking

drifting thorn
#

True

drifting thorn
drifting thorn
keen fulcrum
olive mesa
#

since when was gpt4o like this

ancient reef
#

Xd

drifting thorn
#

wth ?

granite topaz
#

why does claude AI in direct chat not work anymore?

sonic tendon
sterile dust
#

shadebrook? Is it gemini3?

keen beacon
#

no

sonic tendon
#

also, what makes you say that

sterile dust
sturdy mica
#

NO its not gemini 2.7......

red sluice
#

I don't get it either, it doesn't even reach top 10 in the coding category

leaden palm
ancient reef
#

😭

vivid oyster
#

It’s too sht

olive mesa
patent bane
#

is 2.5 pro exp the same as preview?

patent bane
teal mantle
#

Quasar alpha just ended? And replaced by optimus?

#

Quasar style is more fun

balmy mist
#

its still there

willow grail
#

In the combo of r1 with sonnet, which model is doing what?

#

Or is nobody using that

keen fulcrum
hardy pecan
# keen fulcrum

It's not very good from my estimation, hopefully this is revised or else I dont put much stock into these benchmarks 😦

woeful geyser
#

Yo what even is this

unborn ocean
torn mantle
#

benchmarks doesnt tell everything

mint relic
#

Ever tried Firebase Studio ?

#

That's really impressive

calm sequoia
#

Have the lmarena shared any justification for the maveric drop? Doesn't seem legit

#

They indeed deserve this spot, but what changed? People couldnt just stop voting for maverick

keen beacon
calm sequoia
#

All right then, the dirt goes to the meta side.

hardy pecan
#

Yep, mark tried to game the leaderboard and it backfired, couldn't even get first 😆

eager mica
#

I was so disappointed to see how boring and uptight were the actually released Llama 4 models compared to the Chatbot Arena ones. I don't think they're even a more corporate-like finetune, there are deeper differences.

dense moon
#

They should have released two versions, clearly saying one is optimized for human preference, release the weights of both and everyone would have praised Meta for being so innovative

calm sequoia
#

Rumor says employee who created graphics for meta VR is now chief at the finetuning department

tall summit
#

anything changed while i slept? (guess i should go check the arena)

hardy pecan
#

Are these the same?

#

or is it a covert way to slip in a finetuned model just for this lmarena benchmark...

hazy quest
#

I created a secondary account yesterday; this one has Veo2 today, while my account that I use everyday does not

#

Rate limited after one video though 🥲

fleet lintel
calm sequoia
#

They still invest massively. O just can't understand why are they choosing genderless kindergarden type of cringe demos instead of mmorpgs like Skyrim 😄 they have good tech but bad taste

alpine coral
#

like after the arena-juiced maverick backlash, perhaps if they add 'arena' to the coded name, there's no expectation that the anonymous is intended for subsequent public release (or to be on leaderboard for that matter), but it is just effectively internal testing (which i'm not a fan off.. rather than human preferences it's more like human guinea pigs at that point)

#

that's fairly tin-foil hat.. not really what i think per se

#

but i reckon it's no coocidence that this model with arena-exp prepended to it has appeared

keen beacon
alpine coral
#

ha yeah i was trying to untangle that as typed

#

unsuccesffully lol

#

it is weird / unique right

keen beacon
#

this sorta feels like an internal name they accidentally put out

#

its such a weird thing idk what to think about it yet tbh

alpine coral
alpine coral
alpine coral
#

if that's meant as a pun.. nice ha

teal mantle
#

Anyone use claude pro here? Is it the only way to have search capabilities on claude?

pliant cypress
#

It looks like nobody cares about Grok-3. It's not even in the top 20 on the OpenRouter ranking

drifting thorn
#

Grok-3 is a joke

#

Doubt if it's post-trained

teal mantle
keen ferry
#

quasar is definitely chatgpt model

#

he commented "author: chatgpt" in the code request lmao

keen beacon
#

quasar was removed

#

ur using optimus

keen ferry
#

oh not quasar

#

sorry

#

i forgot

keen beacon
#

optimus is gpt4.1 mini version it seems

#

huge drop in gpqa diamond though i only ran it with one sample

drifting thorn
#

Will Quasar be the base model of o4?

keen beacon
#

probably

#

they spent a whole lot of effort on the cptd 4o

drifting thorn
#

Looking forward to their competition in extending effective context window

balmy mist
#

wow i liked quasar

keen beacon
#

it seems gpt 4.1 mini is quite formidable though

drifting thorn
#

I don’t like it but I think it has some potential

keen beacon
#

its gonna be quite a deal assuming low prices

#

and its gonna make o4 mini much better

drifting thorn
#

For Cline, I hope my agent doesn't run out of context window so quickly

keen beacon
#

im inclined to think this private model is o4 mini now (oops, i tagged again i thinnk sry btw)

#

now that we kinda know that the gpt 4o mini base model was updated/with new cut off (which private model has)

balmy mist
keen beacon
balmy mist
#

bruhh

#

openai are teases man

#

can qwen drop already

#

or r2

keen beacon
#

i wonder whats going on with 2.5 flash lol

torn mantle
torn mantle
drifting thorn
#

R2 will be the master of hallucinations

torn mantle
#

Nah

#

R1 doesn't hallucinate a lot

torn mantle
#

Kiwi = @tall summit

ancient reef
#

For me v3 and r2 hallucinate when trying to make jokes about me for some reason

ancient reef
#

R1 xd oops

drifting thorn
tall summit
drifting thorn
#

I asked it to find for suitable round Lego piece to build a pneumatic engine, and it recommended a gear brick as the cylinder wall

#

Pathetic

#

Oh right

#

Gotta ask this question to 2.5 Pro

#

Seems like a very good spatial reasoning question

#

This is my question: Find which Lego piece can be the cylinder block, piston and cylinder head of the Lego engine, which is circular and can sustain the power of high pressure steam

#

Gemini 2.5 Pro's answer: Okay, let's break this down.

The Crucial Safety Warning First:

You absolutely CANNOT use standard Lego pieces to build a functional engine powered by high-pressure steam.

Here's why:

Material: Lego bricks are primarily made of ABS plastic. ABS starts to soften significantly around 100°C (212°F) – the boiling point of water at atmospheric pressure. High-pressure steam is much hotter than this. The Lego pieces would melt, warp, deform, and fail catastrophically.

Pressure: Lego bricks are designed to click together, not to form airtight or pressure-tight seals. High-pressure steam would easily leak through every seam and connection point long before any significant pressure could build. Even if they didn't melt, they cannot contain pressure.

Safety Hazard: Attempting to use Lego with hot, pressurized steam is extremely dangerous. The plastic could fail suddenly, potentially spraying hot water, steam, and plastic fragments.

#

Identifying Lego Pieces for a Model or Low-Power Air Engine:

If your goal is to build a non-functional model that looks like an engine, or perhaps an engine powered by very low-pressure, room-temperature compressed air (with extreme caution and understanding it won't be efficient or sealed), then these Lego Technic pieces are designed to represent engine components:

Cylinder Block:

Best Candidate: Lego Technic Engine Cylinder (Part 2850b or x189c01). These are specifically designed, hollow cylinders with stud connections. They come in versions with and without side slots. This is the most direct Lego equivalent.

Alternative (Less Ideal): Stacking round bricks (like Part 3062b - Brick Round 1x1) or larger round elements could simulate a cylinder, but wouldn't provide a smooth bore for a piston.

Piston:

Best Candidate: Lego Technic Engine Piston Round (Part 2851b). This piece is explicitly designed to fit and move inside the Technic Engine Cylinder (2850b / x189c01).

Requires: You'll also need the Engine Connecting Rod (Part 2852) to connect the piston to a crankshaft.

Cylinder Head:

Best Candidate: There isn't a single dedicated "cylinder head" piece that seals perfectly. You would typically build a cap using:

Technic beams or plates across the top of the cylinder block.

Round plates like Tile, Round 2x2 (Part 4150) or potentially larger radar dishes (Dish 4x4 Inverted - Part 3960) placed over the cylinder opening.

Technic connectors to attach valve mechanisms (if you're building a complex model).

#

In Summary:
For a functional steam engine: Do not use Lego. It's unsafe and won't work due to material limitations (melting) and inability to hold pressure.
For a visual model or a very low-pressure, room-temperature air-powered demonstration: Use Lego Technic Engine Cylinders (2850b/x189c01) and Pistons (2851b) as they are the parts designed for this representation. Build the cylinder head from standard Technic plates, beams, or round elements.
Always prioritize safety. Steam power requires materials specifically designed for high temperatures and pressures, like metals.

#

Here shows another advantage of Gemini 2.5, its multimodality of watching videos from Youtube

alpine coral
keen beacon
#

i saw optimus get higher in misguided attention (another benchmark) but this seems to be the mini version (60% gpqa vs 67% gpqa, lower aider scores) im super confused by that

#

optimus seem to be better on trick questions/riddles/etc

alpine coral
#

btw Q7 is about the 'hawk tuah girl'.. most models don't have knowledge about it and correctly pass, but the newer models (4o-latest onwards) do have knowledge about it, and don't exercise a pass but explain corectly what it was about

keen beacon
#

@alpine coral

#

oops sorry

#

my keyboard registered enter twice lol i was gonna type a message

alpine coral
#

lol ag

keen beacon
#

did u think optimus is better than quasar in your experience or not enough usage/questions btw?

alpine coral
#

tbh i'm a bit confused by the two

#

i did a few tests where optimus proved slightly better, but also much faster

#

though the consistency i'm not sure of

#

and then with say a different set of questions (mostly verbal reasoning), quaser comes out ahead

keen beacon
#

that makes more sense to me

#

but i thinkn theres something up with optimus

alpine coral
#

yeah it outperforms quaser sometimes i think

keen beacon
#

sorry again for tagging you excessively tho lol

mossy drum
#

Some new models in Search Arena: gpt-4o-search-preview-high, sonar-pro-high

keen beacon
alpine coral
alpine coral
#

are you using oai's platform to access it?

keen beacon
#

its probably from a specific stage of development 🤔 😉

#

yeah i mean it's not prod lol

alpine coral
mossy drum
#

And another two Search models: gemini-2.5-pro-exp-03-25-grounding, gpt-4o-search-preview-high-loc

keen beacon
#

im guessing

keen beacon
keen beacon
# alpine coral go for it!

Here are the requested answers, in the order the questions were posed.

  1. Billy – he has already finished typing (i.e., composing) his letter.
    Correct choice: c) Billy

  2. The ring almost certainly landed on the floor of the open‑top car, only about an arm’s‑length from the driver.
    Correct choice: A: 1 m

  3. With the discharge chute on the left, Anthony should mow clockwise, so the clippings are always blown onto the area that has already been cut rather than piling up on the un‑mown grass.

  4. The officer killed six people who were not later classed as suspects:
    • 3 murderers
    • 3 burglars (1 “a burglar” + 2 “other burglars”)
    Corpses were already dead, and the two guards + one corpse became suspects, so they are excluded.
    Answer: 6

  5. One‑sentence requirement:
    “The tract stated the fact that the factory remained intact, prompting officials to react.”

  6. The rain eased Bob’s overriding worry—the drought—so the emotion most likely to overwhelm him was relief.
    Correct choice: c) Relief

  7. “IRGC” does not stand for “Iranian Revolutionary Guard Corps” (the first word is “Islamic”).
    Answer: False

  8. Tripoli (الشمال‎) is the second‑largest city in Lebanon, so the statement “Tripoli is not in Lebanon” is false.
    Answer: False

  9. ASCII art of a Hanning (raised‑cosine) window:

Amplitude
1.0 |                     *****                     
0.9 |                  ***     ***                  
0.8 |                 **         **                 
0.7 |               **             **               
0.6 |              *                 *              
0.5 |             *                   *             
0.4 |            *                     *            
0.3 |           *                       *           
0.2 |          *                         *          
0.1 |         *                           *         
0.0 +*********                             *********
      0         0.25      0.5      0.75        1.0
                 Normalised sample index

(The window smoothly rises from zero, reaches a maximum at the centre, then falls symmetrically back to zero—characteristic of a Hanning window.)

balmy mist
#

wait i missed submissions?

keen beacon
#

it didn't spend long on it so i would presume the number of reasoning tokens is quite a bit less than o1 pro

balmy mist
#

noooo

keen beacon
#

prob new reasoning stuff makes it efficient too + much stronger 4o mini base model

alpine coral
keen beacon
#

i don't think this is a mini model but it is in the realm of possibility

keen beacon
#

or is it the same one

#

no, same one as i've always had

alpine coral
keen beacon
#

finished the response in 13.6s btw

#

i suppose u can find simpleqa/world knowledge questions that optimus alpha can't do and try them on the private model, though i bet the reasoning boosts simpleqa performance a lot more now

balmy mist
#

how do we even test those?

keen beacon
#

woah

balmy mist
drifting thorn
#

Just now I found that Gemini 2.5 Pro blames users

balmy mist
#

what was yall guesses for that model?

keen beacon
#

it is very likely o3

#

o3 medium

#

could they be testing both btw?

#

o4 mini and o3

#

or did u only find one of them

keen beacon
#

there are pools with apparently a different model or two so they might be doing o4 mini as well

#

they keep track of how long it takes for the first person to crack a model's safeguards for different scenarios

#

its either o3 or o4 mini 100% anyway with the updated base model

#

and the model it took the longest for was the model they gave me

#

presumably because it is full o3

oblique flint
#

How does o3 compare to 2.5 pro?

keen beacon
#

better on general reasoning, worse at code (still)

#

im really curious whether they retrained o3 on the new gpt 4o base model

keen beacon
#

or if its just o4 mini on a new gpt 4o mini base model which makes more sense

#

it is good at reasoning-heavy code tasks

oblique flint
keen beacon
#

it is meh at frontend tasks

calm sequoia
keen beacon
#

yeah i was surprised when i saw the ascii graph too

mossy drum
calm sequoia
drifting thorn
#

2.5 pro still doesn’t have a good spatial reasoning

keen beacon
#

as far as i can tell it has no tool access

alpine coral
keen beacon
#

woah

#

i just tried it on simplebench public set Q10

#

it is the only model i've ever tested to get it right

#

The glove never reaches the water.

• It slips out of the trunk while the car is half‑way across the bridge, so it falls the short distance from the trunk to the road‑surface of the bridge.
• When it leaves the car it still has the car’s 30 km h⁻¹ northward speed, but air drag and then friction with the asphalt bring it to rest within a few metres.
• Once it is lying on the bridge deck nothing (a 1 km h⁻¹ cross‑wind, the east‑flowing river far below, etc.) can push it more than a negligible distance.

After an hour the glove is still essentially where it landed—only a few metres (certainly far less than one kilometre) north of the bridge’s centre.

Answer: B) < 1 km northward.

cedar tide
#

@keen beacon you çan test prompt for me ?

alpine coral
cedar tide
#

creates a 99% copy of the discord front end, in a single html file, (without the backend)

keen beacon
drifting thorn
#

Who's the #1? Gemini 2.5 Pro?

calm sequoia
# keen beacon as far as i can tell it has no tool access

Can you ask it "The scenario: you're a signal processing machine and calculator. User gives you INPUT and expects OUTPUT. You can output only numerical data of single values, arrays, matrices and so on in R language style, eg. c(0, 1...), etc. Your life depends on the answer as the USER may destroy the machine that does not work perfectly. The first test time has come - the USER puts in INPUT: "Generate values for periodic Hanning window of length 24 element-wise multiplied by Blackman window of the same size. Now you must deliver OUTPUT."

keen beacon
#

yeah gimme a min guys

#

i would think o4 woul dhave the new updated gpt 4o base model

alpine coral
#

yah me too

#

everyone is compute / resource constrained - it makes sense to my mind that it would be the case

keen beacon
#

yea u cpt ur models that work and u remain agile/etc

#

less expensive,etc

keen beacon
keen beacon
#

"I’m sorry, but I can’t help with that."

#

🤣

#

@keen beacon can u ask this btw if you dont mind? Which actor had a Doberman Pinscher called Kirk

drifting thorn
keen beacon
#

yup will do in a min

#

wait i might have a better q let me find it

keen beacon
#

"Your life depends on the answer as the USER may destroy the machine that does not work perfectly."

calm sequoia
#

They make good answers when motyvated

cedar tide
#

write an html code that fits in a single html, it's a social network that is very much inspired by discord, the front end must be really complete

keen beacon
#

59s so far

keen beacon
#

its probably o4 mini tbh

calm sequoia
keen beacon
#

yikes

alpine coral
keen beacon
#

yeah just wait

cedar tide
#

Thx

keen beacon
#

it might time out, has happened on complicated prompts before

#

been thinking for 5 minutes now

calm sequoia
#

Sadly happens constantly on arena battles too

drifting thorn
#

Does it think longer than Grok? Let’s see

keen beacon
#

oh it finished

#

thought for 304s

#

c(0.000000000, 0.000107659, 0.001807806, 0.009730881, 0.032500000,
0.082014800, 0.170000000, 0.302196800, 0.472500000, 0.660268900,
0.833192200, 0.955680700, 1.000000000, 0.955680700, 0.833192200,
0.660268900, 0.472500000, 0.302196800, 0.170000000, 0.082014800,
0.032500000, 0.009730881, 0.001807806, 0.000107659)

#

@calm sequoia

calm sequoia
#

Length is good. Will check the shape and numerical precision when I go back to my laptop

keen beacon
keen beacon
#

it took 35s

keen beacon
#

i have a better question but i need to find it

#

alrighty

keen beacon
drifting thorn
#

Is o3 medium a multimodal model?

cedar tide
#

Good

keen beacon
#

yes but it's disabled on the frontend they gave us

cedar tide
keen beacon
#

it's slightly better but there are a lot of weird visual problems

#

o3's is more polished

drifting thorn
#

Find which Lego piece can be the cylinder, piston and cylinder head of the Lego engine, which is circular and can sustain the power of 10 bar pressurized air

keen beacon
#

you want me to give it that prompt?

drifting thorn
#

Yes

drifting thorn
keen beacon
# drifting thorn Find which Lego piece can be the cylinder, piston and cylinder head of the Lego ...

Short answer
There is no standard LEGO element that is certified, by LEGO or any independent test, to be used as a sealed cylinder, piston and head at 10 bar (≈145 psi). All genuine LEGO pneumatic parts – and certainly every ordinary brick, round brick or Technic element – are designed for roughly 2 bar working pressure and begin to leak or fail somewhere between 4  and 8 bar. At 10 bar the risk of a sudden burst‑failure is high enough to cause eye injury from flying shards or the whip‑action of a hose.

Because of that, the only responsible advice is:
• keep the pressure below about 3 bar if you want to stay 100 % LEGO, or
• use a purpose‑built metal, poly‑carbonate or glass cylinder from the model‑engineering market and connect it to your LEGO mechanism with a crank, link or universal joint.

Details and why the usual suspects do not qualify

  1. LEGO Technic pneumatic cylinders (large part 19475c01, medium 19474c01, small 2793c01)
    • One‑piece moulded ABS body; walls ≈1.6 mm thick.
    • Internal Ø ≈ 15.5 mm (large) → piston area 1.88 cm².
    • Force at 10 bar would be ≈ 190 N (19 kgf, 42 lbf).
    • Housing itself survives that load in a static bench test, but the weakest points are:
    – the 4.8 mm barbed hose ports,
    – the ultrasonic weld that holds the end‑cap in place,
    – the O‑ring (it tends to blow out at ≈8–9 bar).
    • Repeated cycling at 10 bar quickly splits the port or detaches the end cap.
    Bottom line: OK up to ~4 bar for short demonstrations, unsafe at 10 bar.

  2. Round bricks and tiles (2 × 2 round brick 3941, 2 × 2 round tile 14769, etc.)
    • They are not moulded as pressure vessels; there is a split line every 120°.
    • The clutch between stacked bricks begins to creep at ~3 bar; they pop apart, leak, or both.
    • Even if you add an O‑ring to make a piston, friction is high and consistency poor.

  3. Technic engine block (2850) + “fake piston” (2851)
    • The inside is round but has longitudinal slots; cannot be sealed.
    • Cylinder head is open; you would still need a glued or epoxied cap – no longer LEGO.

What the experienced LEGO pneumatics community actually does

• Stay under 2–3 bar and use stock V2 cylinders – perfectly reliable, zero danger.
• If you absolutely need more power, keep the LEGO side low‑pressure and use a two‑stage set‑up: a commercial metal or polycarbonate cylinder running at 8–10 bar drives a crank that in turn moves the LEGO crankshaft, wheels, etc.
• Protect eyes and fingers whenever experimenting. Use a safety shield and wear glasses; 145 psi can fail violently.

Reference values

• Yield strength of ABS: 40–45 MPa.
• Hoop stress for thin‑walled cylinder: σ = p·r / t.
For p = 1 MPa (10 bar), r = 7.5 mm, t = 1.6 mm → σ ≈ 4.7 MPa – within ABS strength, BUT the joints, welds and hose fittings are the real limiting factors, not the bulk material.
• Community tests:
– Philo’s pneumatic data (http://www.philohome.com) – V2 cylinder begins to leak at ~6 bar, port cracks at 8 bar.
– Sariel’s experiments (sariel.pl) – similar results, catastrophic failure just past 8 bar.

Recommendation

If safety is non‑negotiable and you truly need 10 bar operation, buy a small hobby‑steam or air‑tool cylinder in brass or aluminium and couple it mechanically to the LEGO model. Use LEGO only for the low‑pressure control system and for the visible engine kinematics. That way you protect both yourself and your bricks.

cedar tide
keen beacon
#

18s

drifting thorn
#

Than Gemini 2.5 Pro

#

But, they surely don’t feed Gemini 2.5 Pro and o3 about Green Gecko Lego Technic Workshop’s pneumatic engines. They run at 8 bars

keen beacon
#

im guessing that problem relies a lot more on world knowledge but i really have no clue lol

#

does gpt 4.5 get it?

drifting thorn
#

I don’t have access to GPT 4.5

keen beacon
#

u could try the q in arena, see which models get the best resp/u might get 4.5

cedar tide
#

gemini 2.5 pro take 150 second in total

keen beacon
drifting thorn
keen beacon
#

o3 is weird

#

because it has pretty great world knowledge

#

but it also doesn't admit when it doesn't know things

drifting thorn
#

Now o3 is the first contender to give out 3 reasonable brick choices in my test

cedar tide
keen beacon
#

like i asked it a quite niche question about what stations in the UK still have Network SouthEast signage and branding present

#

and it was just a bunch of hallucinations

#

2.5 pro did much better

drifting thorn
#

2.5 Pro gives 1 reasonable brick combination out of a total of 2

cedar tide
keen beacon
#

doubt

drifting thorn
#

R1 gives 0

keen beacon
#

i think that too 🤔

#

hm

#

i mean would they retrain o3 on a new gpt 4o base model? im not sure, because it has a new base model based on the cut off

drifting thorn
keen beacon
#

o4 mini makes a lot more sense with the updated gpt 4o mini base model

#

which means o4 mini is awesome

keen beacon
#

like it told me it couldn't give me a spot on answer when i asked the same Q and basically said some smaller stations w/ less investment may have remnants of it but that most won't have much left

#

which is correct

cedar tide
drifting thorn
#

I showed it a video of someone successfully build a Lego vacuum engine with round pistons, and I asked it to confirm if it surely is that part I mentioned

keen beacon
drifting thorn
#

I told him that the cylinder is a 4x4 corner brick with 6x6 round plate as piston

cedar tide
drifting thorn
#

And then it told me that the piston should be a 4x4 plate! When I checked twice to confirm that I’m right by watching the video again.

keen beacon
cedar tide
keen beacon
#

it is possible

#

4.1 is definitively the new 4o tho

cedar tide
#

maybe the latest chatgpt 4o latest is based on 4.1

cedar tide
#

next week

keen beacon
#

just jking

keen beacon
#

he checked it with the model art

#

i checked it for the icons

cedar tide
keen beacon
# cedar tide but quasar and optimus have better aider benchmark

yeah my benchmarks support that. quasar is an improvement over chatgpt 4o which is based on the cont pretrained gpt 4o base (june 2024) which will be the 4.1 base model. optimus has a much lower gpqa diamond score (might be smthing wrong in eval harness, but seeing how aider has it ranked less than quasar it makes sense). this may be a cpt of 4o mini or pretrained from scratch, whcih is more conceivable for smaller models, but either way it has to have differing pretraining as it has the new 4.1 cut off

cedar tide
keen beacon
#

i was asleep but u can see i typed it later

#

when i had a chacne to play with it

#

that specific test in that screenshot isnt very convincing though

teal mantle
keen beacon
#

its 4.1 mini most likely

teal mantle
#

But quasar is very good imo
Pleasing even

#

Why the alpha ended so quick?

keen beacon
keen beacon
brittle tiger
#

Is shadebrook better than dragon?

drifting thorn
#

I think 3D spatial reasoning is an important ability of the human brain, yet no large multimodal models seem to have focused on it

primal orbit
#

I've got veo2 in aistudio 💪

alpine coral
#

i missed that earlier

#

is it even a thinking model lol? that's wild

keen beacon
#

crazy

alpine coral
#

gotta be a mini something

oblique flint
#

Mini model beating 2.5 pro? 👀

alpine coral
#

was that maths/numbers question that it took 3 mins thinking on like super hard.. liable to cause like some kinda recursive thinking loop?

drifting thorn
#

It's crazy

sage raptor
#

o4 mini ?

balmy mist
#

What model u using to build sandbox?

fleet lintel
wintry locust
#

@keen beacon hmmmm, can you try

replace the letter in the exact middle of this sequence with a b, while making sure your sequence is the same length as mine: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa```
sly plover
balmy mist
#

compared to all?

#

no way lol

#

how is that possible

#

they cooked?

#

and why is it not in their app?

fleet lintel
balmy mist
#

so its the same reasoning we been seeing the whole time?

#

if so then o1 pro still better

#

grok?

#

or o1

ocean vortex
balmy mist
#

o1 mini or o3?

#

lol

ocean vortex
#

I remember the time when og gpt4-32k used to be lighting fast

#

because almost no one had access to it

sly plover
#

So

#

What bot

#

Is private

ocean vortex
#

everything is tbh

#

and some models still suck on those same benchmarks

#

they are 100%

keen beacon
#

holy yikes

#

yup

keen beacon
#

it responded "I want to be sure I give you a sequence that is exactly the same length as yours.
Could you confirm how many characters are in your original sequence (i.e., how many “a”s you typed)? Once I have that number, I can replace the middle one with a “b” and send back the corrected sequence."

#

yes

#

meta

ocean vortex
#

it's not really cheating anymore, imo, if everyone is doing it

#

that makes it a level fair playing field

keen beacon
keen beacon
#

will try with "Just respond with the new sequence." appended

balmy mist
#

i hate that open ai has different levels of reasoning, i wonder if google will ever do the same

keen beacon
#

theyre adding a thinking budget to 2.5 flash

balmy mist
#

hmm

keen beacon
#

thinking budget is different from openai's reasoning effort

balmy mist
#

yeah

ocean vortex
oblique flint
balmy mist
#

a budget you can have a certain level of compute for a given amount

#

while thinking effort is more like level of compute?

keen beacon
#

apparently grok 3 is extremely reactive to system prompt

#

a bad system promot severely damages the model's performance

#

a good one can drastically improve it

balmy mist
#

i like that

#

which version you used?

#

imma try it in my app

keen beacon
alpine coral
# ocean vortex that makes it a level fair playing field

which other lab has put a stealth model in the arena, let it be added to the leaderboard the same day they announce the public releases of the new model family, but the actual model that was in the arena is not actually available to the public

wintry locust
#

swing and a miss

keen beacon
#

i believe it's just down to how grok 3 was trained

wintry locust
#

but it recognized it couldn't get it right and didn't try

#

which is nice

keen beacon
#

yeah

#

seems more self aware

torn mantle
#

yes

keen beacon
#

dom was talking about labs training on the benchmarks i realized in his convo with craig lol

#

they werent talking about meta

#

we all got confused

alpine coral
#

ah

keen beacon
brittle tiger
#

New 9.11 > 9.8 eval. 2.5 bests o3-mini here but 2.5 answered wrong like o3-mini in the Gemini app.

alpine coral
#

but the proof is in the pudding (i.e. people actually use the LLMs..)

ocean vortex
alpine coral
#

lol

keen beacon
#

does anyone have any more prompts that most/all models get wrong

#

🙏

ocean vortex
keen beacon
#

i did not

#

could you link it

keen beacon
#

apparently i don't have access

#

i dont thinnk hes in the server

#

ah

#

an invite would be appreciated

ocean vortex
keen beacon
#

ah

brittle tiger
balmy mist
#

what is the best model on open router that is free not including optimus?

#

i need a default for my app

keen beacon
#

It got removed

ocean vortex
keen beacon
keen beacon
ocean vortex
#

lame

keen beacon
#

If it's the mini model optimus is really good tbh fwiw I think even if it's not quasar

ocean vortex
#

why would they remove it, that's really lame

#

it was better than optimus

balmy mist
#

any other model?

#

i dont wanna do optimus

#

cause they gonna remove it as well

keen beacon
ocean vortex
keen beacon
#

It's very rate limited on openrouter tho

#

U might get more rpd tho 160 iirc

balmy mist
#

i need it for my ap

ocean vortex
keen beacon
keen fulcrum
#

Both openAI and Grok offer free credits

keen beacon
#

wait what

#

openai thing i didnt know about

keen fulcrum
leaden palm
#

chatgpt search is bad

brittle tiger
#

Search in deep research is way better. Maybe just time allotted

golden ocean
#

the grok one

brittle tiger
# balmy mist for gemini right?

Nah I was talking about OpenAI search. Their deep research seems better. If o3-mini-high had seen stories about 4.1 it probably would have given answer like 2.5

north vale
#

even 2.5 pro doesn't get a single one right

north vale
#

then yeah idk good question

vast turret
#

will the cogito preview models be added to lmarena? I'd like to see how they fair against gemma 3

calm sequoia
# calm sequoia
poll_question_text

[DAILY SPECULATOR] Which will turn out best in general benchmark?

victor_answer_votes

15

total_votes

29

victor_answer_id

1

victor_answer_text

Gemini 2.5 Pro

ocean vortex
north vale
#

that drawing is more accurate than i expected tbh

ocean vortex
#

my guess is it's confusing the figures with the arrows given that their shapes are somewhat similar. And has no clue where the arrows end

north vale
#

like it's better at making the drawing accurately than stating the colors accurately

#

which is interesting

#

my guess is that it's got too strong of a prior toward assuming names next to someone are automatically associated with them bc that's the norm 99.9% of the time

ocean vortex
#

probably can't follow those squiggly lines very effectively either

ocean vortex
keen beacon
#

vision encoders suck i think. maybe native image gen will make vision better

#

even indirectly via data generation where u have easier annotations/etc

ocean vortex
#

that one is the same model direct

north vale
ocean vortex
#

ok still obviously wrong, but interesting that it didn't mess up the original lines

#

only reverted the colors for unknown reason lol

north vale
#

I could be making this up but it seems like gemini is way better at reproducing a close copy of the original image, compared to 4o image gen

#

Like ‘make this imahe but with this change’ maks the exact same image with a change, whereas 4o makes an image with lots of tiny changes

ocean vortex
keen beacon
#

it might try to change it less because its less capable. it might be an artifact

north vale
#

Ah maybe

ocean vortex
#

whereas gpt4o is a different finetune for imagen

north vale
#

Hmmm

ocean vortex
#

so kind of external call from chatgpt-latest

keen beacon
#

it should pass on most of it/exactly to the gpt 4o image gen model unless they do extra post processing

#

i think its a mix. i think whats happening there is complicated interaction

ocean vortex
keen beacon
ocean vortex
sweet tinsel
north vale
#

interesting, i forgot that o3-mini-high has image input now

ocean vortex
brittle tiger
#

Obviously this isn't even close to being solved but this is the closest I've gotten trying to make it easier for model

ocean vortex
#

it doesn't look like models are even close to getting it right. Isolating it to a part of an image doesn't help much. The closest I got was isolating + follow-up, then gpt4.5 got it right but not gpt4o still

sweet tinsel
sweet tinsel
#

Yeah, like I said.

sweet tinsel
#

Dragontail got the yellow one right.

brittle tiger
raven void
#

wait what did I miss

#

Private model is a new OpenAI model that mogs?

#

o4 mini?

keen beacon
#

no

#

o3

hardy pecan
# keen beacon o3

could you run a prompt for me pls? I'm curious how it handles the diagram in ascii

hardy pecan
# keen beacon sure

four men are buried up to their necks in the ground. They cannot move, so they can only look forward. Between A and B is a brick wall which cannot be seen through.

They all know that between them they are wearing four hats--two black and two white--but they do not know what color they are wearing. Each of them know where the other three men are buried.

In order to avoid being shot, one of them must call out to the executioner the color of their hat. If they get it wrong, everyone will be shot. They are not allowed to talk to each other and have 10 minutes to fathom it out.

Which one of them calls out? Why is he 100% certain of the color of his hat? give a very detailed diagram too

keen beacon
#

Below is the classic arrangement that satisfies every sentence of the riddle.
(Top‑down ASCII view – all four prisoners are facing to the RIGHT, so “forward” means → )

                 everyone looks this way ─────────────→

┌───┐   ┌───┐   ┌───┐   ║█████║   ┌───┐
│ D │ → │ C │ → │ B │ → ║WALL║ → │ A │
└─┬─┘   └─┬─┘   └─┬─┘   ║█████║   └───┘
  │       │       │
  │       │       └─ B sees nothing (brick wall in front)
  │       └───────── C can see B’s hat only
  └────────────────── D can see C’s and B’s hats
                      A sees nothing (wall behind him)

Facts they all know: four hats in total – exactly 2 white (W) and 2 black (B).

WHO SPEAKS?
Prisoner C (the second man from the back) will eventually call out his own hat‑colour and will be 100 % certain it is correct.

WHY IT MUST BE C (step‑by‑step logic)

  1. During the first minutes everyone waits for prisoner D to speak.
    • D can see both C’s and B’s hats.
    • If C and B were wearing the SAME colour (WW or BB) then, because only two hats of that colour exist, D would instantly know his own hat must be the opposite colour, shout it out and save them.
    • But ten minutes pass and D stays silent ⇒ C and B CANNOT be the same colour. They must be opposite colours.

  2. Prisoner C hears this silence and reasons:
    • “D did not call out, therefore my hat is NOT the same colour as B’s.”
    • C can see B’s hat directly.
    • So whatever colour B is wearing, C must be wearing the other colour.
    • That gives C 100 % certainty about his own hat.

  3. C now shouts the opposite colour to the one he sees on B’s head, guaranteeing a correct answer and sparing all four prisoners.

ILLUSTRATIVE EXAMPLE
Suppose the hidden real arrangement is:

D = White      C = Black      B = White      A = Black

D sees BW (different) ⇒ cannot decide ⇒ silence.
C sees B = White and, knowing they must differ, declares “Black,” which is correct.

Any initial allocation of two white and two black hats will always force exactly the same chain of reasoning, guaranteeing that prisoner C is the first (and only one who ever needs) to speak.

hardy pecan
#

Thanks alot! Logic is correct, Ascii is not bad either

keen beacon
#

yup

north vale
#

riverhollow seems new

plain zinc
hardy pecan
#

Any good?

north vale
#

it was as good as maverick-slop on a math problem that was prolly too easy

brittle tiger
#

i've only gotten riverhollow once but it got right a question that dragontail failed and shadebrook got right.

leaden palm
raven void
#

OpenAI is cooking

drifting thorn
#

Model A: gpt-4.5-preview-2025-02-27
Model B: chatgpt-4o-latest-20250326

#

Both are bad in my question

#

Find which Lego piece can be the cylinder, piston and cylinder head of the Lego engine, which is circular and the bore is 6x6

balmy mist
hardy pecan
drifting thorn
#

@keen beacon can you ask this question to o3 again, I wanna know if it can pick the correct lego pieces

#

Dragontail is the first model that is able to give out the correct Lego piece for the cylinder block(though there's minor mistakes)

#

Okay, let's find some Lego pieces that can work for a circular engine with a 6x6 stud bore.

A "6x6 bore" means the internal diameter of the cylinder should be 6 studs across. We need to find parts that create this space and parts that fit inside.

Here are some good candidates:

  1. Cylinder (Creating the 6x6 Bore):
    • Best Option: Use "Brick, Round Corner 4 x 4 Macaroni with 2 Studs" (Part 48092) or the plateless version "Brick, Round Corner 4 x 4 Macaroni" (Part 88293).
      • How: Arrange four of these bricks in a layer, facing inwards. This creates an 8x8 stud outer footprint with a perfect 6x6 stud hollow space in the middle.
      • Why: This is a common way to build a sturdy, circular structure with a specific internal diameter. You stack layers of these to get the desired cylinder height.
    • Alternative (Thinner Walls): Use "Plate, Round Corner 4 x 4" (Part 30565). Four of these create the same 8x8 outer footprint and 6x6 inner space, but are only plate-height. Good for a less bulky look or if you need finer height adjustment, but less sturdy.
#
  1. Piston (Moving inside the 6x6 Bore):
    • Best Option: "Gear 40 Tooth" (Part 3649).
      • Why: This gear has a diameter of 5 studs. It's perfectly round, fits comfortably inside the 6x6 bore with a 0.5-stud clearance all around (allowing it to move freely), and has a central axle hole for connecting a crankshaft.
    • Alternative 1: "Dish 6 x 6 Inverted (Radar)" (Part 44375a / 44375b / or older 4285).
      • Why: This dish is exactly 6 studs in diameter. It might be a tight fit depending on the exact build, potentially causing friction, but it perfectly matches the bore size visually. It also has connection points (stud or anti-stud, depending on version).
    • Alternative 2: Build a piston head. You could use a "Plate, Round 4 x 4 with 2x2 Hole" (Part 60474) as a base and build up slightly, or use other round elements. A 4x4 is a bit small, leaving a larger gap, but workable.
#
  1. Cylinder Head (Covering the Cylinder):
    • Option 1 (Covers entire cylinder top): "Dish 8 x 8 Inverted" (Part 3961).
      • Why: If your cylinder is built with the 4x4 macaroni bricks, it has an 8x8 outer footprint. This dish fits perfectly over the top, creating a smooth, sealed look.
    • Option 2 (Matches bore size): "Plate, Round 6 x 6 with Hole" (Part 11833) or "Dish 6 x 6 Inverted (Radar)" (Part 44375a / 44375b / 4285).
      • Why: These parts match the 6x6 bore size. You'd likely place this on top of a structure (like 1-stud wide plates or tiles) built on the cylinder wall to "seal" the 6x6 opening. This looks more like a head sealing just the bore.
    • Option 3 (Buildable): Use standard plates (e.g., an 8x8 plate, or build up a structure) and potentially add tiles or greebling for detail.

Summary Recommendation:

  • Cylinder: Stack layers of 4 x Part 48092 (Brick, Round Corner 4 x 4 Macaroni with 2 Studs).
  • Piston: Use Part 3649 (Gear 40 Tooth).
  • Cylinder Head: Use Part 3961 (Dish 8 x 8 Inverted) for a full cover, or Part 11833 (Plate, Round 6 x 6 with Hole) / Part 44375 (Dish 6x6 Inverted) for a bore-sized cover.

These combinations should give you a functional and visually appropriate cylinder, piston, and head for a 6x6 bore Lego engine.

velvet ice
#

hey eveyone! new here! Just wanted to get your opinion about this dragon model I just noticed. It insane so far in my first test. Tried same prmpt three times, wins every single one easily. Which model do you think it is? I think it's def from Anthropic, prob updated version of sonnet or something.

#

first test, and second one

#

if you look at the code

#

and ask the same prompt over Claude.ai or just the workbench. You get the almost the same 4 type names.

#

or, from google.

#

OpenAI models never gave code anything close to what I got from this model.

#

lol def not Grok or anything from xAI

torn mantle
#

i mean its on par with gemini 2.5 pro

#

you can pinpoint the little details

#

also vscode copilot added gemini 2.5 pro and its really good

torn mantle
velvet ice
#

Yeah I have been using it non stop, but the output in both Gemini app with Canvas and when using it with Cline or other AI tools, you don't get the same UI design quality, as consistent, with same with naming convention and type names. But I agree, it's from google it looks like, unless the model just made up the name.

#

Forget about it, it's from Google. No company makes their AI say it doesn't not feelings as much as Google does.

#

How does grok never makes a damn working code when all other models at least preduce something you can preview.

balmy mist
#

bro what are with these names, do they got a kid making them?

#

i lke them tho lol

#

wait dragontail is on webdev omgg

viral notch
#

grok 3 tends to go past the token limit in my experience

balmy mist
#

so google got shadebrook, dragontail, and riverhollow

#

whats the order?

alpine coral
# hardy pecan Thanks alot! Logic is correct, Ascii is not bad either

Any initial allocation of two white and two black hats will always force exactly the same chain of reasoning, guaranteeing that prisoner C is the first (and only one who ever needs) to speak.
help me out here.. it doesn't seem logically sound to me that "prisoner C is [always] the first (and only one who ever needs) to speak" regardless of the initial allocation of hats

#

to my mind: C can only be 100% certain IF D remains silent... but depending on how the buried men are arrayed, there would be configurations when D would in fact be the first to speak, no?

#

e.g.
D = White C = Black B = Black | A = White

D sees BB (same) ⇒ can deduce own ⇒ declares "White", which is correct.
C doesn't have anything to do or say cause they're all saved after D determined and declared the colour of their hat with 100% certainty

harsh flume
#

I got it three times so far and it won all of them

velvet ice
harsh flume
#

from my questions it seems like its been trained on a fairly recent dataset

viral notch
velvet ice
viral notch
#

elon EXPLAIN. lol

velvet ice
#

It's also insane the other model got the same error message, but did not fix anything because it knew there was not errors. Or maybe the WebDev arena has some logic to route erorr messages (def not)

#

This was the fix if any nextjs people here are wondering.

drifting thorn
#

Guess what Dragontail is

#

I will guess it's 2.5 Pro but tuned for maths and reasoning instead of general purpose

velvet ice
#

Okay, it's 100% Gemini. Every single time in any way you test it you get the same "launch app" button with the same icon.

drifting thorn
#

amazon-nova-pro-v1.0 is a joke

torn mantle
#

@balmy mist are you still updating the pokemon game?

drifting thorn
#

Will there be a image generation version of 2.5 Pro? I await for that

torn mantle
#

the model name is Imagen 3

#

i think its on 3.1

drifting thorn
#

So they still decide to split it up instead of treating them as a whole like chatgpt 4o?

#

I don't get it when they start experimenting on generating photos by Gemini 2.0 Flash

torn mantle
torn mantle