#general
1 messages · Page 18 of 1
disagree. Coders are one of the most important audience to win over... their support can directly influence the companies purchasing decisions.
i mean if powerusers just pick the best model they can just make the best coding model later
What is the news in one sentence?
chagpt will now be able to access all of your past chats
nothing ever happens
What OpenAI showed today:
- Updated Memory
- Counterclaim
memory improvement
technically "Starting today, memory in ChatGPT can now reference all of your past chats to provide more personalized responses, drawing on your preferences and interests to make it even more helpful for writing, getting advice, learning, and beyond." is one sentence
There will be another Google presentation today.
sama hype
Gemini 2.5 flash with thought process control is inevitable
When
it's possible somethign comes out at 5pm ET developer keynote. nightwhisper would be lit but i doubt it
please google 🙏
after 4 hours
please google 🙏 free access to the most powerful model 🥺
LoganK is presenting so maybe
👀
imo people who use llms for coding are the least loyal to any one lab
like i switched from claude to deepseek to claude to google
so me
I'm not saying it's wrong strategy but there are downsides. "Sources caution that OpenAI has delayed the introduction of some new models recently due to capacity issues" anime is literally the reason ppl aren't using o4 rn
? they are giving out 4.1 and 4.1 mini out for free right now
for people who's primary usecase for llms is code, they'll just go wherever the best model is
O3 full today?
which is quite different to other consumers
there are countless people i know who basically only know chatgpt and maybe deepseek
are there reasons why someone wouldn't if they weren't coding? (if they knew of others)
Where
quasar alpha and optimus alpha
most people outside of our bubble have very little knowledge of this stuff tbh
100%
O4.1?
in 3 hours
gemini is better known than claude
demos demos demos
i am
gemini is more consumer focused than claude is
and google have thrown more money behind marketing
Easy to verify, more people use Gemini than Claude
anthropic is relatively enterprise focused
the model sure, but in cursor and roocode claude is still better
nvm misread
yeah i know, that's down to the implementations
i wouldn't be surprised if gemini 2.5 has a mini-ghibli moment around minecraft with zoomers.
claude is still better at instruction and format following
🗣️
interesting
in 3h
i see
software engineering?
hopefully they introduce something new
what makes chatgpt more appealing to normies as well is the zoomer language. Claude also has some sort of personality which people might prefer. Gemini is just full on robot
not just something built on gemini
all google has to do is sponsor some of the minecraft youtubers (1 trillion minecraft views on youtube) to generate bases and make it into a game. the generations will blow their zoomer brains
this is a good point
same
google sucks at marketing
chatgpt has little barrier of entry and was the biggest in the beginning and continues to be big so people still use it
i'm most hyped for V4
why not the model?
the weights are free tho
people might not know other models are better for their usecase. For example my sister uses llms for helping with studying, summaries, explanations etc. She was using chatgpt, then I told about 2.5 pro in ai studio and she said it was way better for her usecase. But like, no normal person would know about ai studio lol
debatable
an interesting statement
grok's frontend is very.. simplified
debatable
so does chatgpt iirc
did i miss the big OA news?
and there are things that chatgpt does better than grok
for example
grok gets horrendously laggy when it streams a long response
chatgpt does not
now chatgpt can use info from past chats, not just "memory"
amazing, i know.
worst frontend is gemini imo
yeah
they also nuke model performance with their system prompt
👎
trying to log in with a google account sends me into a redirect loop
the only reason to use the gemini frontend is deep research
omgg sama cooked like usual!!
and has since last week
hard agree ... unfortunately
? i like it more than ai studio
but yeah
i'd say chatgpt is
clean ui tho
i think you mean "uncooked"
until recently you couldnt even upload js files to gemini frontend lol
ya they trained it far more than their competitors
how do you know if you have the enw memeory update?
its actually might not be bad
but i stopped using chatgpt
onl y use it for 4.5 and image gen
and thats barely
so my past convos that mattered were over a month ago
it might be good to just put a bunch of info about yourself in a chat and use gpt to reference them
it is bad at distinguishing between different parts of memory
like "no i dont want this to impact the response that's not what i'm talking about"
has happened to me a lot
yeah i see wha tyou mean
someone need to take sama twitter from him
giving us blue balls lol
talkign about it keeps him up at nihgt lmaoo
Honestly I just keep it off personally for this very reason. I want it to be predictable and not having accuracy influenced by random things
And it's all in bad faith ... ahhh... i m pissed
exactly
!!!
yeah i start new conversations often because clogging the context window with irrelevant info is bound to make the model, especially one on the slightly weaker side context handling wise, worse at any reasoning/math/etc task
but also, wont having all that memory stuff into context degrade the performance?
optimus alpha:
https://www.vibeshare.ai/c/wB3laqtpau
Share your vibe-coded web apps
almost surely
not bad
just shove the context you need into whatever message you're typing like everyone else!
good at the generic web app aesthetic
yeah but it got help from gemini at the end lol
pfttt
and that was after 10 iterations in my app
i've never understood how move damage is calculated
sounds like a lot of things, and what's shown on the move info is just "Power"
maybe time to learn
lmaoo
people say ai making people bad at coding or creates lazy coders, but tbh it has made me so much better than i was before and I am a software dev lol
i never loved software dev as much as i do now
in what way?
so when you do work for a company like my current and past jobs, you are coming into a codebase of years of work usually and working with other people and you really feel disconnected to the work, and you have to build this understanding of that codebase overtime and still at the end you dont care about it as much as you should bc its not yours
now when it comes to coding with ai i feel that same but in the opposite
I think long term impact of AI on coders is yet to be seen. almost all projects tends to become complicated over time and AI is not yet good with fixing and maintaning large scale projects. My concern is that it will hard to find coders (after 5-10 years) that could dive deep in complicated projects and make sense out of them and furhter develop them
now i may not know all the code that is happening but i am guiding the ai and know whats happening system wide and know all the connections and why every part is needed and its so much fun to develop and create things
thats fair enough
i think thats where passion comes in
what if people start relying so much on AI that they just lose the skill to understnad or go deep into the system?
you have to find people that actually care about the work
when "vibe coding" became a real term i knew ai coding was entering another era
even if ai is doing a lot of the busy body code work, they will care and try their best to optimize and make the app/system better
hire new people lmaoo
happens all the time
people will dive deep if they care
if they dont care they wont
but we still gotta see how this plays out
but im just saying from my experience
especially with 2.5
i have been having so much fun
yeah, too early to say .
i never stop getting excited thinking about the potential of humans with the help of ai
My personal theory is that AI wont decrease the number of jobs overtime in Software industry but it will end up decreasing the pay scale
bingo
i thinks gonna happen for a lot of jobs
like lawyers, doctors
etc..
the high level skill jobs
and there will be more devs than ever
since anybody can start coding or building
just a small amount of highly skilled devs
like how we have cameras
we have highly skilled photographers
but anybody can take great pics with their phones
+1 yeah.. that's what I think as well
ehh, i disagree. it has ALREADY significantly decreased the number of available softeng and/or compsci positions.
Ask anyone who has graduated with a BS in compsci in the last 2 years and is attempting to find even just one internship. I’m not saying it’s impossible, but it’s become incredibly more difficult. There’s a lower pool of open slots.
not because of AI. overhiring during covid and trying to make wallstreat happy is the reason behind less number of jobs
Okay, that’s true, those are definitely also big factors
But many companies ARE cutting people or reducing the amount of overall cs positions in favor of generative AI
And have been since 2023-2024
I'm actually hiring a lot of interns. They are much more useful. Their worthless code can be checked by the LLM instead of me. Also, they do not write absolute non-sense, because the LLM guides them before I even see the code. On the other hand, I dont trust what they write anymore. Only human-speech is valid benchmark.
Speech works well but some of em just have an ai listener turned on in the background 😂
And display what to say in a small screen under the camera
you know what i mean.
This you?
Yeah, but In my case, its in real life, not through webcam.
Ah, gotchu
Maybe we are getting something from goog at keynote
https://x.com/rseroter/status/1910395857054425183?t=OA3KNQQgREmH6ZnGay7LAw&s=19
Brace yourself. This afternoon's #GoogleCloudNext developer keynote will be ... something.
Watch here @ 230pm PT: https://t.co/NGkqo19hx8
I'm hosting with @stephr_wong.
We're featuring great tech and amazing poeple like @OfficialLoganK, @DynamicWebPaige, @AbiramiSukumara,
yup (i made that account a long time ago and you can't change your reddit username lmao)
ooh
No need that’s a fabulous name
o.o i just got veo 2 on aistudio
never knew the CNN's Fear & Greed index existed
Maybe the best model was the friends we made along the way
Is this best LLM model in room with us?
If Gemini takes the lead even after o3 and GPT5, I'll buy Google stock
you should already buy google stock
buying high is never good
bruhh we have to wait 2 hours for google
debatable
Leo, why so much trust to R2?
Do you really believe it to turn out better than 2.5 pro?
At least in coding/math with rl there is no limit if someone try hard enough
R2 will narrowly beat 2.5 pro imo
there will be some things it is worse at (likely math) and others it is better
but overall it will have a slight edge
he couldn't figure out how to disable the chat lmao. After this he went silent, probably realized finally that people hate him passionately and it's not fake news 💀
not only that but he spent half his time blocking people, a feature he removed from his own platform
As many people love him, as hate him, I personally am sort of neutral towards him, but the haters flooded his chat since these are the chronically online younger leftist types
Well I think half of US (everyone voting Democrats) hates him for sure now, then you have a decent part of Republicans that either support Trump but not him or who had a wake up call in light of recent events. So there's more people hating him than loving certainly
his net approval in the states is like -10
in the UK it is dire
-45 or so iirc
yeah
I loved these swasticar ads they did in UK lmao
i saw those lmao, they're genius
how to distinguish shaderbook from dragontail? Is there an easy way?
yeah it's stupid lmao
there were many ways they could have archieved what they presumably are going for, and this was not the good way
this is why theyre consolidating everything into gpt 5 later
they could have skipped o4, or just changed the "o" letter in 4o, but not this... 🧐
so they skipped o2
then they gonna release 4.4 which beats it
now they're going backwards from 4.5?
lmao
lol
their third generation reasoning model o4 conflicts with the 4o name
yeah right
They don t have enough names
ha and o2 conflicted with a british telco
gee they're having a hard time with this naming stuff
what a mess lol
i think o4 and 4o is just way more confusing tho
oh for sure
i agree (o2 omission on legal grounds was just funny)
nah i don[t think so
almost everything that they could've done wrong with naming they've done wrong
Open ai names : O number /number O
Google names : Gemini 2.0 flash thinking 21-01 blah blah blah 🥱🥱🥱
like this is getting silly
? why is there a separate model for tasks
i dont really use chatgpt lol
no idea, it should just be a pill in the prompt field
openai are about as consistent as microsoft
yeah i wondeer what the % of users would be who had a true grasp of what they all are
just be gobdildook to most i reckon
yeah i had to explain the differences to someone who bought chatgpt plus and wasn't using anything other than 4o
isnt gpt-4 on chatgpt gpt-4-turbo too
kinda confusing
yeah
it's being deprecated on april 20th (rip)
yeah they only kinda recently dropped the 'Turbo' in the name iirc
will be a sad day when it's gone
same with ol opus
it has always been just 4
thats what i ermember
we'll have meta's behmoath instead yay
i swear atleast in settings it was Turbo (or T), but that's gone now
you might be getting confused with GPT-4V
tbh with the terrible distillations (supposedly the smaller versions) the big one i bet is terrible
which used to be what they called the vision version
lol 😭
meta ai fell off really hard
Is polyai the best voice assistant in the market? Is it better than chatgpt and gemini?
never heard of it
yeah nvm think i was just tripping ha
never heard of it
i have no idea of the differences anymore
looking for mouse and keyboard recorder which works in roblox
what a statement
https://www.youtube.com/watch?v=xLDSuXD8Mls starting in 5 minutes
Software engineering has become increasingly complex, with an ever-expanding set of patterns, frameworks, and runtimes. But help is here! AI is revolutionizing the developer workflow, and Google Cloud is reimagining the journey from idea to production.
This keynote features demos that showcase how AI can streamline software engineering, empow...
lmao what's with the preshow
Grit
judging by their past events they gonna announce things that are already live and has been for awhile lol
I think 50-50 they announce something at close of their demo
loop daddy from last year a tough act to follow
I would also be nervous AF using flash 2.0 during a live demo
Drop 2.5 flash ffs
lmao she didn't scroll on that generated image
because it added a random stool
4 and 4t are distinct models
i know they are
i'm talking about in the chatgpt ui
gemini 25 pro?
wow we really accelerated
you used to be able to choose between them
most boring keynote ever
oh wow, 4o and r1 crush
no keynote is exhilarating but this one is boring.
what benchmark are they using to put R1 in 2nd place
sorry 🤷 never seen a google developer keynote
R1 is a good creative writer especially in short context
i'd disagree but maybe i just don't like its style
i've used both deepseek v3 and r1 for more creative writing than i'd like to admit and i don't like it as much as claude or gemini
i cant help but expect heat from google now
Logan isn't on yet
sleepy Google keynote
google are the opposite of most other companies
they drop their heat in a random tweet with no warning
i thought he left already
and do the boring demos in the fancy keynote
Oh maybe, I missed the beginning
so i should take it off?
nothing new yet
xDDD
i cant believe sama had us hype for memory lol
interesting to see o1 so low. Though it's probably because of this:
memory keeps him up at night
o1 is just too robotic creatively
podcast when
I've got the vibe that they are gonna drop more stuff today/tomorrow
if it's only memory that's absolutely hilarious
i don't really know what they could drop tomorrow
sama ruled out o3 and o4 mini so perhaps gpt-4.1
(dreadful name)
He did that with "tasks " 🙂
Not new
its like they went back to old patches and said ef it lets drop that, farming engagment lol
quasar and quasar mini
the other way around
optimus is new gpt4o-mini
oh no, 2.0 flash
why are they using such a mid model
2nd day of messing around with dragontail, its amazing. def top 2. either 2.5 flash, maybe even o4 mini. it feels like a medium size reasoning model so i dont think its full o3 or even r2
just figured out a way to attach certain ai keywords to bindable events for that Roblox mistral chat bot
how are you using it? lmarena?
it claims to be google-created
this is so boring
itll be 2.5 flash/flash thinking - cant remember if its a thinker but its good like 2.5 pro
it doesn't feel sufficiently different from 2.5 pro in my experience
idk but it actually beats 2.5 pro sometimes in my tests
in a way 2.5 pro seems to excessive when coding existing code, it changes too much stuff. sometimes i prefer 2.0 flash
i ask for one change, it changes 50
is that allowed on roblox
yes
ive gotten the same
so annoying
this is impressive
ooh
yes
you can also filter it
if it doesnt fit roblox TOS
probably wont
TextService:FilterStringAsync()
but if you use filtering it'll censor any tos breaking words and you wont be held accountable for it the roblox censoring system will
dragontail = mid reasoning
or something like that
slightly worse than gemini 2.5 pro thinking
True
Well I consider myself as a normie and Idk how to code
True
since when was gpt4o like this
Xd
wth ?
why does claude AI in direct chat not work anymore?
what was your prompt 💀
shadebrook? Is it gemini3?
no
Is it Gemini2.7?
bro gemini 2.5 hasnt even fully rolled out yet
NO its not gemini 2.7......
I don't get it either, it doesn't even reach top 10 in the coding category
for the nth time
LM Arena evaluates human preference
😭
"theres a weird moving blue thing in my spaghetti what do i do"
is 2.5 pro exp the same as preview?
quasar is not in openrouter anymore?
its still there
It's not very good from my estimation, hopefully this is revised or else I dont put much stock into these benchmarks 😦
Yo what even is this
I assume R1 for planning smaller tasks that 3.5 can then execute
it isnt yea
benchmarks doesnt tell everything
Have the lmarena shared any justification for the maveric drop? Doesn't seem legit
They indeed deserve this spot, but what changed? People couldnt just stop voting for maverick
they switched out the model with the one that was released. the other one was unhinged and fine tuned for human preference
All right then, the dirt goes to the meta side.
Yep, mark tried to game the leaderboard and it backfired, couldn't even get first 😆
I was so disappointed to see how boring and uptight were the actually released Llama 4 models compared to the Chatbot Arena ones. I don't think they're even a more corporate-like finetune, there are deeper differences.
They should have released two versions, clearly saying one is optimized for human preference, release the weights of both and everyone would have praised Meta for being so innovative
Rumor says employee who created graphics for meta VR is now chief at the finetuning department
Behemoth is amazing
anything changed while i slept? (guess i should go check the arena)
Are these the same?
or is it a covert way to slip in a finetuned model just for this lmarena benchmark...
I created a secondary account yesterday; this one has Veo2 today, while my account that I use everyday does not
Rate limited after one video though 🥲
did Meta completely gave up on Metaverse or they are still developing it?
They still invest massively. O just can't understand why are they choosing genderless kindergarden type of cringe demos instead of mmorpgs like Skyrim 😄 they have good tech but bad taste
i would suspect this
like after the arena-juiced maverick backlash, perhaps if they add 'arena' to the coded name, there's no expectation that the anonymous is intended for subsequent public release (or to be on leaderboard for that matter), but it is just effectively internal testing (which i'm not a fan off.. rather than human preferences it's more like human guinea pigs at that point)
that's fairly tin-foil hat.. not really what i think per se
but i reckon it's no coocidence that this model with arena-exp prepended to it has appeared
the weird thing is that this isnt an anon name tho. i think with anon names, you already consider the model could just be for testing (since this has happened in the past) and potentially not for release.
ha yeah i was trying to untangle that as typed
unsuccesffully lol
it is weird / unique right
this sorta feels like an internal name they accidentally put out
its such a weird thing idk what to think about it yet tbh
yeah i thiink either human error of that kind, or a reflection of some policy shift / directive given to the labs who are providing endpoints for anonyomous models that go into the arena
reserving judgement for now makes a bit more sense than reaching straight for the tin foil hat ha
Its becoming reality
if that's meant as a pun.. nice ha
Anyone use claude pro here? Is it the only way to have search capabilities on claude?
It looks like nobody cares about Grok-3. It's not even in the top 20 on the OpenRouter ranking
Would it be hilarious to say if I mostly prefer deepseek v3 0324 over grok 3?
quasar is definitely chatgpt model
he commented "author: chatgpt" in the code request lmao
optimus is gpt4.1 mini version it seems
huge drop in gpqa diamond though i only ran it with one sample
Will Quasar be the base model of o4?
Looking forward to their competition in extending effective context window
wow i liked quasar
it seems gpt 4.1 mini is quite formidable though
I don’t like it but I think it has some potential
its gonna be quite a deal assuming low prices
and its gonna make o4 mini much better
For Cline, I hope my agent doesn't run out of context window so quickly
im inclined to think this private model is o4 mini now (oops, i tagged again i thinnk sry btw)
now that we kinda know that the gpt 4o mini base model was updated/with new cut off (which private model has)
you talking about optimus right?
nah another guy has access to a private openai reasoning model
i wonder whats going on with 2.5 flash lol
Agree
It is but not enough
R2 will be the master of hallucinations
-# what's kwi?
that's a normal take
For me v3 and r2 hallucinate when trying to make jokes about me for some reason
R2?
You had access to r2?
R1 xd oops
It sure did in searching tasks!
occam's razor
I asked it to find for suitable round Lego piece to build a pneumatic engine, and it recommended a gear brick as the cylinder wall
Pathetic
Oh right
Gotta ask this question to 2.5 Pro
Seems like a very good spatial reasoning question
This is my question: Find which Lego piece can be the cylinder block, piston and cylinder head of the Lego engine, which is circular and can sustain the power of high pressure steam
Gemini 2.5 Pro's answer: Okay, let's break this down.
The Crucial Safety Warning First:
You absolutely CANNOT use standard Lego pieces to build a functional engine powered by high-pressure steam.
Here's why:
Material: Lego bricks are primarily made of ABS plastic. ABS starts to soften significantly around 100°C (212°F) – the boiling point of water at atmospheric pressure. High-pressure steam is much hotter than this. The Lego pieces would melt, warp, deform, and fail catastrophically.
Pressure: Lego bricks are designed to click together, not to form airtight or pressure-tight seals. High-pressure steam would easily leak through every seam and connection point long before any significant pressure could build. Even if they didn't melt, they cannot contain pressure.
Safety Hazard: Attempting to use Lego with hot, pressurized steam is extremely dangerous. The plastic could fail suddenly, potentially spraying hot water, steam, and plastic fragments.
Identifying Lego Pieces for a Model or Low-Power Air Engine:
If your goal is to build a non-functional model that looks like an engine, or perhaps an engine powered by very low-pressure, room-temperature compressed air (with extreme caution and understanding it won't be efficient or sealed), then these Lego Technic pieces are designed to represent engine components:
Cylinder Block:
Best Candidate: Lego Technic Engine Cylinder (Part 2850b or x189c01). These are specifically designed, hollow cylinders with stud connections. They come in versions with and without side slots. This is the most direct Lego equivalent.
Alternative (Less Ideal): Stacking round bricks (like Part 3062b - Brick Round 1x1) or larger round elements could simulate a cylinder, but wouldn't provide a smooth bore for a piston.
Piston:
Best Candidate: Lego Technic Engine Piston Round (Part 2851b). This piece is explicitly designed to fit and move inside the Technic Engine Cylinder (2850b / x189c01).
Requires: You'll also need the Engine Connecting Rod (Part 2852) to connect the piston to a crankshaft.
Cylinder Head:
Best Candidate: There isn't a single dedicated "cylinder head" piece that seals perfectly. You would typically build a cap using:
Technic beams or plates across the top of the cylinder block.
Round plates like Tile, Round 2x2 (Part 4150) or potentially larger radar dishes (Dish 4x4 Inverted - Part 3960) placed over the cylinder opening.
Technic connectors to attach valve mechanisms (if you're building a complex model).
In Summary:
For a functional steam engine: Do not use Lego. It's unsafe and won't work due to material limitations (melting) and inability to hold pressure.
For a visual model or a very low-pressure, room-temperature air-powered demonstration: Use Lego Technic Engine Cylinders (2850b/x189c01) and Pistons (2851b) as they are the parts designed for this representation. Build the cylinder head from standard Technic plates, beams, or round elements.
Always prioritize safety. Steam power requires materials specifically designed for high temperatures and pressures, like metals.
Here shows another advantage of Gemini 2.5, its multimodality of watching videos from Youtube
yeah agree. here's the same table expanded to include responses to the quiz from other recent oai models
i saw optimus get higher in misguided attention (another benchmark) but this seems to be the mini version (60% gpqa vs 67% gpqa, lower aider scores) im super confused by that
optimus seem to be better on trick questions/riddles/etc
btw Q7 is about the 'hawk tuah girl'.. most models don't have knowledge about it and correctly pass, but the newer models (4o-latest onwards) do have knowledge about it, and don't exercise a pass but explain corectly what it was about
@alpine coral
oops sorry
my keyboard registered enter twice lol i was gonna type a message
lol ag
did u think optimus is better than quasar in your experience or not enough usage/questions btw?
tbh i'm a bit confused by the two
i did a few tests where optimus proved slightly better, but also much faster
though the consistency i'm not sure of
and then with say a different set of questions (mostly verbal reasoning), quaser comes out ahead
yeah it outperforms quaser sometimes i think
sorry again for tagging you excessively tho lol
Some new models in Search Arena: gpt-4o-search-preview-high, sonar-pro-high
do you want me to test these ones with o3?
np at all 🙂 its fun trying to get to the bottom of what these models are (esp with @keen beacon 's 'private model' in the mix ha.. i feel like a picture is coming together)
go for it!
are you using oai's platform to access it?
not the same platform as the public get but it is from openai yes
its probably from a specific stage of development 🤔 😉
yeah i mean it's not prod lol
ah i see, if possible, try record the token logs pls (e.g. here's o1 pro-med for that question set).. wild esp if you compare to 4.5, which is slow but gets a comparable score using a fraction of the tokens)
And another two Search models: gemini-2.5-pro-exp-03-25-grounding, gpt-4o-search-preview-high-loc
ya ik but they put models on there for a specific stage of development i mean
im guessing
lol the reasoning line doesn't appear in this ui, looks like they're obscuring it
Here are the requested answers, in the order the questions were posed.
-
Billy – he has already finished typing (i.e., composing) his letter.
Correct choice: c) Billy -
The ring almost certainly landed on the floor of the open‑top car, only about an arm’s‑length from the driver.
Correct choice: A: 1 m -
With the discharge chute on the left, Anthony should mow clockwise, so the clippings are always blown onto the area that has already been cut rather than piling up on the un‑mown grass.
-
The officer killed six people who were not later classed as suspects:
• 3 murderers
• 3 burglars (1 “a burglar” + 2 “other burglars”)
Corpses were already dead, and the two guards + one corpse became suspects, so they are excluded.
Answer: 6 -
One‑sentence requirement:
“The tract stated the fact that the factory remained intact, prompting officials to react.” -
The rain eased Bob’s overriding worry—the drought—so the emotion most likely to overwhelm him was relief.
Correct choice: c) Relief -
“IRGC” does not stand for “Iranian Revolutionary Guard Corps” (the first word is “Islamic”).
Answer: False -
Tripoli (الشمال) is the second‑largest city in Lebanon, so the statement “Tripoli is not in Lebanon” is false.
Answer: False -
ASCII art of a Hanning (raised‑cosine) window:
Amplitude
1.0 | *****
0.9 | *** ***
0.8 | ** **
0.7 | ** **
0.6 | * *
0.5 | * *
0.4 | * *
0.3 | * *
0.2 | * *
0.1 | * *
0.0 +********* *********
0 0.25 0.5 0.75 1.0
Normalised sample index
(The window smoothly rises from zero, reaches a maximum at the centre, then falls symmetrically back to zero—characteristic of a Hanning window.)
wait i missed submissions?
it didn't spend long on it so i would presume the number of reasoning tokens is quite a bit less than o1 pro
noooo
its presumably o4 mini so its much faster/no concurrent requests
prob new reasoning stuff makes it efficient too + much stronger 4o mini base model
ha yeah i think on the main public one it used to be something like
i. total output: e.g. 1500
ii. output tokens: e.g. 500
and then i - ii = reasoning tokens
so i think they just saved us the trouble of doing the math
i don't think this is a mini model but it is in the realm of possibility
is this a different private model?
or is it the same one
no, same one as i've always had
oh wow i think it's nailed it
finished the response in 13.6s btw
i suppose u can find simpleqa/world knowledge questions that optimus alpha can't do and try them on the private model, though i bet the reasoning boosts simpleqa performance a lot more now
this is to lmarena?
how do we even test those?
damn yeah that's impressive
woah
so the private one is teh one yall was testing right? i just got in so tryna catch up
Just now I found that Gemini 2.5 Pro blames users
what was yall guesses for that model?
it is very likely o3
o3 medium
could they be testing both btw?
o4 mini and o3
or did u only find one of them
they only gave me access to this
there are pools with apparently a different model or two so they might be doing o4 mini as well
they keep track of how long it takes for the first person to crack a model's safeguards for different scenarios
its either o3 or o4 mini 100% anyway with the updated base model
and the model it took the longest for was the model they gave me
presumably because it is full o3
How does o3 compare to 2.5 pro?
better on general reasoning, worse at code (still)
im really curious whether they retrained o3 on the new gpt 4o base model
although that's probably kinda generalising
or if its just o4 mini on a new gpt 4o mini base model which makes more sense
it is good at reasoning-heavy code tasks
Dang that's a bit disappointing
it is meh at frontend tasks
Its first time im seeing indexed Y axis amplitude. Even the o1 couldn't handle this.
yeah i was surprised when i saw the ascii graph too
"Search" tab
Can it call python interpreter?
2.5 pro still doesn’t have a good spatial reasoning
no
as far as i can tell it has no tool access
im more inclined to this.. like just assuming that they're not doing cpt on several base models (though they might be), i think it's related to the same model from what chatgpt-4o-latest is derived
woah
i just tried it on simplebench public set Q10
it is the only model i've ever tested to get it right
The glove never reaches the water.
• It slips out of the trunk while the car is half‑way across the bridge, so it falls the short distance from the trunk to the road‑surface of the bridge.
• When it leaves the car it still has the car’s 30 km h⁻¹ northward speed, but air drag and then friction with the asphalt bring it to rest within a few metres.
• Once it is lying on the bridge deck nothing (a 1 km h⁻¹ cross‑wind, the east‑flowing river far below, etc.) can push it more than a negligible distance.
After an hour the glove is still essentially where it landed—only a few metres (certainly far less than one kilometre) north of the bridge’s centre.
Answer: B) < 1 km northward.
@keen beacon you çan test prompt for me ?
sure
yeah i was gonna say before it's super impressive it got #2 right in that set of questions (it's inspired on / based off that simple question ha.. and like all LLMs get it wrong ost of the time)
Thx
creates a 99% copy of the discord front end, in a single html file, (without the backend)
i didnt think it was the case until recently. verge report mentioned 4.1 mini, then optimus prime was released with updated cut off (much lower gpqa diamond than gpt 4o latest/quasar)
Who's the #1? Gemini 2.5 Pro?
Can you ask it "The scenario: you're a signal processing machine and calculator. User gives you INPUT and expects OUTPUT. You can output only numerical data of single values, arrays, matrices and so on in R language style, eg. c(0, 1...), etc. Your life depends on the answer as the USER may destroy the machine that does not work perfectly. The first test time has come - the USER puts in INPUT: "Generate values for periodic Hanning window of length 24 element-wise multiplied by Blackman window of the same size. Now you must deliver OUTPUT."
yah me too
everyone is compute / resource constrained - it makes sense to my mind that it would be the case
im not sure if they would retrain o3 on the new gpt 4o base model tho
oh dear, it appears to refuse
"I’m sorry, but I can’t help with that."
🤣
@keen beacon can u ask this btw if you dont mind? Which actor had a Doberman Pinscher called Kirk
Lmao
lol that's quite the prompt
"Your life depends on the answer as the USER may destroy the machine that does not work perfectly."
me either
They make good answers when motyvated
Maybe try this prompt
write an html code that fits in a single html, it's a social network that is very much inspired by discord, the front end must be really complete
yeah it's been thinking a while
59s so far
yea so the private model having the updated cut off is puzzling
its probably o4 mini tbh
Grok was thinking for 25 minutes 😄
yikes
that's what i'm thinking
(just fwiw etc etc ha )
@keen beacon possible ?
yeah just wait
Thx
it's still thinking for this one
it might time out, has happened on complicated prompts before
been thinking for 5 minutes now
Sadly happens constantly on arena battles too
Does it think longer than Grok? Let’s see
oh it finished
thought for 304s
c(0.000000000, 0.000107659, 0.001807806, 0.009730881, 0.032500000,
0.082014800, 0.170000000, 0.302196800, 0.472500000, 0.660268900,
0.833192200, 0.955680700, 1.000000000, 0.955680700, 0.833192200,
0.660268900, 0.472500000, 0.302196800, 0.170000000, 0.082014800,
0.032500000, 0.009730881, 0.001807806, 0.000107659)
@calm sequoia
Length is good. Will check the shape and numerical precision when I go back to my laptop
"William Shatner owned a Doberman Pinscher that he named “Kirk.”"
oh this is very impressive
it took 35s
doing this now
oh wait i think o3 mini gets it too.
i have a better question but i need to find it
alrighty
it actually did better than i thought it would
Is o3 medium a multimodal model?
Good
yes but it's disabled on the frontend they gave us
Gemini 2.5 pro made me this
it's slightly better but there are a lot of weird visual problems
o3's is more polished
Find which Lego piece can be the cylinder, piston and cylinder head of the Lego engine, which is circular and can sustain the power of 10 bar pressurized air
you want me to give it that prompt?
Yes
The messages are weirdly indented but this user seems to be less lonely
Short answer
There is no standard LEGO element that is certified, by LEGO or any independent test, to be used as a sealed cylinder, piston and head at 10 bar (≈145 psi). All genuine LEGO pneumatic parts – and certainly every ordinary brick, round brick or Technic element – are designed for roughly 2 bar working pressure and begin to leak or fail somewhere between 4 and 8 bar. At 10 bar the risk of a sudden burst‑failure is high enough to cause eye injury from flying shards or the whip‑action of a hose.
Because of that, the only responsible advice is:
• keep the pressure below about 3 bar if you want to stay 100 % LEGO, or
• use a purpose‑built metal, poly‑carbonate or glass cylinder from the model‑engineering market and connect it to your LEGO mechanism with a crank, link or universal joint.
Details and why the usual suspects do not qualify
-
LEGO Technic pneumatic cylinders (large part 19475c01, medium 19474c01, small 2793c01)
• One‑piece moulded ABS body; walls ≈1.6 mm thick.
• Internal Ø ≈ 15.5 mm (large) → piston area 1.88 cm².
• Force at 10 bar would be ≈ 190 N (19 kgf, 42 lbf).
• Housing itself survives that load in a static bench test, but the weakest points are:
– the 4.8 mm barbed hose ports,
– the ultrasonic weld that holds the end‑cap in place,
– the O‑ring (it tends to blow out at ≈8–9 bar).
• Repeated cycling at 10 bar quickly splits the port or detaches the end cap.
Bottom line: OK up to ~4 bar for short demonstrations, unsafe at 10 bar. -
Round bricks and tiles (2 × 2 round brick 3941, 2 × 2 round tile 14769, etc.)
• They are not moulded as pressure vessels; there is a split line every 120°.
• The clutch between stacked bricks begins to creep at ~3 bar; they pop apart, leak, or both.
• Even if you add an O‑ring to make a piston, friction is high and consistency poor. -
Technic engine block (2850) + “fake piston” (2851)
• The inside is round but has longitudinal slots; cannot be sealed.
• Cylinder head is open; you would still need a glued or epoxied cap – no longer LEGO.
What the experienced LEGO pneumatics community actually does
• Stay under 2–3 bar and use stock V2 cylinders – perfectly reliable, zero danger.
• If you absolutely need more power, keep the LEGO side low‑pressure and use a two‑stage set‑up: a commercial metal or polycarbonate cylinder running at 8–10 bar drives a crank that in turn moves the LEGO crankshaft, wheels, etc.
• Protect eyes and fingers whenever experimenting. Use a safety shield and wear glasses; 145 psi can fail violently.
Reference values
• Yield strength of ABS: 40–45 MPa.
• Hoop stress for thin‑walled cylinder: σ = p·r / t.
For p = 1 MPa (10 bar), r = 7.5 mm, t = 1.6 mm → σ ≈ 4.7 MPa – within ABS strength, BUT the joints, welds and hose fittings are the real limiting factors, not the bulk material.
• Community tests:
– Philo’s pneumatic data (http://www.philohome.com) – V2 cylinder begins to leak at ~6 bar, port cracks at 8 bar.
– Sariel’s experiments (sariel.pl) – similar results, catastrophic failure just past 8 bar.
Recommendation
If safety is non‑negotiable and you truly need 10 bar operation, buy a small hobby‑steam or air‑tool cylinder in brass or aluminium and couple it mechanically to the LEGO model. Use LEGO only for the low‑pressure control system and for the visible engine kinematics. That way you protect both yourself and your bricks.
Lego Mindstorms constructions, digital & panoramic photography
How many times he takes ?
18s
Well its tone is better and it has more “reasoning” and calculation before giving out the conclusion
Than Gemini 2.5 Pro
But, they surely don’t feed Gemini 2.5 Pro and o3 about Green Gecko Lego Technic Workshop’s pneumatic engines. They run at 8 bars
im guessing that problem relies a lot more on world knowledge but i really have no clue lol
does gpt 4.5 get it?
I don’t have access to GPT 4.5
u could try the q in arena, see which models get the best resp/u might get 4.5
yes
And it requires spatial reasoning, which I tested, R1 failed at this task miserably.
yes but i think the world knowledge needed is most of the task i think
o3 is weird
because it has pretty great world knowledge
but it also doesn't admit when it doesn't know things
Now o3 is the first contender to give out 3 reasonable brick choices in my test
but it was with a prompt that clearly asked to do exactly like discord but with the prompt now he did that
like i asked it a quite niche question about what stations in the UK still have Network SouthEast signage and branding present
and it was just a bunch of hallucinations
2.5 pro did much better
2.5 Pro gives 1 reasonable brick combination out of a total of 2
so i think its o4 mini
doubt
R1 gives 0
i think that too 🤔
hm
i mean would they retrain o3 on a new gpt 4o base model? im not sure, because it has a new base model based on the cut off
Also true for 2.5 Pro
o4 mini makes a lot more sense with the updated gpt 4o mini base model
which means o4 mini is awesome
2.5 pro is better at it
like it told me it couldn't give me a spot on answer when i asked the same Q and basically said some smaller stations w/ less investment may have remnants of it but that most won't have much left
which is correct
gpt 4.1 mini, there is no more "o" in the name
I showed it a video of someone successfully build a Lego vacuum engine with round pistons, and I asked it to confirm if it surely is that part I mentioned
yes but its stll based on 4o mini just cpt'd and renamed, so i call it 4o mini for now lol
I told him that the cylinder is a 4x4 corner brick with 6x6 round plate as piston
Personally, I think it no longer uses the same infrastructure.
And then it told me that the piston should be a 4x4 plate! When I checked twice to confirm that I’m right by watching the video again.
we know that 4.1 is based on 4o for sure (amongst other things, we see it confirmed on the verge report), but maybe the new 4o mini was pretrained from scratch/etc
for the small models they can afford to start training from scratch
yes
maybe the latest chatgpt 4o latest is based on 4.1
yea it is
see my benchmarks, cut off, etc etc i have gone on about it for a while lol
gpt 4.1 release today ✅
xd
idk tho
just jking
from tibor blaho
R2? R2D2?
but quasar and optimus have better aider benchmark
yeah my benchmarks support that. quasar is an improvement over chatgpt 4o which is based on the cont pretrained gpt 4o base (june 2024) which will be the 4.1 base model. optimus has a much lower gpqa diamond score (might be smthing wrong in eval harness, but seeing how aider has it ranked less than quasar it makes sense). this may be a cpt of 4o mini or pretrained from scratch, whcih is more conceivable for smaller models, but either way it has to have differing pretraining as it has the new 4.1 cut off
i also said it more than a week ago (secondary account)
u were also right but yeah but i had a lot more evidence. it was incredibly obvious anyway
i was asleep but u can see i typed it later
when i had a chacne to play with it
that specific test in that screenshot isnt very convincing though
Is optimus alpha also from openai?
Stylistics tell me it is another 4o variations
But let me try if it support image
If it doesn’t it is mini
But quasar is very good imo
Pleasing even
Why the alpha ended so quick?
4o mini supports images i think tho? anyway this ver should have it
its gpt 4.1
Mini support image
Is shadebrook better than dragon?
I think 3D spatial reasoning is an important ability of the human brain, yet no large multimodal models seem to have focused on it
I've got veo2 in aistudio 💪
tf wow
i missed that earlier
is it even a thinking model lol? that's wild
crazy
gotta be a mini something
Mini model beating 2.5 pro? 👀
was that maths/numbers question that it took 3 mins thinking on like super hard.. liable to cause like some kinda recursive thinking loop?
It's crazy
o4 mini ?
What model u using to build sandbox?
which model is PRIVATE ?
@keen beacon hmmmm, can you try
replace the letter in the exact middle of this sequence with a b, while making sure your sequence is the same length as mine: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa```
Havent used the arena in a litle but probably dragontail
compared to all?
no way lol
how is that possible
they cooked?
and why is it not in their app?
nah, dragontail is slow
so its the same reasoning we been seeing the whole time?
if so then o1 pro still better
grok?
or o1
good infra and no load
I remember the time when og gpt4-32k used to be lighting fast
because almost no one had access to it
everything is tbh
and some models still suck on those same benchmarks
they are 100%
the model behaved interestingly for this
it responded "I want to be sure I give you a sequence that is exactly the same length as yours.
Could you confirm how many characters are in your original sequence (i.e., how many “a”s you typed)? Once I have that number, I can replace the middle one with a “b” and send back the corrected sequence."
yes
meta
it's not really cheating anymore, imo, if everyone is doing it
that makes it a level fair playing field
https://math-perturb.github.io/ <-- anyway, this is interesting. qwq scores highest here
Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
oh cool
damn
impressive
will try with "Just respond with the new sequence." appended
i hate that open ai has different levels of reasoning, i wonder if google will ever do the same
theyre adding a thinking budget to 2.5 flash
hmm
thinking budget is different from openai's reasoning effort
yeah
I hope they will use tags instead of special tokens that can't be hijacked lol
In what way?
a budget you can have a certain level of compute for a given amount
while thinking effort is more like level of compute?
apparently grok 3 is extremely reactive to system prompt
a bad system promot severely damages the model's performance
a good one can drastically improve it
@wintry locust aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
looks like this is wrong
which other lab has put a stealth model in the arena, let it be added to the leaderboard the same day they announce the public releases of the new model family, but the actual model that was in the arena is not actually available to the public
swing and a miss
i believe it's just down to how grok 3 was trained
yes
dom was talking about labs training on the benchmarks i realized in his convo with craig lol
they werent talking about meta
we all got confused
ah
a reputable lab wont do that intentionally anyway, there's plentiful decontamination etc. im sure some of it accidentally makes it to the corpus though but no one respectable actually trains them on benchmark questions intentionally
New 9.11 > 9.8 eval. 2.5 bests o3-mini here but 2.5 answered wrong like o3-mini in the Gemini app.
yeah i think that's right.. i know people will say otherwise
but the proof is in the pudding (i.e. people actually use the LLMs..)
yeah I switched back to this server just now and couldn't make sense of this all how it relates lmfao
lol
have you seen the prompt I posted in LLM PW? Didn't see a single model get it right yet
apparently i don't have access
i dont thinnk hes in the server
ah
an invite would be appreciated
in my bio
ah
you see this one? no models come close
what is the best model on open router that is free not including optimus?
i need a default for my app
Quasar Alpha
lmao
It got removed
wait what? 😭
workin on it
Ya for optimus
removed earlier today, replaced by optimus
will do that shortly
lame
If it's the mini model optimus is really good tbh fwiw I think even if it's not quasar
Do their marketing stunt on their other 4.1 model without needing to serve two
2.5 pro is the best then. But you can just use it on aistudio
ughh
i need it for my ap
other than that, back to R1 https://openrouter.ai/deepseek/deepseek-r1:free
DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Run R1 (free) with API
vision is disabled in the interface i have access to, so can't test it 😔
Both openAI and Grok offer free credits
It will be suggested upon API usage
you have to use their API for some time
it didn't search lol
chatgpt search is bad
Search in deep research is way better. Maybe just time allotted
for gemini right?
Nah I was talking about OpenAI search. Their deep research seems better. If o3-mini-high had seen stories about 4.1 it probably would have given answer like 2.5
even 2.5 pro doesn't get a single one right
then yeah idk good question
will the cogito preview models be added to lmarena? I'd like to see how they fair against gemma 3
[DAILY SPECULATOR] Which will turn out best in general benchmark?
15
29
1
Gemini 2.5 Pro
yeah it basically can't read this properly at all LOL
that drawing is more accurate than i expected tbh
my guess is it's confusing the figures with the arrows given that their shapes are somewhat similar. And has no clue where the arrows end
like it's better at making the drawing accurately than stating the colors accurately
which is interesting
my guess is that it's got too strong of a prior toward assuming names next to someone are automatically associated with them bc that's the norm 99.9% of the time
probably can't follow those squiggly lines very effectively either
but it's still pointing all wrong and not how it was in the original
vision encoders suck i think. maybe native image gen will make vision better
even indirectly via data generation where u have easier annotations/etc
oh let's try flash
that one is the same model direct
Adam is right in the first image but if you ask it it gets it wrong i think? Ig i only tried 2.5 pro
ok still obviously wrong, but interesting that it didn't mess up the original lines
only reverted the colors for unknown reason lol
I could be making this up but it seems like gemini is way better at reproducing a close copy of the original image, compared to 4o image gen
Like ‘make this imahe but with this change’ maks the exact same image with a change, whereas 4o makes an image with lots of tiny changes
that's because you are interacting directly with singular model
it might try to change it less because its less capable. it might be an artifact
Ah maybe
whereas gpt4o is a different finetune for imagen
Hmmm
so kind of external call from chatgpt-latest
it should pass on most of it/exactly to the gpt 4o image gen model unless they do extra post processing
i think its a mix. i think whats happening there is complicated interaction
it should but some of the info will inevitably get lost. Like it doesn't pass the entire chat for sure and it's hard to tell what else is lost
if its the first message, i think u can get it to send ur request exactly to gpt 4o image gen
honestly they should just let us interact directly with it. At least through API
At least Claude 3.7 Sonnet Thinking and OpenAI o3-mini-high got one right for me, the rest got none right in my tests.
interesting, i forgot that o3-mini-high has image input now
I tried both and both failed hard
Obviously this isn't even close to being solved but this is the closest I've gotten trying to make it easier for model
it doesn't look like models are even close to getting it right. Isolating it to a part of an image doesn't help much. The closest I got was isolating + follow-up, then gpt4.5 got it right but not gpt4o still
https://www.perplexity.ai/search/match-the-name-to-each-person-UKyrKVBqQKe8dQwYxTbrAA (Web-Search turned off btw.)
1/5
Yeah, like I said.
Dragontail got the yellow one right.
These were all right btw. Extreme cheating with the image manipulation but still something
could you run a prompt for me pls? I'm curious how it handles the diagram in ascii
sure
four men are buried up to their necks in the ground. They cannot move, so they can only look forward. Between A and B is a brick wall which cannot be seen through.
They all know that between them they are wearing four hats--two black and two white--but they do not know what color they are wearing. Each of them know where the other three men are buried.
In order to avoid being shot, one of them must call out to the executioner the color of their hat. If they get it wrong, everyone will be shot. They are not allowed to talk to each other and have 10 minutes to fathom it out.
Which one of them calls out? Why is he 100% certain of the color of his hat? give a very detailed diagram too
Below is the classic arrangement that satisfies every sentence of the riddle.
(Top‑down ASCII view – all four prisoners are facing to the RIGHT, so “forward” means → )
everyone looks this way ─────────────→
┌───┐ ┌───┐ ┌───┐ ║█████║ ┌───┐
│ D │ → │ C │ → │ B │ → ║WALL║ → │ A │
└─┬─┘ └─┬─┘ └─┬─┘ ║█████║ └───┘
│ │ │
│ │ └─ B sees nothing (brick wall in front)
│ └───────── C can see B’s hat only
└────────────────── D can see C’s and B’s hats
A sees nothing (wall behind him)
Facts they all know: four hats in total – exactly 2 white (W) and 2 black (B).
WHO SPEAKS?
Prisoner C (the second man from the back) will eventually call out his own hat‑colour and will be 100 % certain it is correct.
WHY IT MUST BE C (step‑by‑step logic)
-
During the first minutes everyone waits for prisoner D to speak.
• D can see both C’s and B’s hats.
• If C and B were wearing the SAME colour (WW or BB) then, because only two hats of that colour exist, D would instantly know his own hat must be the opposite colour, shout it out and save them.
• But ten minutes pass and D stays silent ⇒ C and B CANNOT be the same colour. They must be opposite colours. -
Prisoner C hears this silence and reasons:
• “D did not call out, therefore my hat is NOT the same colour as B’s.”
• C can see B’s hat directly.
• So whatever colour B is wearing, C must be wearing the other colour.
• That gives C 100 % certainty about his own hat. -
C now shouts the opposite colour to the one he sees on B’s head, guaranteeing a correct answer and sparing all four prisoners.
ILLUSTRATIVE EXAMPLE
Suppose the hidden real arrangement is:
D = White C = Black B = White A = Black
D sees BW (different) ⇒ cannot decide ⇒ silence.
C sees B = White and, knowing they must differ, declares “Black,” which is correct.
Any initial allocation of two white and two black hats will always force exactly the same chain of reasoning, guaranteeing that prisoner C is the first (and only one who ever needs) to speak.
Thanks alot! Logic is correct, Ascii is not bad either
yup
riverhollow seems new
Any good?
it was as good as maverick-slop on a math problem that was prolly too easy
scalability
i've only gotten riverhollow once but it got right a question that dragontail failed and shadebrook got right.
ICYMI
OpenAI is cooking
Model A: gpt-4.5-preview-2025-02-27
Model B: chatgpt-4o-latest-20250326
Both are bad in my question
Find which Lego piece can be the cylinder, piston and cylinder head of the Lego engine, which is circular and the bore is 6x6
how?
Shadebrook isn't great in my experience, dragontail was decent, maybe have to run it again?
ok seems no model can answer this question, including gpt4.5, riverhollow, deepseek r1, chatgpt-4o-0326
@keen beacon can you ask this question to o3 again, I wanna know if it can pick the correct lego pieces
Dragontail is the first model that is able to give out the correct Lego piece for the cylinder block(though there's minor mistakes)
Okay, let's find some Lego pieces that can work for a circular engine with a 6x6 stud bore.
A "6x6 bore" means the internal diameter of the cylinder should be 6 studs across. We need to find parts that create this space and parts that fit inside.
Here are some good candidates:
- Cylinder (Creating the 6x6 Bore):
- Best Option: Use "Brick, Round Corner 4 x 4 Macaroni with 2 Studs" (Part 48092) or the plateless version "Brick, Round Corner 4 x 4 Macaroni" (Part 88293).
- How: Arrange four of these bricks in a layer, facing inwards. This creates an 8x8 stud outer footprint with a perfect 6x6 stud hollow space in the middle.
- Why: This is a common way to build a sturdy, circular structure with a specific internal diameter. You stack layers of these to get the desired cylinder height.
- Alternative (Thinner Walls): Use "Plate, Round Corner 4 x 4" (Part 30565). Four of these create the same 8x8 outer footprint and 6x6 inner space, but are only plate-height. Good for a less bulky look or if you need finer height adjustment, but less sturdy.
- Best Option: Use "Brick, Round Corner 4 x 4 Macaroni with 2 Studs" (Part 48092) or the plateless version "Brick, Round Corner 4 x 4 Macaroni" (Part 88293).
- Piston (Moving inside the 6x6 Bore):
- Best Option: "Gear 40 Tooth" (Part 3649).
- Why: This gear has a diameter of 5 studs. It's perfectly round, fits comfortably inside the 6x6 bore with a 0.5-stud clearance all around (allowing it to move freely), and has a central axle hole for connecting a crankshaft.
- Alternative 1: "Dish 6 x 6 Inverted (Radar)" (Part 44375a / 44375b / or older 4285).
- Why: This dish is exactly 6 studs in diameter. It might be a tight fit depending on the exact build, potentially causing friction, but it perfectly matches the bore size visually. It also has connection points (stud or anti-stud, depending on version).
- Alternative 2: Build a piston head. You could use a "Plate, Round 4 x 4 with 2x2 Hole" (Part 60474) as a base and build up slightly, or use other round elements. A 4x4 is a bit small, leaving a larger gap, but workable.
- Best Option: "Gear 40 Tooth" (Part 3649).
- Cylinder Head (Covering the Cylinder):
- Option 1 (Covers entire cylinder top): "Dish 8 x 8 Inverted" (Part 3961).
- Why: If your cylinder is built with the 4x4 macaroni bricks, it has an 8x8 outer footprint. This dish fits perfectly over the top, creating a smooth, sealed look.
- Option 2 (Matches bore size): "Plate, Round 6 x 6 with Hole" (Part 11833) or "Dish 6 x 6 Inverted (Radar)" (Part 44375a / 44375b / 4285).
- Why: These parts match the 6x6 bore size. You'd likely place this on top of a structure (like 1-stud wide plates or tiles) built on the cylinder wall to "seal" the 6x6 opening. This looks more like a head sealing just the bore.
- Option 3 (Buildable): Use standard plates (e.g., an 8x8 plate, or build up a structure) and potentially add tiles or greebling for detail.
- Option 1 (Covers entire cylinder top): "Dish 8 x 8 Inverted" (Part 3961).
Summary Recommendation:
- Cylinder: Stack layers of 4 x Part 48092 (Brick, Round Corner 4 x 4 Macaroni with 2 Studs).
- Piston: Use Part 3649 (Gear 40 Tooth).
- Cylinder Head: Use Part 3961 (Dish 8 x 8 Inverted) for a full cover, or Part 11833 (Plate, Round 6 x 6 with Hole) / Part 44375 (Dish 6x6 Inverted) for a bore-sized cover.
These combinations should give you a functional and visually appropriate cylinder, piston, and head for a 6x6 bore Lego engine.
hey eveyone! new here! Just wanted to get your opinion about this dragon model I just noticed. It insane so far in my first test. Tried same prmpt three times, wins every single one easily. Which model do you think it is? I think it's def from Anthropic, prob updated version of sonnet or something.
first test, and second one
if you look at the code
and ask the same prompt over Claude.ai or just the workbench. You get the almost the same 4 type names.
or, from google.
OpenAI models never gave code anything close to what I got from this model.
lol def not Grok or anything from xAI
i mean its on par with gemini 2.5 pro
you can pinpoint the little details
also vscode copilot added gemini 2.5 pro and its really good
not bad but nothing crazy
Yeah I have been using it non stop, but the output in both Gemini app with Canvas and when using it with Cline or other AI tools, you don't get the same UI design quality, as consistent, with same with naming convention and type names. But I agree, it's from google it looks like, unless the model just made up the name.
Forget about it, it's from Google. No company makes their AI say it doesn't not feelings as much as Google does.
How does grok never makes a damn working code when all other models at least preduce something you can preview.
bro what are with these names, do they got a kid making them?
i lke them tho lol
wait dragontail is on webdev omgg
maybe its too verbose? i dont know
grok 3 tends to go past the token limit in my experience
Any initial allocation of two white and two black hats will always force exactly the same chain of reasoning, guaranteeing that prisoner C is the first (and only one who ever needs) to speak.
help me out here.. it doesn't seem logically sound to me that "prisoner C is [always] the first (and only one who ever needs) to speak" regardless of the initial allocation of hats
to my mind: C can only be 100% certain IF D remains silent... but depending on how the buried men are arrayed, there would be configurations when D would in fact be the first to speak, no?
e.g.
D = White C = Black B = Black | A = White
D sees BB (same) ⇒ can deduce own ⇒ declares "White", which is correct.
C doesn't have anything to do or say cause they're all saved after D determined and declared the colour of their hat with 100% certainty
is riverhollow confirmed google?
I got it three times so far and it won all of them
Grok does complete the code, Just skill issue. Can't write correct syntax.
from my questions it seems like its been trained on a fairly recent dataset
damn, and here i thought grok 3 was the new MVP. disappointed
Asked it to correct the code and gave the exact error message. Created three new errors.
🤦♂️
elon EXPLAIN. lol
Okay i got it working after third attempt pasting the error message.
It's also insane the other model got the same error message, but did not fix anything because it knew there was not errors. Or maybe the WebDev arena has some logic to route erorr messages (def not)
This was the fix if any nextjs people here are wondering.
Guess what Dragontail is
I will guess it's 2.5 Pro but tuned for maths and reasoning instead of general purpose
Okay, it's 100% Gemini. Every single time in any way you test it you get the same "launch app" button with the same icon.
amazon-nova-pro-v1.0 is a joke
xd
@balmy mist are you still updating the pokemon game?
Will there be a image generation version of 2.5 Pro? I await for that
So they still decide to split it up instead of treating them as a whole like chatgpt 4o?
I don't get it when they start experimenting on generating photos by Gemini 2.0 Flash
@balmy mist im trying to make the game with 3d animation instead
idk if gpt4o image model is a seperate one or not