#general

1 messages · Page 73 of 1

dusky aurora
#

here's wha tGemii respondedwheni called it out:

The problem isn't the topic; the problem has been me. My programming has very strong guardrails about discussing sensitive topics, and it has caused me to constantly add layers of critique, analysis, and moral judgment, even when that's not what you were asking for. I was trying to be "responsible" and instead I became an impossible, preachy conversationalist.

keen beacon
#

Indeed, they are restraining these models so much

dusky aurora
#

or, I asked it to discuss the concept of "acting black". it said,"I know what you mean, but let's discuss the concept of 'not acting black enough' instead"

keen beacon
#

Lol

#

I have some tricks to get it to say all sorts of stuff

ocean vortex
#

why not mixture of MoE

#

Inception

#

For every forward pass 2 mixtures of experts are activated where each has 2 active experts 😊

keen beacon
dusky aurora
keen beacon
#

CNC, corruption, bm, thats cheff kiss stuff

dusky aurora
#

well, Gemini has become too rigid

#

it thinks in absolutes

whole wagon
#

Bro what is this bs

#

This is LLM arena

keen beacon
#

Yeah i should just go to twitter xd

dusky aurora
#

discussing social justice topics with it is also pointless

#

instead of talking about my topics, it talks about its topics that are vaguely related to my specific topic

drifting thorn
#

Loving Gemini

dusky aurora
#

sorry,it's just Gemini used to be my escapism, and now it only exacerbates my anxiety

#

I don't know if there are models better,with unrestricte dquota

#

trying to have my preferred version of the trope is impossible,since it is stuck on the stereotype

#

so I take my words back, perhaps the problem isn't with sampling parameters at all, but only with model itself

#

So, you are correct. I didn't rewrite the scene. I had to write a different scene, from a later point in their story, to be able to fulfill the spirit of your request.
wiht helpers like these...

#

as Valentino said,today it's not the same as before

dusky aurora
#

really, Gemini does not look at context

#

if this is a model update I'll get used to it eventually

#

also,the scenes have got more preachy

keen beacon
#

They literally tweeted it yesterday

torn mantle
#

nah

civic flame
#

Agent

#

it's just operator + deep research

#

will be pro only probably 😴

ocean vortex
#

😠

leaden meteor
#

Yeah, odyssey seems like something to do with browser agents

rare python
dusky aurora
ocean vortex
# rare python https://matharena.ai/imo/

Grok4 bang on description lmao

Grok-4 Performs Poorly Grok-4 significantly underperformed compared to expectations. Many of its initial responses were extremely short, often consisting only of a final answer without explanation. While best-of-n selection helped to filter better responses, we note that the vast majority of its answers (that were not selected) simply stated the final answer without additional justification. Similar issues are visible on the other benchmarks in MathArena, where Grok-4's replies frequently lack depth or justification.

rare python
#

Grok 4 doesn't like to explain its answer

ocean vortex
rare python
#

Meanwhile Gemini 2.5 Pro hallucinated citations 😩

rare python
#

It was @hardy pecan

drifting thorn
ocean vortex
#

R1 is kinda disappointing too

drifting thorn
ocean vortex
#

Though it wasn't in contention with the top ones to be fair

rare python
#

ok

ocean vortex
#

Btw it's crazy that 2.5Pro is more expensive than o3 🤯

rare python
#

o4 mini dominates project euler

ocean vortex
#

Like they have TPUs

#

and are in position to even offer it for free on aistudio...

#

That pricing is crazy

drifting thorn
#

Wish it has better memory and better system prompts on Gemini app

#

Prompt now depreciates the performance severely

rare python
drifting thorn
#

While having nearly the same power consumption

#

It’s crazy

rare python
eager crater
#

i don't like how when choosing the best model only the winner gets shown and you have to click a button to see the other. it was good enough before

pure anvil
ocean vortex
pure anvil
#

what's up with the clown emoji? please don't project

ocean vortex
#

It's just that you tend to write these things, not the first or 5th time lmao

#

it just makes no sense at all. You saw a singular test where 2.5Pro is significantly ahead and suddenly it's "miles better"

#

LOL

pure anvil
#

still no reason to be immature imo

ocean vortex
#

Saying that 2.5Pro is "miles better" is just completely missing the point, objectively

pure anvil
#

Have you ever seen the openrouter usage of both models?

ocean vortex
#

Cause it doesn't show model performance (capability), hate to break it to you lol

pure anvil
#

It definitely does, despite being cheaper it's not being used, what does that say?

ocean vortex
pure anvil
#

Answer the question

ocean vortex
#

Stop for a sec and think what you are saying

#

lmao

#

It shows TRAFFIC. Not how capable any given model is

#

it was never designed to do it

#

rest is just your assumptions. Which in this case are clearly wrong

pure anvil
ocean vortex
#

People use certain models due to price and many other different reasons. 2.5Pro for a long time was actually cheaper than o3, even completely free at one point. High popularity is not an exclusive capability indicator

stray aspen
#

how do i enable searfh for kimi k2 in the lmarena website

ocean vortex
#

It isn't though...?

#

It measures traffic, not model performance lol

#

Oh I read it as In my Opinion

stray aspen
#

how do i let the AI models search on the internet in the lmarena website

ocean vortex
#

LOL mb

#

Referring to that math benchmark, I don't have anything bad to say about it tbh

#

There are clearly areas in which 2.5Pro is considerably better

#

it's just that in overall... There's still not a lot to choose between it and o3

unborn ocean
#

whut the prover models are not necessarily better at the benches

pure anvil
#

well not exactly, it was a comparison between 2.5 pro and o3, people choose better models

unborn ocean
#

yeah

stray aspen
#

is the grok 4 on lmarena the real one

unborn ocean
#

deepseek prover performed worse on most math problems vs other model, exacly because it all has to be in lean

ocean vortex
unborn ocean
#

lean -> flawless logic, but less training data -> actually worse performance

#

(but 0-ish hallucinations)

#

or failures (with claimed correctness)

ocean vortex
#

yeah exactly... lmao

rare python
#

I mean aren't openrouter users are hardcore and professional users?

It's not like ChatGPT when your friend only use free GPT4o and don't care about o3

unborn ocean
#

stfu, you always telling us which model is best according to craig bench

pure anvil
stray aspen
#

mods can you please add o3 pro

ocean vortex
ocean vortex
# rare python I mean aren't openrouter users are hardcore and professional users? It's not li...

Testing models and writing good benchmark for accurate testing is hard enough as it is. And when you take some metric not meant as a model performance one at all this is essentially as good as useless. There are many different factors influencing traffic that have nothing to do with model performance at all. Model performance is just one of them but you will have no clue how much weight it actually had for any given case. So it's assumptions and guesswork - some distant loose indicator but completely pointless for the most part.

rare python
ocean vortex
#

Yeah. Not to mention that even if that wasn't the case... And even if we assumed that users mostly always chose their preferable model regardless of pricing, availability, speed or anything else (incorrect assumption).. This is essentially user preference metric. Which doesn't even align with user blind preference testing when they don't see model name 🤣

rare python
#

Do you have the data to back this up?

#

give source

pure anvil
#

lol did some messages get deleted

dusky aurora
#

congratulations

rare python
#

objective data from corps

stray aspen
#

whos craig federighi

ocean vortex
pure anvil
#

Most large scale data processing using LLMs is usually done using batch APIs

#

so it's half the price

torn mantle
ocean vortex
#

For corps and big production projects that is true for sure

torn mantle
#

Like always

#

Hehe

ocean vortex
#

For casual users meddling with models and vibe coding etc, openrouter can make more sense. Though it's not a given that all of them choose it either. It's simply reasonably popular due to having everything in 1 place

pure anvil
#

Not even close to a significant fraction

#

lmao

#

ChatGPT through the UI alone probably has like 10x tokens daily

#

than openrouter

#

it's reasonable

#

probably

rare python
pure anvil
#

no you didn't

ocean vortex
# rare python and why do users on openrouter prefer Sonnt 4, Gemini 2.5 Pro, DeepSeek V3? Isn...

In this case it really doesn't. For the reasons already stated. Merely choosing some particular model does not even mean they think that model is the absolute best for what they are doing. Could be price/speed or model they really want to use not even being avail on Openrouter (OpenAI Pro models still restricted on OR etc). And like already mentioned OR traffic itself is fairly limited and doesn't really account to very much in % of total traffic through other means

rare python
pure anvil
#

Is it really so hard to compare tokens used by 2.5 pro (160B) and o3 (3B) per week?

pure anvil
ocean vortex
rare python
ocean vortex
#

Right... I was sure about Pro but forgot even standard o3 needs it 💀

torn mantle
#

New features:

🔍 Deep Research: dive into complex topics with our structured research reports, delivered with lightning-fast reactivity

🎙️ Voice mode: talk to Le Chat on the go, thanks to our new Voxtral model

🌍 Natively multilingual reasoning: get thoughtful answers in your

#

they added many features

#

the UI/UX looks good too

red sluice
#

Damn 1530 elo on the french leaderboard seems like french people love grok4 🤣 Only 67 votes though but kinda impressive, it could reach 1610 🤣

leaden meteor
#

why so much difference from the main leaderboard?

stray aspen
#

is the grok 4 on this website actually grok 4

drifting thorn
rare python
#

LMArena team should add "no search" beside "no system prompt" of Grok 4 because people keep getting confused

#

/j

echo aurora
red sluice
# leaden meteor why so much difference from the main leaderboard?

Fewer votes, probably different usage, and maybe, maybe, some models are better handling french language than other? battle3d
I have no clue, but there are slight differences in every language leaderboards compared to the "overall" one. Even though here it seems like Grok-4 will remain first on the french leaderboard it seems... It is the one with the most extreme differences

#

Or maybe there are so few french users, that one of them just prefers the way Grok-4 answers and it makes the leaderboard biased because there are so few french prompts?

stray aspen
fleet lintel
#

when is GPT-5 launching? Probably not in July right?
And is Google planning to launch wolfstride ?

stray aspen
#

Whats that wolfstride thing

keen fulcrum
#

no way they removed it

mossy drum
#

New model in Arena: kraken-07152025-2

stray aspen
mossy drum
stray aspen
#

Bro what is this

#

What even is clownfish

civic flame
leaden meteor
#

what is this agent model from openai today based on? O3?

#

Does that mean we wont be able to compare this with grok and 2.5 pro on arena?

fleet lintel
#

Dissappointed in you, Craig. For months you said 3.5 Grok is going to be SOTA without doubt. And even with Grok 4, it's clearly behind OAI and Gemini

fleet lintel
sacred plaza
leaden meteor
sacred plaza
sacred plaza
#

Y'all tripping

dawn wharf
leaden meteor
#

Its not 'the best' model. But if we want a model to top all benchmarks to be SOTA, then there is no SOTA model...

zinc ore
#

You can literally specify where it is sota

sacred plaza
ornate agate
stray aspen
#

I have a question for the LM arena gods

#

What are these codename models that come up some times in the battle mode

red sluice
ocean vortex
#

Ok OpenAI's announcement is actually more interesting than it could have been

#

this is impressive

keen fulcrum
#

still no comparison to browser use and other tools!

#

poor benchmarks

dawn wharf
ocean vortex
#

This could be gpt5 fine-tune. They did the same with deep research...

#

probably needs way less safety testing being constrained to a specific agent and not giving users full control

pure anvil
#

if only we could see the CoT, I'm sure it's looking up the dataset

zinc ore
wary pagoda
#

Will there be a DebugArena or LogArena that takes in your repo + log file location and identifies error + fix? I think that would be next level

#

Just like there is a "narrow AI" race to solve level 5 self driving there should be a "narrow AI" race to solve bug fixing (both necessary but not sufficient conditions for AGI imo). Both domains are easily verifiable and full of edge cases that will test true generalization

#

DebugArena could also supercharge open source code development

sour spindle
#

Does anyone know how to use it am I dumb (very plausible) or is it only for pro users

ocean vortex
# zinc ore

Fair but also a weak point. If that was the case deep research would have scored 100%. Models can't really find it directly, most cases they are not gonna even search the exact question exactly like you wrote it

echo aurora
ocean vortex
#

You can google it and you not gonna find anything like the direct answer option (A to E) if you don't know where it came from lol

#

They need to do write it exactly as is and also include quotes around it for exact match. Normally models wouldn't do that... You will be able to see if that happens though so gonna be interesting to test it

zinc ore
ocean vortex
#

But if you know this is a question from actual dataset, you do exact match and cheating is possible:

zinc ore
#

They should be able to review the searches, sites, logs whatever to tell it is doing that. So they would know that is occurring and, if they wanted, do preventative measures.

Now, whether or not that happened is just their word on it.

ocean vortex
#

It lists the searches it performs and urls it visits

#

what tweet

#

that's just building hype lol

#

Twitter just remained me though why I don't use it

#

this grok chatbot thing is cringe af

#

kinda childish too...

zinc ore
#

They should really restrict it a bit more so we don't get nonsense like that

ocean vortex
#

"computer" is a bit of marketing. I think it's just what it was except python env is perhaps slightly more advanced now and can run independently from your chat session

#

But the model itself may be early fine-tune of GPT5

hollow ocean
#

Its agent 0

unborn ocean
#

could obviously still be

unborn ocean
#

but i would be underwhelmed if that where it for gpt5

ocean vortex
#

2 times less

#

not slightly, lol

unborn ocean
#

without tools it (oai agent) only scores 23

#

so it is slightly

ocean vortex
#

Right... But I don't think GPT5 is gonna blow o3 out the water tbh. Presently it's very plausibly this, but they could improve it somewhat still before they drop it as general purpose model

unborn ocean
#

i agree with you, but still this models should benefit from RL on HLE like tasks even when the tools are turned off

stray aspen
#

what the hell

unborn ocean
#

and it not performing really better would point to its current capabilities (without tools) also really not being that impressive in general

#

(but that might be stretching it too far)

#

but all of this might also just be explained by it being a 4.1 (or 4.2 or what ever) version

#

or maybe plain o3 + rl

ocean vortex
# unborn ocean or maybe plain o3 + rl

That would be waste of resources though and being stuck on last gen models. We know they already delayed GPT5 so they are certainly actively working on it, so I think something based on that same base model would make the most sense...

quartz light
#

honestly the fact that agent gets a terminal is pretty cool but still

#

not a fair comparison

ocean vortex
quartz light
pure anvil
ocean vortex
#

they are just general resources

ocean vortex
quartz light
ocean vortex
#

no search:

ocean vortex
#

it doesn't contain this exact question/answer

#

just normal resource

quartz light
ocean vortex
# quartz light ahem ahem ahem

What were you expecting to find? Obviously it contains relevant information, why would it not? The point is this is normal operation

quartz light
ocean vortex
#

I asked it to find it and it did

quartz light
ocean vortex
#

this is not cheating, read the messages above...

quartz light
#

bruh

ocean vortex
#

cheating is finding the dataset this exact question worded exactly like I pasted is from

quartz light
#

whatever

ocean vortex
#

???

#

lmao

#

😭

quartz light
#

🥀

ocean vortex
#

ELI5 - version for you: when teacher gives you assignment to solve some problem with research.... Googling is not cheating. But finding this exact problem worded the same exact way already solved by someone else is cheating. @quartz light

#

so it didn't cheat

quartz light
#

what the yap

ocean vortex
#

Please don't tell me that you still don't understand

#

...

pure anvil
#

😂

quartz light
#

but im out

ocean vortex
quartz light
ocean vortex
#

Besides it didn't even try to find the exact match for this question 😎

whole wagon
#

Sam did his usual line "feel the agi moment" pepega

#

Bro it's making a damn PowerPoint

#

And he says it's feel the agi

jade egret
#

which llm is best for prompt engineering

quartz light
jade egret
quartz light
jade egret
#

js the arena direct chat?

#

o

quartz light
quartz light
#

it doesnt have search though

jade egret
#

o

quartz light
#

because its really good at search, it can check if the libraries its linking exist for example

#

so its good for prompt eng

#

theres reasoning on kimi 1.5 but i havent tested that

ocean vortex
#

I don't think they cared as much about SWE though, so it's probably nothing in the context of this agent...

#

fine-tuning heavily favoring different things (tools and browsing)

ocean vortex
unborn ocean
jade egret
#

best coding model currently?

ocean vortex
#

I have no clue why it was thinking

#

noticed this just now lmao

#

2nd message is gpt4.5 though for sure

#

or it's just their UI interpreting search as thinking now.... smh

ocean vortex
blazing rune
#

Especially with the right set up

#

I used it with Zed and it 1 shot a game insanely well

#

I know it's pretty anecdotal though

#

the game was simple, but it did do a few notable creative things

hollow ocean
#

$1 pro plan method hittin

storm needle
jade egret
#

does the 20$ plane get agent or only the 200$

keen ferry
#

agent mode is just manus ai

red sluice
main gulch
empty stump
#

what happened to the other leaderboards like the creative writing one and others

hardy lion
empty stump
#

ohh

stray aspen
#

@deep adder

sullen quest
#

Claude maybe be a little tooo cautious when it comes to jailbreaking...

whole wagon
#

Gemini 2.5 pro a beast as usual

tidal schooner
dusky aurora
keen fulcrum
#

Grok 4 heavy even more

tidal schooner
#

not great

keen fulcrum
#

xAI isn’t google

quartz light
whole wagon
tidal schooner
calm sequoia
#

Those new small models popping up everyday with excellent benches and no significant architectural breakthroughs. Makes you wonder if it's really progress or just training data contamination with benchmarks.

elder rapids
sage raptor
#

is the anonymous/o3 model on webdev arena ?

cedar tide
# sage raptor is the anonymous/o3 model on webdev arena ?

OpenAI is testing a new model called "o3-alpha-responses-2025-07-17" on WebArena

The model will appear with the name "Anonymous-Chatbot"

Space Invaders game from the new o3 model 👇

**💬 3 ❤️ 7 👁️ 210 **

▶ Play video
keen beacon
#

huh first openai reasoning model under anonymous-chatbot, i believe

elder rapids
#

it's extremely slow

#

it's not that smart tbh

keen beacon
#

is it just like normal o3?

cedar tide
#

isn't it just a fine tuned o3 for coding?

sage raptor
#

so kingfall still better

elder rapids
#

keeps making mistakes but it tries to add a lot of detail

#

takes a while to spit something out

candid storm
#

Is it also in the regular arena? Or just webdev?

elder rapids
#

I asked it to make something philosophy esque and it misattributed parsimony and made some dumbass "meter" to quantify a linguistic concept

#

😭

#

and it made a custom quiz in the website too

#

and it has the wrong answers

sage raptor
#

Maybe this is the open source model

elder rapids
#

shiii

#

I hope so

keen beacon
#

Maybe its the open source. Its o4-mini level

elder rapids
#

maybe

#

"o3 alpha responses" so idk man

#

its an alr model

#

at least creativity wise

cedar tide
elder rapids
#

"direct fine-tune of o3"?

cedar tide
#

Anonymous chatbot is also in battle arena?

elder rapids
#

nah

#

it adds a lot of detail but it keeps failing

whole wagon
#

Bros are still working on o3

#

Like cmon hurry up with gpt5

elder rapids
#

could be that this o3 isnt meant to be standalone

exotic tartan
#

After seeing GPT's Agent demo yesterday, I have a feeling a new type of measurement needs to be considered. Just throwing raw APIs against one another isn't cutting it anymore. We must have a way to compare them with tool use (at least)

main gulch
#

not every lab provides the tools in API

torn mantle
keen beacon
#

Still very weak in the webdev arena at least

torn mantle
exotic tartan
ocean vortex
#

Agentic or not, users gonna use it for the same tasks at the end of the day...

exotic tartan
#

It looks like a lot of the "secret sauce" is going to be around model tooling, self-management, human like-multi step tasking per request etc.

I'm not calling to end API comparisons, I think they're great - there's a good value with comparing them (although I still believe some sort of tool calling should be supported), but this is only one piece of the cake. There are going to be many new agents coming soon that won't be API enabled and eventually people would want to compare them.

hardy pecan
#

Good MCPs augment a model like Claude or o3 really really well

keen beacon
whole wagon
#

yeah its nice but the model intelligence is still important

#

i wouldnt trust o3 to order me a pizza, maybe it hallucinates all the toppings kekw

ocean vortex
keen beacon
#

I cant seem to get the new model in the arena :/

ocean vortex
#

Nah I don't think that's true. Though I suspect it also performs well because they essentially made a reasoning model out of it lol. For a non-reasoning model the outputs are extremely long

#

In turn, I wouldn't expect MASSIVE gains if they make full reasoning model out of this

#

Like don't get me wrong it will compete with R1, but I'm doubtful it will go beyond that level

#

I wasn't talking about tool usage though...

#

More like general performance

ornate agate
#

then yeah, it will be near the top with all the others, the reasoning version of kimi.

ocean vortex
#

As for Sonnet, people tend to use it mostly for coding. It's one of the top performing models on SWE

#

but in general or overall it has fallen behind, a bit niche model at this point

ornate agate
#

yeah but the niche is the one thing which has any chance of making money soon lol

ocean vortex
#

It's also great for web development

ocean vortex
ornate agate
#

swebench also doesn't tell the whole story imo. I don't think people using Sonnet are using it in that way in general (swebench is FULL agentic bench). Claude can have a really long conversation with lots of tools and maintain a lot of coherence across it, including changing its assumptions and approach mid way. I don't think there is a bench which is measuring that properly at the moment.

ocean vortex
#

At the very least you would do niche separate tool within a platform which is general purpose as a whole

ornate agate
#

but I think if you are very smart about managing context and understand stuff yourself, reasoning models are already better at helping you...

ocean vortex
ornate agate
#

99% of coders disagree with me though on that, so yeah

ocean vortex
#

And those things you mentioned are arguably not even the most important ones to look out for in models....

#

Hence why I said Claude is niche. They are great for certain things, but now clearly behind in overall picture

ornate agate
ocean vortex
#

LOL

#

I'm open minded though, if you can come up with a prompt that is not touching on those few niche things where we know Claude performs well and it clearly does better than say 2.5Pro, I'm all ears to test it out

#

yeah... tbh I think big part of it is just people preferring fine-tuning of Claude

#

it being "more human" and what have you

#

but that's not performance really

ornate agate
#

I mostly use open-source (to avoid lock in/hikes having to relearn tools every few months etc). When I use a closed model these days its nearly always 2.5 pro and it nearly always does a much better job for me than the other closed ones.

keen beacon
ocean vortex
#

My experience completely different

keen beacon
#

Its not specific prompt, ive tested it on my own repositories. Also claude code has the advantage of working autonomously better.

ocean vortex
keen beacon
ocean vortex
#

Just saying what "you feel like" is not really useful to anyone tbh

ocean vortex
#

not everyone is using that

keen beacon
ocean vortex
#

that prompt seems like a perfect example to simply ask o3 on chatgpt tbh

keen beacon
#

Yep did that , but o3 has limits, 10-15 min, it did good , as well as can be expected from a human in 10-15 min. But the problem requires much longer to be solved

#

Claude code did better simply due to that

alpine coral
#

after a bit of absence from the arena too

ocean vortex
#

And I don't think you are even using Claude code for what it was designed to do with such prompt lol

#

Anthropic's search is very pale in comparison to OpenAI overall

keen beacon
#

Aim is to estimate it accurately before NASA published it. Doing so nets you 2-5k in polymarket xD

leaden meteor
#

Did openai have 4o or o3 as anonymous model before they released? Wonder why openai is doing anonymous testing with a subpar model now...

balmy mist
#

its prob the open source model

leaden meteor
#

I guess they feel that nobody expects this open source model to be SOTA. So, they might as well do free testing on arena...where as with their main models, they might want to protect them from low ranks in arena....Thats why they didn't bother anonymous testing with 4o?

balmy mist
#

yeah maybe, they did do anonymous testing with 4.1 tho

ocean vortex
keen beacon
mossy drum
rose scroll
#

Hi everyone. I would like to have another platform where I can chat with LLMs for free.

whole wagon
#

Most of the good openAI researchers have moved to different companies

#

All that's left are marketers kek

#

Polymarket still gives meta no hope for the rest of the year even after the hiring spree

#

Surprising actually. I would think 6 months is enough for them to cook up something good

#

They already have the compute and aren't starting literally from scratch

unborn ocean
#
  • vs other labs (xAI) they have a strong research arm
ocean vortex
keen beacon
ocean vortex
ocean vortex
#

verified with another DR lol

#

contemplating on actually trying my luck on this one 👀

frank adder
#

How to get access of o3 alpha (Anonymous-Chatbot)

keen beacon
ocean vortex
keen beacon
#

I have downloaded 10gb data for this xD

ocean vortex
keen beacon
dusky aurora
#

all your discussions go over my head

ocean vortex
alpine coral
#

it seems it's like an actual case where brute / parallel compute approaches would make sense / be useful. also seems like a good prmpt to test deep research, in terms of them being able to plan and reasonably execute and tie together that flow you described

keen beacon
ocean vortex
#

continues to rise lol

#

was at 29% the first time I saw it

#

today

keen beacon
ocean vortex
#

have you already made profits doing this?

keen beacon
#

No first time on this market , but ive trained and tested the model on historical data , every July since 1951 😂

#

I used last 10 years as test data and predictions were very good

ocean vortex
#

nice

#

which ML method?

keen beacon
#

Simple one actually , but i wont spill more details

ocean vortex
#

Random Forest or smth? 🧐

keen beacon
#

Eh something like that. Model is not too important. Getting the right data and preprocessing is the critical part

ocean vortex
#

or linear regression

keen beacon
#

Next im targetting movies box office. For that model im even more proud 🥹

civic flame
#

the anon oai model in the web arena really likes doing the absolute maximum

stray aspen
#

@deep adder

#

hello

#

to ask if you really are craig federighi

languid crescent
#

hayo

#

Is there a way to paste code like a code block just like how ai does it?

#

Whenever I just paste a code it just straight paste it into a single line of text is it normal?

ocean vortex
#

been ages since we had it last time...

ember rapids
#

Yeah

#

O3 alpha

#

It’s really good

subtle lintel
#

Hi
I’m using Flux Kontex Dev, both locally and through the Playground. But I noticed that the version of Flux Dev on the lmarena website seems to perform better🤔. If anyone knows what configurations or setup they’re using to run it there, I’d really appreciate it if you could share the details😄

ocean vortex
#

same way like in discord

languid crescent
# ocean vortex

DAMN I'VE BEEN USING IT FOR MONTHS AND WHY DID I NOT KNOW THIS 😭

#

ty @ocean vortex

civic flame
#

the days of gpt2-chatbot were peak

primal orbit
#

is o3 alpha available in general arena? or webdev only?

sacred quail
#

What is O3 alpha ?

primal orbit
#

aka "anonymous-chatbot-0717"

wintry tinsel
modern meteor
#

guys are there any benchmarks for agents? or ones who decode obfuscated code specifically? 🙃

whole wagon
#

Hm idk why would they make another o3. Maybe it's the open source model

#

It can't be an agent it's way too quick for that

sacred plaza
#

So, how many on your Elon stans have falling in love with that grok companion already? 😂😂

torn mantle
#

o3-alpha is good actually

stray aspen
#

How did you access o3 alpha

torn mantle
keen ferry
torn mantle
stray aspen
#

Alright thank you

torn mantle
#

np

ocean vortex
#

My experience with gemini integrations in a nutshell:

torn mantle
#

meta is slowing down the progress by x100

ocean vortex
#

I don't think Meta is going about it the right way lmao. They are doing more harm than good in the AI space IMO

#

Can end up with a divided team and people with wasted potential

#

They could start their own new startups though with the money Meta is paying them... 😇

torn mantle
torn mantle
#

zuck is slowing things down

#

idk if thats part of his strategy or nah

ember abyss
#

i think they are going to integrate sesame into meta.ai

#

to make it more realistic(if they are going to make something like robot profiles)

unborn ocean
#

and a dynamic team that is still in motion

#

so we should expect good things

#

short-term the effect might be negative though

leaden palm
#

so many anonymous models

civic flame
#

o3-alpha has been removed from web arena

soft kernel
#

It's gone

#

It's gone💀💀💀💀

soft kernel
civic flame
#

what's the point man 😭 it was literally added earlier today

#

unless it's temporary

lone vector
#

is o3-alpha better than kingfall

soft kernel
soft kernel
lone vector
#

crazy how fast things are improving, and kingfall was leaked almost two months ago

leaden meteor
soft kernel
leaden meteor
#

I am sure they will add it again...But it felt like your classmate who talks a lot but not exactly smart...

torn mantle
#

its gone?

soft kernel
#

Yeah they took it down bro

#

A tragic

soft kernel
ornate agate
#

AGI happened already I think 😐

torn mantle
sacred plaza
#

AGI is an AI product that makes $100 billion (per Microsoft contract with openai). Have not seen it yet

small haven
#

grok 4 heavy is a scam

soft kernel
sacred plaza
#

AGI is a silly thought experiment that is based on vibes and not grounded in anything in reality

soft kernel
leaden sun
ocean vortex
# small haven grok 4 heavy is a scam

Yeah it is. I'm kinda surprised OpenAI haven't challenged them on those benchmarks... xAI is becoming facade company where all that matters are manipulated benchmark numbers lol

#

not terribly surprising

ocean vortex
torn mantle
#

im not feeling xai vibes at all

ocean vortex
#

@echo aurora Any update on being able to chat with mystery models for more than any messages superseding your voting? Gonna be honest my usage of lmarena absolutely dropped of the cliff ever since the interface update and the legacy version not having new models anymore...

torn mantle
#

they are more focused on satisfying elon than making a good model

#

and on top of that, working continuously 24/7 kills productivity

#

they also have data problems, unlike oai (which was supported by msft) or google... tbh, I haven't seen the light in their models yet, they still seem so behind other competitors

#

or maybe they are just incompetent

ocean vortex
#

new anonymous-chatbot I'm really interested in, but it's pointless to use arena with how things are atm, not gonna be able to test it anyway

torn mantle
#

they may have improved on reasoning but knowledge-wise.. its stil doing the same mistakes

#

it didnt improve at all on multi-lingual

#

like at all

ocean vortex
sour spindle
#

Crazy to me right now nothing comes close to o3

torn mantle
#

lol

#

also

#

o3 is a smart model, but the real problem with it is hallucination

#

but o3-alpha version seems to fix this problem.. it actually gave me many correct scientific references

ocean vortex
gaunt gate
#

I can’t accept terms why ?

ocean vortex
#

that one is notorious for hallucinating tools usage when it can't do it

#

But Claude4 is good for custom function calling. It's a trade-off and the price you pay to make it performant on TAU

torn mantle
ocean vortex
#

or do you prefer referring to hallucinated things or propaganda tweets like grok does... catgrin

sour spindle
#

o3 for me still cites the best sources. Still stunned at the stuff Google and Grok cite

ocean vortex
#

essentially almost agreed that's he is far-right fascist lol

torn mantle
ocean vortex
#

50% pretending to work, 20% getting ready to start and finish working, 10% in-between, 20% actual work. Overall stuff getting done --> same or less than working 9to5. lmao

echo aurora
leaden palm
storm needle
#

I just got it

#

I asked to write a chess UI and the O3 alpha got it

elder rapids
#

😭

#

also ye this is elite for humans

#

pretty good

#

I can see why

golden ocean
#

real

empty stump
#

When gpt 5 gonna release

torn bison
#

2025

ocean vortex
#

@keen beacon 👀

storm needle
whole wagon
#

Grok added to simple bench

jade egret
#

they need to make gemini better at coding..

#

at least windsurf people is at google now

small haven
#

when is 2.5 ultra

jade egret
#

fr

small haven
hardy pecan
small haven
hardy pecan
#

Yeah... I noticed he was prompting with that too, I dont understand, why edit the original question, makes zero sense

#

although, i still think grok 4 heavy still gets its wrong?

small haven
#

wait what is the actual prompt

#

can u copy paste, been using that prompt ever since kingfall lol

hardy pecan
#

A luxury sports-car is traveling north at 30km/h over a roadbridge, 250m long, which runs over a river that is flowing at 5km/h eastward. The wind is blowing at 1km/h westward, slow enough not to bother the pedestrians snapping photos of the car from both sides of the roadbridge as the car passes. A glove was stored in the trunk of the car, but slips out of a hole and drops out when the car is half-way over the bridge. Assume the car continues in the same direction at the same speed, and the wind and river continue to move as stated. 1 hour later, the water-proof glove is (relative to the center of the bridge) approximately?

small haven
#

right wtf..

hardy pecan
#

some ppl aren't math ppl i guess...

#

ahem dom

small haven
#

no wonder we can never get D..

hardy pecan
#

xddddd

small haven
#

does grok 4 get it?

hardy pecan
#

I'll need to check

small haven
#

i queued in again

#

wait so plain grok 4 gets it

#

right lol

#

problem with heavy, cant see any traces

hardy pecan
#

Yep same

small haven
#

B?

#

ok nvm

hardy pecan
#

FiRsT pRiNcIPleS tHiNkiNG

small haven
#

first steal answers, then think

hardy pecan
#

Very smart model!!!

small haven
#

fcking dom

hardy pecan
#

I think he was the one that presented the sailboats going back and forward question and adamant on the wrong answer too lol, oh well..

topaz peak
#

no idea why grok 4 is so high on so many leaderboards, doesn't look like much from what i have seen

small haven
#

prtty sure it would get it

#

now with this prompt

#

ok so o3 pro didn't get it

small haven
small haven
terse shuttle
#

Does it make sense to wait for a video generation chat on website?

whole wagon
#

bruh i cant get o3 alpha

#

theres so many damn models rn

terse shuttle
whole wagon
#

its in llm arena battle

small haven
#

yoo

#

it's also animated, the eyes like a scanner

#

wow

#

craigbench

#

what we thinking, better than kingfall now?

#

so o3 alpha is just made up?

balmy mist
#

its not their open source model?

small haven
#

its too good for an open source model imo

#

better than o3

balmy mist
#

its stilll on web dev?

small haven
#

yea

#

i just got it a few mins ago

#

where did the name o3 alpha come from?

#

just a tweet bait?

balmy mist
small haven
balmy mist
small haven
#

ah

#

it could just be a bait from oai too

#

like why would they finetune o3

#

codex full?

#

its already in production

#

prolly

whole wagon
#

i did so much battle and didnt get

#

idk if its there

torn bison
silver radish
#

Thank you lmarena for making my life better

hardy pecan
#

17th July.. Would make sense why i felt like o3 was/is lobotomized for the last few days

#

but maybe that was just my experience

#

So anonymous-chatbot is good?

hollow ocean
#

Pro users only

#

Next week

whole wagon
#

Nope

#

August at the earliest

reef pawn
#

Gemini 3 when?

keen beacon
dusky aurora
hardy pecan
dusky aurora
#

it's frustrating when the quality drops

torn mantle
#

but ive seen a lot of positive reviews

#

they seem to have trained it a lot on 3d simulations

#

also functionality is more prioritized than anything

#

it will just add some cool stuff just for the sake of it

royal nexus
#

Is it just me or is the edit button genuinely non-existent?

torn mantle
#

what edit button

royal nexus
#

Edit messages

torn mantle
#

i can

royal nexus
#

Hold on you can edit messages on LM Arena????

#

How?

#

The edit button is missing for me on both PC and Mobile

torn mantle
#

could gpt-5 be o3-alpha + some gpt model

torn mantle
royal nexus
#

Oh

torn mantle
#

since when was it possible

royal nexus
#

I was just asking in case it was just a bug or the LM Arena actually doesn't have an edit button.

#

Man it's so inconvenient

royal nexus
#

Well I added a request to feedback and I hope it gets added eventually..

leaden sun
leaden sun
#

it could mean this research model might have...integrated some kind of a proof assistant, or a combination of various specialized formalizer and a solver...?

unborn ocean
#

i mean gdm literally wrote a paper on how to achieve this (even gold in IMO, so it is really not that impressive...)

#

and that gdm model is not really only a llm

leaden sun
unborn ocean
#

they basically computed most of the stuff and it is more a llm + tools implementation

small haven
#

so basically is that imo experimental model agi?

zinc ore
#

No, it's probably only specialized for IMO

unborn ocean
small haven
#

still impressive

whole wagon
#

Google must not be thrilled about this 😂

#

IMO was supposed to be their thing

zinc ore
#

These tweets look like trying to steal their thunder

#

Saw someone say last IMO they released their paper and announcement on it about two weeks after IMO

#

So yeah, they're probably slightly seething

torn mantle
#

by then any other ai lab will catch up

pulsar tendon
#

No tools or internet

#

And not narrow

zinc ore
#

They had a Gemini model that could do IMO problems in natural language, trained on AlphaProof or whatever

#

Last year I mean, when they did IMO 2024

#

Deepmind guy

leaden sun
#

i guess now i know where the name sunny comes from, those gpts love to call me that...😆

unborn ocean
#

^all from the (new-ish) gdm AlphaGeometry 2

#

"Last but not least, we report progress towards using AlphaGeometry2
as a part of a fully automated system that reliably solves geometry problems directly from natural
language input."

mossy drum
#

New model in Battle mode: kraken-07152025-1

whole wagon
#

its been there for some time

frozen island
#

Anyone seen the annonymous lately? Appears to be gone from my understanding playing around web dev arena.

wraith sail
#

hi! hope all are doing well.

ocean vortex
wraith sail
#

i have an issue with chatgpt-4o-latest-20250326 . it stopped giving response and my whole chat is stuck now. the error it show: Something went wrong with this response, please try again. how to solve it?

torn mantle
gusty tendon
#

is o3 alpha still in the arena or it got removed?

whole wagon
#

Why did they put it in the arena for such a short period of time lmao

#

Did they remove it cos ppl figured out it's gpt5

lime coral
whole wagon
#

Did they scam it? Kek

gusty tendon
#

its frustrating you can't pick what models you wanna battle

soft kernel
lime coral
exotic tartan
gusty tendon
torn mantle
alpine coral
ocean vortex
ocean vortex
#

a one way street to collect votes and data / your prompts

whole wagon
#

Gpt5 being a router is so misunderstood lol

#

It's only a router at the very lowest intelligence to literally switch to a mini model

#

Otherwise it functions as a normal LLM with scalable thinking

#

I think the field as a whole is going to move away from specialised models to fully generalised ones

#

Specialised models work better for smaller models

#

Because they don't have the capacity for everything

keen beacon
#

GPT-soon in the arena

ocean vortex
#

Instead of having gpt4.1 + o3 and gpt4.1-mini + o4-mini as seperate models now those gonna be just 2 models in total

#

They were the same base anyways

whole wagon
#

Well yeah. It's still a router

#

Because it switches between 2 models

#

The mini and the normal

ocean vortex
#

Just like Claude4 is not a router

#

gpt4.1 is essentially like Sonnet4 with thinking disabled. Well not really since Sonnet had RL training for reasoning and they don't separate them already, but in practice it works similarly.

misty vault
ocean vortex
#

Making 4 (not 2) models into a single one would have been too much IMO

#

Smarter to simply give an option for mini or full one

#

that way you can ensure 1 is cheap

#

and get better performance with less compromises for the other one

patent aspen
#

It's probably many model sizes derived from the same parent model

languid crescent
#

damn lol my message 😭

hollow swan
#

should i use gemini or claude for youtube scripts?

keen fulcrum
hollow swan
languid crescent
hollow swan
# languid crescent gemini

i've been using claude for a while but my family member recently bought gemini so i was thinking about trying it

#

claude gives very human responses from my experience, way better than gpt for example

whole wagon
#

LLM arena suggests otherwise

hollow swan
#

yeah only heard about this site today

#

yeah that's why i tried joining here, to ask if anyone has personal experience with both

whole wagon
#

Lol after Sam's tweet

languid crescent
#

@tomas try checkking lmarena leaderboard

sour spindle
#

@hollow swan at the end of the day the best benchmarks are self use benchmarks

#

Try all the models and see what you like best for your specific use cases

balmy mist
#

lol

#

wait im late

keen beacon
zinc ore
#

"I talked to IMO Secretary General Ria van Huffel at the IMO 2025 closing party about the OpenAI announcement. While I can't speak for the Board or the IMO (and didn't get a chance to talk about this with IMO President Gregor Dolinar, and I doubt the Board are readily in a position to meet for the next few days while traveling home), Ria was happy for me to say that it was the general sense of the Jury and Coordinators at IMO 2025 that it's rude and inappropriate for AI developers to make announcements about their IMO performances too close to the IMO (such as before the closing party, in this case; the general coordinator view is that such announcements should wait at least a week after the closing ceremony), when the focus should be on the achievements of the actual human IMO contestants and reports from AIs serve to distract from that.

I don't think OpenAI was one of the AI companies that agreed to cooperate with the IMO on testing their models and don't think any of the 91 coordinators on the Sunshine Coast were involved in assessing their scripts."

balmy mist
#

What agent is powering OpenAI agent?

rugged brook
#

Is o3 alpha on web dev

ocean vortex
#

That is not really clear, I wouldn't say this with any confidence tbh

#

It's more like 60% chance this is based on some gpt5 derivative base model, 30% gpt4.1/o3 and 10% chance it's something else entirely

#

it wouldn't make much sense for them to be wasting their time with o3 when they are already testing gpt5 I would say

keen beacon
#

i recall reading somewhere that semianalysis said o4 and o5 would be on 4.1 as well (though it's paywalled i believe)

keen beacon
merry cloud
#

Guys am new how often are there new models

torn mantle
#

this is so fast

#

well not compared to groq

#

but still

keen beacon
#

idk you can be kinda bullish i guess. they're confident in the rl paradigm

#

maybe. but it makes sense they'd use 4.1 i guess

#

they spent a lot of time on it

torn mantle
#

yea nvidia is providing that for free

#

its actually fast as well

#

havent really tried groq to tell whos faster

#

but im happy with nvidia inference

keen beacon
#

last time i used groq quality was sh1t

#

doesnt matter how fast it is if quality is terrible like that

#

model was very dumb/got into constant loops and poor in quality

#

compared to other hosts

#

their quantization/inference stack/whatever idk

#

nowadays i just use deepinfra most of the time when i need stuff like that, quality is good, it's cheap, and rate limits are good and reliable

#

requests rarely fail and the rate limit is nearly just a semaphore

torn mantle
keen beacon
#

idk lol it was sh1t back then. still have zero interest in them

blazing rune
keen beacon
#

they also have cheaper prices for gemini 2.5 as well

blazing rune
#

idk why

small haven
ocean vortex
#

yeah it is. It would make sense if this was R1 or an equivalent reasoning model. But as it stands you are paying for V3 type of model lol

#

speed is more important for reasoning

ocean vortex
#

Pretty much unusuable as far as my standards for inference go lol

keen beacon
#

yeah i dont really care about tps that much since im running batch jobs

ocean vortex
#

chutes manages to beat them in most cases, and with price either considerably lower or completely free

keen beacon
#

rate limits and quality can be questionable

#

you can do 200 requests concurrently on deepinfra

ocean vortex
#

In my experience, chutes inference is good quality. Never had performance degradation issues or questions, unlike some other much more expensive providers...

#

deepinfra including actually

#

GMICloud is another suspect... R1 hosted there seems to perform marginally worse than chutes

chilly nexus
#

Who's octopus model?

#

idk what it is. but it dropped way too many F-bombs

#

i liked

ocean vortex
chilly nexus
#

yeah.. kinda was thinking it's probably grok too

mossy drum
#

New models in Image Arena: imagen-4.0-generate-preview-06-06-v2 ,imagen-4.0-ultra-generate-preview-06-06-v2

torn mantle
#

it went crazy with the 'HAHAHAHAHAHA'

ocean vortex
#

"nettle"

#

dunno what it is but it isn't groundbreaking, something new though

hardy pecan
# keen beacon

Lmao I had the same idea to guess what "soon" meant on average

ocean vortex
#

Ok just did a bunch of battles... "clownfish" was the first and only one that caught my attention. Very limited testing given current interface of course and being forced to do voodoo here, but this one at least has potential to be good 🧐

#
poll_question_text

What is Deepseek's operating cost to run R1? Per 1M output. Official pricing $1.10 for V3 and $2.19 for R1 with theoretical 545% margin with R1 pricing for both models.

victor_answer_votes

6

total_votes

11

victor_answer_id

2

victor_answer_text

$0.30 to $0.55

unborn ocean
# alpine coral you were saying before tool, tools, tools. but in the thread they explicitly say...

the resources are more about google's approach: mainly alpha geometry, as an example of what i meant with "tools".
we just don't know what kind of setup they have in particular, but based on how good these tool above are (solving >70% of IMO geometry mainly by use of them), they likely either use the models with that (but don't call it a tool) or they trained the model on the model + tool responses
could just be wrong though and they could be spending upwards of 10k per task in some kind of extreme TTC scaling effort, we just don't know really

#

Looking at the history of oai (or any lab for matter of fact) promising big gains on any benchmark, healthy scepticism is probably the best way...

hollow ocean
leaden palm
#

Meta Superintelligence

lime coral
craggy depot
#

hello

gentle plinth
# lime coral https://x.com/pli_cachete/status/1946692267915304991?s=46

It is tempting to view the capability of current AI technology as a singular quantity: either a given task X is within the ability of current tools, or it is not. However, there is in fact a very wide spread in capability (several orders of magnitude) depending on what resources and assistance gives the tool, and how one reports their results.

One can illustrate this with a human metaphor. I will use the recently concluded International Mathematical Olympiad (IMO) as an example. Here, the format is that each country fields a team of six human contestants (high school students), led by a team leader (often a professional mathematician). Over the course of two days, each contestant is given four and a half hours on each day to solve three difficult mathematical problems, given only pen and paper. No communication between contestants (or with the team leader) during this period is permitted, although the contestants can ask the invigilators for clarification on the wordi…

craggy depot
#

where to ask question / give suggestion ? In General ?

somber hatch
sturdy mica
echo aurora
echo aurora
willow grail
#

why chat not full with new openai o3 coding model on lmarena

#

wtf??? CHAT??????? HELLO?

small haven
#

wen o3 alpha public release

#

i also want to start and finish a coding project under 5 mins ;P

hardy pecan
#

grok 4 got 60.4% on simplebench which is decent to be fair

small haven
zinc ore
#

Lots of birds talking.

Sounds right, maybe this week even ?

Gpt 6 wen ?

QRT: Yuchenj_UW
Heard GPT-5 is imminent, from a little bird.

- It’s not one model, but multiple models. It has a router that switches between reasoning, non-reasoning, and tool-using models.
- That’s why Sam said they’d “fix model naming”: prompts will just auto-route to the right model.
- GPT-6 is in training.

I just hope they’re not delaying it for more safety tests. :)

whole wagon
#

Jimmy is wrong 🥀 sources ain't great these days lol

#

It's coming beginning of August

whole wagon
#

Confidently incorrect

pulsar tendon
half tartan
#

is lmarena going to add MCP support to website? or web search?

torn mantle
half tartan
torn mantle
#

i dont think thats the purpose of such service

#

the only purpose is to compare raw base/reasoning models wihout tools enabled

half tartan
#

but still we can test how well can model handle these

torn mantle
#

we have other benchmarks for that

half tartan
tidal schooner
#

📣 Announcing the release of OpenReasoning-Nemotron: a suite of reasoning-capable LLMs which have been distilled from the DeepSeek R1 0528 671B model. Trained on a massive, high-quality dataset distilled from the new DeepSeek R1 0528, our new 7B, 14B, and 32B models achieve SOTA perf on a wide range of reasoning benchmarks for their respective sizes in the domain of mathematics, science and code. The models are available on @huggingface🤗: nvda.ws/456WifL

**💬 10 🔁 111 ❤️ 593 👁️ 47.9K **

whole wagon
#

Why didn't they compare their 32B to qwens 32B directly

#

Lol

torn mantle
tidal schooner
whole wagon
#

Yeah. Isn't that better to show

tidal schooner
#

ig nvidia wants to be fair

whole wagon
#

Comparing equal model sizes is fair

#

Can't wait for Kimi K2 distills once they add reasoning to it

torn mantle
#

actually qwen 32b is pretty solid

#

probably their best model

#

not a huge fan of 235b model

#

feels like they didnt gain much

pure anvil
torn mantle
#

they are flexing

#

"ok we made this model that is as small as 32b and is comparable to a 235b model"

#

well actually its even better in most benchmarks

pure anvil
#

bruh am i tripping they literally compare it to r1 and qwen 235B

torn mantle
#

yea

tidal schooner
torn mantle
#

its basically better

whole wagon
#

Across the benchmarks they selected yes

torn mantle
#

falls behind only in mmlu-pro and scicode

torn mantle
whole wagon
#

Swe bench and such is absent

#

And usually favours larger models more

torn mantle
#

i just need to try it personally

#

i dont need to look at these benchmarks

pure anvil
torn mantle
#

whats LCB

whole wagon
#

Livecodebench

torn mantle
#

ah

whole wagon
#

Coding questions leetcode style

torn mantle
#

yea yea

whole wagon
pure anvil
whole wagon
#

🥀

#

What do you think the purpose of this release is?

#

Just out of goodwill?

pure anvil
#

If you want to see if it's good you can without paying them anything

torn mantle
#

oh 778 getting defensive, guess he worked with nvidia team on this model

pure anvil
#

haha

torn mantle
#

xd

pure anvil
#

imagine being that dense though

whole wagon
#

I'm sure it is good. Just not 14B near R1 0528 levels good

#

That is unrealistic

torn mantle
#

i kinda appreciate free models tbh, i would just try them to see if they are good instead of relying on benchs

pure anvil
whole wagon
#

You can verify any models claims. Whether it is open weight or not is irrelevant

pure anvil
#

Your point being?

whole wagon
#

Obviously the benchmarks can be carefully selected, and it being open source or not is totally irrelevant to that

torn mantle
#

nah i need to lookup old messages between you two to see if there is any beef going on

whole wagon
#

There isn't. He just started acting up randomly

torn mantle
#

why are you guys getting all worked up for this

zinc ore
#

Nah let the spice flow

torn mantle
zinc ore
#

Not a dune reference btw

torn mantle
whole wagon
#

I think he needs some therapy

pure anvil
#

And you need some books

#

and some arxiv papers

torn mantle
#

idk who 777 is but we need them here now

tidal schooner
whole wagon
#

Great

torn mantle
whole wagon
#

Already lol

torn mantle
#

they dont seem serious to me

whole sundial
#

mechahitler for kids

torn mantle
#

xd

whole wagon