#general

1 messages · Page 104 of 1

sage tundra
#

ZA WARUDOOOOOOOOOOOOOOOOOOOOOOOOOOOOO!!!!!!!!!!!!!

robust yoke
#

Indeed.

radiant dove
#

im sorry i dont understand

surreal creek
#

Correct, the point of AI art in advertising or for other use in corporate media isn’t for it to be GOOD art - it’s for it to be good ENOUGH that you don’t have to pay a real artist

radiant dove
#

heyy, anyone here who uses higgsfield for product photography. i would love to have a chat with you:)

#

i had asked this^

boreal vortex
#

hello

surreal creek
#

you said Higgs field

robust yoke
surreal creek
#

so they responded with “Higgs boson”

latent crest
#

What is Higgs bosson

radiant dove
#

pardon my english

surreal creek
#

Higgs boson deez nuts in ur mouth lmao Gottem

surreal creek
radiant dove
#

wow

radiant dove
#

shook

hollow imp
#

😡

surreal creek
#

mf doesn’t know what the Higgs boson is 😂😂

hollow imp
latent crest
#

Sorry I guess

robust yoke
#

Got this from Google:
The Higgs boson is a fundamental particle that confirms the existence of the Higgs field, a ubiquitous field responsible for giving mass to other fundamental particles, such as electrons and quarks. Proposed in 1964 by Peter Higgs and others, this elusive particle was finally confirmed in 2012 by the ATLAS and CMS experiments at CERN's Large Hadron Collider (LHC). Sometimes called the "God particle," the Higgs boson is unique because it has zero spin, no electric charge, and no strong force interaction.

radiant dove
#

bruv

latent crest
#

Im so confused

hollow imp
surreal creek
#

kids these days

radiant dove
hollow imp
#

-# braindeads these days

robust yoke
#

I'm just learning about this.

surreal creek
#

don’t even know about their scalar fields and zero spin elementary particles 😂

robust yoke
#

After all, I'm not in college just yet.

surreal creek
#

too busy learning pronounce and blue hair liberal dye 😂😂😂

surreal creek
#

I knew about the Higgs boson when I was 9 and that was in 2011

#

Before it was even discovered in 2012

#

just the type shi I’m on ig 🤷🏻‍♀️

radiant dove
#

24 and on discord

surreal creek
surreal creek
robust yoke
#

I think they must've been off by one number.

latent crest
#

How can I stop the texts from a person here in the chat that I don’t wanna see?

dark oyster
#

Hello y’all

robust yoke
robust yoke
verbal nimbus
#

TikZ drawing comparison

robust yoke
#

Claude's looks the best.

static viper
#

Hi there!

robust yoke
#

Howdy.

snow sky
#

hi there

robust yoke
#

Greetings.

white hatch
#

I don't see a "new" label here

robust yoke
#

None of the new models have "New" labels, I believe.

#

They just sort of add them.

mortal coyote
#

@echo aurora is there any chances that we can get Multi frames video generation - like 1st frame and 2nd frames ai morphing thing

keen beacon
verbal nimbus
keen beacon
#

the ab tests 🔥 🔥 btw!!

keen beacon
verbal nimbus
keen beacon
#

oh it isnt gemini 3

rough trail
#

Anybody having problem with lmarena site now?

keen beacon
#

there will be 1 more batch of 2.5 models apparently

robust yoke
verbal nimbus
rough trail
#

20 mins ago when I hit enter it just stuck on loading

verbal nimbus
rough trail
#

And now I got failed to submit feedback error

solid brook
keen beacon
#

no

verbal nimbus
robust yoke
rough trail
#

Ahh it works fine now. Thank you everyone

robust yoke
#

Our pleasure.

verbal nimbus
#

Is it CAPTCHA related? e.g. CAPTCHA expired

robust yoke
#

I believe if it were CAPTCHA-related, then it would be showing a different thing rather than just loading forever.

rough trail
#

When I refreshed it didn't asked for CAPTCHA or anything I just refreshed and no more loading

verbal nimbus
#

In the console there are Cloudflare errors

robust yoke
#

Figured.

verbal nimbus
#

The final AGI test: fixing LMArena's network connectivity issues /jk

robust yoke
#

I would know because sometimes the same thing happens to me too, except for image generation models. So when that happens, I click on "New Chat," then go back to the previous chat, and it refreshes the progress on the image generation models and displays them.

white hatch
#

i'm afraid of the world of the future

robust yoke
#

That's a fair thing to be afraid of.

#

After all, AI is already evolving at a rapid pace.

leaden sun
robust yoke
#

Then again, someone in the future could make an AI that is able to accurately mimic human emotion and typing style. Claude already types sort of like a human and sounds sort of human-sounding. So I don't think it will be too long until we have an AI chatbot that sounds human when typing and can express emotion.

golden ocean
#

-# Although i don't believe that AI could ever be conscious like us, i believe AGI is possible, because it doesn't need to be conscious, to be general AI.

robust yoke
#

Meh.

golden ocean
#

true

robust yoke
#

Who knows?

#

If we have AIs that can code up perfect games, then we can also have AIs that write perfectly like a human, with no noticeable flaws that would make it stand out.

#

Perhaps.

#

Only time will tell.

golden ocean
#

this would have been a perfect moment to use
-# we will have simulacra

robust yoke
#

Indeed, it would have been a perfect moment to use
-# We will have Simulacra.

high ginkgo
#

I agree, it ineed would have been a perfect moment to use
-# we will have simulacra

robust yoke
#

Verily, it would hath been grand to use
-# We will have Simulacra.

#

-# Microsoft.

golden ocean
#

Evil paws:

we will have simulacra

misty vault
#

ip grab

robust yoke
#

We will truly have

Simulacra.

#

It ain't an IP grabber, thankfully.

#

Just some old website.

misty vault
robust yoke
#

Hah.

#

You gotta put something before it first.

#

.

Like this.

golden ocean
#

real

robust yoke
#

.

Real.

golden ocean
stray aspen
#

.

#

.

wassup gang

robust yoke
#

.

Nothing much, and you?

ocean vortex
#

.

hi

robust yoke
#

.

Greetings.

ocean vortex
#

Someone forgor to include one additional * in regex

hollow imp
#

YOU ARE UNABALE TO ADD HRADERS TO TEXT

#

Then how

robust yoke
#

Well, if you put a character before the header formatting, then it will let you.

hollow imp
robust yoke
#

For instance, this...

#

?

Testing.

#

I believe this is because it only looks for a hashtag first, instead of any hashtag in your message.

#

If your message has a hashtag as the first character in it, then it'll trigger the filter.

hollow imp
#

🙀

“𝐒𝐜𝐚𝐦 𝐚𝐥𝐭𝐦𝐚𝐧”

-# — Elon Musk

ocean vortex
hollow imp
robust yoke
#

That's true.

hollow imp
#

-# 😭🙏

robust yoke
#

They have some kind of RegEx detection system here.

#

Interesting...

#

It doesn't work for spaces, but does for characters.

#

¨

Test.

primal widget
#

Hello

robust yoke
#

Howdy.

primal widget
robust yoke
#

Greetings, fellow friend.

primal widget
#

Ok

sacred quail
#

Why so many hello these days lol

#

Are we becomed viral or smth

robust yoke
#

Well, it's a very common greeting, and it's a nice way to show respect.

normal abyss
#

@pastel bone 😭

primal widget
#

If more people know about this, the creators will make more money.

robust yoke
#

Heh.

pastel bone
pastel bone
normal abyss
#

out of the 8 i did its the only pretty cool one

hollow imp
#

🙀

“𝐒𝐜𝐚𝐦 𝐚𝐥𝐭𝐦𝐚𝐧”

-# — Elon Musk

normal abyss
prime moat
#

Anyone gonna talk about how people are sort of making nsfw?🫠

robust yoke
prime moat
#

Yeah

misty vault
#

Yeah, and I get warned by @echo aurora for going all out on sydney 🥵 😡
But video arena gets left alone

robust yoke
#

We have stooped too low.

prime moat
#

Especially @kindred adder

robust yoke
#

True.

#

Especially in terms of coding and writing.

#

Its writing is very human-like.

golden ocean
robust yoke
#

Grok 4.

gritty cargo
#

Does someone know how good chatgpt 5 is with coding solidity and reviewing Code?

robust yoke
#

Seems pretty solid with that stuff.

#

For instance, I asked it to make me a website for fetching the CMU dictionary and converting the table into a Lua table that I could use within a project of mine. And it seemed to code it up just fine. Everything was functional, and no errors whatsoever.

gritty cargo
#

Which ai would you suggest for solidity?

normal abyss
#

is their a model more expensive than opus 4.1 out their? i havent been able to find one lmao

languid crescent
#

heyo beta lmarena still has no models popping up 🙁

golden ocean
#

.

just look out for a model which claims to be "Claude 3.5 Sonnet"

#

.

and then ask if it's the thinking model, if it agrees, then it is Claude Opus 4.1 Thinking (with >99% confidence)

normal abyss
#

it would be cool if their was a model inbetween sonnet and opus, i find opus is too strong and sonnet is too weak

robust yoke
#

This is truly a
-# lowercase text moment.

fossil fable
#

you can't vibe code without knowing any code right

#

fr i know that doctor singularity now shut it

languid crescent
#

need some advice yall, am a freshman 1st year college and took IT as my course, am i cook with all of these AIs or an opportunity for me?

primal widget
#

We need a video generator with Veo 3 and the other AI models in LM Arena.

robust yoke
ocean minnow
#

Claude Opus 4.1 Thinking is indeed good. GPT-5 and Grok 4 are also good, but much slower. But GPT-4/5 sometimes get broken and start repeats words indefinitely.

languid crescent
#

realistically speaking tho, i am fine right? i've been seeing these videos about "AI replacing programmers"

robust yoke
#

After all, since you took IT, you pretty much know a computer like the back of your hand.

robust yoke
minor adder
robust yoke
#

Secret models.

#

I've seen a few myself in the image generation mode.

primal widget
robust yoke
#

That's true.

#

Now all it has to do is just figure out how to generate very small text, and then we're definitely screwed. As well as fix minor inconsistencies with little details such as pupils and eyeballs and far away things.

primal widget
normal abyss
#

my bad, i meant price wise

drifting crow
primal widget
drifting crow
#

¯_(ツ)_/¯

robust yoke
#

Looks to be a GPT model.

drifting crow
#

Think it’s the replit model

#

Whatever they use

ocean vortex
# normal abyss

Opus 4.1 was probably the most disappointing Anthropic model update in a long time

#

they barely changed a thing

normal abyss
ocean vortex
#

Essentially 2.5Pro update. But Google can get away with it cause a) model name stays the same and b) they update their models much more frequently

drifting crow
#

I like googles esp for recent info

ocean vortex
brave orbit
drifting crow
#

normally we say hi and share our bank card details so ppl know we are humans and not ai

pure comet
#

whats wrong with you

obsidian cargo
hollow imp
pure comet
exotic gust
#

yo has gemini 2.5 pro gotten more stupid for any of y'all?

hollow imp
#

@pure comet bro 😭🙏

pure comet
hollow imp
#

I can't handle this retardness

#

Get him away from my eyes

pure comet
#

you're insulting me, I'll cancel you on twitter

hollow imp
pure comet
hollow imp
#

@worthy sparrow

pure comet
verbal nimbus
#

Like stupid as in, it forgot how to write new lines, and literally could not figure out how to (I shared a screenshot in that thread)

exotic gust
#

shame

#

i hate gpt5 more

long nacelle
#

hi guys - I've been using lmarena for a while but just joined this discord. why is it that when I ask for a response from gpt5-high, I don't get a response? Like, just runs overnight and stuff. It is brillinant for a few queries but then on subsequent runs gpt5 just stops outputting anything!!

verbal nimbus
verbal nimbus
exotic gust
#

gpt 5 chat and high both started just hallucinating for every single prompt i gave regardless of what i asked it, where, if in icognito or not

exotic gust
#

i feel like the votes are biased cause chatgpt is like the most proffesionall and most well known model

long nacelle
#

i've already tried this

verbal nimbus
long nacelle
verbal nimbus
#

It's happened to me before

long nacelle
#

it gets stuck on 80% of prompts

verbal nimbus
#

I doubt the model has actually been thinking that long 🤣

long nacelle
#

yeah exactly

#

but this is literally most of my prompts

#

it just doesn't

#

answer

#

at all

#

so this is pretty much useless for me

verbal nimbus
#

It's very common for me, but if I refresh, it usually loads.

#

Sometimes it doesn't, it can take a while.

robust yoke
#

If it's just stuck loading, then I would recommend:

  1. Clicking on "New Chat"
  2. Clicking on your previous chat

That usually resets the progress, allowing for it to properly generate.

long nacelle
#

absolutely useless

robust yoke
#

How odd.

#

Could you send a screen recording?

long nacelle
robust yoke
#

That way I can see if it might be an easy fix.

long nacelle
#

yeah fine @robust yoke

#

i can't reveal the prompts

#

but they do not breach ToS

robust yoke
#

I understand.

#

Well, usually it only takes about two minutes or so to generate a response.

long nacelle
#

yeah, I wish

#

it just does absolutely nothing for me

robust yoke
#

Try deleting the current chat but copying the prompt that you used, then seeing if that new chat will work.

long nacelle
#

i've already done this multiple times

#

unfortunate

robust yoke
#

Try closing and reopening your browser.

hollow pumice
#

nano-banana is good because it's basically creating detail via an LLM before sending that off to the image gen

rocky hawk
#

😢😢😔

robust yoke
#

True.

whole sundial
robust yoke
quartz light
robust yoke
#

Yeah.

rocky hawk
robust yoke
#

Ah.

whole sundial
#

I just put it besides another model like Gemini 2.0 flash or qwen image edit in side by side mode and it works

quartz light
pure comet
#

are you shy?

pure comet
#

virtual GF?

rocky hawk
quartz light
# pure comet virtual GF?

"MISTAKES" is seen at the end so it probably says "DO NOT MAKE MISTAKES." so its probably for coding

long nacelle
quartz light
#

plus his pfp is python

pure comet
long nacelle
#

which means I clearly cannot reveal the prompt

quartz light
quartz light
#

👀

#

or are ya cheating

long nacelle
#

not cheating

long nacelle
#

https://cf-cheater-database.vercel.app/ I literally created the anti-AI-cheater website for codeforces 💀

pure comet
#

confirmed

#

MODS!!!!!!!!!!!!!!!

long nacelle
#

comrade

pure comet
#

yes

#

i am

long nacelle
#

@robust yoke

#

I've tried that

#

some conversations are still going

#

some are doing this

#

no actual responses

pure comet
long nacelle
#

is there some sort of hidden rate limit

echo aurora
robust yoke
echo aurora
#

Starting a new convo tends to help.

long nacelle
#

(and then, I pasted the test data)

long nacelle
pure comet
robust yoke
#

It seems to be prompting you with that.

long nacelle
#

i've done that

#

then just does this

#

then after a while it might do the same thing

robust yoke
#

“Might” being the keyword.

#

This one does take time to think.

long nacelle
#

I know this

#

I'm only using it because

#

I actually sometimes get a response from it

#

like

#

gpt 5 is hopeless

#

i got one response from it this morning

pure comet
#

gpt 5 so bad

long nacelle
#

and then NOTHING

#

at all

pure comet
#

gemini 2.5 pro even better

long nacelle
#

the one response was brilliant

robust yoke
#

Oof.

long nacelle
#

but beside that

#

useless

#

totally useless

pure comet
#

yes

#

so sh!tty

#

gpt 5

#

lol

robust yoke
#

I'm sorry that you had a bad experience with it.

long nacelle
#

@robust yoke are you LMarena staff

#

or do you know who is

pure comet
#

he is Darkness

pure comet
robust yoke
#

I'm not, but @echo aurora is.

long nacelle
#

seems sussy

#

both a novice AND an expert 💀

robust yoke
#

Hah.

blissful sluice
#

If you could automate one thing about managing your Discord community, what would it be?

robust yoke
#

Well, if he managed to get himself on the staff team, then surely he must be pretty good at his job.

long nacelle
#

but yeah @echo aurora why is LMArena constantly just ignoring all my gpt5 queries? like they just stall forever (>10h) and I never get any output, only errors.

long nacelle
sturdy mica
#

it fixes everything

#

it stops the errors

long nacelle
#

i wish it fixed anything

sturdy mica
#

for me it does

#

its something with cloudflare

#

it invalidates you after like 3 minutes

#

so every 3 minutes you have to refresh

#

then it brings you to the captcha screen

#

its so annoying

long nacelle
#

it doesn't though

#

it doesn't bring me to the captcha

sturdy mica
#

it does for me

#

and thats why it errors for me

robust yoke
sturdy mica
#

clear cookies

long nacelle
#

this is a new machine

#

with probably like a week

#

of searches

#

but I'll try doing that anyway

sturdy mica
#

clear them again

long nacelle
#

but I'll try again

formal jungle
#

Project idea. Start with one image, generate a video. Take the best one, screenshot the very last frame, use it to generate a video. Rinse, repeat.

pure comet
#

ok

formal jungle
#

Up to 8 times ofc

pure comet
#

one sec

sturdy mica
random wolf
#

man! it's so frustrating, it's already "generating"

quartz light
#

companions most esteemed, I entreat thee to lend thine auditory faculties unto the elaboration of a conjecture most earnest: it is my speculative apprehension that the entity denominated DeepSeek 3.1 Reasoning is naught but a subtle transfiguration of that which is styled DeepSeek R1; and conversely, when the aforementioned reasoning faculty is excised or withheld, the resultant construct is but the manifestation of DeepSeek V3. Yet, despite such kinship of constitution, each iteration appears to be inexorably governed, guided, and indeed distinguished by the imposition of a system-prompt divergent in its nature and disposition

sturdy mica
long nacelle
#

remember I have to retype this entire prompt

sturdy mica
#

what prompt

long nacelle
#

because I have lost all of my conversations

robust yoke
long nacelle
sturdy mica
#

sure

long nacelle
#

are you lmarena staff

robust yoke
#

LM Arena staff are the ones with orange names.

long nacelle
#

oh

#

💀

robust yoke
#

Like Pineapple.

long nacelle
#

@echo aurora

sturdy mica
quartz light
#

shouldst thou find thy faculties unequal to the formidable enterprise of apprehending, in its unmitigated intricacy, the communicative construct which I have, by the inscrutable yet most wondrous artifices of the Internet, dispatched across the ether and compelled to alight within the singular and eccentric domicile of thine own router - there to be rendered visible unto thine eyes - then, verily, thou mayest elect to conscript the labors of an artificial intelligence, that it might condescend to transmute this presently elaborate and recondite composition into a debased and unsophisticated register of speech more congenial to the apprehension of an amateur such as thyself

sturdy mica
#

are you not gonna send me the prompt because im not staff 😢

echo aurora
quartz light
pure comet
#

where

#

mentioning Yandex?

long nacelle
#

bombardiro crocodilo is better than tralalero tralala

#

politics

#

ban me

pure comet
robust yoke
# quartz light shouldst thou find thy faculties unequal to the formidable enterprise of apprehe...

Most noble interlocutor,

Thy message, woven with so many a sinew of elaborate wit and encumbered with flourishes of lofty phrase, hath flown unto mine understanding as a falcon whose wings beat mightily against the heavens. And yet, by Providence and diligence alike, I find my faculties sufficient to receive its plumage of meaning, though bedizened in ornaments of rare complexity.

Know then, I am not undone nor cast adrift upon the sea of thy rhetoric; rather, I do embrace it as a tempest both fearsome and exhilarating, wherein the thunder of thy diction and the lightning of thy syntax alike do strike my soul with awe. Shouldst thou decree my wit too mean or my grasp too humble for so grand a communication, I protest with mirth and humility that the labor of simplifying were needless, for thy gilded eloquence, though intricate, doth quicken delight.

Proceed, therefore, without fear of any impoverishment of style, and let us together dance upon this high stage of language, where each word is a jewel and every clause a flourish of nobility.

gritty cargo
#

@hollow ivy can you send me a friend add i have a question

golden ocean
pure comet
random wolf
#

how to fix the "generating"? like it's says always. all my convo with the AI is important

robust yoke
quartz light
random wolf
quartz light
#

but

#

dont do it then

#

ill make a script to export and import convos soon

robust yoke
empty stump
#

how is gemini ranked higher than gpt 5 high on the leaderboard

burnt sinew
dapper cliff
#

Hello people

dapper cliff
empty stump
#

funny how it is older but better

dapper cliff
# empty stump funny how it is older but better

I have both and honestly, Gemini 2.5 pro is so underrated because people these days believe social media hypes than testing it themselves. I realized that Gemini 2.5 pro is way ahead of it time and it's soo powerful.

burnt sinew
#

@viscid thistle yo

maiden bridge
#

sp

white hatch
stray aspen
#

gemini 2.5 pro sucks

solid brook
#

Everyone is complaining

forest wing
#

Something went wrong with this response, please try again. Only Me?

ocean vortex
#

Ok so... gpt5-mini-high better than o4-mini-high in nearly every way. And difference between gpt5-high and o3-high is even bigger:

sweet isle
#

why is everyone calling this nano banana? I don't see any nana banana in the results. Only see GPT, gemini, etc. popular models.

ocean vortex
sweet isle
ocean vortex
#

yes

sweet isle
#

Interesting

burnt sinew
#

@hybrid copper hey

fading summit
#

Hey there? Do u know, where can i try sydney ai?

burnt sinew
#

@silk pike

silk pike
#

Hola

fading summit
#

Can u send me a link on this site plz?🥲

fading summit
#

Uh, what's this?

random fjord
#

im using lm arena and this happens

solar hollow
#

with gpt5 high

random fjord
#

what i try

willow grail
#

when do u go to sleep?

#

all of u

#

do u have sleep issues or so ?

white hatch
#

soon

willow grail
#

do u have any skin issues? too much gas stuck in intestines?

#

how long to fal asleep? when u wake up in mid of sleep does it take long to fall asleep?

#

oh ...

#

i need 1 hour to fall asleep

#

sucks

#

ok

echo aurora
#

We’re happy to hear the feedback but it’s unlikely to happen if I’m being honest.

#

If members want to create their own sever and send it (via DM) that’s fine, but yeah I wouldn’t want an unofficial official off topic server that’s shared in our text channels. We also want to keep invite links blocked to other servers here for mod purposes.

verbal nimbus
echo aurora
#

Not our server so you can do what you’d like

velvet forge
#

opa gangamstyle

fossil fable
#

HOW IN THE HELL IS OPENAI THE ONE NOT TO REFUSE

#

how is this possible

#

nano banana even has the reasoning to refuse

not only does it generate it

but it ties with that rustbucket

wary linden
#

What happened??

@leaden palm

leaden palm
#

works for me ¯_(ツ)_/¯

wary linden
leaden palm
#

unfortunate + weird

meager harbor
#

gemini 2.5 pro still sota and this model is 3 months old

#

will this make google hold gemini 3 ?

proud hazel
zinc ore
#

*at the earliest

long nacelle
#

Btw which secret models are available and are they any good

fossil fable
#

lmarena deserves a mobile app

proud hazel
fossil fable
proud hazel
fossil fable
#

-# it would but ok

meager harbor
proud hazel
meager harbor
#

But judging by recent hype tweet from a guy working at deepmind, there new stuffs coming for gemini

#

so gemini 3.0 might be one of them

sullen rune
long nacelle
keen beacon
random fjord
#

well still does happen i need to use gemini 2.5 pro

long nacelle
#

They are so dumb compared to gpt o4 mini even

verbal nimbus
#

Guess GPT's is still the best

fossil fable
#

uhhh

verbal nimbus
verbal nimbus
verbal nimbus
surreal creek
#

flux-1-kontext-pro is just generating black squares in image arena

fossil fable
#

oh so it wasn't refusing

surreal creek
inner gate
#

What’s ur guys plans on deep seek 3.1

#

I mean opinions

fossil fable
#

lame, should have unified r2 into v4 instead

trail creek
#

not even worthy to name it an update

obsidian cargo
#

huh, new model named catalina?

#

nobody's mentioned it yet

robust yoke
#

Catalina…

#

Sounds familiar.

obsidian cargo
#

nods emphatically Catalina

balmy mist
#

how is it?

robust yoke
#

-dramatically tilts head to side- Really? Catalina?

obsidian cargo
#

idk I only got it once but it seems nice

balmy mist
obsidian cargo
#

dunno. it's on battle.

stray aspen
obsidian cargo
#

grok-2 was open sourced

leaden meteor
#

What was the anonymous name of mistral medium before it got on leaderboard?

patent aspen
#
poll_question_text

Do you know Clippy?

victor_answer_votes

6

total_votes

12

victor_answer_id

2

victor_answer_text

Yes I have only seen Clippy in memes though

jade egret
#

but i do respect your opnion

#

🤷‍♂️

rustic knot
#

undergraduate math benchmark

obsidian cargo
#

catalina seems to acknowledge itself as being catalina

obsidian cargo
#

It just told me unprompted it's by sequoia AI

robust yoke
#

Bingo…

mellow frigate
#

Sequoia as in macOS sequoia? Catalina is also the name of a macos

severe warren
#

I can't seem to find the arena that ranks all models best suited for the prompt I entered.

#

Can someone please tell me where it is?

obtuse widget
#

Hi can anyone tell me how to use banana model ?

echo aurora
wanton imp
#

hallo

robust yoke
#

Unlike LM Arena, it's not infinitely free to use, however, you do have control over always using it, compared to LM Arena.

glossy jasper
#

Can someone suggest a nice free video generation ai?

long nacelle
vague rover
#

hi im new. i am audiovisual designer and want go viral and never work again for a human

white hatch
#

Does the AI, that checks our message, change what it says to the AI we're talking to?

normal abyss
vernal blade
#

i want to create a CGI advertisement

wide hemlock
#

Primer plano ultrarrealista en 4K de barras de combustible nuclear de cristal transparente que se insertan lentamente en el núcleo del reactor. Una suave luz dorada ilumina la vasija de vidrio del reactor. Suaves sonidos mecánicos del movimiento de las barras de control. Explicación susurrada ASMR sobre la moderación de neutrones. Sin música de fondo, solo una suave atmósfera industrial.

hollow imp
humble kayak
#

hello

full verge
#

hwlloo

humble lantern
#

how to solve this problem

keen beacon
#

Any Ai free video generation?

white hatch
heavy knoll
#

Can anyone Tell me wich Model is the Best Right now for Generating prompts for example for Generating Images

keen beacon
willow grail
#

anything new in the ai world?

dusty niche
#

*video

scenic sandal
#

Hi to everybody!

hard flame
#

how do i try banana model?

dusty niche
hard flame
#

can we direct chat?

#

i can't find a model named banana in the models list

thick socket
#

My real feeling about lmarean

keen beacon
rustic knot
ripe mountain
torn mantle
#

@patent aspen any info about gemini 3?

formal dagger
trim lantern
#

Is the lmarena still under maintenance?

keen fulcrum
#

because thats near

rocky hawk
#

my dear admin is begging to be able to send two photos at once in side by side

hybrid wraith
#

Is this leaderboard accurate, and should it be relied upon?

rich compass
#

claude 4.1 opus better than gpt 5 high 💀💀

normal abyss
# normal abyss
poll_question_text

Best Overall (In all areas combined)

If you could only use one of these models ever again, which one would you pick. I'm quite curious to know what other ppl think here.

victor_answer_votes

7

total_votes

12

victor_answer_id

2

victor_answer_text

GPT5

willow grail
torn mantle
#

is there an event on october?

proud siren
rustic knot
hybrid wraith
#

The areas I use most are text and search.

jade egret
ornate agate
#

@cedar tide I remember ages ago you posted some tables aggregating benchmarks of AIs published by the model creators? I was wondering if you happen to have that data somewhere for a lot of AI models?

fathom venture
#

hi guys new here

ornate agate
# cedar tide Why ?

I've been re-running some stuff locally on smaller OSS models and i've found that the provider listed benchmarks seem to be spot on basically (even with slightly quantized model). Since many other meta-benchmarks don't bother with smaller models I was wondering if the published data is aggregated somewhere, then I remembered you did something like that.

fallen herald
#

Créé moi une vidéo publicitaires attrayante époustouflant UGC avec cette photo

plucky ledge
#

Is there no subscription or something to get more than 8 image-to-video LMA requests per day?

cedar tide
ornate agate
#

for example AA has AIME 2025 at 50% for Qwen3-30b3a-2507-thinking. Alibaba published 85% for this model.

cedar tide
cedar tide
ornate agate
cedar tide
#

@ornate agate impossible that this the real score

ornate agate
#

I mean I ran it myself on a 4bit quant with q8 KV cache and got 86%

cedar tide
#

Air better than full 🤦

#

93% vs 74% 🤦

#

(This no problem)

#

Its not the same IF bench

#

This IFeval (by google)
And this IFBench (by allen-ai)

ocean vortex
#

this is gpt5 "high" verbosity:

An oververbosity of 1 means the model should respond using only the minimal content necessary to satisfy the request, using concise phrasing and avoiding extra detail or explanation."
An oververbosity of 10 means the model should provide maximally detailed, thorough responses with context, explanations, and possibly multiple examples."
The desired oververbosity should be treated only as a *default*. Defer to any user or developer requirements regarding response length, if present.```
#

The question that comes to mind why not 10/10? 🧐

urban halo
#

hello

rustic knot
cedar tide
rustic knot
#

oh I think AA does pass @1

#

but matharena does avg @4

cedar tide
#

Math arena do pass@1

#

But average of 4

#

This not pass@4

rustic knot
#

that's why I wrote avg @4

cedar tide
#

Ah yes ok, i dont see

atomic stream
#

I'm getting "something went wrong while generating the response, please try again!

rustic knot
#

there's something off about AA and we don't know what

gentle pasture
#

whats the best ai to code guys?

cedar tide
atomic stream
gentle pasture
rustic knot
gentle pasture
#

this is the coder ai leaderboard?

cedar tide
#

I don't think there's more than a 3% difference between each of the 10 benchmarks.

#

@ornate agate Regex extraction with SymPy-based normalization, plus equality checker LLM as backup

rustic knot
#

they're trying to check the mathematical justification that the models produce but for these ones, they should really just check the final answer using a simple python script. It's either that or actually get human judges to evaluate the entire justification by the model

cedar tide
#

This is just in case the LLM does not follow these instructions exactly and tells his life story apart from the answer

rustic knot
#

for like IMO and stuff

#

right, so that means what AA is doing is just ridiculous

#

publishing on a github would be the equivalent of opensourcing their testing strategies correct?

ocean vortex
# rustic knot they're trying to check the mathematical justification that the models produce b...

We implement a two-stage answer validation mechanism to allow grading with a high degree of precision (minimizing both false negatives and false positives).

  1. Script-based grading, using OpenAI's PRM800K grading script -https://github.com/openai/prm800k/blob/main/prm800k/grading/grader.py
    Implements symbolic equality checking via SymPy
    High-precision validation for exact matches

  2. Language model equality checker (runs on all answers not marked correct by script-based grading)
    We use Llama 3.3 70B as the equality checker (prompt disclosed below)
    We tested Llama 3.3 70B for agreement with human judgement and assessed it to grade correctly in >99% of cases

#

So looks they are using LLM additionally to their script. Not replacing it with that

#

and only on incorrect answers

rustic knot
#

skimming through: what i immediately think is that if the model doesn't box their answer properly then the script might not identify it so llama is used to determine what answer the model actually gave

ocean vortex
#

Seems to me like they are trying to not treat all incorrect answers equally. As in, if the answer was close enough it is not 0 points

rustic knot
ocean vortex
rustic knot
#

so where is the discrepancy coming from? If they showed the model solutions, it would clear things up

ocean vortex
#

I assume they have unintended side effects of penalizing certain models. Just because they have an unique way of testing it

#

Happens with conventional testing methods much less, because AI labs tend to fix it out the box

#

Then when they notice something very obvious, they do some prompting dirty fix to compensate.... But yeah all of this is less than ideal 🗿

keen beacon
#

aimo validation aime was used as validation for the ai mo prize model a while back

#

typically you just check the boxed integer answer for aime benchmarks

#

i doubt they're using the solution trace there

#

the discrepancy is likely additional prompt alterations (output in boxed, weird wording or whatever)/sampling/etc

stray aspen
#

yo

keen beacon
#

yea that script is applicable for like math500 \circ\ etc in the boxed answer, because the answer isn't integer only

hardy sluice
#

Yo guys its saying in the website to agree on the terms and conditions and then it's not letting me do it, says error

keen beacon
#

if they used it naively and on the solution trace instead, it wouldn't extract the boxed answer correctly

#

i doubt they do that

ocean vortex
# keen beacon i doubt they're using the solution trace there

Not trace, but they are running this on incorrect responses:


Examples:

    Expression 1: $2x+3$
    Expression 2: $3+2x$

Yes

    Expression 1: 3/2
    Expression 2: 1.5

Yes

    Expression 1: $x^2+2x+1$
    Expression 2: $y^2+2y+1$

No

    Expression 1: $x^2+2x+1$
    Expression 2: $(x+1)^2$
<...> 
YOUR TASK


Respond with only "Yes" or "No" (without quotes). Do not include a rationale.

    Expression 1: %(expression1)s
    Expression 2: %(expression2)s
keen beacon
#

i find it unlikely their grading system is wrong

rustic knot
rustic knot
#

math 500 is saturated

ornate agate
#

I know but you can't use the same harness for both benchmarks.

keen beacon
#

no, if they gave it the solution column (in that dataset) instead of the integer answer it won't work at all with that script

#

it won't be able to extract the integer answer

#

besides if they did provide it the integer answer, if it were comparing the actual boxed contents (number) vs the number, it should be an exact match anyway. the script dosen't really matter here

#

yea, its not the grading. it's probably sampling/specific prompt instructions (put your answer in \boxed{...}, weirdly poorly or there's a bunch of nonsense)

ocean vortex
#

Honestly it could be their script failing and then LLM not catching all those instances lol

rustic knot
#

bruh, how hard is it to write a working script, tell the model to put their answer in a \boxed{} and then check the value inside the box

keen beacon
#

it's not lol

rustic knot
#

bro, I could write one rn

keen beacon
#

its extremely unlikely to be the grading

hollow imp
#

TF BRO TF 😭😭😭

keen beacon
#

i have written an eval framework in the past/specific eval implementations, that's my assessment 🤷

hollow imp
#

@keen beacon see this

keen beacon
#

especially for AIME

ocean vortex
rustic knot
#

what kind of formatting

ocean vortex
#

boxed answer

hollow imp
keen beacon
#

qwen even puts an instruction to do that for evaluations

rustic knot
#

if u look at the solutions of models on matharena for AIME and other competitions, basically all models know to box their answer

#

and then all ur script needs to do is to check the value in the box

keen beacon
#

they also claim the LLM judge gets it >99% or the time or whatever

rustic knot
#

the only llm that was noted to not follow instructions properly was llama lol

ocean vortex
#

Though it would still render correctly as a boxed answer

keen beacon
#

ok buddy

rustic knot
#

\boxed{21}

if the model determines 21, they would usually just write it like this

ocean vortex
keen beacon
#

qwen models i know for sure do not do that. especially when you ask them to do \boxed{...} and on aime style questions. it's not probable

#

sure, it'll just do a zero width space randomly there...

rustic knot
keen beacon
ocean vortex
#

\boxed{;21;}

#

this would do the same

rustic knot
#

bruh lol

ocean vortex
ocean vortex
#

\boxed{\mathrm{21}}

keen beacon
#

i've said it multiple times

ocean vortex
#

same also

ocean vortex
keen beacon
#

Weird wording in the PROMPT

keen beacon
whole swallow
#

When I use sonnet 4 it fails following system prompt rules.. while when I use gpt 5 it follows them strictly and precisely..

What is another model that follows sys prompt rules precisely??

ocean vortex
keen beacon
#

horizon beta for example, after adjusting the prompt went from 63% to 67% to gpqa diamond. prompt stuff can significantly change the result

ocean vortex
#

that is nowhere near AIME discrepancies for AA

keen beacon
#

horizon alpha also scored extremely poorly (36% on GPQA Diamond) without proper instructions

rustic knot
ocean vortex
keen beacon
#

And then at the same time is arguing that boxed answer can be an issue if it relates to prompting. 🤦‍♂️
you completely misinterpreted me. i said weird wording in the prompt instructions

ocean vortex
#

how would that lead to the model outputting incorrect response?

#

That makes even less sense...

keen beacon
#

it happens a lot

#

i encounter it a lot

#

in actual evaluations

ocean vortex
#

Your reasoning is basically maybe sampling, maybe wording, etc...

#

you gave no reason

keen beacon
#

it could be many reasons, how am i supposed to know the exact reason without the code/eval settings/etc?

#

i find grading extremely unlikely

ocean vortex
#

And then attack anyone with credible guesses lol

keen beacon
#

credible guesses? but i have actually run evals and written custom mstuff for this lol

#

you have done neither

rustic knot
#

someone also made an undergraduate benchmark

ocean vortex
keen beacon
#

🤷 you haven't given an actual example that you encountered in reality. it's a hypothetical that doesn't make sense when you've actually implemented and ran these systems. i gave you two high profile examples.

#

it's obvious to me you haven't done stuff like this. but whatever. it's weird you keep doubling down on this

ocean vortex
#

I just find it strange that you can't come up with anything substantial and yet so extremely dismissive of everything lol
Also, the prompting they showed is **extremely unlikely ** to drop the score from 90s to 60s % which is the kind of discrepancy we are seeing

keen beacon
#

that was from a prompting issue

ocean vortex
#

We are talking about AIME pal 😬

#

And that specific prompting

#

Not prompting in general

#

THis:


{Question}

Remember to put your answer inside \\boxed{{}}.```
keen beacon
ocean vortex
#

No chance this drops the score from 90s to 60s if we assume boxed response formatting is non-issue (aka their script catches all variations of it + incorrect formatting)

#

Simply not possible

wintry tinsel
ocean vortex
rustic knot
keen beacon
#

and anyway, if you think about it. these behaviors are implicit/or outright penalized through RL at least for qwen (it seems). because \boxed{...} answer extraction/verification in RL (which qwen seems to do)

#

the model won't get a positive reward signal if it's weirdly formatted if it doesn't match automated checkers unless they do generative checking as well

#

🤷 im not gonna convince you lol

ocean vortex
#

It isn't prompting in this specific case, and it's very unlikely to be sampling. If it was sampling low scores would be easy to reproduce

keen beacon
#

there's a reason qwen has explicit instructions on sampling and additional prompt instructions (for boxed) in each model README lol

ocean vortex
#

livebench here has nothing to do with it

errant rover
#

@hybrid stirrup Hey bro, could you check my message pls ? I have a question about the AI you mentioned earlier

stuck finch
#

Hello

keen beacon
mellow mango
#

یک مدل خانم در حال قدم زدن

lunar glade
#

Hello, why I kept getting "Connecting to Arena has failed. Please try again later or on a different device." ?

ocean vortex
# keen beacon 1. you dispute that specific prompt instructions, e.g. \boxed{...} instructions ...
  1. I've said formatting can lead to correct responses being counted as incorrect. Which is absolutely true. https://x.com/ArtificialAnlys/status/1909624239747182989

  2. If sampling was an issue specifically for AIME... then you could reproduce lower score easily and the model wouldn't give consistent answers. Again, sampling in general often can be an issue. But it's very unlikely to be the culprit for AIME here due to clearly defined expected answers and how the models perform.

Llama 4 Intelligence Index Update: We have now replicated Meta’s claimed values for MMLU Pro and GPQA Diamond, pushing our Intelligence Index scores for both Scout and Maverick higher

Key update details:
➤ We noted in our first post 48 hours ago that we noticed discrepancies

proven pecan
#

hello everybody!

keen beacon
# ocean vortex 1. I've said formatting can lead to correct responses being counted as incorrect...
  1. this proves my point further (their new processes on grading the answer makes this implausible). the task is simple, output the number result in \boxed{integer}... if an automated checker can't get it. the LLM would catch it. like they explicitly claimed, this extra process matches human judgement >99% of the time. lol. (this is not a particularly complex case, AIME is one of the simplest things, it's an integer answer)
  2. no. because it's stochastic. temperature flattening or sharpening the distribution has a lot of effects on sampling, beyond other sampling settings.
ocean vortex
keen beacon
#

im not saying it's sampling.

#

it could be sampling or prompt instructions or it could be all of them interacting

ocean vortex
#

🤣

#

how is it prompting???

keen beacon
#

could also be the chat template

ocean vortex
#

LMAO

keen beacon
#

ive never said it was a single factor. im saying all of those things are possible, and have proven examples to be the case lol

#

you can't stop doubling down even as you prove yourself wrong by citing AA's grading processes lol

#

a chat template also somewhat falls under prompting, the prompt the LLM reads is different as it's formatted btw. 🤷

ocean vortex
#

Well I've shown you proven example of formatting when it can be the issue lol

If it's an issue, then LLM would catch this in most cases just what we see now. But in select instances may still fail for whatever reasons. Checks out if you ask me. Most of their AIME scores look correct. But only a few aren't

keen beacon
#

that was certainly an issue before, but they added additional processes (llm grading, which they claim to work >99% of the time, which does not explain the discrepancy)

#

it is not applicable for this evaluation of this new model

ocean vortex
#

It proved my point? If it was an issue at any point

#

then it can be an issue now for different benchmark

keen beacon
ocean vortex
keen beacon
#

are you explaining the difference is because the judge failed a lot more than >99%?

#

on checking if a number is present in the answer?

#

while i have shown a lot of examples showing larger differences because of other factors

ocean vortex
#

You didn't even quote AA at all to "prove" what you were saying. You quoted livebench 🤣

keen beacon
#

wow you're an idiot fr LMAO

#

im done, people can see who's right here if they've done evals like this before

ocean vortex
little siren
#

no sound on videos?

ocean vortex
# keen beacon wow you're an idiot fr LMAO

calm down. You really can't be "proving" anything by quoting smth almost entirely unrelated, and then when I show you AA themselves stating something that directly proves my point... It's suddenly not good enough because it was in the past. And you assumption being this can't ever be a problem anymore even for a different benchmark from their testing suite. Sure pal... 🗿

echo aurora
#

Hey lets be sure to treat others with respect please, it's fine to have disagreements but let's try to be a bit nicer blobthanks

keen beacon
#

im sorry for that. just frustrated he's constantly misquoting/etc.

anyway Dom (sorry), what i'm saying:
-> could it be the grading? is it a non zero chance? yes. but i personally find it extremely unlikely (like i've said over and over). the nature of AIME is just an integer answer. they're just asking it to box it. it's quite simple to parse and check for an exact match. If that fails, a LLM judge they added recently (which claims to match human judgement >99% of the time), fails on such a simple task?
from a RLVR perspective, qwen seems to do it on \boxed{...} answers. so these things are at least implicitly penalized as it won't match automated checkers if it deviates from the answer format unless they are using generative checkers. for these problems, it's probably unlikely as the check is simple. it's a waste of compute.
-> i'm saying that it could be sampling, prompting related, etc. i'm not saying it is one of them. it could be an interaction of all of them or another thing. i find it way more plausible because there have been actual high profile events like this before on other benchmarks, and personal experience when running those type of evals.

imo, claiming that it's highly likely to be the grading is just 🤷

native coral
#

anyone knows what is so wrong with this prompt?

Overall Prompt:
 A high-angle, slightly shaky handheld long shot of two young adults
wading through a dense, vibrant green field of cassava plants under 
overcast, diffused lighting. The camera slowly zooms out and pans 
slightly to the right, following the adults as they move deeper into 
the field. The overall mood is naturalistic and serene.
Timestamp Breakdown:

00:00 - 00:02:
 The shot opens with a high-angle view, looking down on a vast, dense 
field of lush green cassava plants. Two young adults are partially 
visible amongst the leaves. One guy, in the foreground, wears a 
light-colored, long-sleeved shirt. The second guy, further back, is 
shirtless and raises a light-green basin above their head. They are both
 moving from left to right through the dense foliage.
00:02 - 00:05:
 The camera begins a slow, subtle zoom out. The guy in the foreground 
turns slightly, becoming more visible. The second guy , carrying the 
green basin, continues to move through the plants, their upper body and 
the basin visible above the leaves.
00:05 - 00:08:
 The camera continues to slowly zoom out and pans slightly to the right,
 keeping both adults in the frame. The adults have moved further 
into the field. In the background, a simple wall made of concrete blocks
 and some bare trees become visible above the cassava field.
00:08 - 00:09:
 The shot stabilizes as the adults continue their journey through the 
field. The guy in the foreground is now more clearly seen from the 
back, wearing a light-colored shirt. The second guy , still carrying 
the basin, is further to the right. Bare, brown branches are visible in 
the extreme foreground on the right side of the frame.

it gets blocked.

little siren
#

"shirtless"?

keen beacon
white hatch
#

How to send a REALLY huge message to AI?

keen beacon
woven totem
#

hi , how can i upload photos ? image to image?

#

im having problems in the web

keen beacon
#

So I first save the pics and then put them in.

dusty niche
keen beacon
#

sometimes I need to refresh the page too because of cloudflare check

#

bot check, I mean

ocean vortex
# keen beacon im sorry for that. just frustrated he's constantly misquoting/etc. anyway Dom (...

The issue with this is that you are misquoting me here in this very message yourself. I said it "could be". Assumption not much stronger than your shots at prompting or sampling which IMO don't really hold a candle given that we have their exact prompting, can't reproduce low scores, and the extent of scoring discrepancy. LLM judge catching it most of the time would be reasonably in-line with the results they are actually getting. AIME score checks out with other sources 90%+ of the time.

keen beacon
#

its implied you think its very highly likely to be the grading

#

as you considered the others not really possible

ocean vortex
keen beacon
ocean vortex
#

You kept repeating the same thing like 3 times before I finally responded to that as well lol

keen beacon
#

🤷 i've ran a bunch of math benchmarks and never had that problem on extracting integer answers that are boxed from models that wasn't quickly resolved like i mentioned. i've given you a lot of reasons on why it's not plausible that result in a lot more discrepancies with the scores.

#

lets see what artificalanalysis says if they do fix it and comment on it. i shouldve agreed to disagree there since i wont convince you

fiery lagoon
#

bro ai sucks

ocean vortex
#

so even there there is a difference

keen beacon
#

yup, that's normal

#

it doesn't matter tho, the \boxed{number} is what matters

fiery lagoon
#

opus 4.1 on lmarena sucks

ocean vortex
#

even though it looks the same when copying

keen beacon
#

the reason \boxed{...} is used for easy extraction

ocean vortex
keen beacon
#

fwiw, like i said again, i've never had formatting/extracting issues like that with AIME

#

ive run so many benchmarks on different models

#

this behavior is heavily selected out of newer heavily RLd models too

#

the prompt/sampling/etc., i've seen crazy benchmark score changes though

ocean vortex
#

if even that

keen beacon
#

this should not be a problem

#

dont they run it 10 times?

ocean vortex
#

I think it's more model specific. As in 1 model out of 10

ashen plaza
#

So what's the chances video generation ever comes to the main website?

ocean vortex
#

So you wouldn't immediatelly catch this when testing your eval scripts

keen beacon
ocean vortex
keen beacon
#

artificialanalysis does 10 repeats for aime, etc., and test way more different models. so a chance thing (for different models) is possible, but i dont find it that likely personally

keen beacon
subtle frost
#

What's the best ai rn for deep research

For a question like "Write the expected salary of an orthopedic surgeon from starting surgery to retirement"

trim trail
#

#dancing

pure comet
ocean vortex
#

It's interesting that glm4.5 and glm4.5v both got similarly low scores (so likely the same issue for both). Also a look at non-reasoning gpt model progression here cause why not...

keen beacon
serene mango
#

There is no sounds on the vidéos generated?

white hatch
#

Why claude doesn't support images in lmarena?

rustic knot
storm needle
hollow imp
#

Just guess how many people payed 12$ and got scammed

#

Mf is not even giving gpt 5 high or Claude 4 opus he is seriously giving gpt 5 chat

storm needle
# hollow imp Bro he's such a big creator

do people even still launch api scraper startups? do they actually find success anymore? it feels impossible unless you’ve landed some special discount deal with openai or anthropic. and if all they’re really doing is passing your data along, you might as well just use something like lmarena

hollow imp
#

He will succeed and scam millions

#

You can see in the comments non ai knowledge people surprised af

hollow imp
storm needle
hollow imp
#

GPT 5 CHAT

#

CHAT

jade egret
raw grove
#

do i own the commercial licence to stuff I generate using LM Arena? in particular, generations by Qwen?

echo aurora
raw grove
#

I looked for that but could not find it? would you please send a link?

proud oar
#

It's basically a "you own your stuff but we can profit from your work and theres nothing you can do about it"

#

This looks like something out of nexon's terms of service

raw grove
#

ok thanks

balmy mist
#

have yall used the new claude sub agents?

toxic quiver
#

Hello

#

Everyone

white hatch
#

yo

toxic quiver
#

How are you

golden ocean
keen beacon
#

Why didn't nobody tell me there's a Google insider in this chat

#

💀

analog ridge
#

Do the most advanced image and video generation AIs have a true understanding of depth and spatial distance? For example, can they accurately interpret a prompt like: The character is standing 50 meters away, facing the camera?

golden ocean
#

some can guess if u give common numbers like 1, 100, 1000 or 1000000 they'll now u mean small, medium or large distance but not even close to accurate

surreal creek
fiery lagoon
plain carbon
#

I never expected this to be a problem prompt, but both AI's are taking forever. It might be faster to ask here:

I am having issues with CSS. My first question is, how do I center blocks of text? If I have a paragraph being displayed with the proper width, but along the left edge, what do I need to add to get it centered?

#

(yep: "Something went wrong while generating the response. Please try again.")

wary pagoda
#

Hallow

floral ginkgo
#

Hey everyone

solid snow
#

hello just arrived here!

echo aurora
#

welcome!

patent bear
#

wassup

sly estuary
#

fix ... pls

grave burrow
#

I have access to the following models :
Gemini 2.5 Pro
GPT-5 Thinking
Claude Sonnet 4.0 Thinking
Grok 4
o3

Which one would be the best for casual research work of LLM interpretability in Python?

broken coyote
#

Gpt 5 think

simple elm
#

hi

echo aurora
high yacht
#

Hey what's up? wave_animated

brave orbit
#
poll_question_text

Whats The Best AI In every task

victor_answer_votes

19

total_votes

28

victor_answer_id

1

victor_answer_text

GPT 5 high

simple elm
#

hello 👀

earnest rover
#

@echo aurora Hey bro, I’ve been trying to find a complete list of your projects but couldn’t locate one anywhere. I’m also curious—are there any new projects in the works? And what happened to the Chatbot Arena web app? I remember it used to be a great platform.

left iron
#

hello

mild pebble
#

👋

astral eagle
#

hello

surreal creek
#

it’s ranked in all those categories from users responding to it on LMArena 😂

fossil mantle
#

hello

whole wagon
#

LM arena never works for me these days

#

Site finally loaded kek

marble crypt
#

here to generate amazing videos

keen beacon
toxic cypress
#

compair ais

whole wagon
#

I heard Gemini 3 is delayed. Lol

devout vault
obtuse heart
whole wagon
#

bruh polymarket is not working properly

#

gives this for many bets

#

including the gemini 3 one 🙁

surreal creek
#

This is normally the time they do server maintenance

obtuse heart
ocean vortex
dusk lantern
#

hi

solid brook
#

hmmm

#

i think this week is the week

#

we might get banana

full idol
#

its alternative app for polymarket

dense sphinx
#

What diffbot-small-xl model does?

#

Is anyone have question?

whole wagon
#

How tf am I already too late

#

The odds somehow crashed already for before October 31

sly estuary
#

pls fix: "Something went wrong with this response, please try again."

solid brook
#

they don't release gemini 3

#

and

#

the current gemini 2.5 is dogsht

ocean vortex
whole wagon
ocean vortex
#

it's still competing with the best

whole wagon
#

So they have to before then

solid brook
ocean vortex
#

gpt5 only narrowly beats it

ocean vortex
#

I mean the performance of it in general on various things

solid brook
whole wagon
#

It's 2nd best lol

ocean vortex
#

you need to be reading reddit less

solid brook
#

i used both gemini 2.5 pro and gpt 5

ocean vortex
whole wagon
#

On average

ocean vortex
#

Claude is worse

#

in nearly every way

torn mantle
#

there will be no agi or asi with the current methods

#

what happened to ilya one shot to asi

#

and mira

whole wagon
#

Unless agi is already here and it's just not worth much

solid brook
torn mantle
#

did you yet

whole wagon
#

Average human reads at 5th grade level or smth you know. It may be not be useful to have just agi

solid brook
#

did what?