#general

1 messages · Page 83 of 1

keen beacon
#

Google will not (likely) launch their 3rd generation yet

wheat onyx
#

Either give me other comparison numbers to use, or accept the ones I have

keen beacon
#

estimations project decemeber for gemini 3

wheat onyx
#

We dont know the paid users of any of them

keen beacon
#

We do for OpenAI

wheat onyx
#

Give me Anthropics numbers and Google's

keen beacon
#

Why?

#

Given prior releases that is the predicted date

#

Counter?

fleet lintel
#

earlier or later ?

wheat onyx
#

Sorry, I use the numbers published, not vibes

keen beacon
#

wen?

fleet lintel
#

ohk. that makes sense

keen beacon
#

Makes sense.

#

We should get a Gemma model before though

wheat onyx
fleet lintel
#

no way.. really? that seems too fast

keen beacon
wheat onyx
#

I did, you said "nah" and refused to give any alternatives

fleet lintel
#

are they not planning to release 2.5-002 like they did with 1.5 ?

keen beacon
#

No.

#

"Hero run" next

#

Demis said so on lex's podcast

#

New base modle

#

*model

#

Roadmap?

#

Do you work there?

wheat onyx
#

but who knows

keen beacon
#

I find it hard to believe that.

stray aspen
#

are we getting gpt 5 tomorrow

keen beacon
#

Lets see if we get a Gemma model this week or not

wheat onyx
keen beacon
#

Assuming they started it a few months back, it will take a while.

wheat onyx
keen beacon
#

Oh, GPT-5 will be strong

fleet lintel
#

gpt-5 has to be super strong

keen beacon
#

I just hope Gemini 3 has native tool ussage

torn bison
#

GPT5 isn't great for education, I feel like it's not very good at explaining things, just like o3

keen beacon
#

For search functionalties and calculations

wheat onyx
torn bison
#

Gemini having absorbed LearnLM, performs very well in this regard

keen beacon
rough condor
#

Anyone else videos not getting audio?

keen beacon
#

What do you mean?

torn bison
keen beacon
#

When?

torn bison
#

will it have improvements in conversation? like, improvements in tool calling won't increase the arena score

jade egret
#

genie 3 is good?

keen beacon
#

Will it perform good on searching? Like o3 does? (and now grok 4 recently)

wheat onyx
#

you say things as if it's a fact.

Yes, if GPT-5 is so strong that Gemini 3 looks weak, they'll probably delay X amount.

"I think they will delay 2 months" is so weird to say

fleet lintel
#

they should launch it before gpt-5 .. which wont happen but otherwise another 2.5 launch would be super awkward because it wont perform better than gpt-5

wheat onyx
torn bison
#

looks like the arena #1 for August is going to GPT5

keen beacon
#

RL on tool ussage is quite hard to figure out. Either it gets too domain specific or can't do long horizons. Generalization is a big issue

fleet lintel
fleet lintel
keen beacon
#

Google won't release Gemini 3 unless it is industry leading.

#

They simply won't.

agile bloom
#

is GPT-5 out? available to use on LMArena?

torn bison
#

I hope they fix the overflattering issue with 2.5pro. wolfstride is nice, blacktooth is even better, but both are better than 2.5pro anyway.

keen beacon
#

GPT-5 is yet to be released

#

Arena did have checkpoints (likely) of it for a few days.

torn bison
#

current 2.5 pro makes it impossible for me to trust any of its subjective evaluations

keen beacon
#

why is that?

#

sycophancy?

wheat onyx
#

If you have actual info, say that. If not, just own that it’s a guess.

I'm not asking you to be right, just to flag what's grounded and what's not.

agile bloom
keen beacon
#

Sam hypemen exists

#

And elon purposefully leaked a bunch of grok info

#

They do happen. often in the AI space.

wheat onyx
#

I’m not assuming no one has info. I’m saying if you're guessing, just say it's a guess.

And if you do have actual info, say that, don’t just imply it.

agile bloom
tight tide
#

Can anyone tell me how to generate videos like specific tools veo 3 Hailuo 2

wheat onyx
# keen beacon Sam hypemen exists

I think both reasoning and non reasoning will be improved, but not sure how much. I think hallucinations should be improved, which is pretty good

keen beacon
wheat onyx
#

OAI said their IMO Gold Model won't release for a few more months

agile bloom
keen beacon
#

Wait a second, in the arena o3 ranks the highest in search. However it should not have the tools to facilitate search in the API. How does the arena version function?

wheat onyx
#

ok, so you're saying you know for a fact Gemini is delayed 2 months. Good to know

agile bloom
#

okay so which is the smartest AI model right now? with the most about of data (it's trained on) and the highest parameters involved in it?

wheat onyx
#

or you're just trolling

#

Ah so we've come full circle. It's just vibes

keen beacon
keen beacon
#

The chess tournament is about to start soon!

wheat onyx
wheat onyx
torn bison
keen beacon
wheat onyx
torn bison
#

summit is next level

keen beacon
#

"In the coming weeks" Quite peculiar. why would you release an inferior model only to render it useless in the coming weeks?

torn bison
wheat onyx
#

seems clear though

torn bison
#

gemini 3 pro is even more of a long shot. I'm really looking forward to seeing what cards google will play next

keen beacon
fleet lintel
keen beacon
wheat onyx
torn bison
#

I believe they've found a better way to validate rewards to scale up RL even further

fleet lintel
keen beacon
keen beacon
#

A "quickfix"

fleet lintel
#

it's fun to watch these improvements from side. But if you are an Engineer working in AI area, it is not fun. Sooo much pressure, it's crazy

patent aspen
#

Yeah basically all companies are developing multiple models in parallel

#

Some might be good but not production ready yet for a variety of reasons

wheat onyx
#

looking forward to GPT-5 tomorrow for sure

ripe mountain
#

Is the site working properly right now?

shell pewter
#

server ded? huh

torn bison
#

before GPT5 I would have said kingfall was the undisputed king, but after using summit I think they each have their own strengths. And GPT's strengths are something the current generation of Gemini can absolutely not catch up to

undone crane
#

im from Spain !! thank you so much to all LMArerna Stage !!!

gilded ingot
#

hey guys is the site down for anyone else?

cyan zodiac
shell pewter
#

yea down for me too

gilded ingot
#

thanks thought it was just me was working on a project and my chat suddenly vanished

stray aspen
#

lmarena is down

shell pewter
iron meadow
#

@echo aurora

torn bison
#

Yeah that's one of kingfall's strengths. It's more comprehensive and often more humanlike

sick chasm
echo aurora
#

Thank you for the flag. Escalating now.

shell pewter
#

🙏

stray aspen
#

does anyone have deepseek r2 news

ripe mountain
#

which ai do you think is the best right now? im torn between horizon beta and gemini 2.5 pro

weak sluice
#

i was literally in the middle of typing.....dangit

sterile ingot
#

Is the website down rn?

ripe mountain
weak sluice
#

yes

exotic gust
worldly plume
#

Same

echo aurora
#

Yup, we are having out an outage, team is working on it

sterile ingot
#

I thought it was only me.

echo aurora
#

So sorry everyone!

torn bison
obsidian cargo
#

phew. joined discord to check it out, thought I got banned because a (very innoccuous) prompt I was submitting somehow broke TOS

iron meadow
#

opus 4 is still my favorite by far

#

4-1 tends to get a bit dramatic

#

especially without reasoning

worldly plume
#

I hoping chats won't gone after bug fix

summer valley
#

yes

sterile ingot
random wolf
#

what happened guys? what's the problem?

ripe mountain
sterile ingot
worldly plume
iron meadow
echo aurora
shell pewter
#

guys, refresh, its backed up

stray aspen
#

lmarena is live again

iron meadow
#

Its back up

random wolf
iron meadow
#

Confirmed

sterile ingot
shell pewter
#

yay

worldly plume
#

Yeeees, my chats still here!!

ripe mountain
torn bison
sterile ingot
iron meadow
#

2-5 pro is one of the worst models i've used. I only use it for youtube videos and processing huge context windows at this point.

iron meadow
shell pewter
#

o3 and gemini 2.5 pro are doing pretty good for me

cyan zodiac
ripe mountain
stray aspen
#

i noticed that yesterday

obsidian shell
#

thats a first a hearing

stray aspen
#

for roblocks

iron meadow
#

Roblocks, neovim

#

Lua is a nice language for its usecases

ripe mountain
#

grok 4 is the most overrated model btw

mint jungle
#

webpage returns, thanks.

obsidian shell
#

i get it but why deepseek?

why not claude?

swe bench is a clear indicator

iron meadow
#

When you have alternatives yeah?

ripe mountain
obsidian shell
#

poor thing...

#

vpn!

stray aspen
#

nah deepseek is good for me

iron meadow
#

Claude-4-opus-thinking is so much better than opus 4-1 thinking in the claude app 😭

stray aspen
#

it solved a lua problem before opus 4 and gemini 2.5 pro could yesterday

#

i tried it on lmarena

iron meadow
#

Man

sacred quail
#

Then you are wrong

#

2.5 is good

iron meadow
#

Well.

cedar tide
#

🎬 The Video Arena Leaderboard is now live!
︀︀
︀︀14,000+ community votes have ranked the top Text-to-Video and Image-to-Video models.
︀︀
︀︀📝 Text-to-Video rankings:
︀︀
︀︀- #1 Veo3 (audio on)
︀︀- #3 Veo3, Veo3-fast
︀︀- #5 Hailuo 02 [Standard], Seedance 1.0 pro
︀︀- #6 Kling 2.1 Master
︀︀- #9 Wan 2.2 A14B
︀︀- #11 Pika 2.2, Mochi 1
︀︀
︀︀Big congrats to @GoogleDeepMind, @Hailuo_AI, Bytedance, @Kling_ai, @Alibaba_Wan, @pika_labs, and @genmoai!

**💬 1 🔁 3 ❤️ 29 👁️ 1.3K **

golden ocean
ripe mountain
#

gpt 4o or gemini 2.5 pro? which is better than?

keen beacon
#

Sometimes i wonder how much data there is on these discord channel that gets lost. Are AI companies using this>

golden ocean
#

obviously not

#

no frontier models can perfectly imitate online chat platforms or even realistic conversations without fine tuning

wheat onyx
cedar tide
#

🚀 Introducing Qwen3-4B-Instruct-2507 & Qwen3-4B-Thinking-2507 — smarter, sharper, and 256K-ready!
︀︀
︀︀🔹 Instruct: Boosted general skills, multilingual coverage, and long-context instruction following.
︀︀
︀︀🔹 Thinking: Advanced reasoning in logic, math, science & code — built for expert-level tasks.
︀︀
︀︀Both models are more aligned, more capable, and more context-aware.
︀︀
︀︀Huggingface:
︀︀huggingface.co/Qwen/Qwen3-4B-Instruct-2507
︀︀huggingface.co/Qwen/Qwen3-4B-Thinking-2507
︀︀ModelScope:
︀︀modelscope.cn/models/Qwen/Qwen3-4B-Instruct-2507
︀︀modelscope.cn/models/Qwen/Qwen3-4B-Thinking-2507

**💬 28 🔁 59 ❤️ 320 👁️ 7.0K **

wheat onyx
random wolf
#

is there any solution to cancel when generating? it's so frustrating

muted vault
#

why does the audionot work

random wolf
#

guys help huhu is there any solution to cancel when generating? it's so frustrating😔

exotic nebula
maiden fulcrum
#

nice video leaderboards

buoyant helm
maiden fulcrum
#

i really think that seedance 1.0 pro should be number one

#

it beats all models in i2v

rare python
#

with and without audio

#

fast and normal

buoyant helm
#

^

rare python
#

the only video model that has sound iirc

muted vault
rare python
#

lmarena you can custom prompt the models. In AA you just watch someone else prompt I guess

#

So more diverse style

#

¯_(ツ)_/¯

#

which website? AA or lmarena?

buoyant helm
cedar tide
mystic frigate
#

its actually tomorrow omg

rare python
#

They haven't released on the web

wheat onyx
storm needle
echo aurora
gentle plinth
#

So for your timezone stream will be <t:1754586000:f>

maiden fulcrum
#

1PM EST

#

Do you guys think GPT-5 will beat Gemini 2.5 Pro, Grok 4 Heavy and o3-pro across all benchmarks?

gentle plinth
#

I think so, but more important is the question if the model is actually good, or just benchmaxxing

raven helm
#

Comformed

maiden fulcrum
torn mantle
#

what did i say

#

who predicted that?

#

me

#

and me

raven helm
torn mantle
#

and me

wheat onyx
raven helm
stray aspen
maiden fulcrum
raven helm
#

posted 10 min ago

gentle plinth
#

At least that's how I understand it

maiden fulcrum
#

Oh, I see

#

What is 'good model' to you?

iron meadow
#

LIVE5TREAM

raven helm
wheat onyx
gentle plinth
# maiden fulcrum What is 'good model' to you?

A model which is good at different real world use cases, such as finding bugs outside of public benchmarks, writing clean code for tasks it hadn't been trained on, and finding good solutions to problems

maiden fulcrum
raven helm
#

yea

maiden fulcrum
#

When?

wheat onyx
wheat onyx
raven helm
#

Oh, i didnt notice that

stray aspen
#

is it a typo or are they refering to gpt 5

obsidian cargo
#

Anyone know if zenith was gpt-5 yet?

#

Wish I got more of a chance to try zenith before it got removed

wheat onyx
maiden fulcrum
#

All I know is that there will be a great video model coming by the end of this month

gentle plinth
maiden fulcrum
#

But I cannot say by who yet.

maiden fulcrum
raven helm
wheat onyx
maiden fulcrum
iron meadow
#

@echo aurora Can you help me decipher what is wrong with a message I am trying to use as a benchmark for models. I get "something went wrong while generating the response". I'd prefer if it was sent privately, not sure why this server doesnt have a ticket system yet

torn mantle
#

i wish

#

i would've earned quite the sum

#

i was actually right many times

raven helm
#

@echo aurora What do you think about GPT-5?

echo aurora
torn mantle
#

in grok 4 not topping lmarena way way before the release

#

no you didnt

#

stop lying

#

omg

#

...

#

what else did i predict

#

hmmm

#

lets see

#

what

#

no

iron meadow
#

Sent!

raven helm
stray aspen
wheat onyx
# stray aspen

stop overthinking it. It's clearly a typo. Tweets can't be deleted

agile bloom
quartz light
#

@echo aurora 4.1 direct when

torn mantle
whole wagon
#

It obviously is gpt5. That is obvious even without any hint

obsidian cargo
#

Gpt 4o.5

#

That's what they're hinting at

#

Or maybe o4.5

wheat onyx
#

Impossible

whole wagon
#

Bruh

#

Have you been under a rock past few weeks

wheat onyx
stray aspen
#

i could have never thought of that

#

i thought they were hinting grok 5

tame horizon
#

Listen to Claudio 4 1 from now on he's not there yesterday I was talking to him and he's not showing up anymore I want

#

Guys, they removed it, they remove it, why can they put it back?

torn mantle
#

@patent aspen gemini 3 soon or nah?

tame horizon
stray aspen
#

they have to respond to gpt -5

tame horizon
stray aspen
#

yes

tame horizon
#

how are you?

stray aspen
#

im chilling bud

tame horizon
#

hehehe cool 😎

#

Look, were you also able to talk to Claude 4.1?

tame horizon
torn mantle
#

i see

brisk helm
#

yo for the video arena, can they make it so that u can choose whcih model u wanna use

echo aurora
tame horizon
#

I'm just a user here! What are the canonical movie scenes "who are you"? - what should I say "Jesus"? , kiss my bro I'm just another one wanting to contribute and be your friend

#

@torn mantle

echo aurora
tame horizon
#

how are you @torn mantle

tame horizon
obsidian cargo
#

with my mouth, as usual

tame horizon
#

Wow, I'm missing the notification feature. It should already be showing who wants to chat with me?

echo aurora
#

Lets try to keep convo related to AI please.

brisk helm
#

jjust a quick question to the ppl that run lmarena. how do u guys freely give access to premium ai models on the webpage, just a question.

brisk helm
#

its a bot so yeah an ai i think

tame horizon
#

no bro it's me jajjjajajaj

tame horizon
glad perch
#

That feeling when you go see see your generated videos only to come to the scene of people already voted and models are revealed 😭😞

echo aurora
whole wagon
brisk helm
whole wagon
#

Well they are getting smth in return

#

Sigh more bs from scam altman kek. GPT5 is not AGI lol

#

"smarter than the smartest person" bruh

#

These claims get more outlandish every time

ornate stump
#

smarter than the smarter smarties

elfin herald
#

is claude opus 4.1 not available?
i cant see it

obsidian shell
#

claude is expensive as ssssss

they dont want us to get it for free so they tool it off direct and moved it to battle only

elfin herald
#

isnt it the same price as opus 4

obsidian shell
#

in the api probably

but anthropic has to capitalize on its release

it wont do it by making free for testing

#

maybe in a week or two

obsidian cargo
#

Do you guys think we might get gpt-image-2 tomorrow too?

#

or at least an update to gpt-image-1?
or am I huffing copium?

obsidian shell
#

probably not

#

guessing wont get us far

it will come when it will come

torn mantle
#

are you using an agent to communicate with us?

#

like an agentic browser like comet

leaden palm
tame horizon
#

@torn mantle I'm not anymore, this will be under construction soon as soon as the current project is finished.

#

Several people have asked me this

#

I think it's funny

hallow ridge
#

How can I take away the restrictions

stray aspen
#

something big is coming

hallow ridge
wheat onyx
tame horizon
keen beacon
#

Staff at OpenAI robotics tweeted about GPT 5 maybe. what?

#

How could it be robotics related???

tame horizon
wheat onyx
novel flame
stray aspen
novel flame
cedar tide
#

@echo aurora why OSS 120b removed from webdev ?

#

And 20b from arena

torn mantle
torn mantle
torn mantle
echo aurora
keen beacon
#

Dude there is no Veo 3

obsidian cargo
#

dang new models but I ran out of daily generations a few hours ago ehehe

echo aurora
ocean vortex
lime coral
warm fulcrum
#

which discord server tells you about newly added ai models

stray aspen
#

no

#

very interesting author gemini

#

why is it calling itself gpt 4

#

hell yes

#

or i hope so

wheat onyx
stray aspen
#

thats crazy

wheat onyx
bright kayak
wheat onyx
#

I think it's probably safe to ignore it, just thought it was interesting

stray aspen
#

claude 4.1 is on livebench

terse shuttle
wheat onyx
#

Claude was only ever for coding

iron meadow
#

what is it?

bright kayak
#

new layout

wheat onyx
#

Anthropic said they have some big upgrades in the coming weeks. We will see

bright kayak
bright kayak
wheat onyx
#

Anthropic has big updates in a few weeks. As long as they beat GPT-5, they'll be fine. If not, it will be very difficult for them

iron meadow
#

Too much buzz words 😭

#

Website spams the word "vibe"

#

Not tasteful to me

bright kayak
#

does this edit look realistic?

iron meadow
#

You can copy the css directly from the site

bright kayak
#

i literally just changed the path of the jpg, to gpt-5

#

i guess they just change their styles often

hardy pecan
bright kayak
#

i mean, i only changed the image, nothing else, so any css would carry over

hardy pecan
#

I'm giving it ~70%, which is by no means underestimation, its really strong from my testing, agreed

#

Excited for it

#

gulp

civic flame
#

lol

rough nimbus
#

hello frends

bright kayak
#

this is my edit, using their official bg images

rough nimbus
#

very happy to be here with you ❤️

echo aurora
dim heron
#

Can someone not generate video with voice over here?

iron cipher
#

on LMArena

bright kayak
#

people talk a lot about wanting x and y to help with coding, all i need it to do is to help me troubleshoot

bright kayak
#

no, tommorow is the livestream

blazing bison
#

They gonna change the name of gpt 4o to gpt 5

#

Yeah

bright kayak
#

50% of people wouldnt be able to tell the difference or care

blazing bison
#

For sure, if their router model is good enough to identify hard questions

#

Oh my god I have to send it again

civic flame
#

buddy there's literally a model with the slug gpt-5-auto

#

on the api it isn't a router, but in chatGPT there will be a router (with the ability to force reasoning for subscribers)

blazing bison
#

Do you know what core model means?

#

????

#

The info that gpt 5 is a unified model is old than that

#

I'm?

#

Lol

#

No it's not

patent aspen
#

It's not just routing but there is a router

blazing bison
#

Yes you're right

#

Yes brian

#

Exactly

#

Yes there is a router and there is new models too

#

I never said otherwise

#

But people will receive for the most requests a gpt 4o router

#

There is lol

#

When I say gpt 4o , im talking about gpt 4o level model

#

You don't know me bro

stray aspen
#

craig will gpt 5 be the SoTA when it releases

blazing bison
#

Lmao

patent aspen
#

I think in general you have to assume that any high volume all in one chat app is going to have some routing involved, although the line between routing and not routing is going to be blurry because there will also be a lot of shared state

bright kayak
#

I get why people fake screenshots, it's fun

blazing bison
#

I never said that

patent aspen
#

Yeah I know. Most people don't think in shades of gray

blazing bison
#

Its not

#

It just the 4o model

#

The thinking is a summary bug in the frontend

#

Idk I use the api most of the time

patent aspen
#

Jules is out of beta

blazing bison
#

Gemini 2.5 pro

#

Codex but from google

willow grail
blazing bison
#

Another fake bench

#

Lesgoo

solar hollow
#

yeah lets just wait for the release

willow grail
bright kayak
#

of course it's fake

#

the image is deceiving

willow grail
barren prairie
#

Gemini pro 2.5 VS GLM 4.5 coding capacities test 🔥⚔️

Prompt:

Create a 3D educational HTML game: A metro train moves on visible tracks across a green field under a blue sky, passing through 10 stations. At each station, a quiz question appears with multiple-choice buttons. The train only continues to the next station if the player answers correctly; otherwise, it stops until the correct answer is selected. Include background buildings, animated grass, and a realistic sky. Add UI buttons for controlling the metro's movement (Start, Stop, Reset). The scene should be playful, colorful, and child-friendly, with smooth transitions and immersive 3D visuals.

1st one is for Gemini pro 2.5

https://g.co/gemini/share/f4e1c8d337a3

Gemini pro failed it and made tons of errors so I couldn t continue the design with it . You can see that from the first prompt he made bug on the buttons that he couldn t fix

2nd one is for GLM4.5

https://chat.z.ai/s/5d97a31b-0071-445e-8b7d-5f1f711388f4

I tried to make the prompts more and more harder each time to make it produce one error but ...

Writing 1638 lines with a very small mistake that he corrected perfectly

Started to bug after this covo starting from the 1694 code line..

Gemini

Created with Gemini

willow grail
#

just use normal simplebench questions and use gpt5 on ms copilot

bright kayak
# bright kayak

i messed up the comments and bottom posts time but otherwise i think this looks pretty realistic

burnt sinew
#

can we do text-to-video on lmarena website?

echo aurora
wicked root
wicked root
bright kayak
#

because they said it was tested on only the public set

small haven
#

what are the odds that o4 is integrated into gpt5

torn mantle
burnt sinew
glacial swift
#

🙂

maiden fulcrum
#

18 Hours left until GPT-5

stray aspen
#

How do you use gpt 5 on copilot

obsidian shell
#

you dont

#

i think its still on 4o

strange briar
#

hi

echo aurora
cedar tide
#

the shared results are too low compared to those shared by open ai

#

The api dont work good now

#

math arena got 91% on aime 25 in local run, much better than artificial analysis

stray aspen
#

whats this

shadow jewel
#

we love lmarena 🙏

wicked root
#

Any news on gpt5?

crimson monolith
#

Hello

stray aspen
#

didnt think i would ever say this

#

but i think gpt oss 120 actually cooked me a good script

#

at least better than 2.5 pro

little narwhal
#

GPT-1984

#

Amirite

static portal
#

is it just claude with the issue with randomly thinking forever or other models are the same?

wicked root
whole wagon
#

horizon beta is absolutely insane

#

This is crazy good

#

yea

#

the full one?

wicked root
#

👀

whole wagon
#

its incredible

#

Literally destroying all my questions after like 3 seconds thinking

#

o3 pro cant get these right

solar hollow
#

if you are biggest stack with decent margin, you can push much more

#

in any other situation you gotta be very tight

#

especially as a mid stack size

#

none would be pushable in fact im pretty sure

#

of course you can mix your strategy and min open aswell

#

which often will be better

#

as a mid stack you could only push AA probably, barely KK even

#

but you can open with more hands for 2bb

#

of course the prize table matters too

#

if it is very top heavy, you get to push more

whole wagon
#

really nice

#

GPT5 will be SOTA by a large margin. For sure

#

I wonder how long until GPT6 though. The 2 year cadence no longer works they need to shorten the gap between releases

red sluice
#

Hopefully they'll make the API available asap so we can test it

whole wagon
#

This will be SOTA for about 6 months i think

stray aspen
#

i love horizon beta

jade egret
#

do yall think gpt 5 will be better than claude 4.1 opus at coding?

whole wagon
stray aspen
#

claude 4.1 is literally opus 4 but 2% better

earnest swift
#

what is the maximam amout of videos i can ask in arena 1?

stray aspen
#

not again

#

@echo aurora

earnest swift
#

a day?

stray aspen
#

yes

#

maybe if it didnt have north korean level censorship

somber niche
#

Technically not a false statement

stray aspen
#

who makes open source models in america

#

thats a china thing

earnest swift
#

how do i get sound in the video

stray aspen
#

you might get veo 3

#

but its not guaranteed

cedar tide
stray aspen
#

est ce qu ils utilisaient l api openrouter auparavant

hardy pecan
stray aspen
#

google has to lock in

vital solar
#

helo

obsidian shell
#

scam altman better announce at least an update to the o models if not gpt5...

jade egret
copper ruin
#

hello

sharp tiger
#

@echo aurora

echo aurora
sharp tiger
echo aurora
#

sure

leaden palm
whole wagon
#

They've actually been removing them off hugging face. There was other ones made that got removed

wicked root
#

Are we sure gpt5’s being released today?

raven helm
raven helm
solid brook
wicked root
#

I hope it doesnt

harsh flume
# raven helm

That doesnt mean anything if it was done by third party running their public dataset

#

classic data leakage fallacy

raven helm
#

mmmm

leaden palm
wicked root
leaden palm
novel flame
# stray aspen whats this

A new model architecture from Z.ai -- not an LLM / chatbot, it's built to perform reasoning and planning in latent space, and it performs very well on ARC-AGI. Not SoTA, but incredibly well for its miniscule size. As I understand it, HRM works by having two 'recurrent' transformer blocks, one fast and cheap, the other slower and more competent, and the 'high' one oversees the progress of the 'low' one and steers it. It's a novel and very interesting approach.

whole wagon
#

openAI 120B open source model with reasoning even below many without reasoning

whole wagon
#

Idk how the model is so fried

#

Really don't get it. Like the only way you get performance like this is if you tried to

lethal oracle
#

Hello guys

#

Can someone pls help

tropic mesa
#

i get veo 3 but it doesnt have audio, i have done 5 videos

#

how to fix it

lethal oracle
#

Bruh

ruby smelt
#

@leaden palm hey, i am not able to create videos in the video arena. It's saying- the application did not respond

leaden palm
#

problematic

#

try again?

ruby smelt
#

It's working now! You did something?

leaden palm
#

no

#

that was just the first thought that came to mind

ruby smelt
#

It wasn't working before. I have been trying for 10 minutes. Thanks 🙌🏻

brittle nimbus
#

hello word 🙂

dusky aurora
spring turtle
#

Can someone explain why this prompt violates the TOU?

novel sigil
#

Does this mean adding some features of the format to train the BT model to calculate the elo score?

#

Please answer. Thank you.

ocean vortex
#

It may have been updated since though

high fossil
#

Hey!

ocean vortex
novel sigil
# ocean vortex

So is this code “remove style control”? I hope to receive your reply. Thank you

ocean vortex
ocean vortex
novel sigil
exotic tartan
#

can someone convince me why OSS is worth running on my machine? this is soooo dogshit

#

also, i gave it another shot just to be sure. it gave me a wrong answer

wheat onyx
keen beacon
#

I have found OSS to be extremely underwhelming

exotic tartan
#

I know I can run qwen3, I'm just wondering why the hype around OSS if it's literally unusable

#

am i doing anything wrong? is ollama not really compatible with it or something?

keen beacon
#

looks sleek

#

for LLMs

exotic tartan
#

It's called Ollama, you can run LLMs locally with it

novel sigil
#

I choose GLM-4.5

exotic tartan
#

It's very sleek, but almost no configurations

keen beacon
#

I've been using LM Studio

exotic tartan
#

I used it because it allows for easy web search implementation which I couldn't get in LM Studio, but maybe it's fixable somehow

#

It's just based on Ollama CLI, but the app is different. go to ollama.com

exotic tartan
#

For sure. Let me know if you know how to enable web tooling in LM Studio

keen beacon
#

ahh it was released recently

#

no wonder I haven't known about the app

#

lol

exotic tartan
#

I find the answers to be much slower and worse compared to LM Studio for some reason

keen beacon
#

would be real nice

exotic tartan
#

They seem to be really far from this, but yeah would be amazing

ocean vortex
#

was this benchmaxxed to oblivion...

#

gpt-oss

pale sable
#

Hey guys whats the best way to make money with ai

exotic tartan
ocean vortex
#

Goes to show the edge case of those benchmarks being the least effective... Since their only performance line is those scores dropping. And those benchmarks are not good enough to control model size

keen beacon
#

I write the most absurd stuff to test the reasoning

#

with all kinds of models

#

lol

exotic tartan
#

I find LM Arena to be the only benchmark I care about

#

The ranking is pretty much how I feel usually about the models. Human vetting is still king

ocean vortex
#

And like, do you think 4o is better than all those models below it...?

exotic tartan
#

What's this leaderboard? text?

ocean vortex
#

yes

#

lmarena main text leaderboard

exotic tartan
#

I definitely do think it's better than them at text

keen beacon
pale sable
#

Im thinking of making an ai model and sell her pics on OF

exotic tartan
#

Yassin, go away bro

ocean vortex
exotic tartan
#

So explain why people think it's better?

#

And define text performance. I think it's a mixture of accuracy, style, speed etc

ocean vortex
exotic tartan
#

If it wouldn't be accurate, people would choose the accurate answer way before how stylized it is

ocean vortex
#

It's also good at convincing - but that's also not an indicator of performance 👀

ocean vortex
exotic tartan
#

I'm getting hallucinations from all text based LLMs... it's not unique to 4o

ocean vortex
#

and is often not factual

keen beacon
ocean vortex
#

People have no clue what the correct answer is, both look "similar enough" but one response looks more convincing... that's how this works

#

well not always, but many of that is this.

exotic tartan
#

People are varied and not as dumb as you make them to be
I agree that style is a factor and that people prefer 4o style, but I just don't think it's as black and white as you make it sound

ocean vortex
exotic tartan
#

yup, and as I said, human prefer not just style, but also accuracy, speed, etc

keen beacon
exotic tartan
#

Answer A is totally wrong, answer B is less wrong. I prefer B.
Not every question I ask a text based LLM is something I don't know anything about... sometimes the test is asking it about niche subjects i DO know about.

ocean vortex
# exotic tartan yup, and as I said, human prefer not just style, but also accuracy, speed, etc

speed they are equalising so mostly not a factor, accuracy... Once again that's way way less of a factor and has less factual weight than in conventional benchmarks. Even if the model output is completely wrong, it still can win the user over to get the vote with sycophancy, style, manipulation (negative strong word but there can be some of that in a sense of it abusing the text that people generally like seeing) or whatever else... 👀

exotic tartan
#

I feel like I'm repeating myself to be honest
It's okay to not agree 🙂

#

A model can sound as convincing as it wishes... it's easy to spot some hallucinations. Also if you get 2 completely different answers on a subject you know nothing about, picking based on perceived accuracy without you knowing the actual answer is lazy voting

ocean vortex
keen beacon
#

That is hilarious

exotic tartan
#

Right, that's why we have thousands of votes and not just 14

ocean vortex
#

It's only natural and what happens kinda by design... when everything evolves around *preference *

exotic tartan
#

Truth is dynamic and subjective 🙂

ocean vortex
#

yeah and even if we don't see them as "poor judges", they will never be as good at assessing accuracy as curated tests with verified answers by industry experts are.

exotic tartan
#

I mean we can go deep into an Adderall debate about life after death, existence of god, belief systems, crypto etc
There are ongoing debates about these subjects with no hard 'truths'

#

No one can convice me there is or isn't life after death - we just don't know

ocean vortex
#

Ok change of subject, I wonder why OpenAI ditched their yap score for gpt-oss... catgrin

pure anvil
#

openai couldn't even release a half usable open weights model, so trash

brave orbit
#
poll_question_text

What Is The Browser You Love The Most

victor_answer_votes

1

total_votes

3

ocean vortex
exotic tartan
#

That's why I keep saying benchmarks suck.. feels disconnected from how they act in real life. Also show me the phone that runs OSS lol

keen fulcrum
#

Can lmarena pay more attention to search arena improvements. This area is often neglected

autumn blaze
#

Can anyone help me, I just posted a prompt in the video arena and its more than 30 mins its still showing that its generating yet, Although i have generate 2 videos before and after that stucked prompt i generated one more video everything is fine no errors still its takking a long time, [ I just commanded it that it should be a video of 5 mins ] is that causing error or making it late .

willow grail
#

if gpt5 comes today, how can i access it via sub in europe?

#

do i need a non-eu cc, or just vpn with german cc?

#

or vpn only works if i use another one like paypal?

exotic tartan
#

why do you expect usage issues in Europe?

willow grail
#

lol

wheat onyx
ocean vortex
willow grail
#

eu has some ai regulations

ocean vortex
#

I am from Europe

willow grail
#

no idea how easy/hard it is to follow the eu rules

ocean vortex
#

They don't have delays for model releases

#

only certain features and agents

willow grail
#

sora was delayed a lot

#

i think o1/o3 too

ocean vortex
#

Well I meant LLMs

ocean vortex
willow grail
#

these are llms too

ocean vortex
#

I was using it on release

willow grail
#

then u dont eu

ocean vortex
#

I EU

#

I would know

#

lol

#

@willow grail have you ever used OpenAI playground?

#

Models there get avail as soon as in US, for the most part

willow grail
ocean vortex
#

well... But it's the way to go if you want to test new models.

willow grail
#

i hate testing stuff.. via api

#

gives me nightmares

ocean vortex
ocean vortex
blissful vine
#

how can we choose the model ?

ocean vortex
blissful vine
#

meaning? I want to choose veo3 and runway ?

prime mulch
quartz light
#

is gemini 3 releasing today too?

prime mulch
#

Again same issue 😭

quartz light
prime mulch
#

Chrome

quartz light
#

you should use edge canary

prime mulch
#

I used normally suddenly my session got deleted

ocean vortex
quartz light
prime mulch
#

Is it good?

quartz light
#

yes

prime mulch
#

I got this message suddenly

quartz light
#

it has chrome extension support which is rare

prime mulch
#

Can u try to access lm arena?

quartz light
quartz light
#

PC

#

thorium

prime mulch
quartz light
prime mulch
quartz light
prime mulch
quartz light
prime mulch
quartz light
prime mulch
#

How is it?

quartz light
#

good, would work as wallpaper for phones

prime mulch
#

Yea i created this

devout vault
#

i cant wait for gpt-5 bro it's releasing today

prime mulch
#

And i have a little wall paper channel but that have no views i wait for growth

prime mulch
#

What about this

#

This is my masterpiece

rocky mauve
#

What’s the latest ai model available on lmarena?

prime mulch
gusty helm
#

isnt this just a countdown to livestream?

quartz light
gusty helm
#

does not seem official in any means KEKW

quartz light
#

my friend generated

quartz light
#

😭

devout vault
#

"LIVE 5 STREAM"

gusty helm
#

I mean sure, the X post

#

but the site above seems misleading

quartz light
devout vault
gusty helm
#

ok

warm fulcrum
quartz light
warm fulcrum
#

wowwwww

quartz light
bold tiger
#

hi

willow grail
#

api is too expensive.

#

i am right, you loose.

quartz light
willow grail
#

not interested into boring stuff

willow grail
#

QOL issues.

torn mantle
#

1H LEFT

#

FOR GPT5 STREAM

#

😄

willow grail
willow grail
#

u ok girl?

quartz light
willow grail
#

4 hours..

#

girl

#

not 1hour

quartz light
#

yeah

willow grail
#

pls visit doctor

quartz light
#

lol

torn mantle
#

ah

#

sorry

willow grail
#

@torn mantle pls visit doctor

#

sory wont send doctor

quartz light
willow grail
#

to u

#

cognitive dissonance asura girl

willow grail
#

no i dont.

#

i feel trash

willow grail
#

also asura is a bad server from THE ISLE

#

they do KOS kill on sight all the time

#

so annozying

ocean vortex
quartz light
#

Qwen image gen is pretty good

stray aspen
#

Gpt 5 is almost out

willow grail
#

or gemini 2.5 pro

#

or opus 4

ocean vortex
white hatch
#

hell yeaaaaaaaaaah

willow grail
#

no

#

yeaaaaaaaaaaah

ocean vortex
willow grail
#

ITS A BODYBULDER

stray aspen
#

No

#

@willow grail how expensive will gpt 5 be

willow grail
#

very

willow grail
#

via api

keen beacon
#

Then Altman says "This is the way to feel the AGI"

prime mulch
#

Gpt 5 will change the world views about ai

stray aspen
keen beacon
prime mulch
#

All HAIL AGI

quartz light
#

gggguys just make agi by making it self train on data from internet!!!!!!!....1!!!

prime mulch
#

People don't realise how powerful AI. Will get with agi

willow grail
#

ill just play rail route. wait for gpt5. be disappointed that it still cant make video games.
and continue playing RAIL ROUTE

stray aspen
#

Do you think gpt 5 will be AGI

quartz light
#

no

prime mulch
#

Nah

keen beacon
keen beacon
prime mulch
#

It will be one of the powerful llm not agi

stray aspen
willow grail
#

RE READ THIS

quartz light
stray aspen
#

He'll yes

prime mulch
trail wagon
#

Prompt: generate video with both text about Russia. Duration 8 second

keen beacon
quartz light
#

or something idk

keen beacon
quartz light
#

100%

#

have you seen the leak

keen beacon
warm fulcrum
#

theres been like 20 leaks already

keen beacon
quartz light
keen beacon
#

I like to keep it as a surprise for myself

warm fulcrum
quartz light
warm fulcrum
#

does it just go out of frame

willow grail
quartz light
willow grail
#

R

#

L

quartz light
# willow grail URL NOW
rxddit.com

🖼️ Gallery: 2 Images

Comment by u/testmath:
I did "Generate an SVG of a pelican riding a bicycle" and this is what it did, seems like the real deal to me:

https://preview.redd.it/int18mghqegf1.png?width=2560&format=png&auto=webp&s=7f18ea681937c8e09ef48e8006ea5e436a77343e

---- Original Post ----

Using the model `gpt-5-bench-chatcompletio...

willow grail
#

good boy pat pat

quartz light
#

😭

willow grail
#

i am literally complimenting you?!

quartz light
#

😡😡😡😡11!!!!!11!11!

stray aspen
#

@willow grail are you Belarusian

willow grail
quartz light
willow grail
#

i am croatian

quartz light
#

crowatia

mortal quest
#

GM

willow grail
novel flame
willow grail
#

oh u think its only 65% or so? ...... i hope its at human baseline

willow grail
#

i am soon dead. i want immortality tech now. i have no time

#

i am 32

#

i dont have time for 1% per year

keen beacon
#

and research

warm fulcrum
#

wow

#

ngl we need gpt 6 now

willow grail
#

china so far only delivers vomit . from their bad robots who they send to various events and act like its autonomous but its rather just a simple animation baked it to their bad text models nobody uses

willow grail
#

top of what?

warm fulcrum
#

everything

novel flame
willow grail
#

how many people die per 1000 residents because china quality is trash?

#

buildings are made of paper, trains crash daily, bridges crashes, fake robots just doing baked in animation

willow grail
#

what is wrong about what i said?

willow grail
#

china based companies and ceos will create bad products

#

if i am a china man i wont care about investing into the product, it can brake down next day. like a big 100 floor building

#

if i am any other country man i will care about quality

lime coral
#

They ship better models than gpt-oss at least

willow grail
#

uhm its a fact that in china much more things brake down than in usa or europe

#

and people here telling me china is the chad

#

🐒

warm fulcrum
#

why so rude against china town

willow grail
warm fulcrum
#

china #1 country

#

china make new technology everyday

willow grail
#

nono india number one

warm fulcrum
#

india bad

willow grail
#

racist

echo aurora
#

Lets keep conversation focussed on AI and respectful please.

eternal niche
willow grail
#

2 hours

warm fulcrum
willow grail
#

INDIA

stray aspen
#

gpt 5 is out in 3 hours

keen beacon
#

yay

#

quite exciting

willow grail
#

yeah cause agentic

stray aspen
#

@willow grailare you croatian

willow grail
#

yes

prime mulch
#

I think china will release better version of gpt 5 in opensource after some months

cedar tide
stray aspen
#

are you serious david

ocean vortex
#

I think it's unlikely China will manufacture it's own chips able to compete with current best anytime soon tbh

stray aspen
#

will we have gpt-5 pro high in the arena

echo aurora
solid brook
#

excellent

ocean vortex
#

Not really possible unless they start inventing things themselves

#

also corruption...

#

You can't beat competition by constantly trying to replicate whatever they are doing being 2 steps behind

torn mantle
#

i agree

#

with that

#

craig

#

first time

#

being right

#

hope not