#general | Arena | Page 67

wind moth Jul 10, 2025, 3:53 AM

#

questions to ask

#

they should just premiere

whole wagon Jul 10, 2025, 3:53 AM

#

It sometimes randomly has nazi outbursts it's not just political questions

wind moth Jul 10, 2025, 3:53 AM

#

a livestreme

hollow ocean Jul 10, 2025, 3:53 AM

#

guys did it start yet

wind moth Jul 10, 2025, 3:53 AM

#

like what many companies do

empty stump Jul 10, 2025, 3:53 AM

#

hollow ocean guys did it start yet

No

whole wagon Jul 10, 2025, 3:54 AM

#

Like the other day it was flaming literally random Jewish ppl on X. Since they had Jewish surnames

zenith saffron Jul 10, 2025, 3:54 AM

#

yea agree they should just go full hackathon mode and just replay things

#

not actually use it

whole wagon Jul 10, 2025, 3:54 AM

#

Without any political prompt

whole sundial Jul 10, 2025, 3:54 AM

#

supergrok heavy is not sold out anymore

#

(don't worry, i don't have $3,000 to spend. also, "SuperGrokPro"? I thought this was SuperGrok Heavy!)

#

they must have changed the name in the last second

whole wagon Jul 10, 2025, 3:56 AM

#

Bets on it starting on the hour?

north vale Jul 10, 2025, 3:58 AM

#

which hour of the night can it start at to minimize viewership

#

that's where my money is

whole sundial Jul 10, 2025, 3:58 AM

#

meanwhile someone on the grok discord accidentally bought SuperGrok Heavy

#

whole wagon Jul 10, 2025, 4:00 AM

#

You can just downgrade it and it'll refund the diff scaled by days

whole sundial Jul 10, 2025, 4:00 AM

#

wind moth Jul 10, 2025, 4:01 AM

#

https://x.com/i/broadcasts/1lDGLzplWnyxm

xAI

Grok 4 Demo Livestream

whole wagon Jul 10, 2025, 4:01 AM

#

It did start on the hour

#

Ez

north vale Jul 10, 2025, 4:01 AM

#

ggs

whole wagon Jul 10, 2025, 4:01 AM

#

Maybe they really did want it to start at nein

jade egret Jul 10, 2025, 4:01 AM

#

FINALLY

hallow pelican Jul 10, 2025, 4:01 AM

#

it started eventually

empty stump Jul 10, 2025, 4:01 AM

#

We were all wrong

hallow pelican Jul 10, 2025, 4:01 AM

#

🥳

whole sundial Jul 10, 2025, 4:01 AM

#

https://x.com/xai/status/1943158495588815072

xAI (@xai)

Introducing Grok 4, the world's most powerful AI model. Watch the livestream now: https://t.co/59iDX5s2ck

jade egret Jul 10, 2025, 4:02 AM

#

EZ

#

FINALLY

whole wagon Jul 10, 2025, 4:02 AM

#

That's a lot of viewers damn

zenith saffron Jul 10, 2025, 4:02 AM

#

as predicted, one hour and one minute late

jade egret Jul 10, 2025, 4:02 AM

#

whole wagon That's a lot of viewers damn

fr

#

how much r there?

#

21.6?

whole wagon Jul 10, 2025, 4:02 AM

#

It's climbing rapidly

jade egret Jul 10, 2025, 4:02 AM

#

or is that vies

whole wagon Jul 10, 2025, 4:02 AM

#

At 60k rn

jade egret Jul 10, 2025, 4:02 AM

#

whole wagon It's climbing rapidly

how do you see how much people are watching

zenith saffron Jul 10, 2025, 4:02 AM

#

now we're being baited

keen beacon Jul 10, 2025, 4:03 AM

#

Loading screen for another hour

zenith saffron Jul 10, 2025, 4:03 AM

#

the livestream began, but the real livestream has still yet to begin

#

😭

jade egret Jul 10, 2025, 4:03 AM

#

oh this?

whole wagon Jul 10, 2025, 4:03 AM

#

80k

echo aurora Jul 10, 2025, 4:03 AM

#

we have cool music now tho

jade egret Jul 10, 2025, 4:03 AM

#

echo aurora we have cool music now tho

: )

whole wagon Jul 10, 2025, 4:03 AM

#

100k

#

What the hell

leaden meteor Jul 10, 2025, 4:03 AM

#

"worlds most powerful ai model" hmm....

jade egret Jul 10, 2025, 4:04 AM

#

whole wagon What the hell

LOL

small haven Jul 10, 2025, 4:04 AM

#

over/under elon musk happy or sad on stream

whole wagon Jul 10, 2025, 4:04 AM

#

This is going to be the most watched livestream on X by far

#

120k

#

140k

echo aurora Jul 10, 2025, 4:05 AM

#

whole wagon This is going to be the most watched livestream on X by far

do you know what the current record is?

elder rapids Jul 10, 2025, 4:05 AM

#

whole wagon This is going to be the most watched livestream on X by far

wym? the most watched is well into the millions

whole wagon Jul 10, 2025, 4:05 AM

#

I mean watched live ofc

elder rapids Jul 10, 2025, 4:05 AM

#

yeah, live

jade egret Jul 10, 2025, 4:05 AM

#

146k for me rn

hardy pecan Jul 10, 2025, 4:05 AM

#

ooh

#

we on

jade egret Jul 10, 2025, 4:05 AM

#

152 now

zinc ore Jul 10, 2025, 4:05 AM

#

140k live will end up being millions of viewers

elder rapids Jul 10, 2025, 4:05 AM

#

do you mean the most watched AI related thing

jade egret Jul 10, 2025, 4:05 AM

#

yooo

zenith saffron Jul 10, 2025, 4:06 AM

#

i wonder who's the voice

elder rapids Jul 10, 2025, 4:06 AM

#

it's AI generated

torn mantle Jul 10, 2025, 4:06 AM

#

Who's talking

elder rapids Jul 10, 2025, 4:06 AM

#

it's deadass AI

empty stump Jul 10, 2025, 4:06 AM

#

When is it gonna be on lmarena leaderboard

whole wagon Jul 10, 2025, 4:06 AM

#

zinc ore 140k live will end up being millions of viewers

210k now

#

wild

zenith saffron Jul 10, 2025, 4:07 AM

#

lol "AI is advancing faster than any human"

small haven Jul 10, 2025, 4:07 AM

#

so this is going to better than kingfall?

jade egret Jul 10, 2025, 4:07 AM

#

elder rapids it's deadass AI

fr?

zenith saffron Jul 10, 2025, 4:07 AM

#

oh wait what

elder rapids Jul 10, 2025, 4:07 AM

#

zenith saffron lol "AI is advancing faster than any human"

progression wise yeah ofc

jade egret Jul 10, 2025, 4:07 AM

#

238k

whole wagon Jul 10, 2025, 4:08 AM

#

college exams were solved long ago man we are already on phd questions

#

get to the good stuff

zinc ore Jul 10, 2025, 4:08 AM

#

Elon is so low energy rn lol

whole sundial Jul 10, 2025, 4:08 AM

#

"Grok 4 is smarter than almost all graduate students in all disciplines simultaneously."

hardy pecan Jul 10, 2025, 4:08 AM

#

blud needs to sleep

whole wagon Jul 10, 2025, 4:08 AM

#

hardy pecan blud needs to sleep

bullish

hardy pecan Jul 10, 2025, 4:08 AM

#

order of magnitude is big

jade egret Jul 10, 2025, 4:09 AM

#

:0

#

is that good

zenith saffron Jul 10, 2025, 4:09 AM

#

is elon trying to emulate jensen

whole wagon Jul 10, 2025, 4:09 AM

#

jade egret is that good

yes

zenith saffron Jul 10, 2025, 4:09 AM

#

compute used for RL step

hardy pecan Jul 10, 2025, 4:09 AM

#

gwak 4, based

zenith saffron Jul 10, 2025, 4:09 AM

#

they didn't confirm that explicitly

hardy pecan Jul 10, 2025, 4:09 AM

#

oh nvm there is sneaky orange colours to cause confusion

elder rapids Jul 10, 2025, 4:10 AM

#

hardy pecan oh nvm there is sneaky orange colours to cause confusion

how is that confusing

#

?

#

they're highlighting the RL difference

whole wagon Jul 10, 2025, 4:10 AM

#

400k viewers sheesh

jade egret Jul 10, 2025, 4:10 AM

#

poll_question_text

in the next 10 years, will google win the ai race or get defeated?

victor_answer_votes

18

total_votes

25

victor_answer_id

1

victor_answer_text

win : )

zenith saffron Jul 10, 2025, 4:10 AM

#

yeah it's not explicit but yeah i see what you mean

#

it's likely it's basically the same base

whole wagon Jul 10, 2025, 4:11 AM

#

#

yeah

zenith saffron Jul 10, 2025, 4:12 AM

#

bro

#

"hebrew source text"

#

i'm dead

jade egret Jul 10, 2025, 4:12 AM

#

444k for me viewers..

#

crazy...

whole wagon Jul 10, 2025, 4:12 AM

#

ngl they are overhyping still. it aint AGI

#

ok he clarified lmao

zenith saffron Jul 10, 2025, 4:13 AM

#

i wonder how it does in math research

#

rly want to see my math phd friend try this

zinc ore Jul 10, 2025, 4:14 AM

#

Elon doesn't even believe himself

whole sundial Jul 10, 2025, 4:14 AM

#

26.9%

hardy pecan Jul 10, 2025, 4:14 AM

#

https://tenor.com/view/elon-musk-elon-musk-twitter-sink-gif-26995403

Tenor

whole sundial Jul 10, 2025, 4:14 AM

#

no tool

whole wagon Jul 10, 2025, 4:14 AM

#

550k viewers

#

its going to hit a million

whole sundial Jul 10, 2025, 4:14 AM

#

41% with tool

elder rapids Jul 10, 2025, 4:15 AM

#

wait

#

aren't the deep researches higher

whole sundial Jul 10, 2025, 4:15 AM

#

"we put the tools in training"

elder rapids Jul 10, 2025, 4:15 AM

#

lmao

jade egret Jul 10, 2025, 4:15 AM

#

whole sundial 41% with tool

is that good?

whole sundial Jul 10, 2025, 4:15 AM

#

i think so?

jade egret Jul 10, 2025, 4:15 AM

#

ooo

whole wagon Jul 10, 2025, 4:15 AM

#

its extremely good HLE yes

#

double the SOTA

elder rapids Jul 10, 2025, 4:16 AM

#

kinda inherent to the size tho, no?

#

what

jade egret Jul 10, 2025, 4:16 AM

#

holy it double

elder rapids Jul 10, 2025, 4:17 AM

#

can this dude talk faster

#

holy sht

#

😭

zinc ore Jul 10, 2025, 4:17 AM

#

Elon is so out of it lol

whole wagon Jul 10, 2025, 4:17 AM

#

he seems depressed

jade egret Jul 10, 2025, 4:17 AM

#

elder rapids can this dude talk faster

fr..

jade egret Jul 10, 2025, 4:17 AM

#

whole wagon he seems depressed

why

whole wagon Jul 10, 2025, 4:18 AM

#

i guess since grok turned into a nazi

#

he is sad

elder rapids Jul 10, 2025, 4:18 AM

#

did he just say "idk"

jade egret Jul 10, 2025, 4:18 AM

#

705k is crazy.......

wind moth Jul 10, 2025, 4:19 AM

#

he needs to stop yapping

whole wagon Jul 10, 2025, 4:19 AM

#

are there more benchmarks or smth

#

or a demo

echo aurora Jul 10, 2025, 4:20 AM

#

I'm sure there will be

jade egret Jul 10, 2025, 4:20 AM

#

is it gonna be on arena?

elder rapids Jul 10, 2025, 4:20 AM

#

ye

jade egret Jul 10, 2025, 4:20 AM

#

:0

elder rapids Jul 10, 2025, 4:21 AM

#

interesting conclusion

#

"yeah.... yeah."

jade egret Jul 10, 2025, 4:21 AM

#

i gtg...

#

im gonna come back soon tho

#

cya

echo aurora Jul 10, 2025, 4:21 AM

#

🍊

jade egret Jul 10, 2025, 4:21 AM

#

echo aurora 🍊

🍊

whole wagon Jul 10, 2025, 4:22 AM

#

they trained it on puzzle type things i bet

zinc ore Jul 10, 2025, 4:22 AM

#

Ik openAI and Google feeling pretty good rn

elder rapids Jul 10, 2025, 4:22 AM

#

this is a pretty bad stream

#

lmao

#

ts buns

zenith saffron Jul 10, 2025, 4:23 AM

#

"closing the loop on reality" makes me worried about paperclips

whole sundial Jul 10, 2025, 4:23 AM

#

50.7% now

#

with test time compute

hardy pecan Jul 10, 2025, 4:23 AM

#

text only subset

#

?

#

misleading!

zenith saffron Jul 10, 2025, 4:24 AM

#

wait, text-only subset?

whole wagon Jul 10, 2025, 4:24 AM

#

zenith saffron wait, text-only subset?

the image stuff isnt ready yet

zenith saffron Jul 10, 2025, 4:24 AM

#

wdym by your question?

whole sundial Jul 10, 2025, 4:24 AM

#

maybe they didn't finish the multimodal part yet?

zenith saffron Jul 10, 2025, 4:24 AM

#

whole wagon the image stuff isnt ready yet

o i see.
how do other models do on text-only subset?

whole sundial Jul 10, 2025, 4:24 AM

#

oh, test-time compute is Grok 4 Heavy

zinc ore Jul 10, 2025, 4:24 AM

#

Consc@1024

hardy pecan Jul 10, 2025, 4:26 AM

#

lol]

proper prawn Jul 10, 2025, 4:26 AM

#

They seem to be hiding reasoning

whole wagon Jul 10, 2025, 4:26 AM

#

simple problem

#

i could solve that even lol

#

bro is showing polymarket

zinc ore Jul 10, 2025, 4:26 AM

#

"seeker of truth" "aligns with reality"

#

Maaannn

hardy pecan Jul 10, 2025, 4:26 AM

#

what a dumb example,

#

gRoK wIlL pReDiCt tHe fuTuRe

somber hatch Jul 10, 2025, 4:27 AM

#

test time compute is just using more compute when answering a question. Reasoning tokens are an example of TTC, also running multiple in parallel and voting or colaborating is TTC. It's just use GPUs a lot while answering the question, not training or updating the model at all

whole wagon Jul 10, 2025, 4:27 AM

#

1M viewers

#

the odds plunged back down

#

53% to 41%

elder rapids Jul 10, 2025, 4:28 AM

#

they tryna hypnotize us lmao

zinc ore Jul 10, 2025, 4:28 AM

#

whole wagon the odds plunged back down

Like I said, Google and openAI feeling pretty good rn

somber hatch Jul 10, 2025, 4:28 AM

#

their graph was shoing how adding more TTC increases the performance. OpenAI has shown similar charts when they released O1 and O3

elder rapids Jul 10, 2025, 4:28 AM

#

somber hatch their graph was shoing how adding more TTC increases the performance. OpenAI has...

ye

storm needle Jul 10, 2025, 4:28 AM

#

what are the chances of everyone there getting fired

elder rapids Jul 10, 2025, 4:28 AM

#

they're not showing anything impressive

#

what's going on lol

#

they're showing what makes o3 cool

zenith saffron Jul 10, 2025, 4:29 AM

#

"wen grok 5"

elder rapids Jul 10, 2025, 4:29 AM

#

nice

#

😭

#

tool use

proper prawn Jul 10, 2025, 4:30 AM

#

elder rapids they're not showing anything impressive

The HLE numbers are impressive. Just the presentation and likely drugged Elon couldn't show it

elder rapids Jul 10, 2025, 4:30 AM

#

they're saying grok 4 is doing all this research to get the simulation correct

whole wagon Jul 10, 2025, 4:30 AM

#

latex was wrong lel

elder rapids Jul 10, 2025, 4:30 AM

#

proper prawn The HLE numbers are impressive. Just the presentation and likely drugged Elon co...

we haven't seen other labs do what they've done with HLE tho, Google could have even higher numbers and we'd never know

keen beacon Jul 10, 2025, 4:31 AM

#

I wonder if they switched off the qwq cold start lmfao

whole sundial Jul 10, 2025, 4:31 AM

#

so grok 4 is currrently able to look at images

proper prawn Jul 10, 2025, 4:31 AM

#

Training on tool use could be big

whole wagon Jul 10, 2025, 4:31 AM

#

its not that wild man

keen beacon Jul 10, 2025, 4:32 AM

#

They have so many resources. They should make their own XD and actually innovate

elder rapids Jul 10, 2025, 4:32 AM

#

no

zenith saffron Jul 10, 2025, 4:32 AM

#

imo there's a lot they're not revealing

whole sundial Jul 10, 2025, 4:33 AM

#

38.6% with tools, 44.4% heavy

empty stump Jul 10, 2025, 4:33 AM

#

Is the gemini 2.5 hle score w/o tools

whole sundial Jul 10, 2025, 4:33 AM

#

25.4% without tools

zenith saffron Jul 10, 2025, 4:33 AM

#

zenith saffron imo there's a lot they're not revealing

all they say is "RL", the devil is in the details

elder rapids Jul 10, 2025, 4:33 AM

#

whole sundial 38.6% with tools, 44.4% heavy

bs why is the discrepancy so low

zenith saffron Jul 10, 2025, 4:33 AM

#

oh i'm just saying like

#

this is not even close to a technical paper

empty stump Jul 10, 2025, 4:34 AM

#

Wonder grok 4 superheavy vs o3 pro vs gemini 2.5 pro deepthink

zenith saffron Jul 10, 2025, 4:34 AM

#

just because they're still using RL doesn't mean there isn't any innovation lol

clever estuary Jul 10, 2025, 4:34 AM

#

reich4 looks itneresting

whole wagon Jul 10, 2025, 4:34 AM

#

bro what in a few weeks

zenith saffron Jul 10, 2025, 4:34 AM

#

that's like saying GPT-3 was not innovative because it was just scaling the same method up

whole sundial Jul 10, 2025, 4:34 AM

#

they are training version 7 with vision and image gen

#

grok 4 is version 6

zenith saffron Jul 10, 2025, 4:34 AM

#

(whereas in this case we don't even know what methods they used)

zinc ore Jul 10, 2025, 4:34 AM

#

Gemini no tools does better than grok 4 no tools on HLE

jade egret Jul 10, 2025, 4:35 AM

#

back

whole sundial Jul 10, 2025, 4:35 AM

#

proper prawn Jul 10, 2025, 4:35 AM

#

61.9% in USAMO

jade egret Jul 10, 2025, 4:35 AM

#

what did i miss

echo aurora Jul 10, 2025, 4:35 AM

#

jade egret what did i miss

some demos

jade egret Jul 10, 2025, 4:35 AM

#

echo aurora some demos

is grok 4 very very good?

zinc ore Jul 10, 2025, 4:35 AM

#

Okay finally more benchmarks

elder rapids Jul 10, 2025, 4:36 AM

#

they're not

#

grok 3 mini remember

#

lmao

#

ye

#

prob

keen beacon Jul 10, 2025, 4:36 AM

#

For 10 minutes 😂

elder rapids Jul 10, 2025, 4:36 AM

#

we'll see in practice

#

deadass

jade egret Jul 10, 2025, 4:37 AM

#

but better models gonna come out very soon right

#

like usual?

keen beacon Jul 10, 2025, 4:37 AM

#

Maybe not this month

zinc ore Jul 10, 2025, 4:37 AM

#

I completely didn't retain how the reasoning works with grok heavy

jade egret Jul 10, 2025, 4:37 AM

#

keen beacon Maybe not this month

isnt gpt 5 this month or nah

elder rapids Jul 10, 2025, 4:37 AM

#

zinc ore I completely didn't retain how the reasoning works with grok heavy

just parallel

whole wagon Jul 10, 2025, 4:37 AM

#

zinc ore I completely didn't retain how the reasoning works with grok heavy

its simple man

#

they just do the parallel thing

#

same as o3 pro

#

holy cringe

proper prawn Jul 10, 2025, 4:38 AM

#

lol they know their audience

torn mantle Jul 10, 2025, 4:39 AM

#

Ewwww

#

What the hell

hollow ocean Jul 10, 2025, 4:39 AM

#

only $300 for heavy

#

not bad

whole wagon Jul 10, 2025, 4:39 AM

#

bruh

jade egret Jul 10, 2025, 4:39 AM

#

diet coke

hardy pecan Jul 10, 2025, 4:39 AM

#

my stomach hurts

whole wagon Jul 10, 2025, 4:39 AM

#

how are they not cringing

zinc ore Jul 10, 2025, 4:39 AM

#

😐

whole wagon Jul 10, 2025, 4:39 AM

#

aspartame ambrosia 😂

#

bro what

zenith saffron Jul 10, 2025, 4:39 AM

#

this is so ex machina coded

torn mantle Jul 10, 2025, 4:40 AM

#

Oh it failed

hardy pecan Jul 10, 2025, 4:40 AM

#

that was weary weary cringe

elder rapids Jul 10, 2025, 4:40 AM

#

torn mantle Oh it failed

ye

#

interesting

torn mantle Jul 10, 2025, 4:40 AM

#

Ye

elder rapids Jul 10, 2025, 4:40 AM

#

nice voice tho

echo aurora Jul 10, 2025, 4:40 AM

#

elder rapids nice voice tho

agreed

whole wagon Jul 10, 2025, 4:40 AM

#

it didnt fail he didnt realise it retained the context

empty stump Jul 10, 2025, 4:40 AM

#

now advertising openai

whole wagon Jul 10, 2025, 4:40 AM

#

lol

torn mantle Jul 10, 2025, 4:40 AM

#

Openai?

whole wagon Jul 10, 2025, 4:41 AM

#

RIP

jade egret Jul 10, 2025, 4:41 AM

#

did openai fail

empty stump Jul 10, 2025, 4:41 AM

#

no

hardy pecan Jul 10, 2025, 4:41 AM

#

why does it keep asking questions

#

just say the numbers!

whole wagon Jul 10, 2025, 4:41 AM

#

openai got destroyed

#

lmao

#

thats hilarious

torn mantle Jul 10, 2025, 4:41 AM

#

Lmao

#

That wasnt fair

#

No no

jade egret Jul 10, 2025, 4:41 AM

#

openai die : (

torn mantle Jul 10, 2025, 4:41 AM

#

He was faster

#

Nah

#

That's unfair

hardy pecan Jul 10, 2025, 4:42 AM

#

do they have a hitler voice, based

empty stump Jul 10, 2025, 4:42 AM

#

oh thats what they were showing the speed

elder rapids Jul 10, 2025, 4:42 AM

#

oh nice api

whole sundial Jul 10, 2025, 4:42 AM

#

real grok 4 arc agi 2 score: 15.9%

#

still very much in lead

whole wagon Jul 10, 2025, 4:42 AM

#

👀

zinc ore Jul 10, 2025, 4:42 AM

#

Yeah previous sota was 8%

whole wagon Jul 10, 2025, 4:42 AM

#

sheesh

#

ARC-AGI-2 is getting solved in a year

jade egret Jul 10, 2025, 4:43 AM

#

holy 2x

elder rapids Jul 10, 2025, 4:43 AM

#

I'm ngl I can just dismiss arc agi 2 scores

#

this one is obviously a training thing

empty stump Jul 10, 2025, 4:44 AM

#

I wonder about SWE bench score

torn mantle Jul 10, 2025, 4:44 AM

#

Now we are getting to the real stuff

elder rapids Jul 10, 2025, 4:44 AM

#

it's arc agi 2 score isn't proportional

hardy pecan Jul 10, 2025, 4:44 AM

#

i thought it cant see images

#

how did it do do arc

torn mantle Jul 10, 2025, 4:44 AM

#

That's benchmarks looks crazy ngl

torn mantle Jul 10, 2025, 4:44 AM

#

hardy pecan i thought it cant see images

Just text

torn mantle Jul 10, 2025, 4:44 AM

#

hardy pecan how did it do do arc

Text

hardy pecan Jul 10, 2025, 4:44 AM

#

translated to text?

#

ok

torn mantle Jul 10, 2025, 4:44 AM

#

hardy pecan translated to text?

Yea

whole wagon Jul 10, 2025, 4:45 AM

#

1.5M viewers damn

whole sundial Jul 10, 2025, 4:45 AM

#

grok 4 0709 is the current version

whole wagon Jul 10, 2025, 4:45 AM

#

what the hell is vending bench

jade egret Jul 10, 2025, 4:45 AM

#

whole wagon what the hell is vending bench

money : )

elder rapids Jul 10, 2025, 4:45 AM

#

nobody knows

empty stump Jul 10, 2025, 4:46 AM

#

Running a small shop or something

zinc ore Jul 10, 2025, 4:46 AM

#

Grok downloads about to skyrocket

torn mantle Jul 10, 2025, 4:47 AM

#

Please add it on lmarena so i can try it

#

Can you add it?

jade egret Jul 10, 2025, 4:47 AM

#

when do yall think google and openai will take to catch up

whole wagon Jul 10, 2025, 4:47 AM

#

how is bro gonna add it

jade egret Jul 10, 2025, 4:47 AM

#

cuz gemin 3 and gpt 5 is comming

torn mantle Jul 10, 2025, 4:47 AM

#

Arent you one of lmarena staff team?

elder rapids Jul 10, 2025, 4:47 AM

#

jade egret when do yall think google and openai will take to catch up

probably by their next releases

hollow ocean Jul 10, 2025, 4:47 AM

#

@deep adder deepthink already dead

jade egret Jul 10, 2025, 4:47 AM

#

elder rapids probably by their next releases

oo

torn mantle Jul 10, 2025, 4:47 AM

#

Okay~

empty stump Jul 10, 2025, 4:48 AM

#

jade egret when do yall think google and openai will take to catch up

few weeks

jade egret Jul 10, 2025, 4:48 AM

#

empty stump few weeks

dang

small haven Jul 10, 2025, 4:48 AM

#

whole wagon 1.5M viewers damn

views not live viewers

jade egret Jul 10, 2025, 4:48 AM

#

rlly fast ig

tidal schooner Jul 10, 2025, 4:49 AM

#

100% on aime25

#

insane

empty stump Jul 10, 2025, 4:50 AM

#

how does it do in IMO

whole wagon Jul 10, 2025, 4:50 AM

#

small haven views not live viewers

no its live viewers

#

1.7M rn

zinc ore Jul 10, 2025, 4:50 AM

#

It says 1.7m views

#

So probably sub 100k live

empty stump Jul 10, 2025, 4:51 AM

#

so dumb they dont show live count

whole sundial Jul 10, 2025, 4:51 AM

#

tidal schooner Jul 10, 2025, 4:52 AM

#

empty stump how does it do in IMO

~60%+

#

i think

#

which is absurd

wind moth Jul 10, 2025, 4:52 AM

#

thats usamo

#

not imo

whole wagon Jul 10, 2025, 4:53 AM

#

whole sundial

not much coming tbh

frigid phoenix Jul 10, 2025, 4:54 AM

#

Is grok that good? lol
Didnt watch the stream

zinc ore Jul 10, 2025, 4:54 AM

#

Last 20 mins was way better than the beginning

empty stump Jul 10, 2025, 4:55 AM

#

not trying it out until leaderboard results come out

proper prawn Jul 10, 2025, 4:55 AM

#

Is grok 4 in arena?

empty stump Jul 10, 2025, 4:55 AM

#

no

dawn wharf Jul 10, 2025, 4:56 AM

#

Screenshot_2025-07-10-07-39-32-682_com.discord-edit.jpg

#

Deepthink dead before even releasing

tidal schooner Jul 10, 2025, 4:56 AM

#

wind moth thats usamo

ah damn

empty stump Jul 10, 2025, 4:56 AM

#

dawn wharf

hmm no o3 pro

tidal schooner Jul 10, 2025, 4:56 AM

#

don’t think they tested it for imo then

#

still extremely impressive

#

better than i could ever do

zenith saffron Jul 10, 2025, 4:57 AM

#

tidal schooner don’t think they tested it for imo then

doesn't imo literally start today? lol

hardy pecan Jul 10, 2025, 4:57 AM

#

very lacking in the demo's, just benchmaxxxxing it seems

#

will need to try ourselves

tidal schooner Jul 10, 2025, 4:57 AM

#

zenith saffron doesn't imo literally start today? lol

july 10 yes

#

ironically

zenith saffron Jul 10, 2025, 4:58 AM

#

i am lowkey a little worried

empty stump Jul 10, 2025, 4:58 AM

#

tidal schooner july 10 yes

wow that is crazy

zenith saffron Jul 10, 2025, 4:58 AM

#

zenith saffron i am lowkey a little worried

i hope we don't get paperclipped

#

within the next 5 years

whole sundial Jul 10, 2025, 5:00 AM

#

grok 4 rate limit: same as grok 3 free, 20 per 2 hours for supergrok subscribers (not available for free rn)

empty stump Jul 10, 2025, 5:00 AM

#

someone try out grok 4 and tell me how good it really is

zenith saffron Jul 10, 2025, 5:00 AM

#

my p(doom) has been climbing

small haven Jul 10, 2025, 5:00 AM

#

dawn wharf Deepthink dead before even releasing

which deepthink?

dawn wharf Jul 10, 2025, 5:00 AM

#

small haven which deepthink?

Gemini

dawn wharf Jul 10, 2025, 5:01 AM

#

dawn wharf

@small haven middle column

small haven Jul 10, 2025, 5:01 AM

#

dawn wharf Gemini

no, as in kingfall or 2.5 pro base model

zinc ore Jul 10, 2025, 5:02 AM

#

https://vxtwitter.com/emollick/status/1943171795894370809

Ethan Mollick (@emollick)

Grok 4 creating the shader (no errors).

QRT: emollick
o3-pro does by far the best so far at my benchmark (scroll quote tweet thread for others): "create a visually interesting shader that can run in twigl app make it like the ocean in a storm"

It did take 21 minutes for o3-pro to think (and another 19 to fix a small shader error) https://t.co/KqzmuHm5Zf

▶ Play video

elder rapids Jul 10, 2025, 5:02 AM

#

holy pumped compute

#

we already know the pricing

tidal schooner Jul 10, 2025, 5:03 AM

#

whole sundial grok 4 rate limit: same as grok 3 free, 20 per 2 hours for supergrok subscribers...

grok 4 is free?

elder rapids Jul 10, 2025, 5:03 AM

#

it's like 3$ input 15$ output

tidal schooner Jul 10, 2025, 5:03 AM

#

huh

whole sundial Jul 10, 2025, 5:03 AM

#

no

elder rapids Jul 10, 2025, 5:03 AM

#

per million

#

wym?

#

it says 3 in 15 out

#

link

#

yo what

#

are you joking

whole sundial Jul 10, 2025, 5:05 AM

#

keen beacon Jul 10, 2025, 5:05 AM

#

No parallel compute? Lol

#

Must be a mistake

elder rapids Jul 10, 2025, 5:06 AM

#

wait where are we getting this from, I'm only seeing 3 in 15 out

#

if it's actually that expensive then all the hype for me is squashed

keen beacon Jul 10, 2025, 5:06 AM

#

It's a typo

elder rapids Jul 10, 2025, 5:07 AM

#

id thought so

keen beacon Jul 10, 2025, 5:07 AM

#

The pricing below that is correct

zinc ore Jul 10, 2025, 5:07 AM

#

elder rapids Jul 10, 2025, 5:07 AM

#

ye see

keen beacon Jul 10, 2025, 5:08 AM

#

Do they still give you reasoning 'summaries'?

#

Can u show if you don't mind I can't check rn

#

Oh I mean regular grok 4

#

Oh lol they actually summarize now

#

Thanks

keen fulcrum Jul 10, 2025, 5:10 AM

#

Grok 4 achieves SOTA

keen beacon Jul 10, 2025, 5:10 AM

#

Did they update the cut off or is it literally just rl on grok 3 lol

#

Yeah I didn't really watch it lmao

keen fulcrum Jul 10, 2025, 5:12 AM

#

@echo aurora when can we expect grok 4 to be added to lmarena? ArtificialAnalysis received early access

keen beacon Jul 10, 2025, 5:12 AM

#

Ultra should beat it though

keen fulcrum Jul 10, 2025, 5:12 AM

#

It would be great if grok 4 heavy can be tested by us as well

dawn wharf Jul 10, 2025, 5:13 AM

#

#

bro is flying

hollow ocean Jul 10, 2025, 5:14 AM

#

grok 4 heavy is the truth

fleet pine Jul 10, 2025, 5:19 AM

#

hollow ocean grok 4 heavy is the truth

so when grok 4 is coming in arena?

echo aurora Jul 10, 2025, 5:19 AM

#

keen fulcrum <@283397944160550928> when can we expect grok 4 to be added to lmarena? Artifici...

sorry to say I don't have any details to share atm 😦

hollow ocean Jul 10, 2025, 5:20 AM

#

fleet pine so when grok 4 is coming in arena?

https://tenor.com/view/neva-never-marc-homealone-youcantstopme-gif-9816521

Tenor

keen fulcrum Jul 10, 2025, 5:21 AM

#

looks like its available

elder rapids Jul 10, 2025, 5:22 AM

#

yeah

#

@leaden palm gave me a story it wrote

#

and I don't need any other examples

#

ts BUNS at writing

#

💔😢

empty stump Jul 10, 2025, 5:23 AM

#

what llm is best at writing so far

elder rapids Jul 10, 2025, 5:23 AM

#

empty stump what llm is best at writing so far

opus and 2.5 pro

#

think it might be

empty stump Jul 10, 2025, 5:23 AM

#

So gpt 4.5 is not best at anything

elder rapids Jul 10, 2025, 5:24 AM

#

empty stump So gpt 4.5 is not best at anything

if you give it an environment that communicates with the other models it'll do really well

#

but it's not that good by itself

dapper storm Jul 10, 2025, 5:24 AM

#

Why do you think that?

tidal schooner Jul 10, 2025, 5:24 AM

#

keen fulcrum looks like its available

@echo aurora yay

civic flame Jul 10, 2025, 5:26 AM

#

lol ive been using this

#

it's honestly kinda meh..

tidal schooner Jul 10, 2025, 5:27 AM

#

and it’s gone again?

elder burrow Jul 10, 2025, 5:27 AM

#

i only trust scicode for coding

look at 4 opus on livecodebench..
its behind nemotron 💀

#

#

this is accurate

empty stump Jul 10, 2025, 5:27 AM

#

grok 4 gone from arena bruh

elder burrow Jul 10, 2025, 5:28 AM

#

empty stump grok 4 gone from arena bruh

IT WAS THERE?

empty stump Jul 10, 2025, 5:28 AM

#

for a few mins

elder burrow Jul 10, 2025, 5:28 AM

#

bruh

indigo hazel Jul 10, 2025, 5:28 AM

#

empty stump for a few mins

lmao

whole wagon Jul 10, 2025, 5:29 AM

#

keen fulcrum Jul 10, 2025, 5:30 AM

#

empty stump grok 4 gone from arena bruh

why lol

empty stump Jul 10, 2025, 5:30 AM

#

no idea

primal orbit Jul 10, 2025, 5:30 AM

#

I managed to get 1 prompt to grok 4 in lmarena then it dissapeared and chat switched to chatgpt

zinc ore Jul 10, 2025, 5:31 AM

#

Yep

fleet lintel Jul 10, 2025, 5:31 AM

#

I tried scrolling 100 comments to understand grok4 is SOTA or not... I am still not sure.

Is it SOTA or not?

elder burrow Jul 10, 2025, 5:31 AM

#

💀

keen fulcrum Jul 10, 2025, 5:31 AM

#

empty stump no idea

I think they aren't paid for users using it yet

zinc ore Jul 10, 2025, 5:31 AM

#

empty stump Jul 10, 2025, 5:32 AM

#

i can do better on ms paint

elder burrow Jul 10, 2025, 5:32 AM

#

zinc ore

@old garden

old garden Jul 10, 2025, 5:32 AM

#

what

#

a

elder burrow Jul 10, 2025, 5:32 AM

#

2 Treys

old garden Jul 10, 2025, 5:32 AM

#

lol

civic flame Jul 10, 2025, 5:32 AM

#

zinc ore

i've noticed currently frontend dev seems really poor. i am giving it the benefit of the doubt because i am only able to use it through X where it has a system prompt and all of that applied, but still..

zinc ore Jul 10, 2025, 5:32 AM

#

Why u tag me twice

civic flame Jul 10, 2025, 5:34 AM

#

elder burrow Jul 10, 2025, 5:34 AM

#

hey at least its 20x cheaper than 4..

#

oh you meant regular 3

#

isnt mini better

zenith saffron Jul 10, 2025, 5:35 AM

#

you think it's contaminated for HLE, USAMO, etc.?

fleet lintel Jul 10, 2025, 5:36 AM

#

SuperGrok Heavy: $300/month (Multi Agent Version)

Dayum! Is it really that good?

elder burrow Jul 10, 2025, 5:36 AM

#

wtf is stonebloom

fleet lintel Jul 10, 2025, 5:36 AM

#

@deep adder time for you to buy $300/month version and tell us the truth

elder burrow Jul 10, 2025, 5:37 AM

#

💀

#

@rare python can we have grok 4 back

zenith saffron Jul 10, 2025, 5:38 AM

#

where's this from

#

what problem is this

whole wagon Jul 10, 2025, 5:38 AM

#

Just tell it not to search

elder burrow Jul 10, 2025, 5:38 AM

#

you can disable search in settings

#

:p

whole wagon Jul 10, 2025, 5:39 AM

#

Tell it in the prompt

elder burrow Jul 10, 2025, 5:39 AM

#

#

oh

#

lmfao

whole wagon Jul 10, 2025, 5:40 AM

#

The API will not randomly call tools

#

Without it being enabled

#

There are benchmark numbers without tools

stuck orchid Jul 10, 2025, 5:43 AM

#

Waiting for Grok 4 on LMArena to vote 👍

hardy pecan Jul 10, 2025, 5:43 AM

#

seems to be bad at coding, cant edit my code without bugging

small haven Jul 10, 2025, 5:43 AM

#

it thinks a lot

dapper storm Jul 10, 2025, 5:45 AM

#

No I hope

stuck orchid Jul 10, 2025, 5:45 AM

#

I think Grok 4 should be something like o3 in terms of reasoning and cognition.
o3 is lazy and also worse at coding than Gemini, especially in web coding, but according to my tests, o3 is the most powerful model for researchers.
Grok 4 should reason "from first principles."

abstract leaf Jul 10, 2025, 5:46 AM

#

stuck orchid I think Grok 4 should be something like o3 in terms of reasoning and cognition. ...

Grok 4 should reason "from first principles."

they hide reasoning

winged wing Jul 10, 2025, 5:46 AM

#

same base model as grok 3?

abstract leaf Jul 10, 2025, 5:46 AM

#

whn will you guys update the leaderboard ?

winged wing Jul 10, 2025, 5:46 AM

#

really i totally missed that

abstract leaf Jul 10, 2025, 5:47 AM

#

grok-4 should be around 1500.
-# my assumption.

winged wing Jul 10, 2025, 5:47 AM

#

gd damn good thing i got out break even

stuck orchid Jul 10, 2025, 5:47 AM

#

abstract leaf whn will you guys update the leaderboard ?

When Grok 4 appears in the arena

abstract leaf Jul 10, 2025, 5:47 AM

#

yes

zenith saffron Jul 10, 2025, 5:48 AM

#

did they actually?

#

i didn't literally hear them say that

winged wing Jul 10, 2025, 5:48 AM

#

u manage to catch any of the swings on HLE?

elder rapids Jul 10, 2025, 5:50 AM

#

the model is overall mediocre

#

@civic flame

indigo hazel Jul 10, 2025, 5:50 AM

#

basically like a student who cheats during the test at school

elder rapids Jul 10, 2025, 5:50 AM

#

have you tried that old

#

math prompt

#

with the answer of 3031

civic flame Jul 10, 2025, 5:50 AM

#

elder rapids <@1338136168344064040>

yes i've gathered

#

but right now i've only tried via Grok on X

elder rapids Jul 10, 2025, 5:50 AM

#

grok on X is different

civic flame Jul 10, 2025, 5:50 AM

#

which has a big annoying system prompt and tools it won't let me disable

#

yes i know

elder rapids Jul 10, 2025, 5:50 AM

#

alr

civic flame Jul 10, 2025, 5:50 AM

#

so i'm waiting for lmarena to add grok 4

elder rapids Jul 10, 2025, 5:50 AM

#

but from my testing

#

it's still buns

winged wing Jul 10, 2025, 5:51 AM

#

It might have some sauce with the scaffolding. Idk I wouldnt be so quick to judge. Its def gonna get blown out of the water by the end of the month tho

elder rapids Jul 10, 2025, 5:51 AM

#

ye ofc

#

not terrible at all

civic flame Jul 10, 2025, 5:51 AM

#

lol we're still waiting on R2

elder rapids Jul 10, 2025, 5:51 AM

#

yep, but nice benchmarks

civic flame Jul 10, 2025, 5:52 AM

#

seems to also suffer from being too succinct when writing code

whole sundial Jul 10, 2025, 5:53 AM

#

trained off of same base model as grok 3, they are just now training the new base model

civic flame Jul 10, 2025, 5:53 AM

#

no wonder it's mid

blazing bison Jul 10, 2025, 5:53 AM

#

Bro stop yapping, for god sake

zinc ore Jul 10, 2025, 5:53 AM

#

They should have kept the 3.5 naming scheme

rare python Jul 10, 2025, 5:53 AM

#

civic flame no wonder it's mid

wolfstride vs stonebloom, which one is better at web design?

whole sundial Jul 10, 2025, 5:54 AM

#

so i guess there will be a Grok 4.5 with image gen? Gemini 3/GPT-5 image gen will be better

blazing bison Jul 10, 2025, 5:54 AM

#

Because you're poor, don't even have access to the model and is yapping around

#

Yes you're

civic flame Jul 10, 2025, 5:54 AM

#

rare python wolfstride vs stonebloom, which one is better at web design?

can barely tell the difference tbh

blazing bison Jul 10, 2025, 5:54 AM

#

So stop yapping

hardy pecan Jul 10, 2025, 5:55 AM

#

children, stop arguing

blazing bison Jul 10, 2025, 5:55 AM

#

This is the mass user model, the true model is api only

small haven Jul 10, 2025, 5:55 AM

#

https://grok.com/share/c2hhcmQtMg%3D%3D_14297bc2-398f-427e-9195-687e89e05e81

Glove's Position After Falling from Car | Shared Grok Conversation

A luxury sports-car is traveling with open windows in the direction opposite of the south at 30km/h

#

what we thinking

civic flame Jul 10, 2025, 5:56 AM

#

oh it's right but

#

are all your tools set to OFF

#

yesyou can

#

one sec

small haven Jul 10, 2025, 5:56 AM

#

civic flame are all your tools set to OFF

search was off

hardy pecan Jul 10, 2025, 5:56 AM

#

im thinking 0.00025km long is very small

blazing bison Jul 10, 2025, 5:56 AM

#

If you use o3, sonnet, everything on their frontend is dumber than api

hardy pecan Jul 10, 2025, 5:56 AM

#

the bridge is 25cm

small haven Jul 10, 2025, 5:56 AM

#

check prompt

civic flame Jul 10, 2025, 5:56 AM

#

civic flame Jul 10, 2025, 5:57 AM

#

small haven check prompt

okay maybe it isn't too stupid

hardy pecan Jul 10, 2025, 5:57 AM

#

small haven https://grok.com/share/c2hhcmQtMg%3D%3D_14297bc2-398f-427e-9195-687e89e05e81

is the original question, having the bridge 25cm long?

elder rapids Jul 10, 2025, 5:57 AM

#

civic flame okay maybe it isn't too stupid

nah it's not stupid but it's not smart like 2.5 pro, or even o3

blazing bison Jul 10, 2025, 5:58 AM

#

It's not even secret, just search on X and you gonna see openai team talking about it. You're a yapping machine bro

elder rapids Jul 10, 2025, 5:58 AM

#

doesn't catch anything im saying

#

lmao

#

it just got crushed in a debate with 2.5 pro too

thorny bane Jul 10, 2025, 5:58 AM

#

what makes you think it's very contaminated?

small haven Jul 10, 2025, 5:59 AM

#

oh yea terminator svg benchmark, i forgot

whole sundial Jul 10, 2025, 6:00 AM

#

maybe grok would have even higher bench scores if it didn't use brave search, which has their own index so it kinda sucks. a sample search/deep research across gemini 2.5pro, grok deepsearch, duckduckgo, and perplexity shows that grok is the clear loser. brave's index is so bad that when i asked the question a few days ago, it finally found a relevant source while all previous attempts were just... bad. idk why they can't just license from duckduckgo lol

civic flame Jul 10, 2025, 6:00 AM

#

when are they dropping the code model 💀

#

i forgot about it

#

when is it

whole sundial Jul 10, 2025, 6:00 AM

#

august

civic flame Jul 10, 2025, 6:00 AM

#

i didn't see the stream

#

bruh

small haven Jul 10, 2025, 6:01 AM

#

civic flame when are they dropping the code model 💀

august

civic flame Jul 10, 2025, 6:01 AM

#

yeah this is disappointing

whole sundial Jul 10, 2025, 6:01 AM

#

september for "multimodal agent" a.k.a. new image gen

indigo hazel Jul 10, 2025, 6:02 AM

#

so i was waiting for the best model ever, but it's worse than o3 and 2.5 pro?

whole sundial Jul 10, 2025, 6:02 AM

#

to be fair grok was the first major ai that i know of that had native image gen. all the others did it this spring. they had it last winter.

zenith saffron Jul 10, 2025, 6:02 AM

#

it's just jimmy ba and tony wu the co-founders lol

indigo hazel Jul 10, 2025, 6:02 AM

#

so what

zenith saffron Jul 10, 2025, 6:02 AM

#

just researchers

whole sundial Jul 10, 2025, 6:03 AM

#

i think grok 4 was rushed, it's all hype

elder rapids Jul 10, 2025, 6:03 AM

#

yo

undone hull Jul 10, 2025, 6:03 AM

#

Can't wait to see MechaHitler 4 on the charts

elder rapids Jul 10, 2025, 6:03 AM

#

@civic flame it hallucinates search a ton

#

lmao

winged wing Jul 10, 2025, 6:04 AM

#

ok lets see it

indigo hazel Jul 10, 2025, 6:04 AM

#

indigo hazel so what

@deep adder

wintry tinsel Jul 10, 2025, 6:04 AM

#

whole sundial i think grok 4 was rushed, it's all hype

It was definitely rushed but is it better than 2.5 pro is the question

whole sundial Jul 10, 2025, 6:04 AM

#

wanted to try my "basic" knowledge question that only OpenAI models (o1, o3, GPT-4.5, maybe 4o/4.1?) get right. Claude Opus fails it. Gemini 2.5 Pro, stonebloom (Ultra) gets it wrong. Grok 3 got kinda close, but I don't think it will get it right if it uses same base model. Might get it right with reasoning.

civic flame Jul 10, 2025, 6:05 AM

#

what's the question?

whole sundial Jul 10, 2025, 6:05 AM

#

I'll wait for it to be on arena, thank you though

thorny bane Jul 10, 2025, 6:05 AM

#

grok 5 will invent new physics

civic flame Jul 10, 2025, 6:05 AM

#

whole sundial I'll wait for it to be on arena, thank you though

alright

#

have you tried it with wolfstride?

indigo hazel Jul 10, 2025, 6:05 AM

#

whole sundial I'll wait for it to be on arena, thank you though

then let you know if it answers correctly or not pls

elder rapids Jul 10, 2025, 6:06 AM

#

wintry tinsel It was definitely rushed but is it better than 2.5 pro is the question

nah in my testing

#

if you find different results lmk

whole sundial Jul 10, 2025, 6:06 AM

#

i will

elder rapids Jul 10, 2025, 6:06 AM

#

but it's far from it

#

it's not dumb by any means

dawn wharf Jul 10, 2025, 6:07 AM

#

do models that support web search use it on their own?

#

or they don't use it

elder rapids Jul 10, 2025, 6:07 AM

#

this model has the same issue as grok 3 with Improving through a context

dawn wharf Jul 10, 2025, 6:07 AM

#

direct chat

civic flame Jul 10, 2025, 6:08 AM

#

lol ok

indigo hazel Jul 10, 2025, 6:08 AM

#

so it got that high result in arc agi 2 by using tools?

small haven Jul 10, 2025, 6:09 AM

#

civic flame Jul 10, 2025, 6:09 AM

#

lol am i supposed to be seeing grok 4 in lmarena

#

it's not there

stuck orchid Jul 10, 2025, 6:09 AM

#

Is Grok 4 similar in communication style to o3?

civic flame Jul 10, 2025, 6:09 AM

#

not in direct chat and i haven't seen it in battle in the 10 rounds i've gone thru

elder rapids Jul 10, 2025, 6:09 AM

#

stuck orchid Is Grok 4 similar in communication style to o3?

nah

#

same as grok 3

whole sundial Jul 10, 2025, 6:10 AM

#

grok 4 from what i heard was available on the arena for a few minutes earlier

civic flame Jul 10, 2025, 6:10 AM

#

whole sundial grok 4 from what i heard was available on the arena for a few minutes earlier

yeah but it seems to have disappeared

#

cc @echo aurora

echo aurora Jul 10, 2025, 6:10 AM

#

civic flame yeah but it seems to have disappeared

it's in Battle Mode

civic flame Jul 10, 2025, 6:10 AM

#

i've gone through about 15 battle rounds now and not seen it 😔

#

either i'm unlucky or something's up

echo aurora Jul 10, 2025, 6:11 AM

#

there are issues with putting it in Direct & side-by-side, but we're working on it

stuck orchid Jul 10, 2025, 6:11 AM

#

echo aurora it's in Battle Mode

Is he still in Battle mode?

whole sundial Jul 10, 2025, 6:11 AM

#

it's really just grok 3 but the very smoky Colossus is pumping more power into it to make it reason better

echo aurora Jul 10, 2025, 6:11 AM

#

stuck orchid Is he still in Battle mode?

yeah should be

small haven Jul 10, 2025, 6:11 AM

#

grok 4 sucks

echo aurora Jul 10, 2025, 6:12 AM

#

I haven't heard otherwise but will flag if more folks are saying they're not getting it

whole sundial Jul 10, 2025, 6:12 AM

#

first try

#

it gets it right!

civic flame Jul 10, 2025, 6:12 AM

#

on what platform

whole sundial Jul 10, 2025, 6:12 AM

#

i'm an idiot for doubting Dork 4

#

this was the question lol. Grok 4's response is fully correct. First non-OpenAI model to get it correct!

civic flame Jul 10, 2025, 6:13 AM

#

memphis in general is being screwed over by a bunch of ai-related developments that aren't adequately planned (imo)

stuck orchid Jul 10, 2025, 6:14 AM

#

Why then did Elon say that Grok 4 could rewrite the entire current dataset for AI training?
I think Grok 4 has hidden potential. If it's not very good in other areas, it must be good for some specific type of task. We need to try to find that area.

whole sundial Jul 10, 2025, 6:14 AM

#

grok 3

civic flame Jul 10, 2025, 6:14 AM

#

whole sundial grok 3

oh i have a more niche one of these knowledge Qs

stuck orchid Jul 10, 2025, 6:14 AM

#

Esl

whole sundial Jul 10, 2025, 6:15 AM

#

well i guess all of that pumped up energy did something to Dork 4. Not paying $300 a month to see what "Heavy" is like, though.

dawn wharf Jul 10, 2025, 6:16 AM

#

civic flame oh i have a more niche one of these knowledge Qs

like?

eager mica Jul 10, 2025, 6:16 AM

#

stuck orchid Why then did Elon say that Grok 4 could rewrite the entire current dataset for A...

Current pretraining datasets are crappola. Other AI companies are already thinking of this, but I don't think it's being done at scale yet.

fleet lintel Jul 10, 2025, 6:16 AM

#

whole sundial this was the question lol. Grok 4's response is fully correct. First non-OpenAI ...

AI overview handles it better.. AI overview probably uses like gemini super small model

whole sundial Jul 10, 2025, 6:16 AM

#

this benchmark is for offline models only, no cheating here!

eager mica Jul 10, 2025, 6:16 AM

#

eager mica Current pretraining datasets are crappola. Other AI companies are already thinki...

https://arxiv.org/abs/2506.04689
From Meta.

arXiv.org

Recycling the Web: A Method to Enhance Pre-training Data Quality an...

Scaling laws predict that the performance of large language models improves with increasing model size and data size. In practice, pre-training has been relying on massive web crawls, using almost all data sources publicly available on the internet so far. However, this pool of natural data does not grow at the same rate as the compute supply. F...

elder rapids Jul 10, 2025, 6:17 AM

#

after even more testing I can say that it's not smart but it's knowledgeable ASF and can pinpoint things through a context well + good tool usage + brute forces puzzles really well

#

2.5 pro and o3 >>>

civic flame Jul 10, 2025, 6:17 AM

#

dawn wharf like?

nvm it's not as niche to current models apparently

fleet lintel Jul 10, 2025, 6:18 AM

#

elder rapids after even more testing I can say that it's not smart but it's knowledgeable ASF...

how are benchmark number so high then? Are they pulling some shady stuff like Meta/Llama?

elder rapids Jul 10, 2025, 6:18 AM

#

fleet lintel how are benchmark number so high then? Are they pulling some shady stuff like M...

grok 3 mini benchmarks

#

should tell you everything

#

lmao

whole sundial Jul 10, 2025, 6:18 AM

#

more fails from Claude 4 Opus Thinking and Gemini 2.5 Pro (the Ultra anon models fail here)

fleet lintel Jul 10, 2025, 6:18 AM

#

somehow these CEOs characters are reflected in these models... shady CEOs == shade models

whole sundial Jul 10, 2025, 6:19 AM

#

the tools they use bump up the numbers, but grok 4 offline is better than grok 3 offline

elder rapids Jul 10, 2025, 6:19 AM

#

yep

blazing bison Jul 10, 2025, 6:19 AM

#

whole sundial the tools they use bump up the numbers, but grok 4 offline is better than grok 3...

this

elder rapids Jul 10, 2025, 6:19 AM

#

it ends there tho tbh

#

it got crushed in a debate with o3 too

#

that's crazy

#

one thing Ive noticed tho

blazing bison Jul 10, 2025, 6:20 AM

#

elder rapids that's crazy

it's a math model, they are using a lot of spacex and tesla math stuff

dawn wharf Jul 10, 2025, 6:20 AM

#

weren't you glazing elon before the announcement?

elder rapids Jul 10, 2025, 6:20 AM

#

is that it's not as dogmatic

#

as other models

#

I have to ask it not to hold back and be efficient etc etc

blazing bison Jul 10, 2025, 6:21 AM

#

dawn wharf weren't you glazing elon before the announcement?

he is just a professional yapper

elder rapids Jul 10, 2025, 6:21 AM

#

even though it still gets crushed

#

it's something I have to remove

keen fulcrum Jul 10, 2025, 6:21 AM

#

grok 4 is such a meme and a legend

haughty siren Jul 10, 2025, 6:22 AM

#

Is the Grok 4 model in the arena with thinking and is it super heavy

keen fulcrum Jul 10, 2025, 6:22 AM

#

haughty siren Is the Grok 4 model in the arena with thinking and is it super heavy

got shut down after 5min

blazing bison Jul 10, 2025, 6:22 AM

#

in my math tests it actually crushed them

keen fulcrum Jul 10, 2025, 6:22 AM

#

believe due to rate limit

winged wing Jul 10, 2025, 6:22 AM

#

blazing bison he is just a professional yapper

professional yapper and good at navigating c suite in startups

wind moth Jul 10, 2025, 6:22 AM

#

hows grok 4 doing for those who have tested it

elder rapids Jul 10, 2025, 6:23 AM

#

blazing bison it's a math model, they are using a lot of spacex and tesla math stuff

yeah but I don't care about that stuff

haughty siren Jul 10, 2025, 6:23 AM

#

keen fulcrum got shut down after 5min

Interesting, I just was able to message it a minute ago.

elder rapids Jul 10, 2025, 6:23 AM

#

that's redundant overall

indigo hazel Jul 10, 2025, 6:23 AM

#

wind moth hows grok 4 doing for those who have tested it

it uses a lot web search. it doesnt only use X.

whole sundial Jul 10, 2025, 6:24 AM

#

aka the arena!

blazing bison Jul 10, 2025, 6:24 AM

#

wind moth hows grok 4 doing for those who have tested it

it's good at math and programming, on par with O3 and 2.5. However, O3 might be better when it comes to actual logic usage

indigo hazel Jul 10, 2025, 6:24 AM

#

blazing bison it's a math model, they are using a lot of spacex and tesla math stuff

a paper i saw a few days ago said that llms which are trained and better at maths are also better in the other areas

blazing bison Jul 10, 2025, 6:25 AM

#

indigo hazel a paper i saw a few days ago said that llms which are trained and better at math...

I don't see it in Grok 4. The writing is terrible

pure anvil Jul 10, 2025, 6:25 AM

#

indigo hazel a paper i saw a few days ago said that llms which are trained and better at math...

It doesn't work in practice because a 32b qwen fine-tune could be better in math than 3.7 sonnet but it will be worlds worse in everything else

blazing bison Jul 10, 2025, 6:26 AM

#

Sonnet is bad at math

#

Claude in general is bad at math

pure anvil Jul 10, 2025, 6:26 AM

#

it's an example

#

math is not correlative to general performance

blazing bison Jul 10, 2025, 6:27 AM

#

pure anvil it's an example

I know it's a good example because they're bad at math but good at other things

elder burrow Jul 10, 2025, 6:27 AM

#

in battle mode?

stuck orchid Jul 10, 2025, 6:28 AM

#

eager mica Current pretraining datasets are crappola. Other AI companies are already thinki...

Yes, it seems that's the issue. Models nowadays are often fixated on the specific text of the question they are asked. They might provide a good answer to task X, but get confused when the same task X appears as part of task Y.
Some tasks require broader reasoning, for instance, in agent modes, to even understand the environment they are operating in. This is likely due to training on unformatted web data.
However, the o3 model really stands out in this regard; while it can be somewhat 'lazy', it sometimes understands the task context better, though it also hallucinates quite significantly

whole sundial Jul 10, 2025, 6:28 AM

#

elder burrow in battle mode?

i think it's a joke

elder burrow Jul 10, 2025, 6:28 AM

#

scicode is the best

elder burrow Jul 10, 2025, 6:28 AM

#

whole sundial i think it's a joke

sorry

#

😭

blazing bison Jul 10, 2025, 6:28 AM

#

I would use Grok 4 for math-related problems, but I wouldn't switch from ChatGPT to Grok. Sorry, Musk, but you need to do more

#

Maybe heavy grok is good, but $300?

indigo hazel Jul 10, 2025, 6:29 AM

#

basically it's a cycle. like a student who uses ai to cheat, or like he uses calculator in maths tests, grok uses tools to respond correctly

keen fulcrum Jul 10, 2025, 6:30 AM

#

blazing bison I would use Grok 4 for math-related problems, but I wouldn't switch from ChatGPT...

Get outta here

dawn wharf Jul 10, 2025, 6:30 AM

#

indigo hazel basically it's a cycle. like a student who uses ai to cheat, or like he uses cal...

not a good comparison

#

if it's truly smart, it shouldn't need tools

keen fulcrum Jul 10, 2025, 6:30 AM

#

blazing bison Maybe heavy grok is good, but $300?

$300 for code, multimodal and video model is very cheap

#

full context 256k + grok 4 heavy

blazing bison Jul 10, 2025, 6:30 AM

#

keen fulcrum $300 for code, multimodal and video model is very cheap

I know. I pay about $700 every month with APIS

pure anvil Jul 10, 2025, 6:31 AM

#

keen fulcrum $300 for code, multimodal and video model is very cheap

In that case the gemini ultra sub would also be competitive

elder burrow Jul 10, 2025, 6:31 AM

#

have yall seen grok 3 mini reasoning high price to performance

pure anvil Jul 10, 2025, 6:31 AM

#

2.5 pro + veo3

indigo hazel Jul 10, 2025, 6:31 AM

#

dawn wharf if it's truly smart, it shouldn't need tools

in fact it's the right comparison. a student who can do 1+1 using his own mind doesnt need to use calculator to do it. it's the same. is there a benchmark where llms cant use tools?

elder burrow Jul 10, 2025, 6:31 AM

#

elder burrow have yall seen grok 3 mini reasoning high price to performance

20x cheaper than 4, 2nd fastest model, beats 4 in a benchmark or 2

keen fulcrum Jul 10, 2025, 6:32 AM

#

claude max is cheap if you do code

whole sundial Jul 10, 2025, 6:32 AM

#

coming soon: Dork 4.5Vo SuperHeavyDuty Ultra DeepThink - super duper early preview available in SuperDuperDork Pro Max+ for $1,000,000 a month and $10,000,000 a year

#

may get 35% on arc agi 2

elder burrow Jul 10, 2025, 6:32 AM

#

whole sundial coming soon: Dork 4.5Vo SuperHeavyDuty Ultra DeepThink - super duper early previ...

😭

sour spindle Jul 10, 2025, 6:32 AM

#

Is grok 4 heavy only available on the super premium $300

elder burrow Jul 10, 2025, 6:33 AM

#

btw yall seen the last few secs of grok stream? they revealed timeline, coding model in august

sour spindle Jul 10, 2025, 6:33 AM

#

Don’t know if grok 4 then is worth it compared to o3

blazing bison Jul 10, 2025, 6:33 AM

#

GPT-5 is still months away

elder burrow Jul 10, 2025, 6:33 AM

#

sour spindle Is grok 4 heavy only available on the super premium $300

yes

#

not even available in api

sour spindle Jul 10, 2025, 6:34 AM

#

Does seem like all the impressive benchmarks are grok 4 heavy

elder burrow Jul 10, 2025, 6:34 AM

#

blazing bison GPT-5 is still months away

sam said "in a few months" in feb/march im pretty sure

violet adder Jul 10, 2025, 6:34 AM

#

@echo aurora Where did Grok 4 disappear to in Direct Chat mode?

elder burrow Jul 10, 2025, 6:35 AM

#

elder burrow not even available in api

@deep adder mr betterknower am i right

#

mr moreknowing

#

oh damn

blazing bison Jul 10, 2025, 6:35 AM

#

There's no point in discussing this, but GPT-5 is not a July thing

whole sundial Jul 10, 2025, 6:35 AM

#

try the arena battle mode

echo aurora Jul 10, 2025, 6:35 AM

#

violet adder <@283397944160550928> Where did Grok 4 disappear to in Direct Chat mode?

we're working on a fix

elder burrow Jul 10, 2025, 6:35 AM

#

echo aurora we're working on a fix

YESSS

#

YES

#

i thought its intentional

whole sundial Jul 10, 2025, 6:35 AM

#

i'm surprised i got grok 4 on the first try in the battle mode

#

it didn't even reason, it just spat out the correct response

echo aurora Jul 10, 2025, 6:36 AM

#

elder burrow i thought its intentional

nope, just was giving us troubles in the other modes. no ETA on when it'll be available in direct/side-by-side but is something we're working on

blazing bison Jul 10, 2025, 6:37 AM

#

whole sundial i'm surprised i got grok 4 on the first try in the battle mode

I could be wrong, but I think you can't disable Grok 4 reasoning

indigo hazel Jul 10, 2025, 6:37 AM

#

whole sundial it didn't even reason, it just spat out the correct response

bro looked for it the on web, it found the respond in the first link and then gave it to you lmao

patent bane Jul 10, 2025, 6:38 AM

#

is grok 4 better than o3 or 2.5 pro?

blazing bison Jul 10, 2025, 6:38 AM

#

patent bane is grok 4 better than o3 or 2.5 pro?

short answer: no

patent bane Jul 10, 2025, 6:38 AM

#

is it just a hype?

blazing bison Jul 10, 2025, 6:38 AM

#

also no

pure anvil Jul 10, 2025, 6:38 AM

#

Did yall see the grok 4 demo livestream? elon is ket'd out

whole sundial Jul 10, 2025, 6:38 AM

#

if it was doing that, we would see Llama 4 Maverick: the sequel

blazing bison Jul 10, 2025, 6:38 AM

#

patent bane is it just a hype?

It's good, but there's no reason to switch

whole sundial Jul 10, 2025, 6:39 AM

#

the response began like 1 to 2 seconds after i hit enter, it did not even have time to search and reason

blazing bison Jul 10, 2025, 6:39 AM

#

Llama will be fine now. They hired very experienced people

whole sundial Jul 10, 2025, 6:39 AM

#

but tools are likely disabled on the arena

whole sundial Jul 10, 2025, 6:39 AM

#

blazing bison Llama will be fine now. They hired very experienced people

ik, they got OpenAI's best minds now and some others as well

#

i was just saying that if xAI did use tools on the model in the arena, something like that would happen again

elder burrow Jul 10, 2025, 6:40 AM

#

echo aurora nope, just was giving us troubles in the other modes. no ETA on when it'll be av...

👍

whole sundial Jul 10, 2025, 6:40 AM

#

but i don't think they are using tools, it's just model output

#

there was a search arena on the old site, likely still usable

#

but this is just the standard arena on the new site

#

i wonder why the Step-1X edit and SeedEdit 3.0 models only appear in Arena mode for Image editing. I assume this is something the model owner set in place? but you can access Seedream 3.0 just fine, so idk

pure anvil Jul 10, 2025, 6:44 AM

#

he's so jittery

rare python Jul 10, 2025, 6:47 AM

#

whole sundial i wonder why the Step-1X edit and SeedEdit 3.0 models only appear in Arena mode ...

Bagel

whole sundial Jul 10, 2025, 6:47 AM

#

and bagel, forgot about that one

rare python Jul 10, 2025, 6:47 AM

#

is bagel SeedEdit 2.0?

#

It's on lmarena right now

whole sundial Jul 10, 2025, 6:50 AM

#

bagel is a separate open source model they made based off of Qwen-2.5 VL 7B (I think)

hollow ocean Jul 10, 2025, 6:51 AM

#

misclick $3000/yr

cedar tide Jul 10, 2025, 6:59 AM

#

We want grok 4 on direct chat

echo aurora Jul 10, 2025, 7:00 AM

#

cedar tide We want grok 4 on direct chat

so do we! just need to work out a few issues first

winged wing Jul 10, 2025, 7:04 AM

#

its not being ran in battle mode either, no?

echo aurora Jul 10, 2025, 7:05 AM

#

winged wing its not being ran in battle mode either, no?

its in battle mode

lilac nimbus Jul 10, 2025, 7:13 AM

#

Grok4 seems great I try

cedar tide Jul 10, 2025, 7:19 AM

#

@echo aurora the site is all bugged

echo aurora Jul 10, 2025, 7:19 AM

#

Yeah we’re looking into

#

Sorry everyone

echo aurora Jul 10, 2025, 7:20 AM

#

cedar tide <@283397944160550928> the site is all bugged

Thank you for flagging

indigo hazel Jul 10, 2025, 7:20 AM

#

echo aurora Sorry everyone

dont worry, it's only 9 am. take your time

whole sundial Jul 10, 2025, 7:20 AM

#

is it related to grok 4 or is it something that just... happened?

echo aurora Jul 10, 2025, 7:22 AM

#

whole sundial is it related to grok 4 or is it something that just... happened?

Looks unrelated

blazing bison Jul 10, 2025, 7:26 AM

#

grok 4 hacked lmarena

#

💀

echo aurora Jul 10, 2025, 7:26 AM

#

SCsurprised

indigo hazel Jul 10, 2025, 7:27 AM

#

blazing bison grok 4 hacked lmarena

using tools lmao

rapid merlin Jul 10, 2025, 7:27 AM

#

wasn't grok in a controversy like a tiny bit ago?

#

where it "shared its thoughts" on some twitter replies

blazing bison Jul 10, 2025, 7:27 AM

#

the mecha thing

rapid merlin Jul 10, 2025, 7:28 AM

#

yeah

keen fulcrum Jul 10, 2025, 7:28 AM

#

rapid merlin wasn't grok in a controversy like a tiny bit ago?

It got instructed to not shy away from political incorrect statements if it can provide sources that claim his position

winged wing Jul 10, 2025, 7:28 AM

#

@echo aurora does basically every model provider ask for the 1 day heads up thingy?

keen fulcrum Jul 10, 2025, 7:28 AM

#

Its now the second incident where grok ran crazy 🤪

rapid merlin Jul 10, 2025, 7:28 AM

#

Solid cover!

winged wing Jul 10, 2025, 7:29 AM

#

if the llm doesnt seek the truth that you think is correct, you make it. At least this is what elon thinks.

blazing bison Jul 10, 2025, 7:29 AM

#

They said it could go crazy. It's an experiment

ornate stump Jul 10, 2025, 7:30 AM

#

Just woke up and saw all the bench—I'm a little hyped, not gonna lie. But where's Grok 4? Is it just an announcement?

echo aurora Jul 10, 2025, 7:30 AM

#

Site should be working again btw 👍

blazing bison Jul 10, 2025, 7:30 AM

#

@ornate stump battle mode only

keen fulcrum Jul 10, 2025, 7:30 AM

#

winged wing if the llm doesnt seek the truth that you think is correct, you make it. At leas...

Elon uses the same mentality as with spaceX (fail until succeed with rapid testing). Releasing such instructions without testing internally is irresponsible

echo aurora Jul 10, 2025, 7:31 AM

#

winged wing <@283397944160550928> does basically every model provider ask for the 1 day head...

I’m not sure what that is tbh pikaconfused

keen fulcrum Jul 10, 2025, 7:31 AM

#

echo aurora I’m not sure what that is tbh <:pikaconfused:398202117493620740>

Do you have a rate limit for grok 4?

echo aurora Jul 10, 2025, 7:33 AM

#

keen fulcrum Do you have a rate limit for grok 4?

Yeah, I’m not sure the limit, can get more info tomorrow

winged wing Jul 10, 2025, 7:33 AM

#

echo aurora I’m not sure what that is tbh <:pikaconfused:398202117493620740>

i found it highlighted here

echo aurora Jul 10, 2025, 7:34 AM

#

winged wing i found it highlighted here

Oh gotcha, I’m not actually sure

whole sundial Jul 10, 2025, 7:35 AM

#

now the site is just... slow

winged wing Jul 10, 2025, 7:35 AM

#

mhm i see

whole sundial Jul 10, 2025, 7:35 AM

#

and that thing happens where your chat history gets wiped and have to accept the terms of use again... except you can't!

echo aurora Jul 10, 2025, 7:37 AM

#

Site is struggling again

#

Apologies for the issues, I spoke too soon about it being fixed

whole wagon Jul 10, 2025, 7:49 AM

#

All that grok 4 induced demand Kappa

cedar tide Jul 10, 2025, 7:52 AM

#

Anyone have webdev example from grok 4 ?

burnt pulsar Jul 10, 2025, 8:00 AM

#

Is grok 4 heavy also planned to be available?

blazing bison Jul 10, 2025, 8:01 AM

#

burnt pulsar Is grok 4 heavy also planned to be available?

If the price is reasonable

blazing bison Jul 10, 2025, 8:02 AM

#

burnt pulsar Is grok 4 heavy also planned to be available?

But 99% chance no

burnt pulsar Jul 10, 2025, 8:03 AM

#

Understandable, it's the priciest of the new Grok 4 models.

main gulch Jul 10, 2025, 8:03 AM

#

grok-4-heavy is not available even in the official API yet

sullen finch Jul 10, 2025, 8:04 AM

#

why did they remove the ability to attach files in direct chat

burnt pulsar Jul 10, 2025, 8:06 AM

#

That did never work when I tried to attach files. 🙂

mystic mica Jul 10, 2025, 8:06 AM

#

I can't even make it past the Cloudflare verification

sullen finch Jul 10, 2025, 8:06 AM

#

sullen finch why did they remove the ability to attach files in direct chat

for some ai models

sullen finch Jul 10, 2025, 8:06 AM

#

burnt pulsar That did never work when I tried to attach files. 🙂

everything worked fine for me with claude, but now there is simply no possibility

burnt pulsar Jul 10, 2025, 8:08 AM

#

Maybe there is no disk space left? 😉

sullen finch Jul 10, 2025, 8:10 AM

#

burnt pulsar Maybe there is no disk space left? 😉

but for some models this option is still available

soft kernel Jul 10, 2025, 8:11 AM

#

blazing bison If the price is reasonable

No way it'll be too costly

#

Why they didn't release an image gen?

burnt pulsar Jul 10, 2025, 8:17 AM

#

sullen finch but for some models this option is still available

The AI gods give it, the AI gods take it... 🤖

whole sundial Jul 10, 2025, 8:20 AM

#

soft kernel Why they didn't release an image gen?

because it wasn't ready yet. grok 4 is just 3 with a lot of RL. they are training a new multimodal base model (Grok 4.5?) that will have image gen.

burnt pulsar Jul 10, 2025, 8:21 AM

#

Even more impressive if they achieved these kinds of improvements just with a lot of RL with grok-3 as base model....

hardy oriole Jul 10, 2025, 8:22 AM

#

Grok 4 not generating answers on webdev arena? Got it two times in a row and only the opposite model generated code

verbal nimbus Jul 10, 2025, 8:35 AM

#

Why is Gemini 2.5 Pro so dumb on Github Copilot

#

It can plan an entire 2 week schedule on AI Studio, but can't even swap two time slots within 2 hrs of each other on Copilot. I ask it to swap slots A and B and it forgets to re-add one of them.

#

Tried it in AIStudio and it runs fine

keen ferry Jul 10, 2025, 8:43 AM

#

I just woke up, where can I see the grok 4 demonstration?

torn mantle Jul 10, 2025, 8:44 AM

#

Is grok4 really on lmarena?

#

I couldn't get it once

blazing bison Jul 10, 2025, 8:44 AM

#

torn mantle Is grok4 really on lmarena?

On battle mode only

torn mantle Jul 10, 2025, 8:45 AM

#

blazing bison On battle mode only

Yea ik

blazing bison Jul 10, 2025, 8:45 AM

#

verbal nimbus Why is Gemini 2.5 Pro so dumb on Github Copilot

Because github copilot doesn't use full model context but rag

#

I think only cursor with max mode offer full model context

soft kernel Jul 10, 2025, 8:48 AM

#

whole sundial because it wasn't ready yet. grok 4 is just 3 with a lot of RL. they are trainin...

Yeah but they are far behind of image gen race

languid crescent Jul 10, 2025, 8:49 AM

#

Can't find grok 4 in direct chat?

civic flame Jul 10, 2025, 8:49 AM

#

torn mantle I couldn't get it once

honestly

#

i've had enough

#

literally had 60 battles

#

it's come up 0 times

keen ferry Jul 10, 2025, 8:53 AM

#

languid crescent Can't find grok 4 in direct chat?

only on battle arena

blazing bison Jul 10, 2025, 8:55 AM

#

civic flame it's come up 0 times

Its rng bro

#

Sorry

torn mantle Jul 10, 2025, 8:55 AM

#

civic flame literally had 60 battles

Sigh

keen ferry Jul 10, 2025, 8:56 AM

#

what chances of getting it

opaque adder Jul 10, 2025, 8:56 AM

#

I just bought grok 4 heavy give me some prompts

keen ferry Jul 10, 2025, 8:56 AM

#

we need grok 4 on direct arena

keen ferry Jul 10, 2025, 8:57 AM

#

opaque adder I just bought grok 4 heavy give me some prompts

how much does the monthly subscription costs I might get one

opaque adder Jul 10, 2025, 8:57 AM

#

300 usd

keen ferry Jul 10, 2025, 8:57 AM

#

wtf

indigo hazel Jul 10, 2025, 8:57 AM

#

opaque adder 300 usd

it's per year

opaque adder Jul 10, 2025, 8:57 AM

#

no

#

its 3000 usd per year

#

300 per month

indigo hazel Jul 10, 2025, 8:58 AM

#

opaque adder Jul 10, 2025, 8:58 AM

#

if you can read

#

i clearly said heavy?

rare python Jul 10, 2025, 8:58 AM

#

lmarena is extremely laggy when both model generating code

indigo hazel Jul 10, 2025, 8:58 AM

#

opaque adder 300 per month

.

opaque adder Jul 10, 2025, 8:59 AM

#

indigo hazel .

#

#

how much does the monthly subscription costs I might get one

#

Runo — 09:56
I just bought grok 4 heavy give me some prompts

swen — 09:57
how much does the monthly subscription costs I might get one

indigo hazel Jul 10, 2025, 9:00 AM

#

but both of them are per year.

indigo hazel Jul 10, 2025, 9:00 AM

#

opaque adder Runo — 09:56 I just bought grok 4 heavy give me some prompts swen — 09:57 how m...

he asked monthly, but you said the "per year" cost

opaque adder Jul 10, 2025, 9:00 AM

#

indigo hazel he asked monthly, but you said the "per year" cost

no

#

grok 4 heavy is 300 per month

#

are u braindead

indigo hazel Jul 10, 2025, 9:01 AM

#

opaque adder are u braindead

yes because i didnt see on the top the fact that it also has the "month" section xD

#

sorry

opaque adder Jul 10, 2025, 9:01 AM

#

yeah you are

keen ferry Jul 10, 2025, 9:05 AM

#

@echo aurora can we get grok 4 on direct chat it's just impossible to get it in battle

torn mantle Jul 10, 2025, 9:06 AM

#

Don't make @opaque adder mad

opaque adder Jul 10, 2025, 9:07 AM

#

asura

torn mantle Jul 10, 2025, 9:07 AM

#

Runo

opaque adder Jul 10, 2025, 9:07 AM

#

i see you here every time

#

i come into this discord

#

and i come here once per month

#

i never miss your username

#

i havent checked your message count but i assume you have over 40k

civic flame Jul 10, 2025, 9:07 AM

#

keen ferry <@283397944160550928> can we get grok 4 on direct chat it's just impossible to g...

at this rate I have doubts about it even being in battle

torn mantle Jul 10, 2025, 9:07 AM

#

Your eyes playing tricks on you

civic flame Jul 10, 2025, 9:07 AM

#

or they've deliberately made it ridiculously hard to get

opaque adder Jul 10, 2025, 9:07 AM

#

ok thats surprising

civic flame Jul 10, 2025, 9:07 AM

#

in which case, why even bother

torn mantle Jul 10, 2025, 9:07 AM

#

5k???

opaque adder Jul 10, 2025, 9:07 AM

#

grok 4 is not going into lmarena

torn mantle Jul 10, 2025, 9:08 AM

#

That's a lot

keen ferry Jul 10, 2025, 9:08 AM

#

civic flame at this rate I have doubts about it even being in battle

I only got it once

civic flame Jul 10, 2025, 9:09 AM

#

HOW HAVE I GOT IT 0 TIMES AFTER 80 BATTLES ☠️

keen ferry Jul 10, 2025, 9:09 AM

#

civic flame HOW HAVE I GOT IT 0 TIMES AFTER 80 BATTLES ☠️

I got it only once in 20 battles lmao

civic flame Jul 10, 2025, 9:10 AM

#

still better than me

stuck orchid Jul 10, 2025, 9:10 AM

#

keen ferry I got it only once in 20 battles lmao

How do you know that you are communicating with Grok4 and not Grok3?

rare python Jul 10, 2025, 9:10 AM

#

stuck orchid How do you know that you are communicating with Grok4 and not Grok3?

After voting

#

They reveal the name

keen ferry Jul 10, 2025, 9:11 AM

#

stuck orchid How do you know that you are communicating with Grok4 and not Grok3?

I'm always voting tie and saw grok 4

stuck orchid Jul 10, 2025, 9:11 AM

#

rare python After voting

Hmm. I wish I could chat with Grok4 and be sure that I was chatting with Grok4

rare python Jul 10, 2025, 9:12 AM

#

stuck orchid Hmm. I wish I could chat with Grok4 and be sure that I was chatting with Grok4

Direct Chat comming soon

stuck orchid Jul 10, 2025, 9:12 AM

#

Sometimes Deepseek-r1 says that it is Grok4

rare python Jul 10, 2025, 9:12 AM

#

DeepSeek is the mastermind

keen ferry Jul 10, 2025, 9:12 AM

#

rare python Direct Chat comming soon

if we beg enough it will appear on direct chat lmao

rare python Jul 10, 2025, 9:12 AM

#

It was made from multiple models

torn mantle Jul 10, 2025, 9:13 AM

#

civic flame HOW HAVE I GOT IT 0 TIMES AFTER 80 BATTLES ☠️

Me too

stuck orchid Jul 10, 2025, 9:13 AM

#

keen ferry if we beg enough it will appear on direct chat lmao

Pineapple says it solves one problem that makes Grok4 difficult to add to DirectChat. Apparently, this is related to the use of tools

keen ferry Jul 10, 2025, 9:14 AM

#

stuck orchid Pineapple says it solves one problem that makes Grok4 difficult to add to Direct...

he's online

#

offline*

#

https://discord.com/channels/1340554757349179412/1392796158765699102

rare python Jul 10, 2025, 9:19 AM

#

Why did Gork drop?

tidal schooner Jul 10, 2025, 9:20 AM

#

rare python Why did Gork drop?

gemini 3 is probably gonna cook hard

#

also some people said the demos were meh

indigo hazel Jul 10, 2025, 9:21 AM

#

why is o3 so low if it's comparable with 2.5 pro?

tidal schooner Jul 10, 2025, 9:24 AM

#

indigo hazel why is o3 so low if it's comparable with 2.5 pro?

it’s a tight race tbh

#

grok 4 seemed kinda rushed to me for some reason idk why

#

elon was stuttering half the time during the announcement

rare python Jul 10, 2025, 9:25 AM

#

tidal schooner elon was stuttering half the time during the announcement

awkward silence

burnt pulsar Jul 10, 2025, 9:26 AM

#

As it is a highly competitive landscape, he possibly wanted to get the attention in the summer days.

rare python Jul 10, 2025, 9:26 AM

#

nervous laugh

tidal schooner Jul 10, 2025, 9:27 AM

#

rare python awkward silence

he had to pass on the actual talk in regards to the model onto his engineers tbh

#

ngl bro was just yapping about the stuff in his tweet prior to the livestream

tidal schooner Jul 10, 2025, 9:30 AM

#

rare python DeepSeek is the mastermind

yo speaking of deepseek

#

r2 wen

rare python Jul 10, 2025, 9:31 AM

#

tidal schooner r2 wen

After GPT 5

tidal schooner Jul 10, 2025, 9:31 AM

#

rare python After GPT 5

hope it doesn’t flop like llama 4

rare python Jul 10, 2025, 9:31 AM

#

tidal schooner hope it doesn’t flop like llama 4

better than r1 0528 won't be a flop

#

r1 0528 is a decent model

tidal schooner Jul 10, 2025, 9:34 AM

#

rare python better than r1 0528 won't be a flop

basically r1.5

rare python Jul 10, 2025, 9:35 AM

#

tidal schooner basically r1.5

But like we need v4 first

#

The base for RL training

#

Or they will merge into one model

tidal schooner Jul 10, 2025, 9:35 AM

#

rare python Or they will merge into one model

i wouldn’t mind that ngl 🥶

#

would prob be slower for a lot of tasks tho tbf

fleet pine Jul 10, 2025, 9:37 AM

#

does anyone tried deepseek r1 with 600+ billions parameter?

indigo hazel Jul 10, 2025, 9:38 AM

#

fleet pine does anyone tried deepseek r1 with 600+ billions parameter?

the january version had 671b, the 0528 has 684

keen beacon Jul 10, 2025, 9:44 AM

#

Hey, do you guys know if Grok 4 is ever coming to direct chat?

cedar tide Jul 10, 2025, 9:53 AM

#

@echo aurora grok 4 not working

Screenshot_2025-07-10-11-51-52-448_com.android.chrome-edit.jpg

keen ferry Jul 10, 2025, 9:55 AM

#

cedar tide <@283397944160550928> grok 4 not working

what the hell how did you get it on battle

cedar tide Jul 10, 2025, 9:57 AM

#

Anyone see a new mystery model that he dont want to say his name and his good ?

cedar tide Jul 10, 2025, 9:57 AM

#

keen ferry what the hell how did you get it on battle

I will dm you

torn mantle Jul 10, 2025, 10:00 AM

#

cedar tide I will dm you

dm me as well

#

i want to try it

#

also just reading some grok 4 outputs i think it wont top lmarena

torn mantle Jul 10, 2025, 10:01 AM

#

rare python Why did Gork drop?

because people have tried it and concluded that its still behind

keen beacon Jul 10, 2025, 10:03 AM

#

cedar tide I will dm you

dm me i want to try it

keen fulcrum Jul 10, 2025, 10:03 AM

#

torn mantle dm me as well

Isn’t it available in arena

cedar tide Jul 10, 2025, 10:04 AM

#

I dont have a real way to use it, just I increase my chances of getting it faster

#

@keen beacon

#

@torn mantle

sacred plaza Jul 10, 2025, 10:05 AM

#

What will be higher the amount of companies that will overpay for a Nazi aligned AI model or the amount of trade deals trump signs with other nations (3 so far)

#

🤡🤡🤡🤡🤡🤡

keen fulcrum Jul 10, 2025, 10:08 AM

#

sacred plaza What will be higher the amount of companies that will overpay for a Nazi aligned...

Can you stop promoting fascism

#

Any model has hiccups of misinformation and wrong beliefs

#

Its common in the industry

cedar tide Jul 10, 2025, 10:14 AM

#

Pikachu by grok 4

Screenshot_2025-07-10-12-13-34-592_com.android.chrome-edit.jpg

rare python Jul 10, 2025, 10:22 AM

#

keen fulcrum Any model has hiccups of misinformation and wrong beliefs

https://tenor.com/view/geese-band-goose-jesse-gliiese710-3d-country-gif-7136779464493144429

Tenor

stuck orchid Jul 10, 2025, 10:28 AM

#

Elon is focused on training the 256K token model and does not want to increase the context window yet, probably because he wants to first achieve something groundbreaking at this context window length before expanding it to a longer context

ocean vortex Jul 10, 2025, 10:31 AM

#

So Dork4 AGI? PikaOMG

torn mantle Jul 10, 2025, 10:33 AM

#

ocean vortex So Dork4 AGI? <a:PikaOMG:1149518663288889385>

its good on benchmarks

ocean vortex Jul 10, 2025, 10:34 AM

#

torn mantle its good on benchmarks

yeah and this time there doesn't seem to be much manipulation. Everything is surprisingly clear

#

even confirmed by AA

rare python Jul 10, 2025, 10:35 AM

#

ocean vortex yeah and this time there doesn't seem to be much manipulation. Everything is sur...

getting mixed vibe check from people

ocean vortex Jul 10, 2025, 10:35 AM

#

the only way they could have cheated if they are serving different model publicly (with safety alignment and whatnot) than the one which was tested, but that's a reach...

#

probably even for Elon

torn mantle Jul 10, 2025, 10:39 AM

#

ocean vortex yeah and this time there doesn't seem to be much manipulation. Everything is sur...

did you try it yet?

#

bruh

#

my brain is fried

#

i messed up my sleep with the release

ocean vortex Jul 10, 2025, 10:40 AM

#

torn mantle did you try it yet?

only 2 prompts yet. The way they are hiding reasoning is... interesting.

alpine coral Jul 10, 2025, 10:40 AM

#

it thinks for so fkn long

torn mantle Jul 10, 2025, 10:40 AM

#

ocean vortex only 2 prompts yet. The way they are hiding reasoning is... interesting.

what was this service called again?

alpine coral Jul 10, 2025, 10:40 AM

#

at least on OR

ocean vortex Jul 10, 2025, 10:40 AM

#

torn mantle what was this service called again?

openrouter

alpine coral Jul 10, 2025, 10:40 AM

#

weirdly underwhleming so far.. like compared to the screenshots of the evals here - was expecting way more

#

but only just started playing around

ocean vortex Jul 10, 2025, 10:42 AM

#

alpine coral weirdly underwhleming so far.. like compared to the screenshots of the evals her...

it did pass this. The only models able to solve this were either insanely dumb which didn't know the original riddle (Amazon), or Opus4 doing it properly. 2.5Pro and o3-pro fail. It knows the original version since it did mention it:

torn mantle Jul 10, 2025, 10:43 AM

#

ocean vortex openrouter

should i keep trying?

#

its not working for me

#

😦

rare python Jul 10, 2025, 10:43 AM

#

ocean vortex it did pass this. The only models able to solve this were either insanely dumb w...

test it on simple bench question 10

alpine coral Jul 10, 2025, 10:44 AM

#

yeah it's definitely solid - but i suspect it excels at single prompt, exam/eval-style questions.. i've tried a few questions (non-riddles) that require multiple steps of knowledge recall, which only 2.5 pro and o3 get right, and it fails kinda terribly

alpine coral Jul 10, 2025, 10:45 AM

#

torn mantle its not working for me

i'm using it atm on OR - does it throw an error or like?

hazy quest Jul 10, 2025, 10:45 AM

#

On the Wes Roth live testing, he asked how many times a basketball dropped from 100m would bounce if no air friction, and Grok 4 Heavy, after almost 5min of thinking, answered "infinitely many times", which is an absurdly wrong answer to a simple question

#

Worrying that it gets it that wrong

alpine coral Jul 10, 2025, 10:46 AM

#

it kina feels like o1/3-pro

#

how long it thinks

#

which can lead to both brilliant and ridiculous respones

torn mantle Jul 10, 2025, 10:52 AM

#

alpine coral i'm using it atm on OR - does it throw an error or like?

error

torn mantle Jul 10, 2025, 10:53 AM

#

alpine coral i'm using it atm on OR - does it throw an error or like?

This request requires more credits, or fewer max_tokens. You requested up to 230367 tokens, but can only afford 2349. To increase, visit https://openrouter.ai/settings/credits and upgrade to a paid account

OpenRouter

The unified interface for LLMs. Find the best models & prices for your prompts

#

is it telling me im poor?

keen beacon Jul 10, 2025, 10:56 AM

#

hardy oriole Jul 10, 2025, 10:56 AM

#

Grok has lost every single "battle" so far in webdev arena for me, probably 6 times total... There's something wrong with the model or idk what's going on lmao

It's producing worse results than even 2.5 flash

keen beacon Jul 10, 2025, 10:56 AM

#

Damn

#

Google may also release gemini 3

#

Those guys camt stop cooking

torn mantle Jul 10, 2025, 10:57 AM

#

keen beacon

nobody knows

keen beacon Jul 10, 2025, 10:57 AM

#

OpenAI better step up their game

torn mantle Jul 10, 2025, 10:57 AM

#

hardy oriole Grok has lost every single "battle" so far in webdev arena for me, probably 6 ti...

yea its bad at coding

#

seems like they wanted to focus on other things and make a seperate coding model

keen beacon Jul 10, 2025, 10:58 AM

#

Yeah they have a separate coding model for that reason. Interesting choice.

hazy quest Jul 10, 2025, 10:58 AM

#

The unhiged voice mode is pretty cool

#

But every company could do that, they just chose not to for now

keen beacon Jul 10, 2025, 10:58 AM

#

Im excited for the open source o3-mini like model next week

hardy oriole Jul 10, 2025, 10:58 AM

#

torn mantle seems like they wanted to focus on other things and make a seperate coding model

Makes sense

burnt pulsar Jul 10, 2025, 11:05 AM

#

I'd love to see Gemini 3 getting rid of some character encoding issue that are very annoying in my coding tasks.

cedar tide Jul 10, 2025, 11:10 AM

#

discord clone by grok 4

alpine coral Jul 10, 2025, 11:10 AM

#

torn mantle is it telling me im poor?

ahah what shows up on Credits ? I think you need at least like $1 balance (or dial back the max tokens a bunch or something)

ocean vortex Jul 10, 2025, 11:11 AM

#

torn mantle This request requires more credits, or fewer max_tokens. You requested up to 230...

yeah it's not very cheap. To run arc-agi it cost the same as Opus4

hardy oriole Jul 10, 2025, 11:12 AM

#

cedar tide discord clone by grok 4

It's doing much better than in my tries, are you using on the grok website?

ocean vortex Jul 10, 2025, 11:12 AM

#

Slightly more even

rare python Jul 10, 2025, 11:13 AM

#

cedar tide Jul 10, 2025, 11:13 AM

#

r1 0528 in comparaison

keen ferry Jul 10, 2025, 11:14 AM

#

cedar tide discord clone by grok 4

he even added you're pfp into it

cedar tide Jul 10, 2025, 11:14 AM

#

hardy oriole It's doing much better than in my tries, are you using on the grok website?

lm arena

keen ferry Jul 10, 2025, 11:14 AM

#

why does everyone keep testing it in html and css and not like python or c++

hardy oriole Jul 10, 2025, 11:15 AM

#

Grok 4

#

Wolfstride

#

Just using the pre made prompts in lmarena

cedar tide Jul 10, 2025, 11:17 AM

#

cedar tide discord clone by grok 4

grok 4 its very similar to grok 3 0224

hardy pecan Jul 10, 2025, 11:19 AM

#

grok4's geoguessing abilties aren't so great