#general
1 messages Β· Page 67 of 1
It sometimes randomly has nazi outbursts it's not just political questions
a livestreme
guys did it start yet
like what many companies do
No
Like the other day it was flaming literally random Jewish ppl on X. Since they had Jewish surnames
yea agree they should just go full hackathon mode and just replay things
not actually use it
Without any political prompt
supergrok heavy is not sold out anymore
(don't worry, i don't have $3,000 to spend. also, "SuperGrokPro"? I thought this was SuperGrok Heavy!)
they must have changed the name in the last second
Bets on it starting on the hour?
which hour of the night can it start at to minimize viewership
that's where my money is
You can just downgrade it and it'll refund the diff scaled by days
ggs
Maybe they really did want it to start at nein
FINALLY
it started eventually
We were all wrong
π₯³
Introducing Grok 4, the world's most powerful AI model. Watch the livestream now: https://t.co/59iDX5s2ck
That's a lot of viewers damn
as predicted, one hour and one minute late
It's climbing rapidly
or is that vies
At 60k rn
how do you see how much people are watching
now we're being baited
Loading screen for another hour
oh this?
80k
we have cool music now tho
: )
"worlds most powerful ai model" hmm....
LOL
over/under elon musk happy or sad on stream
do you know what the current record is?
wym? the most watched is well into the millions
I mean watched live ofc
yeah, live
146k for me rn
152 now
140k live will end up being millions of viewers
do you mean the most watched AI related thing
yooo
i wonder who's the voice
it's AI generated
Who's talking
it's deadass AI
When is it gonna be on lmarena leaderboard
lol "AI is advancing faster than any human"
so this is going to better than kingfall?
fr?
oh wait what
progression wise yeah ofc
238k
college exams were solved long ago man we are already on phd questions
get to the good stuff
Elon is so low energy rn lol
"Grok 4 is smarter than almost all graduate students in all disciplines simultaneously."
blud needs to sleep
bullish
order of magnitude is big
is elon trying to emulate jensen
yes
compute used for RL step
gwak 4, based
they didn't confirm that explicitly
oh nvm there is sneaky orange colours to cause confusion
how is that confusing
?
they're highlighting the RL difference
400k viewers sheesh
in the next 10 years, will google win the ai race or get defeated?
18
25
1
win : )
yeah it's not explicit but yeah i see what you mean
it's likely it's basically the same base
i wonder how it does in math research
rly want to see my math phd friend try this
Elon doesn't even believe himself
26.9%
no tool
41% with tool
"we put the tools in training"
lmao
is that good?
i think so?
ooo
holy it double
Elon is so out of it lol
he seems depressed
fr..
why
did he just say "idk"
705k is crazy.......
he needs to stop yapping
I'm sure there will be
is it gonna be on arena?
ye
:0
π
π
they trained it on puzzle type things i bet
Ik openAI and Google feeling pretty good rn
"closing the loop on reality" makes me worried about paperclips
wait, text-only subset?
the image stuff isnt ready yet
wdym by your question?
maybe they didn't finish the multimodal part yet?
o i see.
how do other models do on text-only subset?
oh, test-time compute is Grok 4 Heavy
Consc@1024
lol]
They seem to be hiding reasoning
test time compute is just using more compute when answering a question. Reasoning tokens are an example of TTC, also running multiple in parallel and voting or colaborating is TTC. It's just use GPUs a lot while answering the question, not training or updating the model at all
they tryna hypnotize us lmao
Like I said, Google and openAI feeling pretty good rn
their graph was shoing how adding more TTC increases the performance. OpenAI has shown similar charts when they released O1 and O3
ye
what are the chances of everyone there getting fired
they're not showing anything impressive
what's going on lol
they're showing what makes o3 cool
"wen grok 5"
The HLE numbers are impressive. Just the presentation and likely drugged Elon couldn't show it
they're saying grok 4 is doing all this research to get the simulation correct
latex was wrong lel
we haven't seen other labs do what they've done with HLE tho, Google could have even higher numbers and we'd never know
I wonder if they switched off the qwq cold start lmfao
so grok 4 is currrently able to look at images
Training on tool use could be big
its not that wild man
They have so many resources. They should make their own XD and actually innovate
no
imo there's a lot they're not revealing
38.6% with tools, 44.4% heavy
Is the gemini 2.5 hle score w/o tools
25.4% without tools
all they say is "RL", the devil is in the details
bs why is the discrepancy so low
Wonder grok 4 superheavy vs o3 pro vs gemini 2.5 pro deepthink
just because they're still using RL doesn't mean there isn't any innovation lol
reich4 looks itneresting
bro what in a few weeks
that's like saying GPT-3 was not innovative because it was just scaling the same method up
(whereas in this case we don't even know what methods they used)
Gemini no tools does better than grok 4 no tools on HLE
back
61.9% in USAMO
what did i miss
some demos
is grok 4 very very good?
Okay finally more benchmarks
For 10 minutes π
Maybe not this month
I completely didn't retain how the reasoning works with grok heavy
isnt gpt 5 this month or nah
just parallel
its simple man
they just do the parallel thing
same as o3 pro
holy cringe
lol they know their audience
bruh
diet coke
my stomach hurts
how are they not cringing
π
this is so ex machina coded
Oh it failed
that was weary weary cringe
Ye
nice voice tho
agreed
it didnt fail he didnt realise it retained the context
now advertising openai
lol
Openai?
RIP
did openai fail
no
openai die : (
do they have a hitler voice, based
oh thats what they were showing the speed
oh nice api
π
Yeah previous sota was 8%
holy 2x
I wonder about SWE bench score
Now we are getting to the real stuff
it's arc agi 2 score isn't proportional
That's benchmarks looks crazy ngl
Just text
Text
Yea
1.5M viewers damn
grok 4 0709 is the current version
what the hell is vending bench
money : )
nobody knows
Running a small shop or something
Grok downloads about to skyrocket
when do yall think google and openai will take to catch up
how is bro gonna add it
cuz gemin 3 and gpt 5 is comming
Arent you one of lmarena staff team?
probably by their next releases
@deep adder deepthink already dead
oo
Okay~
few weeks
dang
views not live viewers
rlly fast ig
how does it do in IMO
so dumb they dont show live count
not much coming tbh
Is grok that good? lol
Didnt watch the stream
Last 20 mins was way better than the beginning
not trying it out until leaderboard results come out
Is grok 4 in arena?
no
ah damn
hmm no o3 pro
donβt think they tested it for imo then
still extremely impressive
better than i could ever do
doesn't imo literally start today? lol
i am lowkey a little worried
wow that is crazy
i hope we don't get paperclipped
within the next 5 years
grok 4 rate limit: same as grok 3 free, 20 per 2 hours for supergrok subscribers (not available for free rn)
someone try out grok 4 and tell me how good it really is
my p(doom) has been climbing
which deepthink?
Gemini
@small haven middle column
no, as in kingfall or 2.5 pro base model
Grok 4 creating the shader (no errors).
QRT: emollick
o3-pro does by far the best so far at my benchmark (scroll quote tweet thread for others): "create a visually interesting shader that can run in twigl app make it like the ocean in a storm"It did take 21 minutes for o3-pro to think (and another 19 to fix a small shader error) https://t.co/KqzmuHm5Zf
grok 4 is free?
it's like 3$ input 15$ output
huh
no
wait where are we getting this from, I'm only seeing 3 in 15 out
if it's actually that expensive then all the hype for me is squashed
It's a typo
id thought so
The pricing below that is correct
ye see
Do they still give you reasoning 'summaries'?
Can u show if you don't mind I can't check rn
Oh I mean regular grok 4
Oh lol they actually summarize now
Thanks
Grok 4 achieves SOTA
Did they update the cut off or is it literally just rl on grok 3 lol
Yeah I didn't really watch it lmao
@echo aurora when can we expect grok 4 to be added to lmarena? ArtificialAnalysis received early access
Ultra should beat it though
It would be great if grok 4 heavy can be tested by us as well
grok 4 heavy is the truth
so when grok 4 is coming in arena?
sorry to say I don't have any details to share atm π¦
looks like its available
yeah
@leaden palm gave me a story it wrote
and I don't need any other examples
ts BUNS at writing
ππ’
what llm is best at writing so far
So gpt 4.5 is not best at anything
if you give it an environment that communicates with the other models it'll do really well
but it's not that good by itself
Why do you think that?
@echo aurora yay
and itβs gone again?
i only trust scicode for coding
look at 4 opus on livecodebench..
its behind nemotron π
this is accurate
grok 4 gone from arena bruh
IT WAS THERE?
for a few mins
bruh
lmao
why lol
no idea
I managed to get 1 prompt to grok 4 in lmarena then it dissapeared and chat switched to chatgpt
Yep
I tried scrolling 100 comments to understand grok4 is SOTA or not... I am still not sure.
Is it SOTA or not?
π
I think they aren't paid for users using it yet
i can do better on ms paint
2 Treys
lol
i've noticed currently frontend dev seems really poor. i am giving it the benefit of the doubt because i am only able to use it through X where it has a system prompt and all of that applied, but still..
Why u tag me twice
you think it's contaminated for HLE, USAMO, etc.?
SuperGrok Heavy: $300/month (Multi Agent Version)
Dayum! Is it really that good?
wtf is stonebloom
@deep adder time for you to buy $300/month version and tell us the truth
Just tell it not to search
Tell it in the prompt
The API will not randomly call tools
Without it being enabled
There are benchmark numbers without tools
Waiting for Grok 4 on LMArena to vote π
seems to be bad at coding, cant edit my code without bugging
it thinks a lot
No I hope
I think Grok 4 should be something like o3 in terms of reasoning and cognition.
o3 is lazy and also worse at coding than Gemini, especially in web coding, but according to my tests, o3 is the most powerful model for researchers.
Grok 4 should reason "from first principles."
Grok 4 should reason "from first principles."
they hide reasoning
same base model as grok 3?
whn will you guys update the leaderboard ?
really i totally missed that
grok-4 should be around 1500.
-# my assumption.
gd damn good thing i got out break even
When Grok 4 appears in the arena
yes
u manage to catch any of the swings on HLE?
basically like a student who cheats during the test at school
yes i've gathered
but right now i've only tried via Grok on X
grok on X is different
alr
so i'm waiting for lmarena to add grok 4
It might have some sauce with the scaffolding. Idk I wouldnt be so quick to judge. Its def gonna get blown out of the water by the end of the month tho
lol we're still waiting on R2
yep, but nice benchmarks
seems to also suffer from being too succinct when writing code
trained off of same base model as grok 3, they are just now training the new base model
no wonder it's mid
Bro stop yapping, for god sake
They should have kept the 3.5 naming scheme
wolfstride vs stonebloom, which one is better at web design?
so i guess there will be a Grok 4.5 with image gen? Gemini 3/GPT-5 image gen will be better
Because you're poor, don't even have access to the model and is yapping around
Yes you're
can barely tell the difference tbh
So stop yapping
children, stop arguing
This is the mass user model, the true model is api only
A luxury sports-car is traveling with open windows in the direction opposite of the south at 30km/h
what we thinking
search was off
im thinking 0.00025km long is very small
If you use o3, sonnet, everything on their frontend is dumber than api
the bridge is 25cm
check prompt
okay maybe it isn't too stupid
is the original question, having the bridge 25cm long?
nah it's not stupid but it's not smart like 2.5 pro, or even o3
It's not even secret, just search on X and you gonna see openai team talking about it. You're a yapping machine bro
doesn't catch anything im saying
lmao
it just got crushed in a debate with 2.5 pro too
what makes you think it's very contaminated?
oh yea terminator svg benchmark, i forgot
maybe grok would have even higher bench scores if it didn't use brave search, which has their own index so it kinda sucks. a sample search/deep research across gemini 2.5pro, grok deepsearch, duckduckgo, and perplexity shows that grok is the clear loser. brave's index is so bad that when i asked the question a few days ago, it finally found a relevant source while all previous attempts were just... bad. idk why they can't just license from duckduckgo lol
august
august
yeah this is disappointing
september for "multimodal agent" a.k.a. new image gen
so i was waiting for the best model ever, but it's worse than o3 and 2.5 pro?
to be fair grok was the first major ai that i know of that had native image gen. all the others did it this spring. they had it last winter.
it's just jimmy ba and tony wu the co-founders lol
so what
just researchers
i think grok 4 was rushed, it's all hype
yo
Can't wait to see MechaHitler 4 on the charts
ok lets see it
@deep adder
It was definitely rushed but is it better than 2.5 pro is the question
wanted to try my "basic" knowledge question that only OpenAI models (o1, o3, GPT-4.5, maybe 4o/4.1?) get right. Claude Opus fails it. Gemini 2.5 Pro, stonebloom (Ultra) gets it wrong. Grok 3 got kinda close, but I don't think it will get it right if it uses same base model. Might get it right with reasoning.
what's the question?
I'll wait for it to be on arena, thank you though
grok 5 will invent new physics
alright
have you tried it with wolfstride?
then let you know if it answers correctly or not pls
nah in my testing
if you find different results lmk
i will
this model has the same issue as grok 3 with Improving through a context
direct chat
lol ok
so it got that high result in arc agi 2 by using tools?
Is Grok 4 similar in communication style to o3?
not in direct chat and i haven't seen it in battle in the 10 rounds i've gone thru
grok 4 from what i heard was available on the arena for a few minutes earlier
yeah but it seems to have disappeared
cc @echo aurora
it's in Battle Mode
i've gone through about 15 battle rounds now and not seen it π
either i'm unlucky or something's up
there are issues with putting it in Direct & side-by-side, but we're working on it
Is he still in Battle mode?
it's really just grok 3 but the very smoky Colossus is pumping more power into it to make it reason better
yeah should be
grok 4 sucks
I haven't heard otherwise but will flag if more folks are saying they're not getting it
on what platform
i'm an idiot for doubting Dork 4
this was the question lol. Grok 4's response is fully correct. First non-OpenAI model to get it correct!
memphis in general is being screwed over by a bunch of ai-related developments that aren't adequately planned (imo)
Why then did Elon say that Grok 4 could rewrite the entire current dataset for AI training?
I think Grok 4 has hidden potential. If it's not very good in other areas, it must be good for some specific type of task. We need to try to find that area.
grok 3
oh i have a more niche one of these knowledge Qs
Esl
well i guess all of that pumped up energy did something to Dork 4. Not paying $300 a month to see what "Heavy" is like, though.
like?
Current pretraining datasets are crappola. Other AI companies are already thinking of this, but I don't think it's being done at scale yet.
AI overview handles it better.. AI overview probably uses like gemini super small model
this benchmark is for offline models only, no cheating here!
https://arxiv.org/abs/2506.04689
From Meta.
Scaling laws predict that the performance of large language models improves with increasing model size and data size. In practice, pre-training has been relying on massive web crawls, using almost all data sources publicly available on the internet so far. However, this pool of natural data does not grow at the same rate as the compute supply. F...
after even more testing I can say that it's not smart but it's knowledgeable ASF and can pinpoint things through a context well + good tool usage + brute forces puzzles really well
2.5 pro and o3 >>>
nvm it's not as niche to current models apparently
how are benchmark number so high then? Are they pulling some shady stuff like Meta/Llama?
grok 3 mini benchmarks
should tell you everything
lmao
more fails from Claude 4 Opus Thinking and Gemini 2.5 Pro (the Ultra anon models fail here)
somehow these CEOs characters are reflected in these models... shady CEOs == shade models
the tools they use bump up the numbers, but grok 4 offline is better than grok 3 offline
yep
this
it ends there tho tbh
it got crushed in a debate with o3 too
that's crazy
one thing Ive noticed tho
it's a math model, they are using a lot of spacex and tesla math stuff
weren't you glazing elon before the announcement?
is that it's not as dogmatic
as other models
I have to ask it not to hold back and be efficient etc etc
he is just a professional yapper
grok 4 is such a meme and a legend
Is the Grok 4 model in the arena with thinking and is it super heavy
got shut down after 5min
in my math tests it actually crushed them
believe due to rate limit
professional yapper and good at navigating c suite in startups
hows grok 4 doing for those who have tested it
yeah but I don't care about that stuff
Interesting, I just was able to message it a minute ago.
that's redundant overall
it uses a lot web search. it doesnt only use X.
aka the arena!
it's good at math and programming, on par with O3 and 2.5. However, O3 might be better when it comes to actual logic usage
a paper i saw a few days ago said that llms which are trained and better at maths are also better in the other areas
I don't see it in Grok 4. The writing is terrible
It doesn't work in practice because a 32b qwen fine-tune could be better in math than 3.7 sonnet but it will be worlds worse in everything else
I know it's a good example because they're bad at math but good at other things
in battle mode?
Yes, it seems that's the issue. Models nowadays are often fixated on the specific text of the question they are asked. They might provide a good answer to task X, but get confused when the same task X appears as part of task Y.
Some tasks require broader reasoning, for instance, in agent modes, to even understand the environment they are operating in. This is likely due to training on unformatted web data.
However, the o3 model really stands out in this regard; while it can be somewhat 'lazy', it sometimes understands the task context better, though it also hallucinates quite significantly
i think it's a joke
scicode is the best
I would use Grok 4 for math-related problems, but I wouldn't switch from ChatGPT to Grok. Sorry, Musk, but you need to do more
Maybe heavy grok is good, but $300?
basically it's a cycle. like a student who uses ai to cheat, or like he uses calculator in maths tests, grok uses tools to respond correctly
Get outta here
not a good comparison
if it's truly smart, it shouldn't need tools
$300 for code, multimodal and video model is very cheap
- full context 256k + grok 4 heavy
I know. I pay about $700 every month with APIS
In that case the gemini ultra sub would also be competitive
have yall seen grok 3 mini reasoning high price to performance
2.5 pro + veo3
in fact it's the right comparison. a student who can do 1+1 using his own mind doesnt need to use calculator to do it. it's the same. is there a benchmark where llms cant use tools?
20x cheaper than 4, 2nd fastest model, beats 4 in a benchmark or 2
claude max is cheap if you do code
coming soon: Dork 4.5Vo SuperHeavyDuty Ultra DeepThink - super duper early preview available in SuperDuperDork Pro Max+ for $1,000,000 a month and $10,000,000 a year
may get 35% on arc agi 2
π
Is grok 4 heavy only available on the super premium $300
btw yall seen the last few secs of grok stream? they revealed timeline, coding model in august
Donβt know if grok 4 then is worth it compared to o3
GPT-5 is still months away
yes
not even available in api
Does seem like all the impressive benchmarks are grok 4 heavy
sam said "in a few months" in feb/march im pretty sure
@echo aurora Where did Grok 4 disappear to in Direct Chat mode?
@deep adder mr betterknower am i right
mr moreknowing
oh damn
There's no point in discussing this, but GPT-5 is not a July thing
try the arena battle mode
we're working on a fix
i'm surprised i got grok 4 on the first try in the battle mode
it didn't even reason, it just spat out the correct response
nope, just was giving us troubles in the other modes. no ETA on when it'll be available in direct/side-by-side but is something we're working on
I could be wrong, but I think you can't disable Grok 4 reasoning
bro looked for it the on web, it found the respond in the first link and then gave it to you lmao
is grok 4 better than o3 or 2.5 pro?
short answer: no
is it just a hype?
also no
Did yall see the grok 4 demo livestream? elon is ket'd out
if it was doing that, we would see Llama 4 Maverick: the sequel
It's good, but there's no reason to switch
the response began like 1 to 2 seconds after i hit enter, it did not even have time to search and reason
Llama will be fine now. They hired very experienced people
but tools are likely disabled on the arena
ik, they got OpenAI's best minds now and some others as well
i was just saying that if xAI did use tools on the model in the arena, something like that would happen again
π
but i don't think they are using tools, it's just model output
there was a search arena on the old site, likely still usable
but this is just the standard arena on the new site
i wonder why the Step-1X edit and SeedEdit 3.0 models only appear in Arena mode for Image editing. I assume this is something the model owner set in place? but you can access Seedream 3.0 just fine, so idk
he's so jittery
Bagel
and bagel, forgot about that one
bagel is a separate open source model they made based off of Qwen-2.5 VL 7B (I think)
misclick $3000/yr
We want grok 4 on direct chat
so do we! just need to work out a few issues first
its not being ran in battle mode either, no?
its in battle mode
Grok4 seems great I try
@echo aurora the site is all bugged
Thank you for flagging
dont worry, it's only 9 am. take your time
is it related to grok 4 or is it something that just... happened?
Looks unrelated

using tools lmao
wasn't grok in a controversy like a tiny bit ago?
where it "shared its thoughts" on some twitter replies
the mecha thing
yeah
It got instructed to not shy away from political incorrect statements if it can provide sources that claim his position
@echo aurora does basically every model provider ask for the 1 day heads up thingy?
Its now the second incident where grok ran crazy π€ͺ
Solid cover!
if the llm doesnt seek the truth that you think is correct, you make it. At least this is what elon thinks.
They said it could go crazy. It's an experiment
Just woke up and saw all the benchβI'm a little hyped, not gonna lie. But where's Grok 4? Is it just an announcement?
Site should be working again btw π
@ornate stump battle mode only
Elon uses the same mentality as with spaceX (fail until succeed with rapid testing). Releasing such instructions without testing internally is irresponsible
Iβm not sure what that is tbh 
Do you have a rate limit for grok 4?
Yeah, Iβm not sure the limit, can get more info tomorrow
i found it highlighted here
Oh gotcha, Iβm not actually sure
now the site is just... slow
mhm i see
and that thing happens where your chat history gets wiped and have to accept the terms of use again... except you can't!
Site is struggling again
Apologies for the issues, I spoke too soon about it being fixed
All that grok 4 induced demand 
Anyone have webdev example from grok 4 ?
Is grok 4 heavy also planned to be available?
If the price is reasonable
But 99% chance no
Understandable, it's the priciest of the new Grok 4 models.
grok-4-heavy is not available even in the official API yet
why did they remove the ability to attach files in direct chat
That did never work when I tried to attach files. π
I can't even make it past the Cloudflare verification
for some ai models
everything worked fine for me with claude, but now there is simply no possibility
Maybe there is no disk space left? π
but for some models this option is still available
No way it'll be too costly
Why they didn't release an image gen?
The AI gods give it, the AI gods take it... π€
because it wasn't ready yet. grok 4 is just 3 with a lot of RL. they are training a new multimodal base model (Grok 4.5?) that will have image gen.
Even more impressive if they achieved these kinds of improvements just with a lot of RL with grok-3 as base model....
Grok 4 not generating answers on webdev arena? Got it two times in a row and only the opposite model generated code
Why is Gemini 2.5 Pro so dumb on Github Copilot
It can plan an entire 2 week schedule on AI Studio, but can't even swap two time slots within 2 hrs of each other on Copilot. I ask it to swap slots A and B and it forgets to re-add one of them.
Tried it in AIStudio and it runs fine
I just woke up, where can I see the grok 4 demonstration?
On battle mode only
Yea ik
Because github copilot doesn't use full model context but rag
I think only cursor with max mode offer full model context
Yeah but they are far behind of image gen race
Can't find grok 4 in direct chat?
honestly
i've had enough
literally had 60 battles
it's come up 0 times
only on battle arena
Sigh
what chances of getting it
I just bought grok 4 heavy give me some prompts
we need grok 4 on direct arena
how much does the monthly subscription costs I might get one
300 usd
wtf
it's per year
lmarena is extremely laggy when both model generating code
.
how much does the monthly subscription costs I might get one
Runo β 09:56
I just bought grok 4 heavy give me some prompts
swen β 09:57
how much does the monthly subscription costs I might get one
but both of them are per year.
he asked monthly, but you said the "per year" cost
no
grok 4 heavy is 300 per month
are u braindead
yes because i didnt see on the top the fact that it also has the "month" section xD
sorry
yeah you are
@echo aurora can we get grok 4 on direct chat it's just impossible to get it in battle
Don't make @opaque adder mad
asura
Runo
i see you here every time
i come into this discord
and i come here once per month
i never miss your username
i havent checked your message count but i assume you have over 40k
at this rate I have doubts about it even being in battle
Your eyes playing tricks on you
or they've deliberately made it ridiculously hard to get
ok thats surprising
in which case, why even bother
5k???
grok 4 is not going into lmarena
That's a lot
I only got it once
HOW HAVE I GOT IT 0 TIMES AFTER 80 BATTLES β οΈ
I got it only once in 20 battles lmao
still better than me
How do you know that you are communicating with Grok4 and not Grok3?
After voting
They reveal the name
I'm always voting tie and saw grok 4
Hmm. I wish I could chat with Grok4 and be sure that I was chatting with Grok4
Direct Chat comming soon
Sometimes Deepseek-r1 says that it is Grok4
DeepSeek is the mastermind
if we beg enough it will appear on direct chat lmao
It was made from multiple models
Me too
Pineapple says it solves one problem that makes Grok4 difficult to add to DirectChat. Apparently, this is related to the use of tools
he's online
offline*
Why did Gork drop?
gemini 3 is probably gonna cook hard
also some people said the demos were meh
why is o3 so low if it's comparable with 2.5 pro?
itβs a tight race tbh
grok 4 seemed kinda rushed to me for some reason idk why
elon was stuttering half the time during the announcement
awkward silence
As it is a highly competitive landscape, he possibly wanted to get the attention in the summer days.
nervous laugh
he had to pass on the actual talk in regards to the model onto his engineers tbh
ngl bro was just yapping about the stuff in his tweet prior to the livestream
After GPT 5
hope it doesnβt flop like llama 4
better than r1 0528 won't be a flop
r1 0528 is a decent model
basically r1.5
But like we need v4 first
The base for RL training
Or they will merge into one model
i wouldnβt mind that ngl π₯Ά
would prob be slower for a lot of tasks tho tbf
does anyone tried deepseek r1 with 600+ billions parameter?
the january version had 671b, the 0528 has 684
Hey, do you guys know if Grok 4 is ever coming to direct chat?
@echo aurora grok 4 not working
what the hell how did you get it on battle
Anyone see a new mystery model that he dont want to say his name and his good ?
I will dm you
dm me as well
i want to try it
also just reading some grok 4 outputs i think it wont top lmarena
because people have tried it and concluded that its still behind
dm me i want to try it
Isnβt it available in arena
I dont have a real way to use it, just I increase my chances of getting it faster
@keen beacon
@torn mantle
What will be higher the amount of companies that will overpay for a Nazi aligned AI model or the amount of trade deals trump signs with other nations (3 so far)
π€‘π€‘π€‘π€‘π€‘π€‘
Can you stop promoting fascism
Any model has hiccups of misinformation and wrong beliefs
Its common in the industry
Pikachu by grok 4
Elon is focused on training the 256K token model and does not want to increase the context window yet, probably because he wants to first achieve something groundbreaking at this context window length before expanding it to a longer context
So Dork4 AGI? 
its good on benchmarks
yeah and this time there doesn't seem to be much manipulation. Everything is surprisingly clear
even confirmed by AA
getting mixed vibe check from people
the only way they could have cheated if they are serving different model publicly (with safety alignment and whatnot) than the one which was tested, but that's a reach...
probably even for Elon
did you try it yet?
bruh
my brain is fried
i messed up my sleep with the release
only 2 prompts yet. The way they are hiding reasoning is... interesting.
it thinks for so fkn long
what was this service called again?
at least on OR
openrouter
weirdly underwhleming so far.. like compared to the screenshots of the evals here - was expecting way more
but only just started playing around
it did pass this. The only models able to solve this were either insanely dumb which didn't know the original riddle (Amazon), or Opus4 doing it properly. 2.5Pro and o3-pro fail. It knows the original version since it did mention it:
test it on simple bench question 10
yeah it's definitely solid - but i suspect it excels at single prompt, exam/eval-style questions.. i've tried a few questions (non-riddles) that require multiple steps of knowledge recall, which only 2.5 pro and o3 get right, and it fails kinda terribly
i'm using it atm on OR - does it throw an error or like?
On the Wes Roth live testing, he asked how many times a basketball dropped from 100m would bounce if no air friction, and Grok 4 Heavy, after almost 5min of thinking, answered "infinitely many times", which is an absurdly wrong answer to a simple question
Worrying that it gets it that wrong
it kina feels like o1/3-pro
how long it thinks
which can lead to both brilliant and ridiculous respones
error
This request requires more credits, or fewer max_tokens. You requested up to 230367 tokens, but can only afford 2349. To increase, visit https://openrouter.ai/settings/credits and upgrade to a paid account
is it telling me im poor?
Grok has lost every single "battle" so far in webdev arena for me, probably 6 times total... There's something wrong with the model or idk what's going on lmao
It's producing worse results than even 2.5 flash
nobody knows
OpenAI better step up their game
yea its bad at coding
seems like they wanted to focus on other things and make a seperate coding model
Yeah they have a separate coding model for that reason. Interesting choice.
The unhiged voice mode is pretty cool
But every company could do that, they just chose not to for now
Im excited for the open source o3-mini like model next week
Makes sense
I'd love to see Gemini 3 getting rid of some character encoding issue that are very annoying in my coding tasks.
discord clone by grok 4
ahah what shows up on Credits ? I think you need at least like $1 balance (or dial back the max tokens a bunch or something)
yeah it's not very cheap. To run arc-agi it cost the same as Opus4
It's doing much better than in my tries, are you using on the grok website?
Slightly more even
r1 0528 in comparaison
he even added you're pfp into it
lm arena
why does everyone keep testing it in html and css and not like python or c++
grok 4 its very similar to grok 3 0224
grok4's geoguessing abilties aren't so great