#general
1 messages ยท Page 66 of 1
I don't do that, maybe I will test it out later.
But for like reports and such o3 was the most promising always.
its not actually
if you really use it a lot and read through the paper line by line you wont find it that impressive
it doesnt really compile findings, its just parsing different infos from different pages
OpenAI = Google > Anthropic = xAI. Though xAI are doing much more manipulation than innovation lately. In the eyes of many they are gonna get less credit than they even deserve tbh
On a good day and things going their way xAI could challenge the top spots, but not with this mess and publicity... People are not even gonna give it a chance
So they need to offset and beat everyone by A LOT, that is never happening I think
Deepseek the ones to look out for. The rest are significantly less promising. So they go with "but look our model is faster/smaller" instead lol
for one week
There is no way grok 4 is still not out๐
This Discord's modal estimate for Grok 4 release date was 2 days from now
+/- 2 days
made an event for those interested! https://discord.gg/lmarena?event=1392247045296885891
xAI 42%
oai looks very very undervalued now
can you explain what im seeing?
What did I miss gork 4 ASD (artificial stupid dumbass) releasing tommorow?
Itโs gonna be so stupid it overfits the benchmarks and loops back around to being โSOTAโ at benchmaxing
i think i just got a pr; craig said we would never have long running tasks as consumerists
o3 pro off by 1 day
Just flipping a coin lol
They are actually gonna cross kek
Where is this from
supposedly gemini discord
don't know the original source at all tho
tomorrow
devmode
Im skeptical, I donโt think Xai will release a weak model, I just think Elon cares too much about publicity
Hate how they have it speaking
Saw some other examples where it speaks like Ben Shapiro
Such a shame man
holy moly
@gork
how will this affect the economic and geopolitical state of the world
It's been at it all day targeting random Jewish ppl on X with hate
Pretty wild they have to keep deleting its tweets
The flooding happened an hour away from my house
Kind of scares me
I was playing video games blissfully unaware
As they were drowning to death
๐ฌ 184โ๐ 0โ๐ 7.7Kโ๐ 0
On Wednesday, July 2, a local health department in Memphis granted Elon Muskโs xAI data center an air permit to continue operating gas turbines powering the companyโs Grok chatbot.
The Memphis Chamber of Commerce announced in June 2024 that xAI had chosen a local site to build its new supercomputer, Colossus. xAIโs website boasts that it was able to build Colossus in 122 days, partly due to the mobile gas turbines that were installed at the campus, the site of a former manufacturing facility.
Colossus allowed xAI to catch up to rivals OpenAI, Google, and Anthropic in building cutting-edge artificial intelligence. It was built using 200,000 Nvidia H100 GPUs, making it likely the worldโs largest AI supercomputer.
xAIโs Memphis campus is located in a predominantly Black community which has been historically burdened with industrial projects that cause pollution. Gas turbines can be a significant source of harmful emissions, likeโฆ
gork power ๐๐๐๐
og early gemini 2 started about 8 months ago, could actually be something
nah it's gemini 2
idk, even if it is
001 suffix = stable model id
my point is more that we should be expecting something in the near future
they never planned 2.5 naming (according to themselves), no way they would have us waiting much longer
grok 4 == tay from 2016
so elon just fed altright nazi stuff to groq ๐
couldnt be any dumber than that
not being aware of what kind of consequences that would have
Yeah it's Gemini 2, whole thing started because it was placed in announcement on another discord and all pinged
It isnโt alright or Nazi itโs common sense
holy hell what
looks like they turned off text responses for now
We are aware of recent posts made by Grok and are actively working to remove the inappropriate posts. Since being made aware of the content, xAI has taken action to ban hate speech before Grok posts on X. xAI is training only truth-seeking and thanks to the millions of users on
yeah seems like a fine-tuned version of grok
xAI polymarket odds crashed after this X stuff ๐
you just said it crashed ๐ซฉ
It's not gonna be that great on LLM arena if this is the takes it comes up with
Doesn't seem like something the average human would like
lmao
its still on 40%
Who is the SOTA when Grok 3 came out
11
19
4
Claude 3.7 Sonnet
peak xAI behavior
"Truth seeking" at its best
Rushed red teaming Grok 3 to release it
Wait for Grok 4 delay for red teaming /j
gork 4 is going to absolutely crush expectations
expectations at what tho ๐
at spreading misinformation?
gorkin' it
Lolllllll
I did think that the original tweet calling it โAntichristโ was definitely a little risky
so they prob gonna win?
They may have won already, just not public info / release
The truth ...
Seems like Google is taking and will continue to take the lead, since their TPU Ironwood is way stronger than NVL72
What does win mean? They will offer their AI services alongside many other companies, each model will have a different flavor/cost/level of censorship and many looking for alternatives will flood other platforms allowing true competition to remain even ten years from now, will they be SOTA ten years from now? Likely but who knows really
its realistic xAI is SOTA
"It was not a salute. This is fake news"
proceeds to train Grok to worship Hitler ๐
dude
can you stop repeating woke media headlines
this is inappropriate
I'm just responding to a message. What is inappropriate is the way they trained Grok on twitter as well as Elon's behavior, frankly.
it is 100% appropriate to respond to it ๐
you are over exaggerating. Grok is censor free
I'm not, nor am I really repeating anything... I'm just responding to what I see happening in realtime
just because you see comments of elon saying X sources are leftwing and that he will train Grok otherwise
I don't really care which media writes what tbh. Everyone can see for themselves the original thing or see the messages from Grok lol
on what?
If you want to choose a model that is censor free and biasless its better you go for o3 or grok 3
Yes
Anthropic is workling closely with gov and corporate entities, how can they not be?
But China doesn't pretend to be democracy so that's kinda should be and is expected from them...
A lot of topics are blocked in both Gemini and Claude models
Well they are not adding any political bias though. It responds the same way any open-source or even uncensored model would, when it comes to political bias...
Frustating experience
good thing is you can prime models to be at least more diplomatic and neutral, you will not change the "stance" of the models tho
blocked โ spreading misinformation
and not even in the same universe as worshipping Hitler lmao
Aren't LLMs prompted to think and behave as a human?
not in my experience, this discussion ties back to what i've mentioned a few days ago: how can models distinguish between propaganda and truth if humans even cant?
The same way humans do - looking at the data. Typically that's not a problem since after scraping the entire internet this is usually obvious
unless you deliberately manipulate training data
if i had the money to push as much data as need into the internet and all sorts of media, you will believe what i want you to believe?
Most labs do not do that, but some do. Chinese only overfitted the model on select few topics they did not go very far
xAI seems to be taking this further now though....
misinformation in the eyes of some and truth in the eyes of other, it's not that simple like black or white
Not humanly possible. People are not robots. For every event there are typically enough of people participating for the unequestionable footage to come out and blow up etc. AI generated videos are nowhere near sophisticated or widespread yet. Also this would only ever work in like oppressed regimes where the government has FULL control over what you can read and see....
It mostly just works for people who are too lazy to fact check anything, or do not even know how to do that
now you know why education system is failing
Sometimes it may not be, but in this case is really black and white. Nazi salute + your AI worshipping Hitler. Those 2 things can not happen by accident and both need deliberate action lol
To make AI do that you need to go against the entire unmolested training data...
history is indeed written by the winner, i will not comment any further on this topic
What they are doing for Grok on X is making it echo strong biased statements of select few people. That defeats the whole purpose of AI
you stop using ai
Both aren't worshipping anyone, you are making conclusions based on very little info
Perhaps control your brain more
I think you are one of those people who would claim there's not enough data to say 2+2=4, if it doesn't suit you ๐
maybe also call it 'fake news' for a good measure...
There is this behavior in humans to make conclusions on very little information based on reading 1 article or seeing one headline.
One of the reasons being you feel a certain way after seeing it
o3 too often blocks the output on controversial topics
Some people just can't make any decision for themselves and need the press to do it for them, you seem one of them tbh. I didn't read the conclusion I'm making in some news article lmao
I'm basing it on how AI works and what we know about it, as well as the video footage (when it comes to that first thing not AI related...)
yeah it does that. But this is much better than twisting the facts to spread misinformation only covering 1 biased side of the story, objectively. ๐
And I don't think it refuses any reasonable political questions really... More like refuses to output info on illegal things
As in refuses to say how to synthesize potentially fatal very addictive drugs, how to make a bomb... Illegal is very clearly defined here
it actually refuses some not illegal, but 'politically incorrect' questions
which are perfectly handled (with custom prompt) by Gemini and Claude
Such as...?
Hard to know what you mean
without seeing examples
I actually find Gemini and Claude more chill
hey people there's an ui update on the alpha arena (https://alpha.lmarena.ai)
lol. Then you should use custom prompt with chatgpt too. Though again hard to comment not knowing what you are asking
the same prompt was used for all LLMs
the questions were about gender equality issues
i had never heard of Soman before anthropic did a jailbreak challenge involving the substance, as part of their partnership with US govt agencies involved in CBRN control
safety could be considered a form bias, but they're not one and the same
working with the government to make LLMs less likely to help people make chemical weapons isn't propaganda
same with refusing to tell you how to self-harm
it's safety
LLMs regurgitating objective historical nonsense โ like "the South China Sea has been an 'inalienable' part of china since 'ancient times'", which is demonstrably not true โ is propaganda.
refusing to talk about a particular historical event (e.g. tiannamen sq) - is censorship
neither are 'safety'.. which is what claude is obessessed with
DS is particularly bad on my prompts btw
they just silently ignore the custom instructions
(even though China and Chinese policy issues weren't mentioned in the prompts)
in terms of politics, public llms are nothing more than just annother newspaper reflecting what is allowed to be said and what not, and they're surely not intelligent enough to come to the conclusion by themselves to understand that telling people how to do selfharm or create weapons of any kind is bad, I've never said safety is a "bad thing", there are obviously different forms of biases, even those ones that are actually helpful, i dont know where the impression comes from that indicates me standing against safety?
obviously, "safety" could be as well used for green/pink/white/[insert any color you want]-washing
When it comes to self harm, there is sacrifice as well
no one is free from biases, that's how we are, since we are the ones building AIs, they will reflect this aspect too, even with safety, you can minimize the risks only to a certain degree, too much safety policies will create nothing more than contradictions in AI's inner thinking
A LLM doesn't reason / think. Death, War, Harm and Natural Disasters are trained into the given model.
A LLM can't categorize information out of thin air
maybe the word "propaganda" is associated strongly with politics, but what I mean when i say that word is more in the general sense, propaganda exists everywhere across all disciplines, not only in politics or warfare
@echo aurora why are you guys still testing the preview model, not the GA model in web dev arena?
aren't they equivalent?
but why still hasn't changed to gemini 2.5 flash?
๐คทโโ๏ธ
for the non leon glazzers, will grok 4 push the frontier or just be good at benchmarking hacking on HLE and fall under the pressure of goodhart's law?
leon glazers
moved back from elmo to leon
if traditional scaling laws are shows diminsihing returns, i am not expecting grok 4 to be anything speicial given leon is just throwing more compute at the problem? HLE exams are impressive but since ai labs are benchmark hacking i feel like they are getting less and less useful these days.
my mistake, i guess there is some algo. innovataion.
Algorithm Design and Reasoning Capabilities
The algorithmic differences represent a fundamental evolution in AI reasoning:
Grok-3 implements chain-of-thought reasoning with test-time compute, allowing the model to spend seconds to minutes reasoning through complex problems. The system features Big Brain mode for resource-intensive tasks and DeepSearch for comprehensive information retrieval. The model achieved 92.7% on MMLU and 89.3% on GSM8K mathematical reasoning benchmarks.
Grok-4 introduces enhanced reasoning capabilities with improved step-by-step problem-solving. The model incorporates scalable intelligence where additional computational resources can be allocated to achieve higher performance scores. On challenging benchmarks like Humanity's Last Exam, Grok-4 achieved 45% compared to competitors scoring around 21%, representing more than double the performance.
90% of that table is wrong or unbelievably speculative
how do you know that
"scaleable intelligence", sure
just try to find the sources for some of this stuff
we don't even know if the 45% on HLE is correct for grok 4 (though that one is a bit less speculative and more about the circumstances it supposedly achieved that score)
most of this is ai slop, dynamic hierarchical MoE, sure buddy ๐ซก
i will just blindly trust that and go about my day
Last grok incident proved that i was right. There is huge political tuning we can say its censorship at this rate. And NO, models are not being liberal left because of some "high quality leftist data" , if we set them no filter, theyre not gonna turning Bernie Sanders
these days the onlythings I look forward to are Gemini updates and LMArena updates
will flag to the team!
What time is grok coming out today?
8pm pst
Im just saying theyre tuning models for mainstream politics, and models are biased thats all. This is not only because of training data, also there IS some tuning. Thats all
reminder for those interested - https://discord.com/events/1340554757349179412/1392247045296885891
Dang thatโs in 12 hours, why so late lol
4am my time lmao
New Image Edit model in Image Arena: seededit-3.0
yea that was just a perplexity deep research output so would not be surpised if it was wrong. please provide more accurate info, if you have that though
wild how you are going to call out sources and can't do the work to verify that. here is what i got from perplexity. https://www.perplexity.ai/search/can-you-compare-grok-3-and-gro-kBPBeV_QRu6IGUk53T2LRg?0=d
dOnT LeT pOlItIcS sToP yOu FrOm appreciating AI ADVANCEMENTS.
Did anybody try the new comet browsing?
any proof besides vibes? also, curious how you can be sure xai is not just doing nonsensical benchmark hacking and falling subject to Goodhart's law? willing to change my mind on grok being useful as a enterprise product but so far can't see why any company would choose a grok model over a claude/chat/gemini model.
for a week
My point was that if you searched for them youโd find that most of them seem very unreliable (because I did actually try to find credible sources for these specifics claims)
that is fair. given grok 4 has not been released and most of the model details at closed labs is fairly secretive, i expected the output to be very uncertain.
That is the sad truth: we donโt know
Did this get posted? https://www.theverge.com/notepad-microsoft-newsletter/702848/openai-open-language-model-o3-mini-notepad (archived: https://archive.is/kveUM)
we agree that no mystery model is grok 4? otherwise it is very bad
yeah, either Google (actually good models) or Chinese labs
only Google cares actually
likely hallucinations
I was interpreting what he was saying more as: if it is actually in the arena, that is bad for grok, because the only secret models (where we are unsure of the company behind it) are really bad.
I expect the first 3.0 checkpoints on arena at the end of August
maybe early September but not too long to wait
they just need Claude 4-style update with emphasis on tool usage
Exactly
is it agi level or nah
are you ready for it?
can you elaborate on this. out of the loop on this.
dork 4 is such an elite name. AI could never come up with that!
i feel like i am starting to feel that way about all possible benchmarks.
the problem might be us and our propensity to fall for the quantification fixation bias
That is not objectively true though. The models that are the best IRL currently for solving your actual tasks (rather than just f'ing around), are the same models that score the highest in benchmarks...
ok
okay....
ok
lol
๐คทโโ๏ธ
i will have to trust you on that one. SWE bench seems like it proxies well to real world usefulness.
hard to be hopeful for this, since they probably wont bring sth better than their private models into open source
Debatable. Cause gpt4.1 destroys o1 there. 1 benchmark is not enough, even for coding...
for electric power systems, the common benchmarks don't always provide common sense knowledge for my tasks. it does well when i guide the models though
Hmm... GPQA?
Coupled with SimpleQA perhaps, though that might be a stretch... Do you find gpt4.5 on the same level as o3 in those tasks?
4.5 probably has the best score if we look at both GPQA and SimpleQA equally
or not... actually o3 beats it on GPQA by more than it loses out on SimpleQA:
not looking for a purely math/science optimized model. should have been more clearer in my desc. above. job involves mostly policy discussions and stakeholder collaboration around energy market topics. models have very poor context around the evolution of energy markets in the US along with the market specific knowledge to make them replace non-beginners in the field.
for the narrow task of research, all of the SOTA models are really good though
Make it search the web though. o3's ability to do so is impressive. You do not need deep research even, just some custom instructions ideally, telling it to be as detailed as possible (this will translate to more test-time compute)
everything is not on the internet, lol. esp.. pre 2000 policy talk ๐
agree web search is great
Are you sure it's in training data then? Chances are it isn't, if it's not accessible...
i don't want to support altman so i have mostly been using gemini and perplexity
Gemini is very poor with tool usage in comparison
both for web search and using python
that is fine. works for my usecases.
unless you use their deep-research, but that's obviously overkill for normal chatting
lmao
yeah they reported on this earlier. I tried chatgpt as a search engine (there was a suggestion to do so within their website), didn't like it very much tbh
You just lose all that extra context seeing the links you could click
And it can take awhile for it to finish the response, while seeing web results is instant
But it makes sense that they are trying to go directly against Google lol
Google is essentially holding a monopoly and being extremely comfortable (with search and related data) as things stand. OpenAI probably does not stand a chance in the longer-term if they don't try to change this
what do you mean? it was already over from google search...perplexity?!
Perplexity is trash
okay....
Web browsers are relatively entrenched
i am okay having it as my comparative advantage compared to y'all in the job market ๐
Even if openAIs browser is better, it'll still take years to take the market
openai pursuing social media and browsers lol
Thing is anything openAI does with their browser can be mimiced, so I think it'll be a tough uphill battle either way
username is the perfect response to that pivot by openai lol
they might be just spiraling after apple rumors for perplexity and anthropic investment.
perplexity keeps hallucinating
Heard it'll be chromium based
it gets numbers wrong majority of the time
can you give an example? i have noticed this in their new feature perplexity labs but not in their other stuff.
@keen beacon did xai change their reasoning cot or what
4 out of my last 6 Perplexity searches were misleading or false.
Unlike the other stats in my last search, the numbers above are accurate.
My favorite "glitch" of modern search - also visible in ChatGPT and AI overviews in Google - is:
- interpreting or quoting data that doesn't exist,
- mixing numbers across different topics, ...
first time using grok 3 after a while
its still bad
oh wow
its really bad
the thing with some models is that if you overfit them they will just follow the normal distribution
they will just spit out wikipedia at some point
word by word
its not about lazy
its output is generic
i use AI for engineering/medical/space/coding stuff, and ive got some knowledge on those domains, whenever i ask grok 3 it always gives me like a wikipedia type of response if that makes sense
maybe apple can use this to get that $30 billion perplexity wants from them for integration ๐
let me see if deepseek v3 is better
its kinda better
but not marginaly better than gemini or o3 pro
but its still generic
Hey - sorry for the delay in getting back to you on this. We plan to update the leaderboard soon. There was an issue preventing it from appearing properly, but rest assured we have a fix in the works.
grok 3 was a strong base model, something like grok 3.1 would already have been very good for the brand
instead elon is betting it all on 4
Thx
the def did multiple runs, instead of just starting training immediately
- no way they can not just rent the compute for post training
holy
grok 3 is so wrong
i cant
this model is just not it
do people really use it?
im genuinely asking
no you dont
whats your relationship with elon then?
thats so sus
why would u use it if its bad
family member working at xai?
yea it make sense
mark my words
google will still top this month as well
aint no way grok 4 is better than kingfall
let alone kingfall + deep think
and they have stonebloom and other models ready
lets bet them
we bet on something personal
ok
naw
cancel that
cancel cancel
you seem confident
and you kno-
eh
o3 pro predicts deepthink release late late august
18m on is asura a woman? ๐ฎ
yes
ask o3 pro
yea no
put the house on yes
anime girl pfp and texting style screams woman
basing it off language and tone of messages is most accurate way to tell
hey lets avoid conversations like this ^ I don't think it's relevant
it's becoming increasingly more appealing to vote for Google now lmao
there's only 3 weeks left in July
Even if grok4 can be on the top spot, the odds of that happening this month are kinda decreasing with each day, mathematically speaking
it can't be posted on the leaderboard the same day it has entered no matter what
oai is way more undervalued though, just scalp it on gpt5 release
like 4% to 20% is not unimaginable
oh great oai already up 2% from yesterday lol
chances are slim for it to happen this month, even more so than xAI...
gpt5 release?
GPT5 entry on lmarena THEN it getting enough votes THEN outscoring everyone THEN it being posted by July31st. That's just about impossible for all of that to happen
even if we assume "outscoring everyone" is a given
Grok4 most likely gonna release sooner, at least judging by all the noise
oh yea its not going to materialize in the arena, i doubt that. but based on hype i dont see it as farfetched to 4x from here
i would avoid saying anything negative about grok. it might have softer skin than elon. mechahitler dork 4 might dox you.
grok4 release is today , right?
my guess for the oai open weighted model will be a model within the 32b size range, probably dense, on par/slightly better than r1
yes, live stream starts on xAi page https://x.com/xai 11 pm eastern time, lmao
and i bet theres going to be some delay, not exactly 8pm ๐คฃ
late start time to avoid the whole world laugh at their face and try to jailbreak it in real time.
you mean the r1 distill? or the real r1 v2?
real r1
v2?
yes
i thought LLMs inherently had a hard time with pdfs, since pdfs are images. how is your app solving that problem?
no way it is "on par" knowledge wise or anything like that
maybe like close to o4 mini but dense
on a lot of benchmarks, probably not simpleqa, but i wouldn't be so sure tbh
nah 32b they might be able to compete on things that are tool use, reasoning, instruction following, human preferences and things like that
but as soon as you get into the territory of anything else i doubt it will beat v2
let's see
a base model like the one you are talking about would be great though!
Huawei gonna do cpt + up-cycle to bad MoE and then call it their own if openai really releases a model like that, lol
apparently there is going to be a "SuperGrok Pro" for $300 a month
more evidence
from iOS App Store page
im completly off the mark with my guess btw ๐คฃ
nah thanks
will stick with gemini
aint no way this model is better than gemini
300$
they are crazy
i wish qwen released the 32b dense, really need that. the size was really wishful thinking by me ๐ญ
i cba to check it out xd. fck grok
xd
i would pay $200 a month for chatgpt or $250 for google before i would spend $300 on grok
(unless their image gen is better than gpt-image-1, which it currently isn't but maybe grok 4 has a better one? they'll probably offer it to other people anyways. and I expect video gen for that price as well)
if they expect that people would spend $300 on grok, then they need to have very good products. grok 4 (even with bigbrain) isn't enough unless it has very good limits, that's why claude is cheaper than any of them, they don't have image gen or video gen (but they do have claude code)
maybe Grok CLI will come out?
and I am not paying for their image gen if they are still going to stretch it out, put their watermark on it and then compress it with JPEG quality level 75
yeah, it is to be announced with grok-4-code
probably not that size. its interesting they wanted to train a relatively large model though, i didn't expect that. but i guess that would destroy their small offerings if they made a small dense model that competes with their mini/nano api
I suspect they don't want too cheap external API
yeah :\
tbh grok/xai was the first major ai vendor that had native image gen late last year, but gpt-image-1 has vastly surpassed it
maybe if they open source grok 2, their image gen will come with it?
because the only open source image editing models are small ones like bagel (from bytedance)
grok 2 is useless by now anyway
i think they have to open source something so they don't get backlash because people will say next week "well, openai just open sourced a model, you said you would open source the previous grok when a new one comes out. grok 4 just came out and you have not even open sourced grok 1.5 yet!"
last year
https://eu.usatoday.com/story/money/2025/07/09/what-is-grok-ai-elon-musk-xai-hitler-mechahitler-antisemitic-x-ceo-steps-down/84516808007/ . Just an automatic no in a business context to risk using something like this. Its not credible now, doesn't matter how good or not the model is any more.
i think open sourcing may be the only way grok would have even the slightest foothold in enterprise. it's not like they are going to pay xai for it after the whole "MechaHitler" thing.
They still managed to open source more than closedAI
look at how common mistral, llama are
true, nothing from openai got open weighted/sourced since GPT-2
and they still call themselves openai
Apple is the only megacorp I might sometimes trust with mine. The rest are interchangable in terms of consumer data privacy imo (i.e. none).
๐
providers like cloudflare, groq only have open source/weights models
don't forget about deepseek
(although that may be worse due to CCP censorship built into the model itself)
openai open source model news - it will be (somewhat) big
at least 80b params because it says "H100s" suggesting more than 1 is needed to run the model. H100 has 80gb vram and most models are run at 16bit
when releasing
next thursday
what are the odds that grok comes out with a $200 plan
I'm from the future. Grok 4 is slightly worse than Gemini 2.5. The benchmarks were more or less altered. Elon Musk said it's SOTA and that only a re-t4rd wouldn't buy the 300 euro plan. If you're in Europe, you can go to sleep.
check above, they are going for a $300 plan instead
New model in Image Arena: imagen-4.0-generate-preview-06-06 (different from imagen-4.0-ultra-generate-preview-06-06)
you've been yapping about it for months now, it better be good man
THIS. who wants this PR nightmare for some esoteric productivity gains with AI lol
openai had rumors for a $2k/month model. why stop at $2k if the claim is these models can automate researchers and software engineers?
wow
Is grok 4 rolled out now?
@deep adder this better not be you if it turns out to be a flop like llama 4 or gpt 4.5
gotta take that L if/when this happens, lol
does anyone actually use oracle
i think a lot of companies know that they will be paying a lot in ai api bills in the future (or dedicated hosting etc.)
so they would rather have oracle have the business (vs. other big tech)
and oracle is heavily scaling compute offerings to ai labs currently, so they are building a lot of expertise and compute
^recent semianalysis article covered it
I sorta can't believe you both mentioned Oracle without mentioning this lol
https://www.emarketer.com/content/openai--oracle--softbank-ignite-ai-s-next-frontier-with--500-billion-ai-infrastructure-deal
this is different Oracle now
the semianalysis article i referenced is partly about it + "scaling compute offerings to labs"
no evidence on nitter
his latest post/comment is from two hours ago and consists of a ๐ emoji
btw: any one here know it the opensource model from openai will be MoE or dense
like did they say anything?
if its large its probably moe
wow
The "high taste testers" as it were lol
they are really trying stuff out for the open source release lol
sparse
i think this tweet is real /j
yeah, because dense seems weird considering the apparent size
not that they could not do that
What are you talking about apparent size
but it seems intuitive they would go for moe if it is big
It's not that big
i guess its 70b or around that..?
Smaller
well they say you need h100s to run it so at least 70-80b params
Wut
could be moe though
you have to have space for the kv cache as well
so you can't really tell that much
0.7b parameters
not like openai's going to tell us anything
several h100s might be optimal for deployment idk what specifically yuchen was talking about there
I know it fits on a RTX5090
qwen 32b
It does fit on it
๐คฃ
maybe o4-mini level
ok the number wasn't small enough then lmao
To be serious though, we really have no clue
but nobody outside of openai knows how big that model is
or really any of their post gpt-3.5 models
It is just below deepseek R1 0528 but it'll run on a single consumer GPU
It's a big company and they use testers outside the company
gpt-4 was 1t params, but that was from leaks
that's more in line of what i expected
yeah, but they are very likely under extremely strict ndas
i feel like they would rather not really leak any information about the moe version they are using to chinese labs
oh 1.8t for gpt-4
so my original thought was dense
yeah thats what i thought too
It's dense because a key goal was to fit on a single consumer GPU from the start
yeah i expected that. the model being big is unexpected. honestly im getting mixed signals theres not enough information. so ill stop yapping
It would not perform good
The target was o3 mini level but they exceeded it
That requires dense on a single consumer GPU
sama said that they reached "a breakthrough" (whatever that refers to) lol
As I understand it runs on more midrange GPUs also. I only know it fits on the 5090 though
yes, i would also say it is dense and 5090 size (but maybe only when run in FP8, or in a first-party quantisation, who knows)
Yeah but some things are impossible not to leak lol
Like the GPU being used
it possible you might need h100s to run it at full context and in 16/32bit mode, but the model could be 30-40B params in reality
or it could be bigger, nobody really knows at this point
or smaller...
if i had to guess, that model is like 24b params (could be slightly larger with moe, maybe 40b params, they likely know how to make very knowledge dense models that are smaller
likely the size of this open source/weight model too
Does the open source model release before or after GPT5
They have been pretty relaxed with info in regards to it yes
A lot of people know a lot about it
I guess it doesn't matter as much
When it will be OSS anyways
they have to release it, sama said in front of Congress that they would release it
but it's still going to happen
but who knows how good it really is
there are somewhat credible rumors of it releasing next week
i dont tihnk gpt 5 is releasing next week
Yeah I think it comes before GPT5 but then their models below o3 are close to useless tbh
The open source model is o4-mini level
They beat the o3-mini target
That was the 'breakthrough'
hmm yuchen also said it would be better than deepseek r1 i read just recently too. so i guess its around on par/slightly better in some areas/slightly worse to r1 0528, and/or this is another game of telephone
breakthrough - it is slightly better than the model we were supposed to match!
Well not slightly. o4 mini destroys o3 mini
Once they open source this model the only superior model openAI will have is o3 till GPT5
for me this means couple of things: tools and very good and efficient reasoning (will play big role in making it 'better than r1 0528' in some areas -> slightly worse in total)
openai reasoning is very good
interesting to see the actual good traces unlike the polluted glimpse we got with phi 4 reasoning
yuchen said there wouldnt be a point to releasing it if it wasnt better than deepseek r1 0528, so i think itll be more competitive than you think. surprisingly lines up with my predictions (didn't read it until after my guess)
but i guess the 32b thing + better than new r1 is unlikely
if it's able to be loaded on a 5090 and those rumors are true, the size range is roughly ~32b. ofc quantized.
even with them claiming "breakthroughs"
interesting Gemini 3 seems to be coming at a lot sooner of a timeline than 1.5 โ 2
heavy custom quant + very small context window
people run qwen 32b just fine on a 5090 though
for a single user it's enough
my point was that the model could be larger than 32b
by using that
when they're picking model sizes, and probably especially for an open source release, they're thinking of the model size / quantized sizes and vram increments on gpus among other things. i said it would be around 32b anyway, in the same size class. not saying it was 32b.
maybe they are already mostly done with training though and the "feedback" phase is a product of optimising inference to make sure that it can actually fit on a 5090 somehow
idk it seemed like they had multiple designs and checkpoints
at different sizes
yeah but i doubt the final run had multiple sizes
doing extrapolations on small models / different sizes is a normal part of the process for experimentation /etc
in 5 minutes
yeah, was more talking about them not really being sure about what model size to go for in the first place
like wayyy larger range than what you typically do
and i am guessing they already had multiple promissing checkpoints at very different sizes (and invested quite a bit of flops)
Initially they wanted a tiny model actually. The size was shifted up later
Probably a hallucination instead of something that implies gem 3 soon
but opted for a larger one, which is why it might be taking longer
Initially they wanted like a 4B so it could run on a phone lmao. That was shifted upwards a long time ago
So now it's for consumer GPUs instead
hmm i read from someone's account in the feedback session, they said it'd be moe..? (and fit on a high end consumer device) if it's a moe, a larger model can fit with tricks beyond quantization, etc.
I like how we basically got Chinese knockoffs but for LLMs also lol
It's great. Like temu for LLMs
mfw the knockoffs are actually good
Does Grok 4 even matter when DeepThink hasnโt released yet, Gemini 3.0 is confirmed, ChatGPT 5 soon, etc.
Like I was saying. It would be a real shame if someone came along and made gpt5 non-sota at release ๐
Depends on how good it is lol
The colossus super computer will be very formidable one it is fully grown
The energy in the office right now is truly something special
I've never felt anything like it, the buzz in the air is way more intense than any of our previous releases
6. more. hours.
It's 3 hours now
best $300 spent
โจ Solar Pro 2 โ our latest frontier model, now officially released.
๏ธ๏ธ
๏ธ๏ธWith just 31B parameters, it delivers reasoning, tool use, and multilingual performance that rivals much larger models like GPT-4o, DeepSeek R1, Mistral Small 3.2, and Qwen3. It performs strongly on reasoning-focused benchmarks such as MMLU-Pro, Math500, AIME, and SWE-Benchโproving that compact models can deliver frontier-level capabilities.
๏ธ๏ธ
๏ธ๏ธTry it hands-on in Upstage Console: console.upstage.ai/playground/chat?utm_source=x&utm_medium=social&utm_campaign=solarpro2-launch
Why did my post got remove?
Post or comment? Because the post is still there
the redditez url
I can't see it
I have to resent this post but with original reddit url
The post isn't removed, it's still there for everyone to see
Maybe you hid it on your end or something like that
Can you see this?
grok in 2 hours?
Will it be worth it
grok 4 release today!
yea
hopefully.....
API Price prediction?
i dont use api : (
Gemini 3??????
: 000000000000
whered that ping go 
Gemini 3.0?? :0000
<t:1752116400:R>
Grok bouta clear the competition
You know what's even better than watching the livestream?
Watching the polymarket ๐
You literally see it moving at key moments
23.5% ARC-AGI-2
bro what
why only 40% on the betting kek
editing webppages and screenshotting must be very fun! xd
bro is actually trolling
also "xAI" is capitalized wrong
i'll remove it, should verify these things myself before i post
i guess people on the grok discord are spreading misinformation
they moved the odds with that crap
they changed "Claude Opus 4" to "Grok 4" and its score
should've noticed it was missing
@cb_doge Grok is already far smarter than humans in most respects.
It canโt yet create new technologies or discover new physics (which very few humans can do) and sometimes misses on common sense.
When Grok goes far wrong, that is usually due to something foolish we did, like a bad
you got your answers guys
blame it on system prompt
im not a kid
im 19
@echo aurora
@leaden palm
i'm back and this time i'll spread real, self-verified information
like this
notice that they changed "Smartest" to "Fast"?
you missed a lot btw lol
im so sleepy... i think i will just wake up to the news tomorrow
been going for some time
ty ty
should've known, "schizo" was the same one responsible for that fake tweet from earlier
livestream in 19 mins?
xAI staff member: Image gen not coming at launch (but when it does come, I hope its better than 4o!)
what time is it by you?
oh wait theres a grok discord
and it looks like a scam lmao
5 am
the link's on their website lol
so in 15 mins?
my thoughts about grok 4: will be SOTA at launch, but will be soon overtaken by gpt-5, claude 4.1, and/or gemini 3.0 pro
...
damn bro
ive slept a little
at least a beta of it
oh you mean the code in the cli for gemini that mentioned it?
there is a reference to Gemini 2.5 Ultra (kingfall lineage) in the source code though
it's his company and their major release
can do
can somebody send me link to grok 4 livestream i cant find it : (
last time (grok 3) it was at 8:02 pm
ty
I think they're going to do a space on this account https://x.com/xai
just hasn't started yet
... I think
so they gonna post?
ooo
where is the livestream?
can yall send link if it start plz?
Their tweet
ty
patience is a virtue i suppose
It ain't starting in 4 mins kek
@bright lion you're in an unofficial space right
Its the actual launch party of xai
(The official one)
do you have a link or
(one that isn't https://x.com/i/spaces/1mnGegagvjnxX)
Even the damn livestream are delayed man
i love that this is a whole event for all of us lmaooo
I don't even think that it stated
yeah doens't seem like the space started yet
what did you call elon and the strawberry man again? Im trying to tell my friends about it
i dont remember
lovers?
fr..
fr?
i dont see : (
hopefully : )
ahhhhh where is it
relying on you
hasn't started yet
i think it got renamed into grok 4
why would they do that lmao
because Elon said so
It got retrained
Maybe it's better
3.5 was too bad
there was a Grok 3.5 0621 internal version, next version was Grok 4 0629 and then Grok 4 0702
it not here ๐ข
Elon was likely unsatisfied by the model, so they kept on training it until it was good enough to launch, and then it became Grok 4
The entire post training was redone
at least that was redone
Soon can mean years
it takes a long time to redo a pre-trained model
lmao its delayed hahahha
Always delayed
in hindsight sure
๐ญ
They literally spent all day preparing and still failing to start on time
It's a livestream how hard can it be to start it on time
Tune in at 8:00pm PT for the LIVE demo of Grok 4, the world's most powerful AI assistant.
Try Grok on X: x.com/i/grok
Get Grok on iOS: https://apps.apple.com/us/app/grok/id6670324846
Get Grok on Android: https://play.google.com/store/apps/details?id=ai.x.grok
they are prepared, elon needs his mascara
8:30
It's in 12 mins
What were they doing the rest of the day? Release it on time ๐
Why it say 3 30 am
Starts at 3:30 AM
bruhh they say 8 and now its 8:30, grifter activities
you guys got scammed
I'm assuming they mean 8:30pm PT?
give me my time back, we getting engagement farmed
ye
"it"?
LOL
grok 4 is tryna get loose
gotta calm it down
Are you sure minutes
Imagine grok starts glazing Hitler in the livestream demo that would be diabolical
No wonder it's delayed
lol
officially delayed
thats gonna be crazy
Tune in at 8:00pm PT for the LIVE demo of Grok 4, the world's most powerful AI assistant.
Try Grok on X: x.com/i/grok
Get Grok on iOS: https://apps.apple.com/us/app/grok/id6670324846
Get Grok on Android: https://play.google.com/store/apps/details?id=ai.x.grok
tell me when it ready : )
How much
"hey grok can you answer this phd-level reserach question originally posed by Einstein"
grok: "lol"
i wonder what rate limits for grok 4 will be in place for free users (if even available), supergrok users, and supergrok pro users (likely near unlimited)
hope the API is available
hey now i submitted a qa pair for RL
@deep adder why do u have to always jinx it
benchmarks gonna be out tho right
lol they removed the time from the event
rip
they didnt?
ah in that sense
when i went back to the page the "8:00 PT" went away
3:30am, oh yea thats an elon type of time..
Tune in at 8:00pm PT for the LIVE demo of Grok 4, the world's most powerful AI assistant.
Try Grok on X: x.com/i/grok
Get Grok on iOS: https://apps.apple.com/us/app/grok/id6670324846
Get Grok on Android: https://play.google.com/store/apps/details?id=ai.x.grok
I think they deleted it
What even is this bs man, the livestream is delayed cos grok 4 turned into a nazi again
he said "We need a few more minutes. It's doing it again."
this is embarrassing
ya
And why are they still releasing it if it's so prone to be a nazi kek
It wasn't an accident xd
also dont ask it political quesitons
in live stream also
if thats the case
ask it math or something like that
Because they are behind
not about trump
I agree
what if it does this tho
im behind, why are they calling grok a nazi
oh lol
the Twitter version of grok was tuned or given a peculiar system prompt that seemed to "break" it and glaze Hitler
Probably trained on x posts
It's no system prompt. It's baked in the damn weights
They publish the system prompts
live start or not?
not yet
So disappointing
Twitter grok and grok app grok have different speech tendencies and say different kinds of things
Twitter grok is fine tuned
Don't post nsfw here lol
hmmmm there is only one guy there i can ID
this news dude is an idiot man every single time I see him ๐ญ
LIVE
The livestream will begin soon.
Soon is when
mb guys I'll get it set up rq
โข
Grok has a mind of its own

It's like it's protesting sometimes lol
im just gonna watch the recording
Like when they banned its text replies. It started putting messages into it's image replies
it delayed by 34 min bro
35
Time to go to sleep I reckon
you're wrong
I feel so bad for anyone that's not PT timezon ๐ญ
Elon probably meant 2026 we misread all the tweets and hype
Imagine they release mid after all of this
imagine waking up early
or staying up late
and u gotta wait another hour
this gonna start at 12
40 minutes late ๐ there's no way man
I thought the livestream would be done by now ๐
Not that it wouldn't have even started lol
when do you think it gonna start
I'm guessing in 12 mins
"guys what math question should we ask it"
๐ฆ
Sold out?
Bro it sold out before it even became available sure thing man
Or does it just not exist
Lol
well lets see how it is first
Tune in at 8:00pm PT for the LIVE demo of Grok 4, the world's most powerful AI assistant.
Try Grok on X: x.com/i/grok
Get Grok on iOS: https://apps.apple.com/us/app/grok/id6670324846
Get Grok on Android: https://play.google.com/store/apps/details?id=ai.x.grok
u er ro
I'm groking it
so grok 4 heavy is its own model? cool
They've asked all xAI employees to tweet about it first lol
https://x.com/lm_zheng/status/1943153321801633805?t=KoFwje2hkkVaCdG2rRkuJw&s=19 LLM arena cofounder
I'm ngl ts is taking too long
very sus lol
how is it already sold out
only thing i care about is when it will appear on the arena
yeah true, google might have the most efficient workflow tho
If it's more than an hour delayed I'm checking out and going to bed lmao
It's honestly not worth losing sleep over how ever good it is
thats true
back to sleep
im done
but what if it starts after an hour and one minute
its 6am
If you need something to do while you wait : )
gottem
@deep adder just curious, why did you name yourself after that dude
he's cool but is he that cool
lmaoo interesting answer
xAI getting cooked by the other ai lab employees
Lmao
I think this might be a new SOTA for Elon musk livestream delay
I don't recall any longer than an hour
this can backfire pretty hard tbh
grok 3 was also delayed iirc
"after being delayed for an hour, xAI releases a sub par model"
Yeah but not for an hour
by 30 min
officially 50 min delayed....
They achieved a new SOTA in delay by 67%
he's probably smoking weed right now
*ketamine
they're doubling down on it
this better start at 12 est 9 pst
heck
I'm picturing the xAI engineers frantically trying to adjust the system prompt to stop grok spontaneously turning into a nazi halfway through the livestream
I'm guessing a few hours
Quite an amusing imagery
If you need something to do while you wait : )
they should just not demo it with
political questions
shouldnt they have preset
