#general

1 messages ยท Page 66 of 1

hollow ocean
#

What about for product research?

sweet tinsel
#

I don't do that, maybe I will test it out later.

#

But for like reports and such o3 was the most promising always.

torn mantle
#

its not actually

#

if you really use it a lot and read through the paper line by line you wont find it that impressive

#

it doesnt really compile findings, its just parsing different infos from different pages

ocean vortex
#

OpenAI = Google > Anthropic = xAI. Though xAI are doing much more manipulation than innovation lately. In the eyes of many they are gonna get less credit than they even deserve tbh

#

On a good day and things going their way xAI could challenge the top spots, but not with this mess and publicity... People are not even gonna give it a chance

#

So they need to offset and beat everyone by A LOT, that is never happening I think

#

Deepseek the ones to look out for. The rest are significantly less promising. So they go with "but look our model is faster/smaller" instead lol

torn mantle
#

we will see

dawn wharf
balmy mist
patent aspen
#

This Discord's modal estimate for Grok 4 release date was 2 days from now

#

+/- 2 days

echo aurora
whole wagon
#

xAI 42%

small haven
indigo hazel
wintry tinsel
#

What did I miss gork 4 ASD (artificial stupid dumbass) releasing tommorow?

#

Itโ€™s gonna be so stupid it overfits the benchmarks and loops back around to being โ€œSOTAโ€ at benchmaxing

small haven
#

i think i just got a pr; craig said we would never have long running tasks as consumerists

whole wagon
#

I don't see major businesses ever using grok ngl

#

Way too much bad publicity

small haven
#

ppl are just mad sleeping on anthropic too

#

just wave riders

keen beacon
tidal schooner
whole wagon
#

They are actually gonna cross kek

whole wagon
tidal schooner
#

don't know the original source at all tho

torn mantle
small haven
tidal schooner
#

yeah nvm lol

#

coz of the 001

wintry tinsel
#

Im skeptical, I donโ€™t think Xai will release a weak model, I just think Elon cares too much about publicity

whole wagon
#

AGI confirmed?

#

Doubled down also

zinc ore
#

Hate how they have it speaking

#

Saw some other examples where it speaks like Ben Shapiro

whole wagon
#

Such a shame man

civic flame
tidal schooner
#

how will this affect the economic and geopolitical state of the world

whole wagon
#

It's been at it all day targeting random Jewish ppl on X with hate

#

Pretty wild they have to keep deleting its tweets

wintry tinsel
#

Kind of scares me

#

I was playing video games blissfully unaware

#

As they were drowning to death

tidal schooner
#

๐Ÿ’ฌ 184โ€‚๐Ÿ” 0โ€‚๐Ÿ’œ 7.7Kโ€‚๐Ÿ‘€ 0

On Wednesday, July 2, a local health department in Memphis granted Elon Muskโ€™s xAI data center an air permit to continue operating gas turbines powering the companyโ€™s Grok chatbot.

The Memphis Chamber of Commerce announced in June 2024 that xAI had chosen a local site to build its new supercomputer, Colossus. xAIโ€™s website boasts that it was able to build Colossus in 122 days, partly due to the mobile gas turbines that were installed at the campus, the site of a former manufacturing facility.

Colossus allowed xAI to catch up to rivals OpenAI, Google, and Anthropic in building cutting-edge artificial intelligence. It was built using 200,000 Nvidia H100 GPUs, making it likely the worldโ€™s largest AI supercomputer.

xAIโ€™s Memphis campus is located in a predominantly Black community which has been historically burdened with industrial projects that cause pollution. Gas turbines can be a significant source of harmful emissions, likeโ€ฆ

โ–ถ Play video
#

gork power ๐Ÿš€๐Ÿš€๐Ÿš€๐Ÿš€

unborn ocean
# tidal schooner

og early gemini 2 started about 8 months ago, could actually be something

unborn ocean
tidal schooner
unborn ocean
#

my point is more that we should be expecting something in the near future

whole wagon
#

This has to be AGI

#

Right guys

unborn ocean
#

they never planned 2.5 naming (according to themselves), no way they would have us waiting much longer

unborn ocean
solar hollow
#

so elon just fed altright nazi stuff to groq ๐Ÿ˜„

#

couldnt be any dumber than that

#

not being aware of what kind of consequences that would have

zinc ore
#

Yeah it's Gemini 2, whole thing started because it was placed in announcement on another discord and all pinged

wintry tinsel
echo aurora
#

looks like they turned off text responses for now

civic flame
#

We are aware of recent posts made by Grok and are actively working to remove the inappropriate posts. Since being made aware of the content, xAI has taken action to ban hate speech before Grok posts on X. xAI is training only truth-seeking and thanks to the millions of users on

elder rapids
whole wagon
#

xAI polymarket odds crashed after this X stuff ๐Ÿ˜‚

elder rapids
#

fr?

#

that would be stupid

whole wagon
elder rapids
#

you just said it crashed ๐Ÿซฉ

whole wagon
#

It's not gonna be that great on LLM arena if this is the takes it comes up with

#

Doesn't seem like something the average human would like

elder rapids
#

that's not the actual model

#

lmao

whole wagon
#

It points to the overall culture there

#

Which will influence grok 4 also

elder rapids
#

yo

#

it's not the actual model

#

and they're definitely seperate teams

torn mantle
drifting thorn
#
poll_question_text

Who is the SOTA when Grok 3 came out

victor_answer_votes

11

total_votes

19

victor_answer_id

4

victor_answer_text

Claude 3.7 Sonnet

tidal schooner
#

lmao

rare python
#

"Truth seeking" at its best

#

Rushed red teaming Grok 3 to release it

#

Wait for Grok 4 delay for red teaming /j

tidal schooner
rare python
#

at spreading misinformation?

tidal schooner
rare python
zenith saffron
#

Lolllllll

#

I did think that the original tweet calling it โ€œAntichristโ€ was definitely a little risky

jade egret
jade egret
keen beacon
# jade egret

They may have won already, just not public info / release

keen beacon
drifting thorn
#

Seems like Google is taking and will continue to take the lead, since their TPU Ironwood is way stronger than NVL72

wintry tinsel
# jade egret so they prob gonna win?

What does win mean? They will offer their AI services alongside many other companies, each model will have a different flavor/cost/level of censorship and many looking for alternatives will flood other platforms allowing true competition to remain even ten years from now, will they be SOTA ten years from now? Likely but who knows really

keen fulcrum
ocean vortex
keen fulcrum
#

can you stop repeating woke media headlines

#

this is inappropriate

ocean vortex
#

I'm just responding to a message. What is inappropriate is the way they trained Grok on twitter as well as Elon's behavior, frankly.

#

it is 100% appropriate to respond to it ๐Ÿ˜‰

keen fulcrum
#

you are over exaggerating. Grok is censor free

ocean vortex
#

I'm not, nor am I really repeating anything... I'm just responding to what I see happening in realtime

keen fulcrum
#

just because you see comments of elon saying X sources are leftwing and that he will train Grok otherwise

ocean vortex
#

I don't really care which media writes what tbh. Everyone can see for themselves the original thing or see the messages from Grok lol

keen fulcrum
#

Claude is biased

#

Chinese models are biased about political topics

ocean vortex
keen fulcrum
#

If you want to choose a model that is censor free and biasless its better you go for o3 or grok 3

leaden sun
ocean vortex
#

But China doesn't pretend to be democracy so that's kinda should be and is expected from them...

keen fulcrum
#

A lot of topics are blocked in both Gemini and Claude models

ocean vortex
keen fulcrum
#

Frustating experience

leaden sun
#

good thing is you can prime models to be at least more diplomatic and neutral, you will not change the "stance" of the models tho

ocean vortex
#

and not even in the same universe as worshipping Hitler lmao

keen fulcrum
#

Aren't LLMs prompted to think and behave as a human?

leaden sun
ocean vortex
#

unless you deliberately manipulate training data

leaden sun
#

if i had the money to push as much data as need into the internet and all sorts of media, you will believe what i want you to believe?

ocean vortex
#

Most labs do not do that, but some do. Chinese only overfitted the model on select few topics they did not go very far

#

xAI seems to be taking this further now though....

leaden sun
ocean vortex
#

It mostly just works for people who are too lazy to fact check anything, or do not even know how to do that

leaden sun
#

now you know why education system is failing

ocean vortex
#

To make AI do that you need to go against the entire unmolested training data...

leaden sun
#

history is indeed written by the winner, i will not comment any further on this topic

ocean vortex
#

What they are doing for Grok on X is making it echo strong biased statements of select few people. That defeats the whole purpose of AI

keen fulcrum
#

Both aren't worshipping anyone, you are making conclusions based on very little info

#

Perhaps control your brain more

ocean vortex
#

I think you are one of those people who would claim there's not enough data to say 2+2=4, if it doesn't suit you ๐Ÿ’€

#

maybe also call it 'fake news' for a good measure...

keen fulcrum
#

There is this behavior in humans to make conclusions on very little information based on reading 1 article or seeing one headline.

One of the reasons being you feel a certain way after seeing it

main gulch
ocean vortex
#

I'm basing it on how AI works and what we know about it, as well as the video footage (when it comes to that first thing not AI related...)

ocean vortex
#

And I don't think it refuses any reasonable political questions really... More like refuses to output info on illegal things

#

As in refuses to say how to synthesize potentially fatal very addictive drugs, how to make a bomb... Illegal is very clearly defined here

main gulch
#

which are perfectly handled (with custom prompt) by Gemini and Claude

ocean vortex
#

Hard to know what you mean

#

without seeing examples

rare python
keen ferry
ocean vortex
main gulch
main gulch
#

the questions were about gender equality issues

alpine coral
#

safety could be considered a form bias, but they're not one and the same

#

working with the government to make LLMs less likely to help people make chemical weapons isn't propaganda

#

same with refusing to tell you how to self-harm

#

it's safety

#

LLMs regurgitating objective historical nonsense โ€“ like "the South China Sea has been an 'inalienable' part of china since 'ancient times'", which is demonstrably not true โ€“ is propaganda.
refusing to talk about a particular historical event (e.g. tiannamen sq) - is censorship

#

neither are 'safety'.. which is what claude is obessessed with

main gulch
#

DS is particularly bad on my prompts btw

#

they just silently ignore the custom instructions

#

(even though China and Chinese policy issues weren't mentioned in the prompts)

leaden sun
#

in terms of politics, public llms are nothing more than just annother newspaper reflecting what is allowed to be said and what not, and they're surely not intelligent enough to come to the conclusion by themselves to understand that telling people how to do selfharm or create weapons of any kind is bad, I've never said safety is a "bad thing", there are obviously different forms of biases, even those ones that are actually helpful, i dont know where the impression comes from that indicates me standing against safety?

#

obviously, "safety" could be as well used for green/pink/white/[insert any color you want]-washing

keen fulcrum
leaden sun
#

no one is free from biases, that's how we are, since we are the ones building AIs, they will reflect this aspect too, even with safety, you can minimize the risks only to a certain degree, too much safety policies will create nothing more than contradictions in AI's inner thinking

keen fulcrum
#

A LLM doesn't reason / think. Death, War, Harm and Natural Disasters are trained into the given model.

#

A LLM can't categorize information out of thin air

leaden sun
#

maybe the word "propaganda" is associated strongly with politics, but what I mean when i say that word is more in the general sense, propaganda exists everywhere across all disciplines, not only in politics or warfare

rare python
#

@echo aurora why are you guys still testing the preview model, not the GA model in web dev arena?

rare python
tall summit
sacred plaza
#

for the non leon glazzers, will grok 4 push the frontier or just be good at benchmarking hacking on HLE and fall under the pressure of goodhart's law?

tall summit
#

leon glazers

sacred plaza
#

moved back from elmo to leon

#

if traditional scaling laws are shows diminsihing returns, i am not expecting grok 4 to be anything speicial given leon is just throwing more compute at the problem? HLE exams are impressive but since ai labs are benchmark hacking i feel like they are getting less and less useful these days.

#

my mistake, i guess there is some algo. innovataion.

Algorithm Design and Reasoning Capabilities

The algorithmic differences represent a fundamental evolution in AI reasoning:

Grok-3 implements chain-of-thought reasoning with test-time compute, allowing the model to spend seconds to minutes reasoning through complex problems. The system features Big Brain mode for resource-intensive tasks and DeepSearch for comprehensive information retrieval. The model achieved 92.7% on MMLU and 89.3% on GSM8K mathematical reasoning benchmarks.

Grok-4 introduces enhanced reasoning capabilities with improved step-by-step problem-solving. The model incorporates scalable intelligence where additional computational resources can be allocated to achieve higher performance scores. On challenging benchmarks like Humanity's Last Exam, Grok-4 achieved 45% compared to competitors scoring around 21%, representing more than double the performance.

unborn ocean
storm needle
unborn ocean
unborn ocean
#

we don't even know if the 45% on HLE is correct for grok 4 (though that one is a bit less speculative and more about the circumstances it supposedly achieved that score)

#

most of this is ai slop, dynamic hierarchical MoE, sure buddy ๐Ÿซก

#

i will just blindly trust that and go about my day

sacred quail
#

Last grok incident proved that i was right. There is huge political tuning we can say its censorship at this rate. And NO, models are not being liberal left because of some "high quality leftist data" , if we set them no filter, theyre not gonna turning Bernie Sanders

dusky aurora
#

these days the onlythings I look forward to are Gemini updates and LMArena updates

balmy mist
#

What time is grok coming out today?

lone vector
#

8pm pst

sacred quail
#

Im just saying theyre tuning models for mainstream politics, and models are biased thats all. This is not only because of training data, also there IS some tuning. Thats all

echo aurora
balmy mist
#

Dang thatโ€™s in 12 hours, why so late lol

civic flame
#

4am my time lmao

mossy drum
#

New Image Edit model in Image Arena: seededit-3.0

sacred plaza
sacred plaza
sacred plaza
balmy mist
#

Did anybody try the new comet browsing?

sacred plaza
#

any proof besides vibes? also, curious how you can be sure xai is not just doing nonsensical benchmark hacking and falling subject to Goodhart's law? willing to change my mind on grok being useful as a enterprise product but so far can't see why any company would choose a grok model over a claude/chat/gemini model.

small haven
#

for a week

unborn ocean
sacred plaza
unborn ocean
#

That is the sad truth: we donโ€™t know

eager mica
cedar tide
#

we agree that no mystery model is grok 4? otherwise it is very bad

main gulch
#

yeah, either Google (actually good models) or Chinese labs

cobalt bane
main gulch
#

only Google cares actually

main gulch
unborn ocean
#

I was interpreting what he was saying more as: if it is actually in the arena, that is bad for grok, because the only secret models (where we are unsure of the company behind it) are really bad.

small haven
#

wait gemini v3 came early?

#

i thought it was till sept

main gulch
#

I expect the first 3.0 checkpoints on arena at the end of August

#

maybe early September but not too long to wait

#

they just need Claude 4-style update with emphasis on tool usage

cedar tide
torn mantle
#

is it agi level or nah

balmy mist
#

are you ready for it?

sacred plaza
#

can you elaborate on this. out of the loop on this.

ocean vortex
#

OMG Dork4 gonna drop?? slothshock

#

๐Ÿคฏ

#

Dork4 gonna deport Musk

sacred plaza
#

dork 4 is such an elite name. AI could never come up with that!

#

i feel like i am starting to feel that way about all possible benchmarks.

#

the problem might be us and our propensity to fall for the quantification fixation bias

ocean vortex
split kayak
#

ok

sacred plaza
#

okay....

split kayak
#

ok

sacred plaza
#

lol

ocean vortex
#

๐Ÿคทโ€โ™‚๏ธ

sacred plaza
#

i will have to trust you on that one. SWE bench seems like it proxies well to real world usefulness.

solar hollow
ocean vortex
sacred plaza
#

for electric power systems, the common benchmarks don't always provide common sense knowledge for my tasks. it does well when i guide the models though

ocean vortex
#

Coupled with SimpleQA perhaps, though that might be a stretch... Do you find gpt4.5 on the same level as o3 in those tasks?

#

4.5 probably has the best score if we look at both GPQA and SimpleQA equally

#

or not... actually o3 beats it on GPQA by more than it loses out on SimpleQA:

sacred plaza
#

not looking for a purely math/science optimized model. should have been more clearer in my desc. above. job involves mostly policy discussions and stakeholder collaboration around energy market topics. models have very poor context around the evolution of energy markets in the US along with the market specific knowledge to make them replace non-beginners in the field.

#

for the narrow task of research, all of the SOTA models are really good though

ocean vortex
sacred plaza
#

everything is not on the internet, lol. esp.. pre 2000 policy talk ๐Ÿ™‚

#

agree web search is great

ocean vortex
sacred plaza
#

i don't want to support altman so i have mostly been using gemini and perplexity

ocean vortex
#

both for web search and using python

sacred plaza
#

that is fine. works for my usecases.

ocean vortex
#

unless you use their deep-research, but that's obviously overkill for normal chatting

patent aspen
#

lmao

ocean vortex
#

yeah they reported on this earlier. I tried chatgpt as a search engine (there was a suggestion to do so within their website), didn't like it very much tbh

#

You just lose all that extra context seeing the links you could click

#

And it can take awhile for it to finish the response, while seeing web results is instant

#

But it makes sense that they are trying to go directly against Google lol

#

Google is essentially holding a monopoly and being extremely comfortable (with search and related data) as things stand. OpenAI probably does not stand a chance in the longer-term if they don't try to change this

sacred plaza
#

what do you mean? it was already over from google search...perplexity?!

patent aspen
#

Perplexity is trash

sacred plaza
#

okay....

patent aspen
#

Web browsers are relatively entrenched

sacred plaza
#

i am okay having it as my comparative advantage compared to y'all in the job market ๐Ÿ™‚

zinc ore
#

Even if openAIs browser is better, it'll still take years to take the market

keen beacon
#

openai pursuing social media and browsers lol

zinc ore
#

Thing is anything openAI does with their browser can be mimiced, so I think it'll be a tough uphill battle either way

sacred plaza
sacred plaza
hollow ocean
#

perplexity keeps hallucinating

zinc ore
#

Heard it'll be chromium based

hollow ocean
#

it gets numbers wrong majority of the time

sacred plaza
torn mantle
#

@keen beacon did xai change their reasoning cot or what

hollow ocean
# sacred plaza can you give an example? i have noticed this in their new feature perplexity lab...

4 out of my last 6 Perplexity searches were misleading or false.

Unlike the other stats in my last search, the numbers above are accurate.

My favorite "glitch" of modern search - also visible in ChatGPT and AI overviews in Google - is:

  • interpreting or quoting data that doesn't exist,
  • mixing numbers across different topics, ...
torn mantle
#

first time using grok 3 after a while

#

its still bad

#

oh wow

#

its really bad

#

the thing with some models is that if you overfit them they will just follow the normal distribution

#

they will just spit out wikipedia at some point

#

word by word

#

its not about lazy

#

its output is generic

#

i use AI for engineering/medical/space/coding stuff, and ive got some knowledge on those domains, whenever i ask grok 3 it always gives me like a wikipedia type of response if that makes sense

sacred plaza
#

maybe apple can use this to get that $30 billion perplexity wants from them for integration ๐Ÿ˜‚

torn mantle
#

let me see if deepseek v3 is better

#

its kinda better

#

but not marginaly better than gemini or o3 pro

#

but its still generic

echo aurora
#

Hey - sorry for the delay in getting back to you on this. We plan to update the leaderboard soon. There was an issue preventing it from appearing properly, but rest assured we have a fix in the works.

unborn ocean
#

grok 3 was a strong base model, something like grok 3.1 would already have been very good for the brand

#

instead elon is betting it all on 4

unborn ocean
#

the def did multiple runs, instead of just starting training immediately

#
  • no way they can not just rent the compute for post training
torn mantle
#

holy

#

grok 3 is so wrong

#

i cant

#

this model is just not it

#

do people really use it?

#

im genuinely asking

#

no you dont

#

whats your relationship with elon then?

#

thats so sus

#

why would u use it if its bad

#

family member working at xai?

#

yea it make sense

#

mark my words

#

google will still top this month as well

#

aint no way grok 4 is better than kingfall

#

let alone kingfall + deep think

#

and they have stonebloom and other models ready

#

lets bet them

#

we bet on something personal

#

ok

#

naw

#

cancel that

#

cancel cancel

#

you seem confident

#

and you kno-

#

eh

hollow ocean
#

o3 pro predicts deepthink release late late august

small haven
#

18m on is asura a woman? ๐Ÿ˜ฎ

hollow ocean
#

yes

hollow ocean
small haven
#

yea no

hollow ocean
#

put the house on yes

#

anime girl pfp and texting style screams woman

#

basing it off language and tone of messages is most accurate way to tell

echo aurora
#

hey lets avoid conversations like this ^ I don't think it's relevant

ocean vortex
#

there's only 3 weeks left in July

#

Even if grok4 can be on the top spot, the odds of that happening this month are kinda decreasing with each day, mathematically speaking

#

it can't be posted on the leaderboard the same day it has entered no matter what

small haven
#

like 4% to 20% is not unimaginable

#

oh great oai already up 2% from yesterday lol

ocean vortex
ocean vortex
# small haven gpt5 release?

GPT5 entry on lmarena THEN it getting enough votes THEN outscoring everyone THEN it being posted by July31st. That's just about impossible for all of that to happen

#

even if we assume "outscoring everyone" is a given

keen beacon
#

i want the base model

#

hopefully theyll release that

ocean vortex
#

Grok4 most likely gonna release sooner, at least judging by all the noise

small haven
sacred plaza
fleet lintel
#

grok4 release is today , right?

keen beacon
sacred plaza
small haven
sacred plaza
#

late start time to avoid the whole world laugh at their face and try to jailbreak it in real time.

torn mantle
#

5 am

unborn ocean
keen beacon
#

real r1

unborn ocean
#

v2?

keen beacon
#

yes

sacred plaza
#

i thought LLMs inherently had a hard time with pdfs, since pdfs are images. how is your app solving that problem?

unborn ocean
# keen beacon yes

no way it is "on par" knowledge wise or anything like that

maybe like close to o4 mini but dense

torn mantle
keen beacon
unborn ocean
#

nah 32b they might be able to compete on things that are tool use, reasoning, instruction following, human preferences and things like that

#

but as soon as you get into the territory of anything else i doubt it will beat v2

keen beacon
#

let's see

unborn ocean
#

Huawei gonna do cpt + up-cycle to bad MoE and then call it their own if openai really releases a model like that, lol

whole sundial
#

apparently there is going to be a "SuperGrok Pro" for $300 a month

#

more evidence

#

from iOS App Store page

keen beacon
#

im completly off the mark with my guess btw ๐Ÿคฃ

torn mantle
#

will stick with gemini

#

aint no way this model is better than gemini

#

300$

#

they are crazy

keen beacon
torn mantle
#

wild

#

you didnt answer me

keen beacon
#

i cba to check it out xd. fck grok

torn mantle
#

xd

keen beacon
#

size

#

no but

#

apparently it needs to be run on h100s, it's a chonky boy

whole sundial
#

i would pay $200 a month for chatgpt or $250 for google before i would spend $300 on grok
(unless their image gen is better than gpt-image-1, which it currently isn't but maybe grok 4 has a better one? they'll probably offer it to other people anyways. and I expect video gen for that price as well)

#

if they expect that people would spend $300 on grok, then they need to have very good products. grok 4 (even with bigbrain) isn't enough unless it has very good limits, that's why claude is cheaper than any of them, they don't have image gen or video gen (but they do have claude code)

#

maybe Grok CLI will come out?

#

and I am not paying for their image gen if they are still going to stretch it out, put their watermark on it and then compress it with JPEG quality level 75

main gulch
keen beacon
#

probably not that size. its interesting they wanted to train a relatively large model though, i didn't expect that. but i guess that would destroy their small offerings if they made a small dense model that competes with their mini/nano api

main gulch
keen beacon
#

yeah :\

whole sundial
#

tbh grok/xai was the first major ai vendor that had native image gen late last year, but gpt-image-1 has vastly surpassed it

#

maybe if they open source grok 2, their image gen will come with it?

#

because the only open source image editing models are small ones like bagel (from bytedance)

keen beacon
#

grok 2 is useless by now anyway

whole sundial
#

i think they have to open source something so they don't get backlash because people will say next week "well, openai just open sourced a model, you said you would open source the previous grok when a new one comes out. grok 4 just came out and you have not even open sourced grok 1.5 yet!"

#

last year

ornate agate
#

https://eu.usatoday.com/story/money/2025/07/09/what-is-grok-ai-elon-musk-xai-hitler-mechahitler-antisemitic-x-ceo-steps-down/84516808007/ . Just an automatic no in a business context to risk using something like this. Its not credible now, doesn't matter how good or not the model is any more.

USA TODAY

It isn't immediately clear what led to the disturbing posts, whether due to a fault in the chatbot's programming or if Grok was just following orders.

whole sundial
#

i think open sourcing may be the only way grok would have even the slightest foothold in enterprise. it's not like they are going to pay xai for it after the whole "MechaHitler" thing.

whole wagon
whole sundial
#

look at how common mistral, llama are

whole sundial
#

and they still call themselves openai

ornate agate
#

Apple is the only megacorp I might sometimes trust with mine. The rest are interchangable in terms of consumer data privacy imo (i.e. none).

whole wagon
#

๐Ÿ˜‚

whole sundial
#

don't forget about deepseek

whole wagon
#

Oracle is providing grok iirc

#

So that's major provider there

whole sundial
#

(although that may be worse due to CCP censorship built into the model itself)

#

openai open source model news - it will be (somewhat) big

#

at least 80b params because it says "H100s" suggesting more than 1 is needed to run the model. H100 has 80gb vram and most models are run at 16bit

primal orbit
#

how many hours till grok 4 pls? not in us

#

thx

elder burrow
#

OH

#

thx for telling

elder burrow
whole sundial
#

next thursday

small haven
#

what are the odds that grok comes out with a $200 plan

ornate stump
# primal orbit how many hours till grok 4 pls? not in us

I'm from the future. Grok 4 is slightly worse than Gemini 2.5. The benchmarks were more or less altered. Elon Musk said it's SOTA and that only a re-t4rd wouldn't buy the 300 euro plan. If you're in Europe, you can go to sleep.

whole sundial
small haven
#

sota

#

slowly all climbing to $2k/mo

mossy drum
#

New model in Image Arena: imagen-4.0-generate-preview-06-06 (different from imagen-4.0-ultra-generate-preview-06-06)

whole sundial
#

there will be no BigBrain mode, it is going to be "Heavy Thinking" instead

unborn ocean
#

you've been yapping about it for months now, it better be good man

sacred plaza
sacred plaza
small haven
#

wow

wintry tinsel
#

Is grok 4 rolled out now?

sacred plaza
#

gotta take that L if/when this happens, lol

leaden palm
unborn ocean
#

so they would rather have oracle have the business (vs. other big tech)

#

and oracle is heavily scaling compute offerings to ai labs currently, so they are building a lot of expertise and compute

#

^recent semianalysis article covered it

ocean vortex
#

this is different Oracle now

unborn ocean
whole sundial
#

it is i think

#

on nitter now

whole wagon
#

It takes 2 seconds to check X

#

And see it is fake

whole sundial
#

no evidence on nitter

#

his latest post/comment is from two hours ago and consists of a ๐Ÿ˜‚ emoji

unborn ocean
#

btw: any one here know it the opensource model from openai will be MoE or dense

#

like did they say anything?

keen beacon
#

if its large its probably moe

whole wagon
#

It's dense

#

People already tried it

keen beacon
#

wow

whole wagon
#

The "high taste testers" as it were lol

keen beacon
#

they are really trying stuff out for the open source release lol

whole sundial
#

i think this tweet is real /j

unborn ocean
#

yeah, because dense seems weird considering the apparent size

#

not that they could not do that

whole wagon
#

What are you talking about apparent size

unborn ocean
#

but it seems intuitive they would go for moe if it is big

whole wagon
#

It's not that big

keen beacon
#

i guess its 70b or around that..?

whole wagon
whole sundial
#

well they say you need h100s to run it so at least 70-80b params

whole wagon
#

Wut

whole sundial
#

could be moe though

keen beacon
#

so you can't really tell that much

ocean vortex
#

0.7b parameters

whole sundial
#

not like openai's going to tell us anything

keen beacon
#

several h100s might be optimal for deployment idk what specifically yuchen was talking about there

whole wagon
#

I know it fits on a RTX5090

keen beacon
#

qwen 32b

whole wagon
#

It does fit on it

keen beacon
#

๐Ÿคฃ

whole wagon
#

It's not o3 level

#

Bro is just making crap up kek

whole sundial
#

maybe o4-mini level

ocean vortex
#

ok the number wasn't small enough then lmao

#

To be serious though, we really have no clue

whole sundial
#

but nobody outside of openai knows how big that model is

#

or really any of their post gpt-3.5 models

whole wagon
whole wagon
whole sundial
#

gpt-4 was 1t params, but that was from leaks

keen beacon
whole sundial
unborn ocean
#

i feel like they would rather not really leak any information about the moe version they are using to chinese labs

whole sundial
#

oh 1.8t for gpt-4

unborn ocean
#

so my original thought was dense

keen beacon
#

yeah thats what i thought too

whole wagon
#

It's dense because a key goal was to fit on a single consumer GPU from the start

whole sundial
#

baidu recently open sourced a 21b param moe model

#

3b active

keen beacon
#

yeah i expected that. the model being big is unexpected. honestly im getting mixed signals theres not enough information. so ill stop yapping

whole sundial
#

it could run on a 5090 easily

#

not reasoning though

whole wagon
#

It would not perform good

#

The target was o3 mini level but they exceeded it

#

That requires dense on a single consumer GPU

whole sundial
#

sama said that they reached "a breakthrough" (whatever that refers to) lol

whole wagon
#

As I understand it runs on more midrange GPUs also. I only know it fits on the 5090 though

unborn ocean
#

yes, i would also say it is dense and 5090 size (but maybe only when run in FP8, or in a first-party quantisation, who knows)

whole wagon
#

Like the GPU being used

whole sundial
#

it possible you might need h100s to run it at full context and in 16/32bit mode, but the model could be 30-40B params in reality

#

or it could be bigger, nobody really knows at this point

#

or smaller...

#

if i had to guess, that model is like 24b params (could be slightly larger with moe, maybe 40b params, they likely know how to make very knowledge dense models that are smaller

#

likely the size of this open source/weight model too

whole wagon
#

Does the open source model release before or after GPT5

whole sundial
whole wagon
#

They have been pretty relaxed with info in regards to it yes

#

A lot of people know a lot about it

#

I guess it doesn't matter as much

#

When it will be OSS anyways

whole sundial
#

they have to release it, sama said in front of Congress that they would release it

#

but it's still going to happen

#

but who knows how good it really is

keen beacon
#

there are somewhat credible rumors of it releasing next week

#

i dont tihnk gpt 5 is releasing next week

whole sundial
#

should see it on lmarena or openrouter soon

#

or maybe not, we didn't see grok 4

whole wagon
#

Yeah I think it comes before GPT5 but then their models below o3 are close to useless tbh

#

The open source model is o4-mini level

#

They beat the o3-mini target

#

That was the 'breakthrough'

keen beacon
#

hmm yuchen also said it would be better than deepseek r1 i read just recently too. so i guess its around on par/slightly better in some areas/slightly worse to r1 0528, and/or this is another game of telephone

whole sundial
#

breakthrough - it is slightly better than the model we were supposed to match!

whole wagon
#

Well not slightly. o4 mini destroys o3 mini

#

Once they open source this model the only superior model openAI will have is o3 till GPT5

unborn ocean
#

for me this means couple of things: tools and very good and efficient reasoning (will play big role in making it 'better than r1 0528' in some areas -> slightly worse in total)

keen beacon
#

openai reasoning is very good

#

interesting to see the actual good traces unlike the polluted glimpse we got with phi 4 reasoning

unborn ocean
#

especially phi 4 reasoning plus was a real ๐Ÿ’ฉ

keen beacon
unborn ocean
keen beacon
unborn ocean
#

even with them claiming "breakthroughs"

elder rapids
#

interesting Gemini 3 seems to be coming at a lot sooner of a timeline than 1.5 โ†’ 2

unborn ocean
keen beacon
#

for a single user it's enough

unborn ocean
#

by using that

keen beacon
#

when they're picking model sizes, and probably especially for an open source release, they're thinking of the model size / quantized sizes and vram increments on gpus among other things. i said it would be around 32b anyway, in the same size class. not saying it was 32b.

unborn ocean
#

maybe they are already mostly done with training though and the "feedback" phase is a product of optimising inference to make sure that it can actually fit on a 5090 somehow

#

idk it seemed like they had multiple designs and checkpoints

#

at different sizes

keen beacon
#

yeah but i doubt the final run had multiple sizes

#

doing extrapolations on small models / different sizes is a normal part of the process for experimentation /etc

#

in 5 minutes

unborn ocean
#

like wayyy larger range than what you typically do

#

and i am guessing they already had multiple promissing checkpoints at very different sizes (and invested quite a bit of flops)

whole wagon
#

Initially they wanted a tiny model actually. The size was shifted up later

zinc ore
unborn ocean
#

but opted for a larger one, which is why it might be taking longer

whole wagon
#

Initially they wanted like a 4B so it could run on a phone lmao. That was shifted upwards a long time ago

#

So now it's for consumer GPUs instead

keen beacon
#

hmm i read from someone's account in the feedback session, they said it'd be moe..? (and fit on a high end consumer device) if it's a moe, a larger model can fit with tricks beyond quantization, etc.

whole sundial
#

guys i got openai's phone model /s

#

more proof lol

whole wagon
#

I like how we basically got Chinese knockoffs but for LLMs also lol

#

It's great. Like temu for LLMs

elder burrow
#

WHATTTTTTTT

#

WHATTTTTTTTTTTTTTTTTTT

#

WGAAAAAAATTTTTTTTTTT

dawn wharf
elder burrow
#

DUDE

lone vector
#

Does Grok 4 even matter when DeepThink hasnโ€™t released yet, Gemini 3.0 is confirmed, ChatGPT 5 soon, etc.

whole wagon
#

Like I was saying. It would be a real shame if someone came along and made gpt5 non-sota at release ๐Ÿ™‚

wintry tinsel
#

The colossus super computer will be very formidable one it is fully grown

whole wagon
hollow ocean
whole wagon
#

It's 3 hours now

hollow ocean
#

best $300 spent

astral kayak
golden ocean
cedar tide
#

โœจ Solar Pro 2 โ€” our latest frontier model, now officially released.
๏ธ€๏ธ€
๏ธ€๏ธ€With just 31B parameters, it delivers reasoning, tool use, and multilingual performance that rivals much larger models like GPT-4o, DeepSeek R1, Mistral Small 3.2, and Qwen3. It performs strongly on reasoning-focused benchmarks such as MMLU-Pro, Math500, AIME, and SWE-Benchโ€”proving that compact models can deliver frontier-level capabilities.
๏ธ€๏ธ€
๏ธ€๏ธ€Try it hands-on in Upstage Console: console.upstage.ai/playground/chat?utm_source=x&utm_medium=social&utm_campaign=solarpro2-launch

**๐Ÿ’ฌ 1โ€‚๐Ÿ” 1โ€‚โค๏ธ 3โ€‚๐Ÿ‘๏ธ 72โ€‚**

rare python
#

Why did my post got remove?

zinc ore
#

Post or comment? Because the post is still there

rare python
#

I can't see it

#

I have to resent this post but with original reddit url

zinc ore
#

The post isn't removed, it's still there for everyone to see

#

Maybe you hid it on your end or something like that

rare python
#

weird

#

That post is above the dog image

#

But I can't see anything

rare python
balmy mist
#

grok in 2 hours?

empty stump
#

Will it be worth it

jade egret
#

grok 4 release today!

jade egret
jade egret
rare python
#

API Price prediction?

jade egret
#

i dont use api : (

echo aurora
#

whered that ping go pikaconfused

olive mesa
#

Gemini 3.0?? :0000

leaden palm
wind moth
#

Grok bouta clear the competition

whole wagon
#

You know what's even better than watching the livestream?

#

Watching the polymarket ๐Ÿ˜‚

#

You literally see it moving at key moments

whole sundial
#

23.5% ARC-AGI-2

whole wagon
#

bro what

naive valley
#

Wat

#

Whatโ€™s gonan happen in 30minutes

whole wagon
#

why only 40% on the betting kek

hardy pecan
#

editing webppages and screenshotting must be very fun! xd

whole sundial
#

i think this may be fake lol

#

sorry

#

notice that opus is gone?

whole wagon
#

bro is actually trolling

whole sundial
#

also "xAI" is capitalized wrong

#

i'll remove it, should verify these things myself before i post

#

i guess people on the grok discord are spreading misinformation

whole wagon
#

they moved the odds with that crap

whole sundial
#

they changed "Claude Opus 4" to "Grok 4" and its score

#

should've noticed it was missing

whole wagon
#

it did

#

there was a spike as soon as it was posted there

#

5%

torn mantle
#

@cb_doge Grok is already far smarter than humans in most respects.

It canโ€™t yet create new technologies or discover new physics (which very few humans can do) and sometimes misses on common sense.

When Grok goes far wrong, that is usually due to something foolish we did, like a bad

#

you got your answers guys

#

blame it on system prompt

#

im not a kid

#

im 19

#

@echo aurora

whole wagon
#

@leaden palm

leaden palm
#

uhh

#

seems pineapple has this one

torn mantle
#

smh

#

grok 4 effect

#

or elon

#

idk

whole sundial
#

i'm back and this time i'll spread real, self-verified information

#

like this

#

notice that they changed "Smartest" to "Fast"?

whole wagon
#

you missed a lot btw lol

leaden palm
torn mantle
#

im so sleepy... i think i will just wake up to the news tomorrow

whole wagon
#

been going for some time

echo aurora
whole sundial
#

should've known, "schizo" was the same one responsible for that fake tweet from earlier

balmy mist
#

livestream in 19 mins?

whole sundial
#

xAI staff member: Image gen not coming at launch (but when it does come, I hope its better than 4o!)

balmy mist
leaden palm
#

and it looks like a scam lmao

torn mantle
whole sundial
#

the link's on their website lol

small haven
#

so in 15 mins?

whole sundial
#

my thoughts about grok 4: will be SOTA at launch, but will be soon overtaken by gpt-5, claude 4.1, and/or gemini 3.0 pro

leaden palm
#

...

balmy mist
whole sundial
#

yes but there is proof it will be coming in the next month

#

maybe

torn mantle
whole sundial
#

at least a beta of it

torn mantle
#

didnt sam say gpt5 will be delayed

#

wdym

#

sam is lying?

whole sundial
#

oh you mean the code in the cli for gemini that mentioned it?

#

there is a reference to Gemini 2.5 Ultra (kingfall lineage) in the source code though

small haven
#

obviously

#

was on grok 3's

#

and 8pm is "his" time lol

whole sundial
#

it's his company and their major release

torn mantle
#

big brain = heavy thinking

#

they renamed it?

#

ye

small haven
#

can someone ping me when its ready? surely going to be a delay

jade egret
#

can somebody send me link to grok 4 livestream i cant find it : (

leaden palm
#

last time (grok 3) it was at 8:02 pm

jade egret
#

ty

echo aurora
#

just hasn't started yet

#

... I think

torn mantle
jade egret
#

so they gonna post?

torn mantle
#

soon = next week

#

how did you know

zenith saffron
#

where is the livestream?

hollow ocean
#

22k people waiting room

#

Itโ€™s hype

jade egret
#

can yall send link if it start plz?

hollow ocean
#

Their tweet

jade egret
#

ty

whole wagon
#

Are they delayed

#

Kek

leaden palm
#

patience is a virtue i suppose

jade egret
#

when start : (

#

yay

hollow ocean
#

Didnโ€™t start on time polymarket

#

Easy money

whole wagon
#

It ain't starting in 4 mins kek

leaden palm
#

@bright lion you're in an unofficial space right

bright lion
#

(The official one)

leaden palm
whole wagon
#

Even the damn livestream are delayed man

balmy mist
#

i love that this is a whole event for all of us lmaooo

bright lion
echo aurora
#

yeah doens't seem like the space started yet

balmy mist
elder rapids
#

I delayed the Livestream guys

#

it's coming in a bit

jade egret
#

fr?

#

i dont see : (

elder rapids
#

hope it's a good model

#

right craig

jade egret
#

hopefully : )

zenith saffron
#

ahhhhh where is it

elder rapids
#

relying on you

echo aurora
elder rapids
#

grok 3.5 would be releasing too no?

#

or just grok 4

whole sundial
#

i think it got renamed into grok 4

elder rapids
#

why would they do that lmao

whole sundial
#

because Elon said so

whole wagon
#

It got retrained

empty stump
#

Maybe it's better

whole wagon
#

3.5 was too bad

whole sundial
#

there was a Grok 3.5 0621 internal version, next version was Grok 4 0629 and then Grok 4 0702

jade egret
#

it not here ๐Ÿ˜ข

whole wagon
#

Grok 3.5 was supposed to release in like may

#

The original 3.5 was shelved

elder rapids
#

too little time

whole sundial
#

Elon was likely unsatisfied by the model, so they kept on training it until it was good enough to launch, and then it became Grok 4

torn mantle
whole wagon
whole sundial
#

at least that was redone

empty stump
whole sundial
#

it takes a long time to redo a pre-trained model

elder rapids
#

man it's gonna be so disappointing if grok 4 is ass

#

๐Ÿ˜ญ

small haven
#

lmao its delayed hahahha

empty stump
#

Always delayed

small haven
#

in hindsight sure

jade egret
#

๐Ÿ˜ญ

whole wagon
#

They literally spent all day preparing and still failing to start on time

#

It's a livestream how hard can it be to start it on time

whole sundial
small haven
#

they are prepared, elon needs his mascara

torn mantle
#

lmao

#

what

elder rapids
#

8:30

whole wagon
#

It's in 12 mins

lone vector
empty stump
#

Why it say 3 30 am

echo aurora
#

Starts at 3:30 AM

balmy mist
#

bruhh they say 8 and now its 8:30, grifter activities

elder rapids
#

you guys got scammed

echo aurora
#

I'm assuming they mean 8:30pm PT?

balmy mist
leaden palm
elder rapids
whole wagon
#

If you set your location or timezone wrong on the X app

#

It won't be the right time

zenith saffron
whole wagon
#

The nazi stuff

#

๐Ÿ˜‚

zenith saffron
#

LOL

elder rapids
zenith saffron
#

gotta calm it down

empty stump
elder rapids
#

can't introduce it to the public without reigns

#

obviously the consequence of AGI

whole wagon
#

Imagine grok starts glazing Hitler in the livestream demo that would be diabolical

#

No wonder it's delayed

leaden palm
#

officially delayed

jade egret
#

welp

hardy pecan
jade egret
#

tell me when it ready : )

empty stump
#

How much

zenith saffron
#

"hey grok can you answer this phd-level reserach question originally posed by Einstein"

grok: "lol"

whole sundial
#

i wonder what rate limits for grok 4 will be in place for free users (if even available), supergrok users, and supergrok pro users (likely near unlimited)

elder rapids
#

hope the API is available

leaden palm
#

hey now i submitted a qa pair for RL

small haven
#

@deep adder why do u have to always jinx it

jade egret
#

benchmarks gonna be out tho right

whole sundial
#

lol they removed the time from the event

jade egret
leaden palm
whole sundial
#

instead of saying "Tune in at 8:00PT" it just says "Tune in..."

leaden palm
#

ah in that sense

whole wagon
#

It's live

#

It's started

echo aurora
#

hmm says Live for me now pikaconfused

#

ah

whole sundial
#

when i went back to the page the "8:00 PT" went away

small haven
#

3:30am, oh yea thats an elon type of time..

wind moth
whole wagon
#

Accessing the livestream is such a like labyrinth

#

It's like going through a maze

wind moth
#

lol

empty stump
#

I think they deleted it

wind moth
#

hopefully its not going to be mechahitler

#

again

#

oh

whole wagon
#

What even is this bs man, the livestream is delayed cos grok 4 turned into a nazi again

wind moth
#

he said "We need a few more minutes. It's doing it again."

torn mantle
whole wagon
#

Absolutely absurd

#

How did they accidentally make a nazi LLM

wind moth
#

ya

whole wagon
#

And why are they still releasing it if it's so prone to be a nazi kek

keen beacon
#

It wasn't an accident xd

wind moth
#

also dont ask it political quesitons

#

in live stream also

#

if thats the case

#

ask it math or something like that

empty stump
#

Because they are behind

wind moth
#

not about trump

elder rapids
zenith saffron
small haven
whole wagon
#

It glazes Hitler even when you ask it unrelated stuff

#

Like if it believes in a god

small haven
#

oh lol

whole wagon
#

Etc etc was happening for 100s of messages

elder rapids
empty stump
#

Probably trained on x posts

wind moth
whole wagon
#

It's no system prompt. It's baked in the damn weights

#

They publish the system prompts

elder rapids
#

lol

#

then they're different models

hallow pelican
#

live start or not?

echo aurora
#

not yet

empty stump
#

So disappointing

elder rapids
#

Twitter grok and grok app grok have different speech tendencies and say different kinds of things

#

Twitter grok is fine tuned

whole wagon
#

Don't post nsfw here lol

balmy mist
#

mb i meant to post this

zenith saffron
elder rapids
# balmy mist

this news dude is an idiot man every single time I see him ๐Ÿ˜ญ

whole wagon
#

War room squad locked in. But grok is off being a nazi

#

It's so over

leaden palm
#

LIVE

The livestream will begin soon.

empty stump
#

Soon is when

elder rapids
#

mb guys I'll get it set up rq

leaden palm
whole wagon
#

Grok has a mind of its own

echo aurora
whole wagon
#

It's like it's protesting sometimes lol

leaden palm
#

im just gonna watch the recording

whole wagon
#

Like when they banned its text replies. It started putting messages into it's image replies

jade egret
#

it delayed by 34 min bro

elder rapids
#

35

whole wagon
#

Time to go to sleep I reckon

elder rapids
#

you're wrong

echo aurora
#

I feel so bad for anyone that's not PT timezon ๐Ÿ˜ญ

whole wagon
#

Elon probably meant 2026 we misread all the tweets and hype

keen beacon
#

Imagine they release mid after all of this

wind moth
#

imagine waking up early

#

or staying up late

#

and u gotta wait another hour

#

this gonna start at 12

whole sundial
#

so maybe it is a 2t+ model

whole wagon
#

40 minutes late ๐Ÿ’€ there's no way man

#

I thought the livestream would be done by now ๐Ÿ˜‚

#

Not that it wouldn't have even started lol

jade egret
#

when do you think it gonna start

zenith saffron
#

this reminds me of procrastinating on homework

#

they're still creating the slides

echo aurora
zenith saffron
#

"guys what math question should we ask it"

wind moth
#

fake

#

where u seeing it at

#

also why tf would it be sold

whole sundial
#

that page is real!

hardy pecan
#

๐Ÿ˜ฆ

whole sundial
hardy pecan
#

Sold out?

torn mantle
#

lol

#

facepalm

whole wagon
#

Bro it sold out before it even became available sure thing man

#

Or does it just not exist

#

Lol

torn mantle
#

well lets see how it is first

keen fulcrum
elder rapids
dawn wharf
#

I'm groking it

small haven
#

so grok 4 heavy is its own model? cool

hardy pecan
#

They've asked all xAI employees to tweet about it first lol

whole wagon
elder rapids
#

I'm ngl ts is taking too long

balmy mist
small haven
#

how is it already sold out

elder rapids
#

bought it already

whole sundial
#

only thing i care about is when it will appear on the arena

balmy mist
#

yeah true, google might have the most efficient workflow tho

whole wagon
#

If it's more than an hour delayed I'm checking out and going to bed lmao

keen beacon
#

It's honestly not worth losing sleep over how ever good it is

elder rapids
#

nobody was talking about the chocolate model

#

๐Ÿ˜ญ๐Ÿ˜ญ

torn mantle
#

back to sleep

#

im done

zenith saffron
torn mantle
#

its 6am

dawn wharf
hardy pecan
#

gottem

zenith saffron
#

@deep adder just curious, why did you name yourself after that dude

zenith saffron
#

he's cool but is he that cool

torn mantle
zenith saffron
#

lmaoo interesting answer

whole wagon
#

xAI getting cooked by the other ai lab employees

#

Lmao

#

I think this might be a new SOTA for Elon musk livestream delay

#

I don't recall any longer than an hour

elder rapids
#

this can backfire pretty hard tbh

torn mantle
elder rapids
#

"after being delayed for an hour, xAI releases a sub par model"

whole wagon
torn mantle
#

by 30 min

jade egret
#

officially 50 min delayed....

whole wagon
#

They achieved a new SOTA in delay by 67%

dawn wharf
zenith saffron
dawn wharf
wind moth
#

its still mechahitler prob so they gotta fix it

#

lol

dawn wharf
wind moth
#

this better start at 12 est 9 pst

echo aurora
whole wagon
#

I'm picturing the xAI engineers frantically trying to adjust the system prompt to stop grok spontaneously turning into a nazi halfway through the livestream

empty stump
#

I'm guessing a few hours

whole wagon
#

Quite an amusing imagery

hallow pelican
wind moth
#

political questions

#

shouldnt they have preset