Elon Musk wants to buy 300,000 B200 GPUS | PauseAI | Page 1

wooden fulcrum Jun 3, 2024, 7:21 PM

#

https://www.tomshardware.com/pc-components/gpus/elon-musk-wants-to-purchase-300000-blackwell-b200-nvidia-ai-gpus-hardware-upgrades-to-improve-xs-grok-ai-bot

that is approximately 9 fully populated data centers. Each at 27 trillion weights, or 15 times the scale of GPT-4's original model.

One possible way to utilize this much hardware would be to try 9 separate hypotheses for a major model improvement in parallel, per 1-3 month training cycle, or approximately 36-108 hypotheses per year. A modular architecture, reusing subnetworks, might increase the number of hypotheses tried to thousands.

RSI criticality is some hidden threshold where this begins to work. (where each cycle, the league of models becomes more capable at guessing effective hypotheses, and quickly scale to AGI level capability on the test bench)

Tom's Hardware

Elon Musk wants to purchase 300,000 Blackwell B200 Nvidia AI GPUs —...

Nvidia's B200 GPUs will be used to boost the X platform's AI capabilities

mystic dagger Jun 3, 2024, 7:24 PM

#

I'm assuming the largest chunk will be inference compute, probably. But damn, that's a lot of compute, although twice as less than meta.

https://www.hpcwire.com/2024/01/25/metas-zuckerberg-puts-its-ai-future-in-the-hands-of-600000-gpus/

wooden fulcrum Jun 3, 2024, 7:25 PM

#

mystic dagger I'm assuming the largest chunk will be inference compute, probably. But damn, th...

I think that for finding something that works - this probably is at the compute/weights level that an AGI can exist at - it's bad news when there are many players like this. This increases the breadth of things tried and makes it more likely someone will discover an effective method.

#

GPT-4 was initially 1.8B, it was then distilled down to probably 1/10 of that and has most capabilities still

#

27B may be enough

#

(especially with a multi-center architecture, such as a 27B motion planning/world near the robot sim, another one for handling relevant memory storage and recall, etc)

#

2025-2026 when this equipment comes online?

#

one other note: at ~3 million per rack of 72 GPUs, this is : 37.5 billion dollars invested. At MARR of 10% this means any kind of slowdown or pause, each year, costs at least 3.75 billion. It's probably actually much worse than that : GPUs like this have a very short useful lifespan in an AI race, probably less than 3 years. So it's probably 12.5 billion in economic incentive for Grok and Meta to lobby against any kind of restrictions.

royal lichen Jun 3, 2024, 7:32 PM

#

Fwiw, I have not actually seen Elon/Grok lobby against restrictions.

wooden fulcrum Jun 3, 2024, 7:33 PM

#

royal lichen Fwiw, I have not actually seen Elon/Grok lobby against restrictions.

didn't you link a public statement they will take no safety measures until they reach "SOTA" which seems to be a moving target?

mystic dagger Jun 3, 2024, 7:33 PM

#

royal lichen Fwiw, I have not actually seen Elon/Grok lobby against restrictions.

you've jinxed it now buddy (jk)

wooden fulcrum Jun 3, 2024, 7:33 PM

#

as in, since the competition has more GPUs, they will never reach SOTA

royal lichen Jun 3, 2024, 7:34 PM

#

No, that was Elon talking, and I don't think it was "no safety measures" as much as "no specific plan until we're competitive." Nothing about SOTA.

#

They're quite behind at the moment and I'm not yet too concerned. Sadly or otherwise, they don't really seem to know what they are doing yet.

wooden fulcrum Jun 3, 2024, 7:35 PM

#

royal lichen They're quite behind at the moment and I'm not yet too concerned. Sadly or other...

agree, the reason to be concerned about Musk is because for engineering he's reusing some algorithm based around putting engineers in charge and making rapid decisions

royal lichen Jun 3, 2024, 7:35 PM

#

They're the only one who's gotten visible model collapse at one point, with Grok strongly insisting that it was chatgpt, etc.

wooden fulcrum Jun 3, 2024, 7:35 PM

#

which has worked in many domains

#

even for twitter the Community Notes feature is SOTA

royal lichen Jun 3, 2024, 7:36 PM

#

I'll get more concerned once I see if Grok actually is making real advancements, right now, to some extent I feel like Elon is taking money away from more effective players.

#

He did say that he favored Pause even in the last speech btw

wooden fulcrum Jun 3, 2024, 7:37 PM

#

royal lichen He did say that he favored Pause even in the last speech btw

then why is he..

royal lichen Jun 3, 2024, 7:38 PM

#

He said that no one is pausing as he feels the need to have a horse in the race. I feel that it would be more honorable if he invested in both strategies, then.

#

But that is exactly what he said, and iirc added "for the record, I favor pausing."

wooden fulcrum Jun 3, 2024, 7:39 PM

#

royal lichen He said that no one is pausing as he feels the need to have a horse in the race....

as in : you should be trying to stop me!
also : my robots are almost able to build themselves in the next model update they will, you're screwed!

#

(and with Elon time the second thing won't happen for 5 more years)

#

I think Holly said something similar btw

#

that if in the event the most likely outcome happens

#

China basically says fuck no to any agreeements

#

well it's e/acc for everyone

#

(rather China wont' say fuck no, they will say 'let's talk about it next summit'...forever)

royal lichen Jun 3, 2024, 7:41 PM

#

I don't think that's even the case now, as per Schmidt. But there's too many players, it feels like.

#

National Security would in fact some version of safety, but they're not having a clean victory either.

wooden fulcrum Jun 3, 2024, 7:42 PM

#

royal lichen I don't think that's even the case now, as per Schmidt. But there's too many pla...

good or bad? When I read Meta's paper on llama 1 I noticed they had pulled techniques from open source models

#

and due to no california noncompetes, it looks like staff just flows between all labs

#

and ultimately all techniques known to work will be available to everyone

#

it would just be a delay, like if 1 lab creates a novel technique

#

it might be 3 months before a huge lab gets it

#

and a year before a smaller one

#

simply because of probability/staff numbers

royal lichen Jun 3, 2024, 7:44 PM

#

I just had a conversation with ML researcher today, and I don't think that's the case. There's a lot more secrecy than one thinks going on.

#

There's a lot of information put out, but its also questionable how much of it is "real."

wooden fulcrum Jun 3, 2024, 7:44 PM

#

royal lichen I just had a conversation with ML researcher today, and I don't think that's the...

so you're saying if you make 1M a year working for <deepmind, OAI> and then you get a 2M offer at Grok and take it

#

you won't share knowledge? seems unlikely

#

everyone is going to claim they don't

#

but they will?

#

you cannot admit to that, NDA

#

but I know for a fact we do it all the time

#

(slightly different field for myself but absolutely my coworkers are alwasy 'why aren't we doing this, at a past employer...'

royal lichen Jun 3, 2024, 7:45 PM

#

No, its more that any individual knows only part of the details, and to some extent, no one is fully aware of how a model works. Meanwhile a lot of information is always being provided, with only some of it being useful.

wooden fulcrum Jun 3, 2024, 7:46 PM

#

royal lichen No, its more that any individual knows only part of the details, and to some ext...

I meant in terms of "so actually 4/2 sparsity is a waste of time, we tried a few hundreds times with SOTA models"

#

or "so here's the method in a paper, I know it works really well"

#

NDAs will say you can't do that

#

but in practice, it's not proveable

#

without copied materials

royal lichen Jun 3, 2024, 7:47 PM

#

'here's this method in a paper, it works really well in the situations we tried, but there were many other factors'

wooden fulcrum Jun 3, 2024, 7:47 PM

#

smoking guns like "here's their source tree", sure

royal lichen Jun 3, 2024, 7:47 PM

#

Stuff like that.

wooden fulcrum Jun 3, 2024, 7:47 PM

#

royal lichen 'here's this method in a paper, it works really well in the situations we tried,...

sure it's inefficient

#

at my firm a different one hired hundreds of us

#

and still hasn't replicated the tech

#

despite years and years of trying

#

though it's not that they can't get it to work, it's specific to making the tech work in the real world they haven't been able to copy

#

the other side of the link uses our equipment also and they were developed together

#

no backwards compat reqs for AI though

#

only the cutting edge matters

royal lichen Jun 3, 2024, 7:49 PM

#

But expensive to "try"

wooden fulcrum Jun 3, 2024, 7:49 PM

#

royal lichen But expensive to "try"

yeah well over 10B has been spent

royal lichen Jun 3, 2024, 7:49 PM

#

Yes, like any real-world setup. Anyway, in Elon's case, we will see if he accomplishes anything. I kind of prefer him anyway, but I have a feeling that his personality will impede success.

#

Remember he dismissed OpenAI as "method not working"

wooden fulcrum Jun 3, 2024, 7:50 PM

#

royal lichen Yes, like any real-world setup. Anyway, in Elon's case, we will see if he accomp...

sure and he was semi correct

#

specifically he realized they needed billions and was in the court documents an early booster of that

#

he wanted them to team up with a bigger company that could supply the money

#

..which yeah

#

they just picked a bigger and richer daddy company

#

and seem to have the ability to cheat on M$ with Apple

royal lichen Jun 3, 2024, 7:51 PM

#

It seems like he doubted their transformer model, and in general, seems to lose faith in things too quickly.

wooden fulcrum Jun 3, 2024, 7:51 PM

#

royal lichen It seems like he doubted their transformer model, and in general, seems to lose ...

oh sure, not sure who didn't doubt it

#

AI dungeon was really trash and you had to have a lot of faith that more scale

#

would reach a useful level of output quality

#

why would it generalize, why wouldn't it be incoherent mess of different voices online matching the same input situation, etc etc

#

I think everyone for decades has thought you need RSI?

#

specifically RSI means the n-1 model intelligently designs networks designed and specialized for a cognitive role in the larger machine

#

this happens over and over ofc until you reach one of your limits

royal lichen Jun 3, 2024, 7:54 PM

#

Anyway, I'm not too worried about Elon at the moment but wish that he'd do more lobbying for safety while he does whatever else he does.

wooden fulcrum Jun 3, 2024, 7:54 PM

#

not just "lets slap in a bunch of transformers and go bigger"

#

well you know how a nuke starts to work at a critical scale

#

that's why this matters. talent won't matter, elon won't matter

#

once you hit RSI

#

is 300k GPUs enough

#

probably not, it's probably 1-2 more generations of equipment

#

but I dunno...

#

RSI develops you a brain like architecture that is efficient and a machine able to do all tasks you test it on

royal lichen Jun 3, 2024, 7:56 PM

#

Pretty sure talent does matter in this case, because you apparently have Google using synthetic data efficiently while Grok believes it is Chatgpt.

#

Seems like something is important there.

wooden fulcrum Jun 3, 2024, 7:56 PM

#

royal lichen Pretty sure talent *does matter* in this case, because you apparently have Googl...

we'll see. I think that hypothesis was falsified already but easy to prove

#

nice thing is that this means one way or another, it's going to be over soon

#

someone uses RSI and develops AGI? no reason to campaign for a pause, go become a safety group

#

or well, the alternative

#

dust eaten by nanomachines doens't have any worries

royal lichen Jun 3, 2024, 7:57 PM

#

Pausing is by far the best way to get to safety.

#

So yes, not going to stop campaigning.

wooden fulcrum Jun 3, 2024, 7:57 PM

#

royal lichen Pausing is by far the best way to get to safety.

almost everyone says it can't be done

#

so we'll see

#

it has to be done before AGI

#

after it, it's over

#

if that's 6 years away, that's all the time you have

royal lichen Jun 3, 2024, 7:58 PM

#

We're all aware of potentially short timelines.

#

More reason why we need to fight for a Pause and why I do what I do.

wooden fulcrum Jun 3, 2024, 7:59 PM

#

oh sure, again go for it, it's just important to understand the board

#

you're in a corner at the moment and way down on piece count, be ready to be checkmated

#

and hope uh the game continues after for you

#

its's over either at : AGI, or possibly at "too useful to ever stop"

#

the second condition could be met maybe next year or the year after?

#

online learning/replacement of some workers with the model able to improve but still subhuman

#

might be that

royal lichen Jun 3, 2024, 8:01 PM

#

Which will increase the support we have.

#

But anyway, disagreements or general "give up, you have no hope" should be in #disagreements-🗣

#

Its pretty wearisome.

#Elon Musk wants to buy 300,000 B200 GPUS