#Groq Server

1 messages · Page 1 of 1 (latest)

pure musk
#

You should work with Groq to get really fast responses. https://groq.com/
I am not affiliated with Groq in any way.

lunar shale
#

i think with the new partnership on azure .. i dont think thats the way

#

also groq deployments arent really as scaleable as you think

#

its 230mb sram per accel at 20k per pop

#

deployments are extremly expensive and power hungry

#

unless auzure gets 100k of those

#

also models have to be separated compiled for the accels - pretty much like cuda but differently

#

the real upside on groq is in different industrys

#

say defence / medical / finance

#

or hyper scale

#

its pretty much what we done in fpga accelerators like the vc1902 for a few years now

#

just a bigger chip

#

the lower card here in my mashine is a vck5k

#

and uses the vc1902 fpga fabric

#

there lpu(groq) is pretty much a systolic array with high sram

#

just asic vs fpga

wooden musk
lunar shale
#

chip cost is a function of wafers orderd

wooden musk
#

that's why I'm saying "depending on the volume"

lunar shale
#

i paid 13k for my vck

#

and its 1/4 as powerfull as the groq

#

that stuff is just that pricy

#

also .. they charge what they want really as they have the only accelerator expect for cerebras in that line

#

but tbh i would rather use cerebras wafer engine in such an embarkment

#

the unit economics are not customer scale

wooden musk
#

they said themselves that the price is not 20k

#

there's a tweet

lunar shale
#

even if its 2.5 mil for a 70b model you get quite a few h100's for that and can actually train on em

#

this is raw inference

#

again .. mistral doesnt operate own datacenters - it would be for the datacenters to buy those accelerators and make em available to mistral

#

so really the one you would need to be talking to is ms azure 🙂

#

but akwardly enough they rather buy h100's

#

and its not like groq is overnight either .. they been there for quite some time

queen axle
lunar shale
#

i mean thats where opinions divide - it still has to be in a datacenter - and the models have to be compiled to run on groq in a very custom way

#

as with any custom accelerator

#

thats usually a longer process

#

also scaling is difficult as you need a other full deployment

#

and as for leadtimes - sure they are 6 months with h100's

#

but there is a bit more to it then just inferenec

#

as with groq all you can do is int 8 inference

#

cerebras is overall the better bet

#

if one would embark down that route

queen axle
lunar shale
#

its a risky move as it would be a first for a newer player - ofc they have the hardware

#

but there is noone else on there api's expect for open source /weight models

#

its early days - so partnering without the operational hardware would question scalability .. / as of now i see it more as a preview product

#

but for someone to push the production infernce on them is a risky move

queen axle
#

wait and see