#required Hardware
1 messages · Page 1 of 1 (latest)
Ayo? @neat zodiac level 1 !!! 
A rough estimate would be multiple high-performance servers or a cloud-based solution with significant computational power and memory capacity to handle the model's processing demands efficiently. The exact hardware requirements would depend on factors such as the complexity of the interactions expected peak usage and desired response times. So to make a long story short, you need at least an rtx 3000 series and a decent CPU to get everything to run efficiently especially if it's a bunch of ppl using it. :p you kinda need to rely on cloud services tbh or make your own
cloud is not an option, I was already thinking of something like an rtx a6000 and atleast like 128 gbs, + a pretty decent cpu
2 prompts/h was prolly a big underestimation though
Hmm I guess the a6000 can do that just fine. But if you can it's a good idea to also look for rtx 4000 GPUs, those could also prove useful and run everything pretty smoothly.
I'm not sure though how much just a singular a6000 can take
Cuz I've never really used one b4
Although I doubt that it would struggle much
U need at least 64 gb of ram though and a really fast ssd
DDR5 ofc
Just make sure to have lots of a6000's
Do you by any chance know why RAM? like what does ram exactly do here since i thought the VRAM was the main performance carry here
Also afaik, those "gaming" gpus like the 4000's dont have ECC memory which makes it less secure, and as a softwarecompany we def dont want
"less secure" as in less reliable
To avoid memory errors, you never really know. But yeah vram is super important hence the reason why u need more than one a6000
Yes thats true, but they do offer good performance when running llms and any sort of AI related programs
But tbh compared to the a6000
It isn't really that worth it :p
And really ram can help in caching, data transfer, data preprocessing, parallel processing and system operations
Why more than that
So like it's always a good idea to also have a good amount of ram
Umm I'd be scared to run more than one personally but u can try it out and if errors out just get another one
It won't really harm the gpu so go ahead
Unless u overclock
Which is not a good idea tbh
Nah
Yeah no over clocking is a no no
Its company money, so idc, I just need to provide proper reasons, otherwise we wont get the funding for this project
I'd it's company money then u should definitely ask for 2. Just for the sake of efficiency
And let's say u want to branch out
U may need more
But in reality u already have 2 gpus so no need to get a new one
Wym I already have 2
Ayo? @neat zodiac level 2 !!! 
What
But make sure u also have a good cpu
Wym = what do you mean lol
I thought u didn't have said gpus that's what I meant
Ye we dont have
But u said u already have 2
I'm a little confused :p bare with me
Ohh u have 2 of em not the company gotcha
Noool where u got that from xd?
We
Dont
Have
Any
Any specific reasons y'all are looking to deploy models locally ?
(e.g. privacy, or custom models)
data protection / privacy, our customers are customers such as Audi, Bugatti, VW, etc. basically gigantic companies and they dont like when their code is fed into the databases/servers of other companies
or to train other models
and we dont wanna get sued soooo yeahhhhhhhhhhh
- like to keep our customers
lol
also we live in germany, data protection law is extremely serious/strict here
Fair. There are european models as a service available, they should comply with the GDPR
either way we host it ourselves hehe
4090s support multi GPU(ing), right? like using to 4090s for example to run an AI chatbot
I need to collect some examples lol, meeting with management is on thursday and till then we need to show them why we need specific hardware
I'm not too knowledgeable on the big-scale deployments, but perhaps renting cloud GPUs to test on would be wise ?
prolly not an option haha
Not for future deployment, but only to collect figures and see which hardware / software suite is appropriate
I can try asking my project owner but im pretty sure that thats not an option and if it is it will take ages to actually try it out (you know the drill with big companies, 50 different people need to approve shit)
and I kinda dont wanna take that route lol
Ayo? @neat zodiac level 3 !!! 
Ah fair 😅
yep
@bright vessel whats a good AMD option for a good workstation gpu?

So with AMD you're essentially gambling
isnt that technically a good option?
looking at the 2nd comparison
to the 6000 ada
Even though their software support has drastically increased in the past year or so, in terms of projects that support AMD hardware accel, your choices are limited
assuming things work smoothly. It's tremendously worse on consumer cards, but enterprise should be fine
why choosing a rtx6000 which almost costs tripple the price when comparind it to the w7900?
like on paper (on the screenie) the w7900 looks fire
On paper yeah, when it comes to software support, that's when fun stuff happens
Nvidia's CUDA has been the de facto standard for years now, while AMD's solutions are quite new
this is what the ollama discord server says
They work, but bugs are to be expected
hmmmmm
okay o looking at the cores, amd is lacking
this is what ive written down so far
https://www.tomshardware.com/pc-components/cpus/phisons-new-software-uses-ssds-and-dram-to-boost-effective-memory-for-ai-training-demos-a-single-workstation-running-a-massive-70-billion-parameter-model-at-gtc-2024
4 RTX 6000 Adas running a 70b param model
but thats effectively training the LLMs, not just running them
We aint training the models
we just wanna use them
smaller spec for inference?
wym
Cores don't really mean much, esp not between architectures
Take a look at the advertised "TFlops" and memory bandwidth, those are the relevant numbers
👍 ill add em as well, yall helping me out a lot, really appreciate it.
Ill let ya know once I got that updated 
Honestly good luck with that lol
Managing to just wing it™️ and land with something that works is a challenge for sure
will need it lmfao, entire management, like 40 managers are there on thursday brrrrr
what flops am I supposed to look at here lmfao
or here
for the a6000
Scary stuff :>
Oh boy, uhh that would depend on the backend you're using
well
we using ollama
So that's llama.cpp under the hood if I remember right
¯_(ツ)_/¯
Yup it is
that meaaaans xd?
what cores am I supposed to look at?
*TFLOPS
There's quite a few libraries that handle running models. llama.cpp is one of them. That's what ollama uses under the hood
we using that chatbot with ollama https://www.chatbotui.com/
Ayo? @neat zodiac level 4 !!! 
obviously selfhosted
Make sure to check, but I'm almost certain you're looking for the normal "single precision" or "half precision" TFlops
👍
my workday is over now, imma continue tmr, thanks so far
ill let u know in case I need more stuff tmr xd cuz me has to do some comparing between cpu's next lol

You might be interested in the koboldai discord server, found over at https://koboldai.org/discord
They're a community of enthusiasts, some of them run bigger servers than others, but I'm sure you'll get great answers there.
Plus they've got some of the people working on the inference backends
This community is dedicated to the usage and development of KoboldAI's software, as well as broader text generation AI. | 11301 members
👍 will take a look tmr, thanks 
@neat zodiac is there any specific reason why you want to focus on a 70b model instead of a 34 or 7 billion one?
I meant the 34b one
We benchmarked the 7b one and it was quite amazing so I assume the 34b one should be even better, why not using the best one? ^^
I see, i've not tested 34b but a 13b works perfectly on A6000 using Fast API, we had a test run with 200+ users and was pretty smooth in terms of usage
Also sorry if you have mentioned it above already but are you looking to host it locally or going for a cloud server?
We are hosting locally
Im in the car now, driving to the office, lemme get there rq and imma get back to u ^^
All good, i'm going to sleep been up it's 8 AM here lol, you can post here i'll be up in like a few hours 4-5
now im being paid again for talking in here lmfao
Not really sure, the best I've used was a 6600xt with a ryzen 3600... I really don't know how it scales
hmm its a decent combo although i would go for a stronger cpu
i wouldnt wanna be bottlenecked or anything
cuz usuallu llms also need good cpus too
and a bunch of ppl are gonna be using it as a server all at once so u might run into problems :p
an AMD Ryzen 5 5600X i think would be better, but then again ur gpu matters more
👍
Careful about that, especially if you're getting multiple cards
lets say I want 2 for now
2 4090s
Check the PCIe lanes of the platform you'll be running on
Most consumer boards will halve the bandwidth available to the first card when you have two
and can unless halve / quartier the bandwidth available to the second one
can u also recommend me a proper mainboard?
Again, I don't do much of the server stuff 
The KoboldAI discord would have answers for sure though
mind sending me the server again?
Right there !
ty ❤️
btw @bright vessel
I legit for the love of god have no clue what tflops im supposed to look at lmfao
Single precision
ahhhh
Even though I'm surprised they don't give half-precision figures
u know just in case
i mean he said around 200 people
thats the "datasheet" I found for the 4090
god dammit why dont the yjust name everything the same
Thing is; if you're going multi GPU properly, you gotta consider more than raw perf, and also the peripherals you can connect
so wont a stronger cpu help with taht
i just wouldnt want any bottle necking to take place u know
Between recent desktop consumer platforms, and some older server/enterprise platforms, you don't end up having the same number of PCIe lanes at all
Hence, why you might just choke your GPUs by simply adding an extra nvme or GPU on a 5600x platform
ohh :p
Techpowerup is usually really reliable though nvidia has fancy tricks for half precision, which might make the figure irrelevant
https://www.techpowerup.com/gpu-specs/geforce-rtx-4090.c3889 ye thats their website but legit no clue which one is right, the legit all name it differently
u know ur best bet is to ask some other company thats actually running an llm server
they'll probably help a lot more
we dont have anyone that is running an llm server lmfao, thats the first one here xD
im sure there are ppl running llm servers online
and discords just for that
that prolly yes
Im askin around in other servers as well but so far nothing really helpful
@bright vessel bad first impression in the kobald server lmfao

yeah ppl are pretty hostile nowadays
I can tell
ig do more research or gamble :3