#Anyone in need of a H100 lmao

1 messages · Page 1 of 1 (latest)

patent blaze
#

[STILL ACTIVE]
I have a H100 laying around doing nothing. I wanned to ask if members of the Vedal Community want some hours / days / weeks on a H100 for free (it would be nice tho if you cover my eletrical bill for the time, but if you can't afford it's also no big deal).

Server Specs:
CPU: AMD Epyc 7702P (64C/128T)
RAM: 512 GB (ECC)
STORAGE: 38 TB
GPU: 1x H100
NETWORKING: 600 Mbit/s
POWER-CONSUMPTION: Full Load (CPU + GPU) 700 W. But when training usually 500 W. At like 40ct / kWh

Also for all requests please include your vRAM (max. 80GB) requirements. I'm able to split up the H100 into multiple GPUs

rustic jay
#

dang that'll be cool

#

i might wanna pretrain a tiny model for testing purposes in a bit so yeah, that'll be so cool YES

#

as for electricity bills, i can help YES

patent blaze
rustic jay
#

aight i'll text once i would like to start a session YES

#

its 2 am here rn lmao

south talon
#

nice that you're offering it for people! is there any sort of catch or reason, or is it a "why not" kinda thing?

patent blaze
south talon
#

monkaOMEGA you paid HOW MUCH

#

christ man

patent blaze
#

I wanted a H100. So I bought one. Then I was like, oh I probably need a server too, so I paid for a server. And now I have a big pc standing in an empty room doubling my electrical bill (doing nothing)

patent blaze
rustic jay
south talon
#

i can’t compete man

rustic jay
vestal rose
patent blaze
#

Wait, wtf do you need 2 4090s for?

patent blaze
#

I'm planning to use the H100 for like a training for a week soon, so it will only be available now and after that

patent blaze
#

Anyone in need of a H100 lmao

rustic jay
#

(roughly)

patent blaze
#

it has the vram chips on the compute chip, which makes it really fast

patent blaze
#

*vedal unoffically bashing my single H100* 😢

solar mica
patent blaze
#

he dosn't need 300 H100s for interference. Each sucks like 350 Watts

patent blaze
#

ACTUALLY I was thinking getting a B100, which has like 192gb vram, that would be crazy tbh

solar mica
#

Woah neuroShocked

rustic jay
#

in huge batches

patent blaze
#

This thing is 50k to 60k alone, this time I have to think if I want to buy it or not

rustic jay
patent blaze
#

It's not that crazy...

patent blaze
#

Ok after I called my supplier she told me that there is not going to be a B100, nor a B200. At this point nobody knows what the card after the H100 will be.

#

Officially Nvidia is currently telling everybody they are having manufacturing delay, but turns out they are actually not producing the B100 and others at all.

#

pls NDA don't kill me

mystic night
#

Great read on the difficulties of manufacturing such tiny parts

rustic jay
#

that is what happened when they wanted to cram 10x more wattage and compute on the GB200 NVL72 server rack compared to average server rack in deployment nowdays, had to work and develop even more for the B200 chips

patent blaze
#

tbh I would also go with a AMD AI GPU, but my seller is not selling them, and there is no good support for them yet.

solar mica
rustic jay
#

192GB vram, more compute than H100

#

twice as cheap

patent blaze
#

agree, sadly currently it's way harder to use them :(

#

I mean I could try to buy like 4 of these, and try to get a decent server setup this time.

rustic jay
#

nvidia kept taking advantage with cuda

patent blaze
#

yeah that sucks tho

humble valley
#

Is the H100 available or something? I think I would have some use for it to generate stuff with a larger sized LLM for experimenting with stuff (I want to try seeing if I can generate a dataset to finetune a smaller model using a much larger one)

patent blaze
#

yeah sure, dm me a list of dependencies (or repo to clone) and an ed25519 ssh pub key

humble valley
#

I really just need something like llama.cpp server or something compatible with the OpenAI API and an IP and port to connect to, I have code on my end that can use that

#

I don't even know how to use SSH, never used it

humble valley
rustic jay
patent blaze
rustic jay
#

dang inferencing really is bottlenecked by memory bandwidth due to it not taking advantage of 3D tensor compute

#

afaik PCIe version of the H100 only has like 2TB/s bandwidth?

#

which is roughly similar to the A100

patent blaze
#

Yeah, its fine i think

rustic jay
#

H100 is more suited for training imo

#

because training actually took advantage of the entire 3D tensor, although you still need high batch size for it but H100 got all the vram so LULE

patent blaze
#

It is still more efficient than the A100

rustic jay
#

mhm

#

its just during inferencing, PCIe H100 will only slightly outperforms A100

#

but training... H100 my beloved....

#

its more of the problem that inferencing only does single vector input whereas training may took advantage of batch processing which means.... multi vector input!

patent blaze
#

Yeah, yeah. I usually use it for training too

rustic jay
#

what did you train SMOCUS

patent blaze
#

a lot of multimodal language models

rustic jay
#

holy

patent blaze
#

I have my own architecture I'm working on

rustic jay
#

lucky mf got the budget asdhseubfebfbdsdbewbdfwe

vestal rose
#

im going to lose it

patent blaze
#

and instead of buying me a tesla I bought a H100

dapper arrow
#

hi, would be great if I could run a llm finetuning job on your h100, need preferrably 100% of the Vram, 50% would be possibly good enough if not possible.

patent blaze
#

ok.

#

you need to be NixOS confirm

dapper arrow
patent blaze
#

your app needs to be packaged with NixOS. include a flake.nix file instead of a requirements.txt file. since there is just the nix package manager :(

dapper arrow
patent blaze
#

yeah

#

I can explain it to you when I'm not busy playing rocket league xD

dapper arrow
patent blaze
#

thats basically what you need

#
{
  description = "My AI Project";
  inputs = {
    nix-ai.url = "git+https://git.wavelens.io/public/nix-ai";
  };

  outputs = { nix-ai, ... }: nix-ai.lib.mkFlake {
    presets = {
      jupyter = true;
      torch = true;
    };

    packages = [ "jq" ];
    pythonPackages = [ "tqdm" ];
    environmentVariables = {
      HF_HOME = ".cache/huggingface";
    };
  };
}
patent blaze
#

or you join the vc and I explain it to you.