#Can't run a 70B Llama 3.1 model on 2 A100 80 gb GPUs.

66 messages · Page 1 of 1 (latest)

steep shard
#

Hey, so I tired running the 70B llama model on 2gpu/worker but it keeps getting stuck at the same place every time but instead if I switch to the 8B model on 1 gpu/worker with a 48gb GPU, it works easily. The issue is coming with the 70B paramater model on 2 gpus/worker.

trail acorn
#

Maybe 70b needs 192gbs or smth like that

steep shard
#

This blogpost said that 2 80GB are enough

trail acorn
#

yeah im not sure about the minimum requirements, maybe let me check

steep shard
#

alright also how much network volume do you I think need for this?

trail acorn
#

maybe around 150~

steep shard
#

alright thanks

#

let me know about the requirements

trail acorn
#

can you try other, gpu 4x

steep shard
#

alr lemme try that

#

4090?

trail acorn
#

4x 48 gb

#

srry*

steep shard
#

ok

#

np

#

It's always at this place

#

What do you think could be the problem @trail acorn

#

It went a bit further now

#

and now it just shifted to a different worker

trail acorn
#

still loading

#

Maybe.. loading took to long

#

just stop it first if you feel like its too long

#

what gpu setup are you using?

steep shard
#

Its 4 48GB not pro, just normal

vast solstice
#

I suggest opting for a GPU with vram 200G+, You can try a lower option, but performance may suffer.🥲

trail acorn
#

why is that @vast solstice

steep shard
#

I gave it 196

steep shard
#

@vast solstice how do you think we can fix this?

steep shard
#

But we gave it 196

#

Much more than 140

vast solstice
#

Do you mind to try a bigger vRAM, see if that helps?

steep shard
#

What do you think i should put?

vast solstice
#

😂 I usually start with highest memory I can put, and keep reduce it until it won't work anymore.

steep shard
#

ValueError: Total number of attention heads (64) must be divisible by tensor parallel size (6).

steep shard
#

@vast solstice Now I have 384 gb but it is still getting stuck there

vast solstice
#

Same logs output?

steep shard
#

yea

#

same

#

Using model weights format ['*.safetensors']

#

its always at this sport it gets stuck

steep shard
#

@vast solstice it worked

#

i hava 4x48gb

#

I had to wait 6 minutes for the first time

#

and then now its working quickly

#

Nvm it became slow again

#

@vast solstice it becomes slow when it has to load after a while

#

Yea it a cold start issue

vast solstice
#

Cool👍🏻 Try to set 1 active worker, that can make sure no cold start when testing.

trail acorn
#

i believe strongly that this "creating new workers on model loading" thing has to do with runpod's autoscaling

steep shard
#

I tried with 48gb seems to work

#

4x

#

it just takes very long to load

trail acorn
#

yeah

trail acorn
queen acorn
#

what token speed did you achieve?
what cost per token?