Loading LLama 70b on vllm template serverless cant answer a simple question like "what is your name" | Runpod | Page 1

novel sequoia Jun 5, 2024, 7:05 PM

#

I am loading with 1 worker and 2 GPU's 80g

But the model just cant performance at all, it gives gibrish answers for simple prompts like "what is your name"

frigid thorn Jun 6, 2024, 12:56 AM

#

novel sequoia I am loading with 1 worker and 2 GPU's 80g But the model just cant performance ...

are you using llama 3 70b?

#

I tried it and it works, just long load times

#

What config do you use? Just the default?

novel sequoia Jun 6, 2024, 1:12 AM

#

I am just setting bfloat16 the rest i leave blank/default.

When i load with web-ui, getting completely different responses.

frigid thorn Jun 6, 2024, 1:18 AM

#

Oh you tried with pods too?

frigid thorn Jun 6, 2024, 1:18 AM

#

novel sequoia I am just setting bfloat16 the rest i leave blank/default. When i load with we...

Last time I left all blank only like the fields in the first page of vllm setup

#

And used a network volume

frigid thorn Jun 6, 2024, 1:19 AM

#

novel sequoia I am loading with 1 worker and 2 GPU's 80g But the model just cant performance ...

How do you make request to this then?

white flare Jun 6, 2024, 2:51 AM

#

Is it llama instruct? i think i was told there was a difference between llama 70b and instruct

#

Instruct is more like an actual chat, respond and answer

#

while the llama 70b is like some weird completion thing. i had also gotten gibberish answers in the past

#

making me move to just using openllm

frigid thorn Jun 6, 2024, 2:52 AM

#

white flare Instruct is more like an actual chat, respond and answer

Wait isn't it chat that's like that

#

Instruct is the completion thing?

white flare Jun 6, 2024, 2:52 AM

#

frigid thorn Wait isn't it chat that's like that

Oh

#

Lol 👁️

frigid thorn Jun 6, 2024, 2:52 AM

#

I thought it was only instruct

white flare Jun 6, 2024, 2:53 AM

#

Haha maybe im wrong and to use chat model

frigid thorn Jun 6, 2024, 2:53 AM

#

I didn't see the llama chat version hmm

#

Can you send the link here

#

I wanna see haha

white flare Jun 6, 2024, 2:53 AM

#

Oof I dont remember. let me see if i can find my old post on this where i also asked about gibberish coming out of vllm

frigid thorn Jun 6, 2024, 2:53 AM

#

white flare making me move to just using openllm

What's the open llm ?

white flare Jun 6, 2024, 2:55 AM

#

frigid thorn What's the open llm ?

It’s just another framework to run llm models easily - i prefer to runpod’s vllm solution which i just dont prefer. some reason couldn’t ever get the vllm to work nicely / easily as openllm i felt

https://github.com/justinwlin/Runpod-OpenLLM-Pod-and-Serverless

https://github.com/bentoml/OpenLLM

GitHub

GitHub - justinwlin/Runpod-OpenLLM-Pod-and-Serverless: A repo for O...

A repo for OpenLLM to run pod. Contribute to justinwlin/Runpod-OpenLLM-Pod-and-Serverless development by creating an account on GitHub.

GitHub

GitHub - bentoml/OpenLLM: Run any open-source LLMs, such as Llama 2...

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud. - bentoml/OpenLLM

#

and also i could get openllm to work vs ollama which requires a whole background server etc

#

and couldn’t ever get ollama to preload models properly

frigid thorn Jun 6, 2024, 2:56 AM

#

Hmm alright I should try that out haha

white flare Jun 6, 2024, 2:57 AM

#

#1208252068238860329 message

oh oops. my previous question was around mistral being dumb 😅

white flare Jun 6, 2024, 2:58 AM

#

frigid thorn Hmm alright I should try that out haha

Yeah! It pretty good. I have the docker images up for mistral7b, and obvs the repo. I didnt realize how big 70b models are xD and left it running on depot and came up with stupidly above 100gb images lmao

#

Which basically is unusable

frigid thorn Jun 6, 2024, 2:58 AM

#

Unusable? Try it out 😂

white flare Jun 6, 2024, 2:58 AM

#

Thxfully depot gave me free caches 🙏

frigid thorn Jun 6, 2024, 2:59 AM

#

white flare Thxfully depot gave me free caches 🙏

But the subs is paid right

white flare Jun 6, 2024, 2:59 AM

#

frigid thorn Unusable? Try it out 😂

xD i dont wanna wait an hour for a single serverless to load 😂

white flare Jun 6, 2024, 2:59 AM

#

frigid thorn But the subs is paid right

what are subs?

#

Oh yea depot usually cost money

frigid thorn Jun 6, 2024, 2:59 AM

#

The plan you pay for the depot

white flare Jun 6, 2024, 2:59 AM

#

But they gave me a sponsored account

#

So i use it for free lol

frigid thorn Jun 6, 2024, 2:59 AM

#

I c that's cool 👍

#

So, what's the "gibrish" response like? @novel sequoia

novel sequoia Jun 6, 2024, 3:34 AM

#

Im using the instruct version. Just feels like its x10 quantized like the model is very stupid.

frigid thorn Jun 6, 2024, 3:36 AM

#

Yeah its not normal

#Loading LLama 70b on vllm template serverless cant answer a simple question like "what is your name"