Downloaded mixtral-8x22b and SillyTavern. 8x22b file is called "consolidated.safetensors" | Mistral AI | Page 1

real fulcrum May 1, 2024, 8:56 AM

#

Downloaded mixtral-8x22b and SillyTavern. 8x22b file is called "consolidated.safetensors" but how to I run 8x22b (262 GB in size) and SillyTavern? There is no EXE and Mixtral does not recognize 8x22b

forest sail May 1, 2024, 8:59 AM

#

sillytavern is a frontend .. not an inference

#

you downloaded the fp16 of the model .. not sure you have the resources local to run that

reef garden May 1, 2024, 9:02 AM

#

real fulcrum Downloaded mixtral-8x22b and SillyTavern. 8x22b file is called "consolidated.sa...

What you downloaded are the Weights of the model, alone they are just numbers, you need an Inference Engine to run it, but before going deeper into what Engine to use, could you provide what are your specs? Cause as Dragon stated, I dont think you have the hardware requirements to run it.

real fulcrum May 1, 2024, 9:04 AM

#

Specs: nvidia GeForce 3060 -- 12 GB video RAM, 64 GB ram, AMD Ryzen 9 5900x

reef garden May 1, 2024, 9:07 AM

#

Yeah thats not even 10% of what you need, in terms of VRAM (gpu) you would need around 300Gb of VRAM to run full precision, and you cant run a decent quant with that much neither in my opinion, you better of with Mistral7B or Mixtral8x7B quanted to like 4bits GGUF? But would be painfully slow.

You should research a bit more aboud LLMs, how they work, requirements and quantization, I think you underestimated the hardware requirements 😅

real fulcrum May 1, 2024, 9:13 AM

#

oh, I figured the massive 8x22b file might be a little slow to write, but not impossible without 300gb video RAM. So, how do I run Silly Tavern and the two "ICE_Tea RP" safetensors ?

reef garden May 1, 2024, 9:15 AM

#

Im a bit confused since Im not a Silly Tavern pro but, Silly Tavern is only a FrontEnd, its just the interface, you need an endpoint/api/model running somewhere with an Engine, if you want to run Mistral7B you could look into Exllama2 I guess 🤔

forest sail May 1, 2024, 9:48 AM

#

st does not inference your model

#

and your system isnt good enough for that model not the slightest

#

you maybe able to get that in a 2 bit quant in llama.cpp running on your system but thats the max ..

reef garden May 1, 2024, 9:49 AM

#

Is that even worth it 🤔

forest sail May 1, 2024, 9:50 AM

#

thats not for me to decide but generally id say no

real fulcrum May 1, 2024, 5:54 PM

#

I downloaded Exllama 2, but I am missing an EXE for Windows again

reef garden May 1, 2024, 5:55 PM

#

real fulcrum I downloaded Exllama 2, but I am missing an EXE for Windows again

https://github.com/turboderp/exllamav2

#Downloaded mixtral-8x22b and SillyTavern. 8x22b file is called "consolidated.safetensors"