#Downloaded mixtral-8x22b and SillyTavern. 8x22b file is called "consolidated.safetensors"

15 messages · Page 1 of 1 (latest)

real fulcrum
#

Downloaded mixtral-8x22b and SillyTavern. 8x22b file is called "consolidated.safetensors" but how to I run 8x22b (262 GB in size) and SillyTavern? There is no EXE and Mixtral does not recognize 8x22b

forest sail
#

sillytavern is a frontend .. not an inference

#

you downloaded the fp16 of the model .. not sure you have the resources local to run that

reef garden
real fulcrum
#

Specs: nvidia GeForce 3060 -- 12 GB video RAM, 64 GB ram, AMD Ryzen 9 5900x

reef garden
#

Yeah thats not even 10% of what you need, in terms of VRAM (gpu) you would need around 300Gb of VRAM to run full precision, and you cant run a decent quant with that much neither in my opinion, you better of with Mistral7B or Mixtral8x7B quanted to like 4bits GGUF? But would be painfully slow.

You should research a bit more aboud LLMs, how they work, requirements and quantization, I think you underestimated the hardware requirements 😅

real fulcrum
#

oh, I figured the massive 8x22b file might be a little slow to write, but not impossible without 300gb video RAM. So, how do I run Silly Tavern and the two "ICE_Tea RP" safetensors ?

reef garden
#

Im a bit confused since Im not a Silly Tavern pro but, Silly Tavern is only a FrontEnd, its just the interface, you need an endpoint/api/model running somewhere with an Engine, if you want to run Mistral7B you could look into Exllama2 I guess 🤔

forest sail
#

st does not inference your model

#

and your system isnt good enough for that model not the slightest

#

you maybe able to get that in a 2 bit quant in llama.cpp running on your system but thats the max ..

reef garden
#

Is that even worth it 🤔

forest sail
#

thats not for me to decide but generally id say no

real fulcrum
#

I downloaded Exllama 2, but I am missing an EXE for Windows again