#Downloaded mixtral-8x22b and SillyTavern. 8x22b file is called "consolidated.safetensors"
15 messages · Page 1 of 1 (latest)
sillytavern is a frontend .. not an inference
you downloaded the fp16 of the model .. not sure you have the resources local to run that
What you downloaded are the Weights of the model, alone they are just numbers, you need an Inference Engine to run it, but before going deeper into what Engine to use, could you provide what are your specs? Cause as Dragon stated, I dont think you have the hardware requirements to run it.
Specs: nvidia GeForce 3060 -- 12 GB video RAM, 64 GB ram, AMD Ryzen 9 5900x
Yeah thats not even 10% of what you need, in terms of VRAM (gpu) you would need around 300Gb of VRAM to run full precision, and you cant run a decent quant with that much neither in my opinion, you better of with Mistral7B or Mixtral8x7B quanted to like 4bits GGUF? But would be painfully slow.
You should research a bit more aboud LLMs, how they work, requirements and quantization, I think you underestimated the hardware requirements 😅
oh, I figured the massive 8x22b file might be a little slow to write, but not impossible without 300gb video RAM. So, how do I run Silly Tavern and the two "ICE_Tea RP" safetensors ?
Im a bit confused since Im not a Silly Tavern pro but, Silly Tavern is only a FrontEnd, its just the interface, you need an endpoint/api/model running somewhere with an Engine, if you want to run Mistral7B you could look into Exllama2 I guess 🤔
st does not inference your model
and your system isnt good enough for that model not the slightest
you maybe able to get that in a 2 bit quant in llama.cpp running on your system but thats the max ..
Is that even worth it 🤔
thats not for me to decide but generally id say no
I downloaded Exllama 2, but I am missing an EXE for Windows again