#Llama.cpp crashing Text Generation WebUI - Core Dumped

3 messages · Page 1 of 1 (latest)

tiny spoke
#

I've been quite happy with some of the newer models (Llama 3.1, Mistral Nemo, etc.) now that llama.cpp, llama-cpp-python, and Text Generation WebUI have been updated to run them. One thing that has started happening with no rhyme or reason - every so often Text Generation WebUI will crash out and the core dumped.

The error that I get usually looks like this:

Llama.generate: prefix-match hit
Prompt evaluation: 0%| | 0/1 [00:00<?, ?it/s]/home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml/src/ggml-cuda/rope.cu:200: GGML_ASSERT(src0->type == GGML_TYPE_F32 || src0->type == GGML_TYPE_F16) failed
Aborted (core dumped)

My setup is as follows:

Ubuntu 22.04, 4x 16GB P100 and 3x 12GB P100. I'm using Text Generation WebUI with the latest updates. Most of what I run are GGUF models - usually in Q8 form.

I haven't come across any similar errors on either the llama.cpp repo, the llama-cpp-python repo, or the text-generation-webui repo, so I'm pretty sure this is a "me" problem.

Any tips on where to go next would be appreciated. Thanks!

tiny spoke
#

Ah, okay, will do.

tiny spoke
#

That did the trick - for my edification, what was streaming_llm doing that was breaking llama.cpp?