help-forum
Batch Prompt Raw Data?
Windows
Possible to get webui to remember parameters?
I can't start qwen 72b
Pass stopping criteria through API
is there a way to reduce the output lenght?
Brand new install of text-generation-webui doesn't support Zephyr (Mistral) GPTQ
Running as a daemon
roBLAS error and CUDA 98. How do I fix those?
Want to try running some local LLMs, getting error when attempting to generate.
Setup
Windows
--listen checkbox in webui doesn't do anything... I have to call it directly with commandline
Windows
DLL load failed while importing flash_attn_2_cuda: The specified module could not be found.
Who is Chiharu Yamada?
Gradio
Windows
ERROR:Failed to load the extension "superbooga"
Windows
Issues hosting local api
Setup
Windows
Starting out, performance woes
Hardware
Setup
Windows
Error with loading Llama 2 on Mac
Gradio
Setup
LLaMA
VPet openai api
Generating nonsense
Prompts & Characters
Can't use API
GPU Usage
Linux
LLM's not behaving like it should when running in an API workflow.
chat memory
How to download only 1 model
Gradio
Setup
VRAM on linux
Linux
Setup
Hardware
how can i run 2 concurrent Public APIs
Can't disable ExLlama when loading with Transformers
Linux
LLaMA
my webui can only output “□□□□□□□□□□□□□□□□□□□□□□□□”
LLaMA
Windows
Llama.cpp limited to 128 layers for n-gpu-layers?
Setup
auto load model on startup
Can't run as OpenAI host
allow outside connections
Windows
Loading model traceback error
Windows
which architectures have problems currently?
Max character card length?
Prompts & Characters
Windows
How do I split memory between CPU and GPU? I'm running a 13B GTPQ model.
Windows
Unable to Load AWQ model
TypeError: Not a string - when loading EM German Leo Mistral (any version)
Linux
trying to train a new lora and it errors out, wizardlm 7b uncensored gptq
Can I use Tesla GPUs to run Ooba?
Why is the bot devolving instantly?
Inilialism meanings
API Usage to load and generate output
Windows
Use multiple GPUs when loading a model
how to use api on text-generation-wubei on postman
Best way to be totally sure the model is running on GPU?
Hardware
Windows
Expected all tensors to be on the same device
Linux
Setup
Extremely slow generation with a 4090
Windows
Strange responsive from LLM
Linux
Cant see awnsers
Vectorization Source - Chat memory help
Prompts & Characters
Windows