help-forum
2 tokens/second
KeyError: ‘bos_token_id’
Windows
VRAM & "CUDA out of memory."
Hardware
Setup
Cuda error 2
general performance-review
Windows
unsupported tensor dtype in loras LIMARP with Exllama
How to convert ggml to ggml v3?
Can not run llama2 7b parameters on windows webui (updated)
Can't install models with WebUI
Hardware
Setup
Windows
Could not find the quantized model in .pt or .safetensors (Solved, wrong model loader.)
Windows
Loading TheBloke_Llama-2-70B-chat-GPTQ…
Windows
[RESOLVED] OSError: [WinError -1073741795] Windows Error 0xc000001d
Setup
Windows
any recommendations?
All of a sudden my screen now goes off randomly while running ooba
Hardware
I can't figure out what's wrong. It won't Load Models!
In chat-instruct mode where do you add memory of important details about the user LLM is talking to?
Can't get any models to work in Text Generation WebUI on Windows 11 laptop
Windows
How to install GPTQ-for-Llama with venv on Linux?
Linux
Setup
LLaMA
Failed building wheel for quant-cuda
Setup
Windows
.safetensors models produce nonsense output.
524 error with public api extension - any insight?
Windows
New to WebUI, Conda errors
Windows
Auto Installer - Listen on Lan
Linux
Setup
Are there any existing functions to submit prompts in batches?
Linux
Prompts & Characters
Question about Perplexity Results
max_new_tokens increased length
Gibrish/Empty responses when clicking "continue".
LLaMA
Windows
Error when loading model (WizardLM Uncensored Falcon 40B)
Hardware
Setup
Windows
Issues using OpenAi api exstension
silero_tts repeating voice clip bug
models won't load
Hi I enabled public_api in Text Generation WebUI from RunPod service and I need the URL
Hardware
Setup
can't start web UI without loading a model, Bug?
Windows
llama.cpp not using GPU despite having BLAS = 1 (Linux, GGML)
Linux
LLaMA
AMD
What are Ooga's default preset values?
What models can I use?
Hardware
Windows
What prompts does Chat mode use instead of Instruct mode?
Linux
Prompts & Characters
I fine-tuned the model and I got CUDA out of memory
Linux
When training Lora in alpaca format - how do i ensure the conversation history is maintained ?
KeyError: 'lm_head.weight' when attempting to load Guanaco 33B with loader other than Transformers
Linux
LLaMA
Show prompts in Console?
Expected inference speeds with a 3090Ti / ExLlama setup
Getting error while trying to create a LoRA based onTheBloke_WizardLM-33B-V1-0-Uncensored-SuperHOT-8
Linux
Setup
Windows
Errors when running SUPERHOT models using Exllama
Cpu bottleneck
Hardware
How exactly does feature "start replay with" work?
Runtime error while trying to boot alpaca
Setup
Windows
CUDA out of memory
Windows
ERROR:Failed to load the model
Conda environment is empty
Setup
Windows