noob question, ran out of tokens. What do I do? How do you get more tokens or how do you avoid that? | Text Generation WebUI | Page 1

sacred phoenix Feb 28, 2024, 1:24 PM

#

I wish there was a 101 guide or something to explain basic things such as this in layman's terms

I mean, do tokens regenerate overtime?

ionic lance Feb 28, 2024, 1:29 PM

#

I think you might be misunderstanding what tokens are? They're basically word fragments

#

not 100% sure what you mean by ran out. Are you talking about on a web-hosted service?

sacred phoenix Feb 28, 2024, 1:40 PM

#

nope, oobabooga. Yeah I did misunderstand, I guess I'll just tweak some settings instead.
I can get it to go for long conversations but eventually it just does that and it doesn't get past that

ionic lance Feb 28, 2024, 1:43 PM

#

Yeah the error message above that says you've run out of memory

sacred phoenix Feb 28, 2024, 1:44 PM

#

yeah, I guess all I can do is tweak settings or change to a smaller model

ionic lance Feb 28, 2024, 1:45 PM

#

either smaller model, or reduce the context length (max_seq_len) on the model load page

sacred phoenix Feb 28, 2024, 1:47 PM

#

not an option but I'll take it in mind when I try other models. I usually set context lenght to the highest possible since I think it improves quality, I didn't know there was a downside other than performance/resource requirement

ionic lance Feb 28, 2024, 1:48 PM

#

the context length affects how much of the context is stored in memory. The longer the length, the more memory it uses

sacred phoenix Feb 28, 2024, 2:28 PM

#

ionic lance the context length affects how much of the context is stored in memory. The long...

so basically it's like memory, context is the whole conversation.
Is there a way to reduce it? Just so it forgets the older parts of it? without deleting the conversation

ionic lance Feb 28, 2024, 2:29 PM

#

context is the amount of the conversation to keep in memory

#

what type of model you are using? Just GPTQ?

sacred phoenix Feb 28, 2024, 2:30 PM

#

I am trying a bunch,
TheBloke/PsyMedRP-v1-20B-GGUF psymedrp-v1-20b.Q4_K_M.gguf
mlabonne/NeuralHermes-2.5-Mistral-7B
Open-Orca/Mistral-7B-OpenOrca
TheBloke/WizardCoder-15B-1.0-GPTQ

and many more

I am mainly trying to understand how it works, what's available, and what I think is the best experience

ionic lance Feb 28, 2024, 2:31 PM

#

So if you're choosing AutoGPTQ, the settings are pretty generic. I'd recommend you pick the actual loader instead for the appropriate model type. That should give you more settings to set things like context length

#

So e.g. for GPTQ, choose Exllamav2_HF

sacred phoenix Feb 28, 2024, 2:32 PM

#

exllamav2 doesn't work for this one since it's not based on llama

ionic lance Feb 28, 2024, 2:33 PM

#

gotcha

sacred phoenix Feb 28, 2024, 2:35 PM

#

does transformers and autogptq have an unlimited context size? As in, they don't have a context size limit?

ionic lance Feb 28, 2024, 2:36 PM

#

I'm not too familiar with either of those honestly. I'd have to assume it's based on the amount of VRAM that you specify

sacred phoenix Feb 28, 2024, 2:39 PM

#

alright, I'll test that theory

lapis compass Feb 28, 2024, 3:37 PM

#

sacred phoenix alright, I'll test that theory

the maximum context length is determined by the model, not the loader you use

#

you can just artificially limit it to fit on less capable systems

lapis compass Feb 28, 2024, 3:39 PM

#

sacred phoenix I am trying a bunch, TheBloke/PsyMedRP-v1-20B-GGUF psymedrp-v1-20b.Q4_K_M.gguf m...

you're using a really odd mix of models and loaders. If you have a fairly capable gpu, just stick to .exl, otherwise stick to gguf and offload. There's no reason to use full precision models unless you're planning to train

sacred phoenix Feb 29, 2024, 2:51 AM

#

lapis compass you're using a really odd mix of models and loaders. If you have a fairly capabl...

is rtx 3080 with 12GB ram fairly capable?

#noob question, ran out of tokens. What do I do? How do you get more tokens or how do you avoid that?