#noob question, ran out of tokens. What do I do? How do you get more tokens or how do you avoid that?

1 messages · Page 1 of 1 (latest)

sacred phoenix
#

I wish there was a 101 guide or something to explain basic things such as this in layman's terms

I mean, do tokens regenerate overtime?

ionic lance
#

I think you might be misunderstanding what tokens are? They're basically word fragments

#

not 100% sure what you mean by ran out. Are you talking about on a web-hosted service?

sacred phoenix
#

nope, oobabooga. Yeah I did misunderstand, I guess I'll just tweak some settings instead.
I can get it to go for long conversations but eventually it just does that and it doesn't get past that

ionic lance
#

Yeah the error message above that says you've run out of memory

sacred phoenix
#

yeah, I guess all I can do is tweak settings or change to a smaller model

ionic lance
#

either smaller model, or reduce the context length (max_seq_len) on the model load page

sacred phoenix
#

not an option but I'll take it in mind when I try other models. I usually set context lenght to the highest possible since I think it improves quality, I didn't know there was a downside other than performance/resource requirement

ionic lance
#

the context length affects how much of the context is stored in memory. The longer the length, the more memory it uses

sacred phoenix
ionic lance
#

context is the amount of the conversation to keep in memory

#

what type of model you are using? Just GPTQ?

sacred phoenix
#

I am trying a bunch,
TheBloke/PsyMedRP-v1-20B-GGUF psymedrp-v1-20b.Q4_K_M.gguf
mlabonne/NeuralHermes-2.5-Mistral-7B
Open-Orca/Mistral-7B-OpenOrca
TheBloke/WizardCoder-15B-1.0-GPTQ

and many more

I am mainly trying to understand how it works, what's available, and what I think is the best experience

ionic lance
#

So if you're choosing AutoGPTQ, the settings are pretty generic. I'd recommend you pick the actual loader instead for the appropriate model type. That should give you more settings to set things like context length

#

So e.g. for GPTQ, choose Exllamav2_HF

sacred phoenix
#

exllamav2 doesn't work for this one since it's not based on llama

ionic lance
#

gotcha

sacred phoenix
#

does transformers and autogptq have an unlimited context size? As in, they don't have a context size limit?

ionic lance
#

I'm not too familiar with either of those honestly. I'd have to assume it's based on the amount of VRAM that you specify

sacred phoenix
#

alright, I'll test that theory

lapis compass
#

you can just artificially limit it to fit on less capable systems

lapis compass
sacred phoenix