#wont load model on latest version
1 messages · Page 1 of 1 (latest)
Does it load if you use the --loader gptq-for-llama flag?
where to do I pass the flag to?
Should be a section at the top of webui.py for it. You can also manually select the gptq-for-llama loader in the webui when loading the model.
Here?
Yes
It may be that the model is simply incompatible with the GPTQ-for-LLaMa version that the webui uses. I saw someone else say that they were able to load that model, but I've yet to be able to myself.
I'll re-download it and see if I can figure out how they did it.
You might also look at this: https://huggingface.co/notstoic/pygmalion-13b-4bit-128g/discussions/2
I'm able to load that model with both gptq-for-llama and AutoGPTQ. You should try the suggestion mentioned in the link from my previous message.
not getting anything again
What about without the --loader gptq-for-llama flag?
Does that with that flag
What about without the flag.
You can also try with --loader autogptq and if that doesn't work, then I don't know what else to do besides using exllama. Setup instructions for exllama here: #general message
And here: https://github.com/oobabooga/text-generation-webui/blob/main/docs/ExLlama.md#installation
how do i disable the autoload for the model?
because with the 7B model i get a diffrent error
--model None
You are supposed to be able to just remove --model, but I think it is bugged.
Thats the error i get when loading the 7B model
Probably need to set the correct groupsize for the model. Most are 128 or None.
changed it to none and same error
Can you link the model?
Groupsize none should be the correct setting. AutoGPTQ will detect the settings on it's own most of the time. GPTQ-for-LLaMa needs wbits set to 4 and groupsize set to None.
If all else fails, exllama may be the only option.
I'll download the model and see what settings work for me.
So AutoGPTQ managed to load it, but it makes gibberish
here is the 13B model btw
https://huggingface.co/notstoic/pygmalion-13b-4bit-128g
now its simply not replying
Interesting. There is a universal tokenizer that you can download that may fix that. Go to the Model tab and enter this to download it: oobabooga/llama-tokenizer
After it is downloaded, reload the model and the webui should load the tokenizer with it.
just stuck on this
Not sure what that's about, but you can download it manually if you have to. https://huggingface.co/oobabooga/llama-tokenizer
Just make sure to put the files in models\oobabooga_llama-tokenizer\
Just to check, make sure that 7B model only has one model file in it's folder. The one you want is Pygmalion-7B-GPTQ-4bit.act-order.safetensors
That should be good.
looks like the 7B model works now
That's good. Not sure why the 13B isn't working.
You might try enabling the auto-devices option when loading it with AutoGPTQ.
Now gptq-for-llama works with the 7B model
let me try load the 13B
13B still doesnt work
it gives the same memory error
24gb vram
It should definitely not be giving a memory error. You might have to set up exllama. It can load just about any GPTQ model and is the fastest way to run them.
guide on how to install?
Get the compiler listed here: #general message
Then check if exllama is in \text-generation-webui\repositories. If it isn't, then run cmd_windows.bat and enter this command:
git clone https://github.com/turboderp/exllama .\text-generation-webui\repositories\exllama
Another link for the compiler: https://aka.ms/vs/17/release/vs_BuildTools.exe
Make sure to select the C++ option when installing.
Once that is done, you should be able to select the exllama option to load the model in the webui.
this one?
yes
the full 8GB worth? dont have all that much disk space
Use the other link I gave then: https://aka.ms/vs/17/release/vs_BuildTools.exe
It is 1gb
thats the one im using
Damn that sucks. Microsoft doesn't care about disk space.
The exllama devs think it's cool to have to compile their software.
is there any precompiled?
Nope. It is designed to compile when it is used. Kinda dumb that they made it that way.
theres everything to be installed with the link
MSVC and the Windows 11 SDK are really all you need, but I don't know enough about the other stuff to know what is safe to disable.
any other way to install?
Nope. Microsoft doesn't provide any other way of installing it. Build Tools is the smallest install they offer.
Eventually I'll try to redesign the exllama code to make a pre-compiled version. But that will take a while as I've never done something like that before.
that would be super useful
Took a cursory glance at the code, and it doesn't seem too difficult to do.
okay have the MS stuff installed, what next?
seems like its working but its only using ~45% power of the gpu
I would assume that it is CPU bottlenecked, but I don't know much about how exllama works. It's pretty new.
cpu is running at about ~30-40%
normal ram usage is high though
Well... I managed to compile exllama as an independent module. Just have to get the webui to load it.
Yeah it's pretty bad on RAM usage.
anyway to tone it down a bit?
No clue. It might be a bug in how the webui uses exllama. Don't know enough about it know for sure.
well i have to head to sleep
i'll try other models to see what works
@cedar dust Do you run TavernAI?
I use SillyTavern
ah alr
do you know the diffrence between TaverAI and SillyTavern?
More features and direct support for text-generation-webui.
Has some cool extensions, not that I've ever used them.
do you know if i can export chats from tavernai to sillytavern?
Not sure. Don't see any mention of it in the docs.
There is a way to import a chat. Don't know if it supports TavernAI's chats or not.
It wouldn't surprise me though, given that SillyTavern started as a fork of TavernAI.
how would i import chats?
Select a character, then press the three-bar button in the bottom left. Press View past chats and you should see a button in the top right of the pop-up to import.
im not using a proxy or anything like that
Probably just have to wait a while and try again. npmjs servers probably just having issues.
any other ideans on how to fix?
VPN maybe? Not really sure what to do.
You can try editing the Start.bat script and change the second line to @rem call npm install
Since this is your first time running, it may just fail due to missing packages.
You can also try running this command in cmd: npm config delete proxy
Getting this now
Just remove the @rem and try again later if the npm config delete proxy doesn't fix it. Not much else to do since it is refusing to connect to install the requirements.
are you able to share the node modules with me?
like the folder
thx
@cedar dust is there anyway to contine the convo in sillytavern, as in letting the AI contine talking?
It sometimes doesn't work very well, but you can press the generate button with the input box blank.
It also has a multi-gen mode in the settings that is intended to allow for longer responses. Live text streaming doesn't work with it though.
Does anyone have any idea what the hell this is all about? Last week the model was working. Today I updated the webui and...
bin J:\oobabooga_windows\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll 2023-06-21 16:47:54 INFO:Loading settings from J:\oobabooga_windows\mysettings.yaml... 2023-06-21 16:47:54 INFO:Loading TheBloke_chronos-wizardlm-uc-scot-st-13B-GPTQ... Traceback (most recent call last): File "J:\oobabooga_windows\text-generation-webui\server.py", line 1007, in <module> shared.model, shared.tokenizer = load_model(shared.model_name) File "J:\oobabooga_windows\text-generation-webui\modules\models.py", line 65, in load_model output = load_func_map[loader](model_name) File "J:\oobabooga_windows\text-generation-webui\modules\models.py", line 197, in huggingface_loader model = LoaderClass.from_pretrained(checkpoint, **params) File "J:\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\auto\auto_factory.py", line 484, in from_pretrained return model_class.from_pretrained( File "J:\oobabooga_windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 2449, in from_pretrained raise EnvironmentError( OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory I:\Textmodels\TheBloke_chronos-wizardlm-uc-scot-st-13B-GPTQ. Press any key to continue . . .
So I cannot load the webui.
OK, it was a compatibility problem. Need delete or rename config-user.yaml:
https://github.com/oobabooga/text-generation-webui/issues/2795#issuecomment-1600999075
sure
kk wait a little
basically
i can load non 4b-128g models
but when i try to load one
it get stuck on that
Or
it send me a weird error
here's what happen
basically it don't load the model
(and i tried to update peft)
so if you can please help me lmao
this is the error
it would send
before
but i managed to fix it i think
start up here to install exllama
worked for me
I did
but uhh...
nothing changed
it only got worse
reinstall without exllama?
can i get the full error?
it don't happen anymore but basically
you have it in the screenshot
it's the full thing
so idk what i did wrong
but this happened after the update.
whats the link to download the model?
and that's what happen now
Traceback (most recent call last): File “B:\webui\text-generation-webui\server.py”, line 62, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File “B:\webui\text-generation-webui\modules\models.py”, line 65, in load_model output = load_func_maploader File “B:\webui\text-generation-webui\modules\models.py”, line 271, in AutoGPTQ_loader return modules.AutoGPTQ_loader.load_quantized(model_name) File “B:\webui\text-generation-webui\modules\AutoGPTQ_loader.py”, line 55, in load_quantized model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params) File “B:\webui\installer_files\env\lib\site-packages\auto_gptq\modeling\auto.py”, line 82, in from_quantized return quant_func( File “B:\webui\installer_files\env\lib\site-packages\auto_gptq\modeling_base.py”, line 773, in from_quantized accelerate.utils.modeling.load_checkpoint_in_model( File “B:\webui\installer_files\env\lib\site-packages\accelerate\utils\modeling.py”, line 1094, in load_checkpoint_in_model checkpoint = load_state_dict(checkpoint_file, device_map=device_map) File “B:\webui\installer_files\env\lib\site-packages\accelerate\utils\modeling.py”, line 946, in load_state_dict return safe_load_file(checkpoint_file, device=list(device_map.values())[0]) File “B:\webui\installer_files\env\lib\site-packages\safetensors\torch.py”, line 261, in load_file result[k] = f.get_tensor(k) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 394.10 MiB already allocated; 4.60 GiB free; 396.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
idfk why
whats your gpu vram?
6
but before i could run it
easily
with it only using 4.5
so yeah, i can def run it
maybe too many apps running, try closing some
and try enable auto-devices in the webui.py file
Traceback (most recent call last): File “B:\webui\text-generation-webui\server.py”, line 62, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File “B:\webui\text-generation-webui\modules\models.py”, line 65, in load_model output = load_func_maploader File “B:\webui\text-generation-webui\modules\models.py”, line 271, in AutoGPTQ_loader return modules.AutoGPTQ_loader.load_quantized(model_name) File “B:\webui\text-generation-webui\modules\AutoGPTQ_loader.py”, line 55, in load_quantized model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params) File “B:\webui\installer_files\env\lib\site-packages\auto_gptq\modeling\auto.py”, line 82, in from_quantized return quant_func( File “B:\webui\installer_files\env\lib\site-packages\auto_gptq\modeling_base.py”, line 773, in from_quantized accelerate.utils.modeling.load_checkpoint_in_model( File “B:\webui\installer_files\env\lib\site-packages\accelerate\utils\modeling.py”, line 1094, in load_checkpoint_in_model checkpoint = load_state_dict(checkpoint_file, device_map=device_map) File “B:\webui\installer_files\env\lib\site-packages\accelerate\utils\modeling.py”, line 946, in load_state_dict return safe_load_file(checkpoint_file, device=list(device_map.values())[0]) File “B:\webui\installer_files\env\lib\site-packages\safetensors\torch.py”, line 261, in load_file result[k] = f.get_tensor(k) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 394.10 MiB already allocated; 4.60 GiB free; 396.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
now another one
hehe
...
well i tried
but ig i can try again
since i managed to literally make it load some models
(before it just wouldn't load anything)
but does this started happening to other ppl too ? after the update
coz it would work normally for me until that
mine broke after the update on my 12gb gpu
How tf...
I got my P40 bc of it lol