#wont load model on latest version

1 messages · Page 1 of 1 (latest)

pale meteor
#

All I get when trying to start the webui is nothing (as shwon in attached img)
can anyone help?

cedar dust
#

Does it load if you use the --loader gptq-for-llama flag?

pale meteor
cedar dust
#

Should be a section at the top of webui.py for it. You can also manually select the gptq-for-llama loader in the webui when loading the model.

pale meteor
cedar dust
#

Yes

pale meteor
cedar dust
#

It may be that the model is simply incompatible with the GPTQ-for-LLaMa version that the webui uses. I saw someone else say that they were able to load that model, but I've yet to be able to myself.

#

I'll re-download it and see if I can figure out how they did it.

cedar dust
# pale meteor

I'm able to load that model with both gptq-for-llama and AutoGPTQ. You should try the suggestion mentioned in the link from my previous message.

pale meteor
#

not getting anything again

cedar dust
pale meteor
cedar dust
pale meteor
#

because with the 7B model i get a diffrent error

cedar dust
#

--model None
You are supposed to be able to just remove --model, but I think it is bugged.

pale meteor
cedar dust
pale meteor
cedar dust
#

Can you link the model?

cedar dust
#

Groupsize none should be the correct setting. AutoGPTQ will detect the settings on it's own most of the time. GPTQ-for-LLaMa needs wbits set to 4 and groupsize set to None.
If all else fails, exllama may be the only option.

#

I'll download the model and see what settings work for me.

pale meteor
#

So AutoGPTQ managed to load it, but it makes gibberish

pale meteor
cedar dust
#

Interesting. There is a universal tokenizer that you can download that may fix that. Go to the Model tab and enter this to download it: oobabooga/llama-tokenizer
After it is downloaded, reload the model and the webui should load the tokenizer with it.

cedar dust
#

Just make sure to put the files in models\oobabooga_llama-tokenizer\

cedar dust
cedar dust
pale meteor
#

looks like the 7B model works now

cedar dust
#

That's good. Not sure why the 13B isn't working.

#

You might try enabling the auto-devices option when loading it with AutoGPTQ.

pale meteor
#

Now gptq-for-llama works with the 7B model

#

let me try load the 13B

#

13B still doesnt work

#

it gives the same memory error

cedar dust
#

How much memory do you have?

#

VRAM

pale meteor
#

24gb vram

cedar dust
#

It should definitely not be giving a memory error. You might have to set up exllama. It can load just about any GPTQ model and is the fastest way to run them.

pale meteor
#

guide on how to install?

cedar dust
# pale meteor guide on how to install?

Get the compiler listed here: #general message
Then check if exllama is in \text-generation-webui\repositories. If it isn't, then run cmd_windows.bat and enter this command:

git clone https://github.com/turboderp/exllama .\text-generation-webui\repositories\exllama
#

Make sure to select the C++ option when installing.

#

Once that is done, you should be able to select the exllama option to load the model in the webui.

pale meteor
cedar dust
#

yes

pale meteor
#

the full 8GB worth? dont have all that much disk space

cedar dust
#

It is 1gb

pale meteor
#

thats the one im using

cedar dust
#

Damn that sucks. Microsoft doesn't care about disk space.

#

The exllama devs think it's cool to have to compile their software.

pale meteor
cedar dust
#

Nope. It is designed to compile when it is used. Kinda dumb that they made it that way.

pale meteor
#

theres everything to be installed with the link

cedar dust
#

MSVC and the Windows 11 SDK are really all you need, but I don't know enough about the other stuff to know what is safe to disable.

cedar dust
#

Nope. Microsoft doesn't provide any other way of installing it. Build Tools is the smallest install they offer.

#

Eventually I'll try to redesign the exllama code to make a pre-compiled version. But that will take a while as I've never done something like that before.

cedar dust
#

Took a cursory glance at the code, and it doesn't seem too difficult to do.

pale meteor
#

seems like its working but its only using ~45% power of the gpu

cedar dust
pale meteor
#

normal ram usage is high though

cedar dust
#

Well... I managed to compile exllama as an independent module. Just have to get the webui to load it.

pale meteor
#

alright

#

Does your ram get hit this hard when running exllama?

cedar dust
#

Yeah it's pretty bad on RAM usage.

pale meteor
cedar dust
#

No clue. It might be a bug in how the webui uses exllama. Don't know enough about it know for sure.

pale meteor
#

well i have to head to sleep
i'll try other models to see what works

pale meteor
#

@cedar dust Do you run TavernAI?

cedar dust
pale meteor
#

ah alr

pale meteor
cedar dust
#

More features and direct support for text-generation-webui.

#

Has some cool extensions, not that I've ever used them.

pale meteor
cedar dust
#

Not sure. Don't see any mention of it in the docs.

cedar dust
cedar dust
# pale meteor how would i import chats?

Select a character, then press the three-bar button in the bottom left. Press View past chats and you should see a button in the top right of the pop-up to import.

pale meteor
cedar dust
#

Probably just have to wait a while and try again. npmjs servers probably just having issues.

pale meteor
cedar dust
cedar dust
#

You can also try running this command in cmd: npm config delete proxy

cedar dust
# pale meteor Getting this now

Just remove the @rem and try again later if the npm config delete proxy doesn't fix it. Not much else to do since it is refusing to connect to install the requirements.

pale meteor
pale meteor
pale meteor
#

@cedar dust is there anyway to contine the convo in sillytavern, as in letting the AI contine talking?

cedar dust
#

It also has a multi-gen mode in the settings that is intended to allow for longer responses. Live text streaming doesn't work with it though.

ashen forum
#

Does anyone have any idea what the hell this is all about? Last week the model was working. Today I updated the webui and...
bin J:\oobabooga_windows\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll 2023-06-21 16:47:54 INFO:Loading settings from J:\oobabooga_windows\mysettings.yaml... 2023-06-21 16:47:54 INFO:Loading TheBloke_chronos-wizardlm-uc-scot-st-13B-GPTQ... Traceback (most recent call last): File "J:\oobabooga_windows\text-generation-webui\server.py", line 1007, in <module> shared.model, shared.tokenizer = load_model(shared.model_name) File "J:\oobabooga_windows\text-generation-webui\modules\models.py", line 65, in load_model output = load_func_map[loader](model_name) File "J:\oobabooga_windows\text-generation-webui\modules\models.py", line 197, in huggingface_loader model = LoaderClass.from_pretrained(checkpoint, **params) File "J:\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\auto\auto_factory.py", line 484, in from_pretrained return model_class.from_pretrained( File "J:\oobabooga_windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 2449, in from_pretrained raise EnvironmentError( OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory I:\Textmodels\TheBloke_chronos-wizardlm-uc-scot-st-13B-GPTQ. Press any key to continue . . .

#

So I cannot load the webui.

pale meteor
#

@cedar dust keep getting this now

#

ig it times out or something

pale meteor
calm wave
#

can you help me rq ?

pale meteor
calm wave
#

basically

#

i can load non 4b-128g models

#

but when i try to load one

#

it get stuck on that

#

Or

#

it send me a weird error

#

here's what happen

#

basically it don't load the model

#

(and i tried to update peft)

#

so if you can please help me lmao

calm wave
#

this is the error

#

it would send

#

before

#

but i managed to fix it i think

pale meteor
calm wave
#

but uhh...

#

nothing changed

#

it only got worse

calm wave
pale meteor
#

reinstall without exllama?

calm wave
#

i tried

#

and it just don't load

#

the model

pale meteor
#

can i get the full error?

calm wave
#

you have it in the screenshot

#

it's the full thing

#

so idk what i did wrong

#

but this happened after the update.

pale meteor
#

whats the link to download the model?

calm wave
#

and that's what happen now

#

Traceback (most recent call last): File “B:\webui\text-generation-webui\server.py”, line 62, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File “B:\webui\text-generation-webui\modules\models.py”, line 65, in load_model output = load_func_maploader File “B:\webui\text-generation-webui\modules\models.py”, line 271, in AutoGPTQ_loader return modules.AutoGPTQ_loader.load_quantized(model_name) File “B:\webui\text-generation-webui\modules\AutoGPTQ_loader.py”, line 55, in load_quantized model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params) File “B:\webui\installer_files\env\lib\site-packages\auto_gptq\modeling\auto.py”, line 82, in from_quantized return quant_func( File “B:\webui\installer_files\env\lib\site-packages\auto_gptq\modeling_base.py”, line 773, in from_quantized accelerate.utils.modeling.load_checkpoint_in_model( File “B:\webui\installer_files\env\lib\site-packages\accelerate\utils\modeling.py”, line 1094, in load_checkpoint_in_model checkpoint = load_state_dict(checkpoint_file, device_map=device_map) File “B:\webui\installer_files\env\lib\site-packages\accelerate\utils\modeling.py”, line 946, in load_state_dict return safe_load_file(checkpoint_file, device=list(device_map.values())[0]) File “B:\webui\installer_files\env\lib\site-packages\safetensors\torch.py”, line 261, in load_file result[k] = f.get_tensor(k) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 394.10 MiB already allocated; 4.60 GiB free; 396.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

#

idfk why

pale meteor
#

whats your gpu vram?

calm wave
#

6

#

but before i could run it

#

easily

#

with it only using 4.5

#

so yeah, i can def run it

pale meteor
#

maybe too many apps running, try closing some
and try enable auto-devices in the webui.py file

calm wave
#

Traceback (most recent call last): File “B:\webui\text-generation-webui\server.py”, line 62, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File “B:\webui\text-generation-webui\modules\models.py”, line 65, in load_model output = load_func_maploader File “B:\webui\text-generation-webui\modules\models.py”, line 271, in AutoGPTQ_loader return modules.AutoGPTQ_loader.load_quantized(model_name) File “B:\webui\text-generation-webui\modules\AutoGPTQ_loader.py”, line 55, in load_quantized model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params) File “B:\webui\installer_files\env\lib\site-packages\auto_gptq\modeling\auto.py”, line 82, in from_quantized return quant_func( File “B:\webui\installer_files\env\lib\site-packages\auto_gptq\modeling_base.py”, line 773, in from_quantized accelerate.utils.modeling.load_checkpoint_in_model( File “B:\webui\installer_files\env\lib\site-packages\accelerate\utils\modeling.py”, line 1094, in load_checkpoint_in_model checkpoint = load_state_dict(checkpoint_file, device_map=device_map) File “B:\webui\installer_files\env\lib\site-packages\accelerate\utils\modeling.py”, line 946, in load_state_dict return safe_load_file(checkpoint_file, device=list(device_map.values())[0]) File “B:\webui\installer_files\env\lib\site-packages\safetensors\torch.py”, line 261, in load_file result[k] = f.get_tensor(k) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 394.10 MiB already allocated; 4.60 GiB free; 396.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

#

now another one

#

hehe

#

...

calm wave
#

but ig i can try again

#

since i managed to literally make it load some models

#

(before it just wouldn't load anything)

#

but does this started happening to other ppl too ? after the update

#

coz it would work normally for me until that

pale meteor
#

mine broke after the update on my 12gb gpu

calm wave
pale meteor
#

I got my P40 bc of it lol

calm wave
#

12gb could run a 13b-4bit-128g easily

#

and the model would only use like 8-9gb vram/13gb

#

i've literally seen a guy running a 30B model

#

on a 16gb gpu

#

or something like that

#

like i can run normal models

#

but 4bit-128g ones can't