#Can't convert finetuned llm to gguf file

14 messages · Page 1 of 1 (latest)

fiery anchor
#

Had finetuned mistral with unsloth, and these are the files in the huggingface repo...im using llama.cpp to convert to gguf file and when running the convert.py file, getting this error:

  File "E:\SuperServer.AI\docker-llm\llama.cpp\convert.py", line 1567, in <module>
    main()
  File "E:\SuperServer.AI\docker-llm\llama.cpp\convert.py", line 1499, in main
    model_plus = load_some_model(args.model)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\SuperServer.AI\docker-llm\llama.cpp\convert.py", line 1369, in load_some_model
    raise FileNotFoundError(f"Can't find model in directory {path}")
FileNotFoundError: Can't find model in directory E:\SuperServer.AI\docker-llm\docker-llm-v1-hf

it seems my repo doesn't have valid files, what to do

blazing fiber
#

it looks like the model is missing the config.json, copy this from your base model onto your fine-tuned model

supple furnace
#

@fiery anchor oh u need to use model.push_to_hub_merged to merge to 16bit

fiery anchor
supple furnace
#

ur using QLoRA itself

#

hence the error

#

LoRA adapters

fiery anchor
#

but now when im running the trainer_stats=trainer.train() cell, this error is coming:
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs: query : shape=(2, 173, 8, 4, 128) (torch.float16) key : shape=(2, 173, 8, 4, 128) (torch.float16) value : shape=(2, 173, 8, 4, 128) (torch.float16) attn_bias : <class 'xformers.ops.fmha.attn_bias.LowerTriangularMask'> p : 0.0 `[email protected]` is not supported because: xFormers wasn't build with CUDA support requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old) operator wasn't built - see `python -m xformers.info` for more info `cutlassF` is not supported because: xFormers wasn't build with CUDA support operator wasn't built - see `python -m xformers.info` for more info `smallkF` is not supported because: max(query.shape[-1] != value.shape[-1]) > 32 xFormers wasn't build with CUDA support dtype=torch.float16 (supported: {torch.float32}) attn_bias type is <class 'xformers.ops.fmha.attn_bias.LowerTriangularMask'> operator wasn't built - see `python -m xformers.info` for more info operator does not support BMGHK format unsupported embed per head: 128

#

using T4 gpu, is it not compatible with xFormers?

blazing fiber
#

it works for me try xformers<0.0.26

supple furnace
#

ye u have u use the exact new instructions

#
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.26" trl peft accelerate bitsandbytes