#Gpt-oss -20b- f16
24 messages · Page 1 of 1 (latest)
bfloat16 . you should be able to yes
so gpt-oss-20b-bf16
you need to use save_pretrained_merge first to merge the model
somewhat unrelated -- has anyone had success loading gpt-oss 20b in bf16 with vllm? Seems like it's not even possible last time i checked:
https://github.com/vllm-project/vllm/issues/22901#issuecomment-3190186828
@lime elm oh great thank you and then for gguf conversion Which comment should I use?
guide is here: https://docs.unsloth.ai/basics/gpt-oss-how-to-run-and-fine-tune/tutorial-how-to-fine-tune-gpt-oss#save-export-your-model
dont forget to change the model name to bf16
Yes @valid goblet I have read this but I couldn't able to do gguf conversion and can't able to inference or validate the merged model. Can you help me out guys?
@sand ledge you need to compile llama.cpp
@lime elm Is there any updated llama.cpp for gpt oss?
wait gimme a sec
It seems that this is not just the new model, I fientuned phi-3.5-mini and have the same problem, can u fix it pls
saving to gguf is non functional right now. nothign to do with the new model. you have to do it manually
@sand ledge are you on colab or a local machine?
hi, I saved the model manually by
model.save_pretrained_gguf("model", tokenizer,)
but it still error, but it work with vLLM :((
@lime elm in lambda labs (putty)
For which model you are trying to do gguf conversion?
@lime elm Can please explain or share anything needful to do gguf conversion for gpt oss model?
hi, im work with phi-3.5-mini-instruct, and here is my colabhttps://colab.research.google.com/drive/1N3HRdCE3PeIHp0EefTJnLYm7ARraPFIM?usp=sharing
sorry guys give me 15 mins. I'll tell you exactly the steps to take.
so if you're on a local machine and not colab
git clone https://github.com/ggml-org/llama.cpp
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-gguf-split llama-quantize
cd llama.cpp
python3 convert_hf_to_gguf.py path_to_your_model/ --outfile out_model_name.gguf
build/bin/llama-quantize out_model_name.gguf out_model_name-Q8_0.gguf Q8_0
then you can run it with
build/bin/llama-cli --model out_model_name-Q8_0.gguf -p "The meaning to life and the universe is"
if you're on colab @oak inlet i will ping you in a thread now
Thank you so much @lime elm I will try it today and let you know