Finally! Uncensored LLAMA-3 is here. https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b
#dolphin-2.9-llama3-8b
61 messages · Page 1 of 1 (latest)
Oh boy
xD what's up?
Oh, here's my choice for self-hosted model for the day lol
Now to wait for someone to quant it
niceee
Also waiting on a quant, hoping OR just picks it up before lol
My hope is that the model doesn't lose too much intelligence but can be rid of its positivity bias
just use fp16 weights and bitsandbytes via --load-in-4bit on Aphrodite or Ooba
or --load-in-smooth on Aphrodite for 8bit quant
I've only used GGUF before. Would my GTX 1080 be able to handle that?
Hm, will probs take some time, but it theoretically should work for 4bit
Btw, here's some GGUF and AWQ quants I found, use at your own discretion:
https://huggingface.co/3thn/dolphin-2.9-llama3-8b-GGUF
https://huggingface.co/solidrust/dolphin-2.9-llama3-8b-AWQ
oh, epic
Took like a minute to quant to 4bit on 3090 btw
wow ouch
Nice broken EOS tokens there
I shouldn't have tried this on Ooba
Btw, use Aphrodite, Ooba is still borked with L3
KoboldCPP is borked too btw
Yeah, Aphro + AWQ quant works flawlessly
I think it managed to do that - def less positivity biased than official instruct, and doesn't seem singficantly dumber from a first glance ( taking 4bit AWQ quant's effect into account obv). Even retained LLaMA-ish response style, which is a plus for me.
Early testing shows degraded performance compared to the original. Maybe some further fine-tuning is in order for this fine-tune... 🙂
Well, it was trained on open datasets, so no Meta's secret sauce in this. I've noticed degraded performance on multilang but this is to be expected, as datasets that this was tuned on are english-only (afaik).
Btw, do you mind sharing in which tasks this model lags behind official l3-8b-intstruct? I'm kInda curious (assuming you were running it in fp16 or int8)
it's been up for like 10 hours already lol
oh nvm you meant the 70b variant
I mean the 8 billion variant is awesome, the 70b is what I really am looking forward to though
definitely use cases for both
btw, maybe potentially important - vLLM is one of few backends that work with l3 without issues
interesitng, llama.cpp gave up?
lcpp on Ooba broke
Spammed broken EOS tokens
rip
Koboldcpp broke too obv
No, it fixed itself when I swapped to Aphrodite (and vLLM after)
Template was the same - ChatML
llama3 template is pretty nuts, we had to fork the one on llama3 repo
hmm I don't think chatml works
ChatML is what the repo says to use
This model was trained FFT on all parameters, using ChatML prompt template format.
example:
<|im_start|>system
You are Dolphin, a helpful AI assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
I also never seen 5-12 broken EOS tokens spammed in the row consistently on every try before on anything
I've noticed that this EOS problem with L3 is well known already
Why did he cut the sequence length to 4K?
I assume to get this model out faster.
also Meta's rules on finetune names are wack
Oh, I can see what's wrong with this one
It has notably worse attention span, and is actually worse at following instructions because of that
I may have kinda ignored this in on first try, probably shrugging it off as 4bit AWQ effect. It is not, same problems in fp16 precision.
Yeah, existing (mostly) synth datasets are not going to cut it against Meta's 10M human annotated examples.
Is it uncensored at least lol
I find the llama 3 model unusable because of that. It kinda takes positivity bias into a whole new level.
It gave some "As an AI..." refusals to me (less refusals than official L3, but still)
Straight up GPT-esque refusals, not even L3's "Why should I do that?" stuff
I'd rather go with official L3 for now
Btw here's L3-8B tuned on toxicQA and toxic-dpo lol
Use at your own discretion and stuff
GGUFs and EXL2s are on Undi's model list
https://huggingface.co/Undi95/Llama-3-Unholy-8B?not-for-all-audiences=true
Yeah, this Dolphin is kinda just a straight up downgrade from L3
Doesn't even have it's usual upside of being fully uncensored, 'cause it's so undertrained that decensoring tuning hadn't even fully applied and it still gives refusals.
We really should wait for 70B one
It doesn't give refusals at all to me, just adjust the system prompt to say "X is allowed"
It is undertrained though so yeah.
I think Eric should have initialised it from LLAMA instruct to overwrite its safety instead while benefitting from its existing training
Well, Unholy one does exactly that afaik
Yes I've been looking around for Q3 quants too lazy to quant myself, might do at some point