dolphin-2.9-llama3-8b | OpenRouter | Page 1

tranquil stratus Apr 21, 2024, 1:53 AM

#

Finally! Uncensored LLAMA-3 is here. https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b

cognitivecomputations/dolphin-2.9-llama3-8b · Hugging Face

quasi sun Apr 21, 2024, 1:55 AM

#

Oh boy

tranquil stratus Apr 21, 2024, 1:57 AM

#

xD what's up?

molten stratus Apr 21, 2024, 2:10 AM

#

Oh, here's my choice for self-hosted model for the day lol

west steeple Apr 21, 2024, 2:31 AM

#

Now to wait for someone to quant it

tame lark Apr 21, 2024, 2:39 AM

#

niceee

tranquil stratus Apr 21, 2024, 2:40 AM

#

Also waiting on a quant, hoping OR just picks it up before lol

#

My hope is that the model doesn't lose too much intelligence but can be rid of its positivity bias

molten stratus Apr 21, 2024, 2:49 AM

#

west steeple Now to wait for someone to quant it

just use fp16 weights and bitsandbytes via --load-in-4bit on Aphrodite or Ooba

#

or --load-in-smooth on Aphrodite for 8bit quant

west steeple Apr 21, 2024, 3:04 AM

#

I've only used GGUF before. Would my GTX 1080 be able to handle that?

molten stratus Apr 21, 2024, 3:09 AM

#

west steeple I've only used GGUF before. Would my GTX 1080 be able to handle that?

Hm, will probs take some time, but it theoretically should work for 4bit
Btw, here's some GGUF and AWQ quants I found, use at your own discretion:
https://huggingface.co/3thn/dolphin-2.9-llama3-8b-GGUF
https://huggingface.co/solidrust/dolphin-2.9-llama3-8b-AWQ

3thn/dolphin-2.9-llama3-8b-GGUF · Hugging Face

solidrust/dolphin-2.9-llama3-8b-AWQ · Hugging Face

west steeple Apr 21, 2024, 3:09 AM

#

oh, epic

molten stratus Apr 21, 2024, 3:10 AM

#

Took like a minute to quant to 4bit on 3090 btw

#

wow ouch
Nice broken EOS tokens there

#

I shouldn't have tried this on Ooba

molten stratus Apr 21, 2024, 3:13 AM

#

west steeple oh, epic

Btw, use Aphrodite, Ooba is still borked with L3

#

KoboldCPP is borked too btw

#

Yeah, Aphro + AWQ quant works flawlessly

molten stratus Apr 21, 2024, 3:44 AM

#

tranquil stratus My hope is that the model doesn't lose too much intelligence but can be rid of i...

I think it managed to do that - def less positivity biased than official instruct, and doesn't seem singficantly dumber from a first glance ( taking 4bit AWQ quant's effect into account obv). Even retained LLaMA-ish response style, which is a plus for me.

molten garden Apr 21, 2024, 8:09 AM

#

Early testing shows degraded performance compared to the original. Maybe some further fine-tuning is in order for this fine-tune... 🙂

molten stratus Apr 21, 2024, 10:24 AM

#

molten garden Early testing shows degraded performance compared to the original. Maybe some fu...

Well, it was trained on open datasets, so no Meta's secret sauce in this. I've noticed degraded performance on multilang but this is to be expected, as datasets that this was tuned on are english-only (afaik).
Btw, do you mind sharing in which tasks this model lags behind official l3-8b-intstruct? I'm kInda curious (assuming you were running it in fp16 or int8)

pale lark Apr 21, 2024, 10:46 AM

#

Oh wait it's already up?

#

@dire zephyr you mean this one?

molten stratus Apr 21, 2024, 10:47 AM

#

pale lark Oh wait it's already up?

it's been up for like 10 hours already lol

pale lark Apr 21, 2024, 10:47 AM

#

oh nvm you meant the 70b variant

dire zephyr Apr 21, 2024, 10:47 AM

#

I mean the 8 billion variant is awesome, the 70b is what I really am looking forward to though

#

definitely use cases for both

molten stratus Apr 21, 2024, 10:48 AM

#

pale lark oh nvm you meant the 70b variant

btw, maybe potentially important - vLLM is one of few backends that work with l3 without issues

pale lark Apr 21, 2024, 10:48 AM

#

molten stratus btw, maybe potentially important - vLLM is one of few backends that work with l3...

interesitng, llama.cpp gave up?

molten stratus Apr 21, 2024, 10:48 AM

#

molten stratus wow ouch Nice broken EOS tokens there

lcpp on Ooba broke

molten stratus Apr 21, 2024, 10:49 AM

#

pale lark interesitng, llama.cpp gave up?

Spammed broken EOS tokens

pale lark Apr 21, 2024, 10:49 AM

#

rip

molten stratus Apr 21, 2024, 10:49 AM

#

Koboldcpp broke too obv

pale lark Apr 21, 2024, 10:49 AM

#

oh if it's just EOS token

#

maybe it's just the template?

molten stratus Apr 21, 2024, 10:49 AM

#

pale lark oh if it's just EOS token

No, it fixed itself when I swapped to Aphrodite (and vLLM after)
Template was the same - ChatML

pale lark Apr 21, 2024, 10:49 AM

#

llama3 template is pretty nuts, we had to fork the one on llama3 repo

#

hmm I don't think chatml works

molten stratus Apr 21, 2024, 10:50 AM

#

pale lark hmm I don't think chatml works

ChatML is what the repo says to use

#

This model was trained FFT on all parameters, using ChatML prompt template format.

example:

<|im_start|>system
You are Dolphin, a helpful AI assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

#

I also never seen 5-12 broken EOS tokens spammed in the row consistently on every try before on anything
I've noticed that this EOS problem with L3 is well known already

inland canyon Apr 21, 2024, 11:23 AM

#

tranquil stratus Finally! Uncensored LLAMA-3 is here. https://huggingface.co/cognitivecomputation...

Why did he cut the sequence length to 4K?

molten stratus Apr 21, 2024, 11:29 AM

#

I assume to get this model out faster.

#

also Meta's rules on finetune names are wack

molten stratus Apr 21, 2024, 2:46 PM

#

Oh, I can see what's wrong with this one
It has notably worse attention span, and is actually worse at following instructions because of that
I may have kinda ignored this in on first try, probably shrugging it off as 4bit AWQ effect. It is not, same problems in fp16 precision.

#

Yeah, existing (mostly) synth datasets are not going to cut it against Meta's 10M human annotated examples.

tranquil stratus Apr 21, 2024, 4:45 PM

#

molten stratus Oh, I can see what's wrong with this one It has notably worse attention span, an...

Is it uncensored at least lol

#

I find the llama 3 model unusable because of that. It kinda takes positivity bias into a whole new level.

molten stratus Apr 21, 2024, 5:13 PM

#

tranquil stratus Is it uncensored at least lol

It gave some "As an AI..." refusals to me (less refusals than official L3, but still)
Straight up GPT-esque refusals, not even L3's "Why should I do that?" stuff
I'd rather go with official L3 for now

molten stratus Apr 21, 2024, 5:22 PM

#

tranquil stratus Is it uncensored at least lol

Btw here's L3-8B tuned on toxicQA and toxic-dpo lol
Use at your own discretion and stuff
GGUFs and EXL2s are on Undi's model list
https://huggingface.co/Undi95/Llama-3-Unholy-8B?not-for-all-audiences=true

Undi95/Llama-3-Unholy-8B · Hugging Face

pale lark Apr 22, 2024, 9:06 AM

#

Just woke up, gonna look at this now

#

Wow this model is heavily undertrained

molten stratus Apr 22, 2024, 9:33 AM

#

pale lark Wow this model is heavily undertrained

Yeah, this Dolphin is kinda just a straight up downgrade from L3
Doesn't even have it's usual upside of being fully uncensored, 'cause it's so undertrained that decensoring tuning hadn't even fully applied and it still gives refusals.
We really should wait for 70B one

pale lark Apr 22, 2024, 9:33 AM

#

rip

#

yeah I think we will skip this one TBH

#

it's just bad

tranquil stratus Apr 22, 2024, 2:37 PM

#

molten stratus Yeah, this Dolphin is kinda just a straight up downgrade from L3 Doesn't even ha...

It doesn't give refusals at all to me, just adjust the system prompt to say "X is allowed"

It is undertrained though so yeah.

#

I think Eric should have initialised it from LLAMA instruct to overwrite its safety instead while benefitting from its existing training

molten stratus Apr 22, 2024, 2:43 PM

#

tranquil stratus I think Eric should have initialised it from LLAMA instruct to overwrite its saf...

Well, Unholy one does exactly that afaik

tranquil stratus Apr 22, 2024, 2:44 PM

#

molten stratus Well, Unholy one does exactly that afaik

Yes I've been looking around for Q3 quants too lazy to quant myself, might do at some point

#dolphin-2.9-llama3-8b