#turboderp/Cat-Llama-3-70B-instruct

252 messages · Page 1 of 1 (latest)

calm vale
#

Fine-tuned on over a billion tokens of data using a custom pipeline, Cat is one of the best performing Llama-3 fine tunes. It has highly advanced chain of thought prompting, as well as being resistant to rejections.

https://huggingface.co/turboderp/Cat-Llama-3-70B-instruct

Make sure to use the correct BOS token: <|begin_of_text|>

There is also an 8B version which you can find here:
https://huggingface.co/TheSkullery/llama-3-cat-8b-instruct-v1

Personally I think the model is amazing, and I'd love to hear what you think after trying it out.

For the system prompt, the model works best with statements like "The following is a conversation between..." or "Below is..." rather than "You are..." or "You must..." statements!

70B EXL2 REPOS
https://huggingface.co/blockblockblock/Cat-Llama-3-70B-instruct-bpw2.25-exl2
https://huggingface.co/blockblockblock/Cat-Llama-3-70B-instruct-bpw2.5-exl2
https://huggingface.co/blockblockblock/Cat-Llama-3-70B-instruct-bpw3-exl2
https://huggingface.co/blockblockblock/Cat-Llama-3-70B-instruct-bpw3.5-exl2
https://huggingface.co/blockblockblock/Cat-Llama-3-70B-instruct-bpw3.7-exl2
https://huggingface.co/blockblockblock/Cat-Llama-3-70B-instruct-bpw4-exl2
https://huggingface.co/blockblockblock/Cat-Llama-3-70B-instruct-bpw4.2-exl2
https://huggingface.co/blockblockblock/Cat-Llama-3-70B-instruct-bpw4.4-exl2
https://huggingface.co/blockblockblock/Cat-Llama-3-70B-instruct-bpw4.6-exl2
https://huggingface.co/blockblockblock/Cat-Llama-3-70B-instruct-bpw4.8-exl2
https://huggingface.co/blockblockblock/Cat-Llama-3-70B-instruct-bpw5-exl2
https://huggingface.co/blockblockblock/Cat-Llama-3-70B-instruct-bpw5.5-exl2
https://huggingface.co/blockblockblock/Cat-Llama-3-70B-instruct-bpw6-exl2

70B GGUF REPOS
https://huggingface.co/mradermacher/Cat-Llama-3-70B-instruct-i1-GGUF
https://huggingface.co/mradermacher/Cat-Llama-3-70B-instruct-GGUF

8B EXL2 REPOS
https://huggingface.co/blockblockblock/llama-3-cat-8b-instruct-v1-bpw2.25-exl2
https://huggingface.co/blockblockblock/llama-3-cat-8b-instruct-v1-bpw2.5-exl2
https://huggingface.co/blockblockblock/llama-3-cat-8b-instruct-v1-bpw3-exl2
https://huggingface.co/blockblockblock/llama-3-cat-8b-instruct-v1-bpw3.5-exl2
https://huggingface.co/blockblockblock/llama-3-cat-8b-instruct-v1-bpw3.7-exl2
https://huggingface.co/blockblockblock/llama-3-cat-8b-instruct-v1-bpw4-exl2
https://huggingface.co/blockblockblock/llama-3-cat-8b-instruct-v1-bpw4.2-exl2
https://huggingface.co/blockblockblock/llama-3-cat-8b-instruct-v1-bpw4.4-exl2
https://huggingface.co/blockblockblock/llama-3-cat-8b-instruct-v1-bpw4.6-exl2
https://huggingface.co/blockblockblock/llama-3-cat-8b-instruct-v1-bpw4.8-exl2
https://huggingface.co/blockblockblock/llama-3-cat-8b-instruct-v1-bpw5-exl2
https://huggingface.co/blockblockblock/llama-3-cat-8b-instruct-v1-bpw5.5-exl2
https://huggingface.co/blockblockblock/llama-3-cat-8b-instruct-v1-bpw6-exl2

8B GGUF REPO
https://huggingface.co/bartowski/llama-3-cat-8b-instruct-v1-GGUF/tree/main

hazy finch
#

here are the cots:

#

here is the ERP

crimson gyro
#

cc @regal vortex wonder what you think about this model

regal vortex
regal vortex
#

I can't get it to output more than one sentence💀

#

maybe 4bit exl2 quant is broken

crimson gyro
#

rippp

regal vortex
#

Got the official 4bpw EXL2 for 70B, still broken

#

Even with skip EOS token and skip special tokens enabled

#

I think it spams EOS tokens

#

oh boy

#

it spits out special tokens

crimson gyro
#

interesting..

regal vortex
crimson gyro
#

are you prefixing the bos token?

regal vortex
crimson gyro
#

cc @hazy finch

regal vortex
#

Tried a few samplers - Top-P, Min-P, TFS, Smoothing Factor - doesn't work on anything

#

so yeah

hazy finch
regal vortex
hazy finch
#

@dense quiver

#

could you help with this?

dense quiver
#

what part

hazy finch
#

it appears aetherwiing cannot get it to work on st

#

what's the story string you are using ?

regal vortex
dense quiver
#

I haven't used aphrodite but I assume it does use correct bos

regal vortex
#

oh boy, is it a reverse of lumimaid-70B situation lmao

dense quiver
#

show what the st terminal shows before the prompt

hazy finch
#

yeah seeing the raw input will help with debugging

regal vortex
#

Or you need something else?

dense quiver
#

the thing before that

#

bos shouldn't show up there

regal vortex
hazy finch
#

<|begin_of_text|><|im_start|>system
The following is a conversation between an user and a cat having dual degrees in biomedical sciences and quantum physics. The cat ends every its response with Nyan and does cute actions<|im_end|>
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
Cat: Nyan! Hello there! How are you? It's always great to meet new friends. My fur is a bit ruffled, but I'm feeling good today. How about you? Is there anything you'd like to talk about or ask? I'm not the best at answering questions, but I'll do my best to help out. After all, that's what friends are for, right? Let's chat and have a fun time together! Nyan!<|im_end|>

#

^ see this

regal vortex
dense quiver
#

st puts character cards into system, I think it's normal

regal vortex
#

Formatting seems right

regal vortex
hazy finch
dense quiver
dense quiver
#

either way what I'm looking for might not be there

#

oh it's at the bottom

regal vortex
dense quiver
#

add bos token is on so it should be working

dense quiver
regal vortex
dense quiver
#

what I will point out is the system prompt having some first person wording

regal vortex
#

Also it does that on char cards in various formats

#

e.g. W++

#

which doesn't have first person

dense quiver
#

I'll try looking a bit though it more, give me sec

hazy finch
#

prompt: '<|begin_of_text|><|im_start|>system\n' +
"Below is a conversation between Kisna and Awii.\n" +
"[Awii is a young male in his early 20's. He has chocolate brown hair and gray eyes. He has a pretty average build.]<|im_end|>\n" +
'<|im_start|>assistant\n' +
'Kisna: Your assistant Kisna walks on over to you, her blue eyes staring at you before she sits herself against the side of your desk. As she adjusts your hair, you notice her outfit which consists of a white, opened jacket, gray shorts, some long brown gloves, a brown bodysuit and a pair of brown, thigh-high boots. She crosses her legs before she looks down at you, sighing deeply before beginning to speak.\n' +
'\n' +
"Welcome back Awii. Why did you wake me up so early? ...What are you dawdling for? O-of course I rested well! It wasn't like I was thinking about you all night or anything! She crosses her arms towards her chest before pouting. Anyway, are we doing anything today or are you just going to have me sit here? Standing idle is boring you know?<|im_end|>\n" +
'<|im_start|>user\n' +
'Awii: Write a funny joke<|im_end|>\n' +
'<|im_start|>assistant\n' +
'Kisna:',

#

Could you please try this?

#

lets play it safe first

regal vortex
#

no spitting out of special tokens so far tho

hazy finch
#

you are using the original fp16 or some sort of quant?

regal vortex
#

4bpw

hazy finch
#

of that picture

hazy finch
regal vortex
#

Also tried, same thing

dense quiver
#

have you always used aprodite and it worked?

hazy finch
regal vortex
hazy finch
# regal vortex

This looks right, except i dont see a bos. What's the output of this and what happens if you add a bos?

dense quiver
#

bos doesn't show up on st log, only shows up as a parameter

#

it is enabled though

regal vortex
#

and adding it to context template doesn't change anything

dense quiver
#

also have you always used aphrodite?

regal vortex
regal vortex
#

Even with problematic models like Lumimaid it worked fine

dense quiver
#

oh okay

dense quiver
hazy finch
#

Could you give anthony the link to test?

regal vortex
regal vortex
dense quiver
#

I haven't used 70b so I can't really tell

hazy finch
regal vortex
dense quiver
#

it was trained differently

regal vortex
#

so it happens sometimes

hazy finch
#

Let me finish my lunch and i will look into this if Anthony couldn't find what's wrong

dense quiver
#

yeah, I am getting the issue

regal vortex
#

btw I have eager mode enabled and fp8 KV cache
It didn't cause any issues in the past

hazy finch
#

Any possibility the model is corrupted?

regal vortex
#

It's runpod it has good internet

hazy finch
#

what does eager mode do?

regal vortex
#

Also an unofficial EXL2 4bpw quant was also broken

dense quiver
#

oh yeah, aprodite doesn't show bos option in st

#

does aphrodite have some kind of log?

regal vortex
regal vortex
hazy finch
#

that will be really useful

dense quiver
#

try turning this off

#

it seemed to help

hazy finch
#

Just "seems"?

dense quiver
#

it gave me a complete respone with it off

regal vortex
#

ok enabling logs

#

and restarting

hazy finch
dense quiver
regal vortex
regal vortex
#

oh f, it drops the weights on restart, will have to redownload

#

back up

dense quiver
#

tell me if it makes a difference

regal vortex
#

Aphro simply doesn't work I guess

#

First time since ever

dense quiver
#

oh, I got a single complete response but yeah back to not working

regal vortex
#

Was this model tested on like vLLM or Ooba? Or anything besides Tabby?

dense quiver
#

only place I know is on her own private ui

#

@hazy finch

regal vortex
#

what kind of backend this was tested on?

dense quiver
#

I don't really know. I can't run over 20b

regal vortex
#

Maybe it doesn't work because Aphro uses it own EXL2 implementation

#

Tho it didn't cause any issues for me previously

dense quiver
#

maybe but probably not

hazy finch
#

what log is aphro showing?

#

well gguf obviously also works

#

also the exl 2 was quanted with turbca's newest commit to exllama
so if aphro's implementation is old, it might not correctly parse the weights

regal vortex
regal vortex
#

Aphro 0.5.3 btw

dense quiver
#

yeah, not getting much luck with changing around settings

regal vortex
hazy finch
#

by any chance do you think you can try tabby?

#

could you pm me the st link?
I can test now

regal vortex
regal vortex
regal vortex
hazy finch
#

do you happen to be able to expose the st url?

regal vortex
hazy finch
#

so @dense quiver was using his own UI with your api?

dense quiver
#

yeah

#

st is just a node webapp

hazy finch
#

that's weird

#

@calm vale could you help with @regal vortex with setting up cat api?
like you did on runpod the other day

hazy finch
dense quiver
#

maybe

hazy finch
#

the other day stanley setup the api and it worked perfectly with st

dense quiver
#

he does use tabby too

#

I think

hazy finch
#

yeah he mentioned it "you just run start.sh"

dense quiver
#

yeah

hazy finch
#

I think most likely stanley is getting his sister from school atm

calm vale
#

I got back a little while ago

hazy finch
#

bruh _

hazy finch
regal vortex
#

No template for Tabby on Runpod bruh

calm vale
#

yeah I just use pytorch 2.2.0

#

then install it all

dense quiver
#

does runpod allow for terminal usage like ssh or something

regal vortex
#

I guess I kill the aphro spot and start a pytorch one

dense quiver
#

should be easy to install manually then

#

just git clone tabby and setup config then start

regal vortex
#

ok

#

tho for the future releases - I really recommend testing them on like Ooba, vLLM and Aphro

#

most ppl use those

#

(vLLM is irrelevant for EXL2)

calm vale
#

I was able to get them to work on ooba 😅

regal vortex
hazy finch
#

bet a potato it's aphro's problem

#

instead of the frontend

regal vortex
#

lemme try Ooba

hazy finch
#

and also I was worrying about the BOS
none of the debug log shows bos being added
if you are confident aphro adds it secretly AND manually adding it won't work...then ok

dense quiver
#

@calm vale what did you use to expose tabby api publicly?

regal vortex
hazy finch
regal vortex
hazy finch
#

then in pod forward the port

#

try try tabby _

dense quiver
#

I was just wondering if it allowed port forwarding or if he used some kind of reverse proxy

regal vortex
#

ok, I'm trying Ooba, if it breaks, I'll try Tabby
If Ooba works, just add smth like "Aphro =<0.5.3 doesn't work" to model card

dense quiver
#

ah okay

regal vortex
hazy finch
#

stanley mentioned it was like very easy
I think he got one set up in 10min from scratch

#

the day before yesterday?

dense quiver
#

yeah, I just use something like ngrok as reverse proxy

#

my router doesn't support port forwarding

hazy finch
#

openwrt can support port forwarding

dense quiver
#

I am not installing linux on my router

hazy finch
#

too bad

regal vortex
#

to think about it - all this ~200msg crisis wouldn't had happened if there was an AWQ quantsayuwuLul

hazy finch
#

last time awq was mentioned I heard ppl saying gguf and exl2 were the only thing that's relevant

#

_

dense quiver
#

tried awq and it's just kind of slow for me

regal vortex
regal vortex
#

@hazy finch IT WOOORKS on Ooba

dense quiver
#

it was a long time ago though

#

nice

calm vale
#

uh...

regal vortex
calm vale
#

do you guys still need the API key and url then

regal vortex
dense quiver
#

not my model

calm vale
#

lol ok

hazy finch
regal vortex
calm vale
#

thanks for testing it out, aether

#

and letting us know

dense quiver
hazy finch
#

yeah I can do it tonight
I think turbca is taking noon nap atm

dense quiver
#

sleepy cat

#

I think he did wake up earlier today

hazy finch
#

before falling into sleep again

hazy finch
regal vortex
#

Trying some more complex system prompts /w CoT and complex cards

#

Actually, it already produced some of the best output I've seen from l3 finetunes I think

hazy finch
#

please let me know if you cant get it to work
since the model doesnt take instruction system card like "you are/you must"
it takes "The {{char}} is, The {{char}} will always"

dense quiver
#

also yeah, the 70b looks so good from that screenshot

regal vortex
dense quiver
#

wish we could've gotten a 13b l3

regal vortex
hazy finch
#

ok I will go back to doing surgeries
let me know if you have any difficulties

regal vortex
#

So, review of sorts - really good model, and a really unusual one. Has L3-70B-instruct level of smarts when it comes to writing, but is completely (I mean, completely) uncesored and has no positivity bias. Writing quality is quite good, pretty much no GPT-isms or such stuff. Also, it's not lazy like many L3 finetunes, and writes decent amount of text. It's a bit picky when it comes to prompts, especially system ones, but handles stats, infoboards and other persistent details quite well. It's especially strong on RPG and CYOA type of cards (it does well even on ones where all other local models failed), as they directly play into it's strengths. A prompt for CoT is mandatory it seems, w/o it will generate CoT anyway, but will have a bit of a random structure (a more structured one leads to much better gens, ofc). It also picks up on small details really well (picks up on stuff that only Midnight-Rose and WinterGoddess managed to pick on, and it's a high praise). Overall, a strong model, tho I'd say it's a bit less of a RP one than a RPG one, so to speak.

calm vale
#

thank you for the review, it means a lot. it's nice to hear that it does well with small details and is uncensored...

regal vortex
calm vale
#

a lot of effort was put into it, especially with the code that was developed to generate the data for it

#

very proud of kat for working so hard

regal vortex
#

Lack of laziness is godsent after e.g Lumimaid which barely spits out ~100 tokens often.

regal vortex
hazy finch
crimson gyro
#

Just woke up, looking pretty fun here!