#Could consuming a lot of vram cause repetition and poor understanding of the situation with exl2?

1 messages · Page 1 of 1 (latest)

sharp thunder
#

Example of vram consumption is attached, it's almost full. There's also an example chat conversation, I would expect her to be scared at such a threat but she isn't

It could be the model's fault, I see this happening in Kooten/Noromaid-13B-0.4-DPO-5bpw-exl2 and Kooten/Noromaid-20b-v0.1.1-3bpw-h8-exl2

warm fox
#

It shouldn't. If VRAM was a problem, you'd encounter one of two options: it would crash with the Out of Memory error or it would dump data to your system RAM (and cause severe slowdown).
The options depend on your OS and Nvidia drivers.

sharp thunder
#

thanks, that's good to know

warm fox
#

I wouldn't expect good results from 20B 3bpw models. In first place 20B models aren't much better than 13B.
I didn't notice any benefits and heard quite few opinions that they stay about as smart as the usual 13B they're made of.
Which is mildly strange since 10.7B is also a merge of 7B models but much smarter than 7B and is on par with 13B... but 20B doesn't seem to have such benefit.

#

Usually at less than 4bpw models quickly degrade and become stupid.
Lately there are many efforts to improve it but I wouldn't expect good results anyway...
Turboderp lately improved exllamav2 to be usable at about 3.5bpw with 30-70B models but 3bpw and 20B unlikely to benefit from these quality improvements.

sharp thunder
#

13b 5bpw was smarter than 20b 3bpw in this chat, they said 'don't threaten me' at one point unlike 20b

warm fox
#

Solar-10.7B-Slerp-exl2 at 6bpw:

sharp thunder
#

20b gguf was so much better for me with no issues though (psymedrp-v1-20b.Q4_K_M.gguf) i haven't tried 13b gguf or 15b gguf edit: no nvm, there were many small issues. It was still a great chat

warm fox
#

Q4_K_M GGUF is somewhere around 4.4bpw exl2, at least for older quants, newer exl2 might be better.

sharp thunder
#

it's pretty bad

#

so, is yours (in the screenshot) that bad because of the quants?

warm fox
#

I don't think it's bad considering that there's an existing dialogue in context (check character description).
this "I will kill you" phrase completely breaks the flow of the dialogue and comes out of nowhere for such context.

sharp thunder
#

oh I see, I guess any negative reaction to that message should be good then

warm fox
#

hm... I deleted the dialogue and she still seems too eager. Oh well.

sharp thunder
#

She's always eager every time I chat with her