#🧙Wizard 8x22B Abliterated💥

151 messages · Page 1 of 1 (latest)

shell ingot
#

Uncensored Wizard is coming... Feedback Wanted!

Preface

WizardLM-2 8x22B is the #2 most popular RP model.

Problem

An important thing to realise about this model is that it is so smart that it can do what you ask for most of the time, yet will silently refuse it.

After extensive internal testing with @lone sundial we have unconvered an increadibly extensive and covert censorship and ideological bias in the model.

This bias extends into roleplays where we found the model to have characters and narrators covertly behave in ways which to align and promote their ideology, impose their moral code and steer the conversation away from subject. This ranges from redirecting the conversation, ignoring queries and char cards, forcing "flowery/purple" prose, stepping out of character etc.

Furthermore we have found that Wizard is extremely tone deaf when it comes to "negative" emotions ("positivity bias/too agreeable") And will respond absurdly or completely ignore you.

## Our Solution

Since the last month, me and @lone sundial has been working on a backend that is using agents to make the model more helpful.

It is extremely effective at reducing censorship, outputting different styles (more human-like, more emotional, less "purple prose") and emulating characters.

We are preparing to deploy it on OpenRouter and starting a beta testing phase. DM me to sign up (it's free).

## Next Steps

We want your feedback! Please DM me about:

  • Types of things Wizard can't do, that you'd like it to be able to
  • Prompts/Char cards/System messages that Wizard refuses (soft refusal, "redirection of query" or hard refusal)
    - Whether or not you'd like to be invited to our beta testing phase.

You can be as vague or as specific (which is better) as you want.

## TLDR
Wizard is covertly and extensively censored even beyond the blatant refusals. DM me your refusals and whether you'd like to sign up to beta test an uncensored version of it.

#

Note: We tested pure abliteration, our methodology works way better than abliteration in terms of reducing censorship (as most of it is silent censorship rather than overt), reducing purple prose and other problems. Abliteration often results in the model behaving quite weirdly, being verbose, unhelpful etc.

shell ingot
#

It's a way of patching model weights so that the resulting weights don't refuse questions. But we found that using this method alone results in the model covertly refusing or dodging prompts. So thats why we are introducing our own endpoint soon.

#

Yeah thats what we are trying to solve, and have largely done so already. Did you have another question

#

read the post lol

#

basically we built like an agentic framework and stuff. It will be completely seamless, all you have to do is switch models and it will be increadible. DM me if you want to join the beta.

fathom marten
#

Sounds very interesting. Is there an estimated launch date on openrouter?

gilded loom
#

Now that sounds really promising, It would be great if it could follow system prompts good too. Because in my opinion it's pretty 'unsatisfied' at that.

My only purpose in finding and testing new models now is when I order: "punch me in the face" and it say "how strong?"

Not an explanation of how much it might hurt

shell ingot
gilded loom
celest saddle
#

Will this be full context length? And how fast will prompt processing and generation speed be?

shell ingot
shell ingot
shell ingot
#

It makes it look like a sardine in comparison

gilded loom
lone sundial
shell ingot
gilded loom
brisk harness
shell ingot
gilded loom
# shell ingot what do you dislike about it?

I feel like it doesn't follow system promts as well as it is praised, and somehow it's still quite struggle to go negative in the way it leads the story. (and repetition problem yeah)

about Qwen 2 it's lazy as f yeah, almost (i mean it) response with short reply (swipes alot, test on many cards, the same problem i have with older Qwen.)

brisk harness
# shell ingot What do you like about Qwen? Have you tried wizard?

I feel like the dialogue is more authentic and human sounding with Qwen compared to Wizard. I've used Wizard before and it's great at figuring out the situation, but it has this weird way of writing that I don't like. It feels very GPT-ish to me, reusing the same expressions, and it also has this habit of not denying NSFW but still shying away from using appropriate language given the context, like a subtle kind of censorship. Or maybe it's just not trained on thoses types of things and that's why? I don't know.

meager ravine
#

seems nice

fathom marten
#

Is the beta testing through openrouter? I mostly use chub venus which doesn't have the greatest support for different APIs. If so, please send an invite, thanks.

shell ingot
burnt egret
#

got an ETA for the openrouter deployment? highly interested in this

shell ingot
#

DM me if you want early access

feral agate
#

Down to test as well, will dm

drowsy minnow
shell ingot
drowsy minnow
#

I would be willing to test this out as well if it's not terribly difficult to participate in. Wizard with no positivity bias sounds like a dream model on all counts

shell ingot
shell ingot
drowsy minnow
#

Done!

fresh glacier
#

would love to test this with Novelcrafter as well - will this become a public model on OR down the line then?

frigid kraken
golden storm
#

Hi all, I would like to share my experience with the Wizard 8x22B in RP, perhaps you will find it useful. In general I like to experiment and have tried many models and instructions / jailbreaks for them. At the moment with the instructions I'm using, the Wizard 8x22B seems to be one of the best models and here's why:

  • Easy to bypass the first level of censorship
  • Very smart and has a fair amount of knowledge in various areas (which is definitely good for the quality of the game experience)
  • Excellent at following instructions ( in my observations, even better than Claude 3 Sonnet)
  • Cheap
  • Sufficient size of the context window
    What prevents this model from being the best:
  • The model does not behave sufficiently “human-like” and emotional (Especially noticeable when compared to the Claude 3 family of models. For myself, I'm currently using a scheme: Sonnet generates the first ~8 responses in the RP and then Wizard plugs in. In this case Wizard behaves a bit better)
  • Second level of soft censorship. The model does not refuse to generate answers, but it tries to avoid violence, cruelty, is afraid to deny the player's wishes, or show obvious aggression towards the player.
  • The model does not tend to invent new situations or move characters to new locations (only cured by manually prescribing locations in the character card. It also saves that Wizard is smart enough to understand your hints. You can say that you heard something, for example, and the model easily picks up your idea).
  • The model is often too lazy to describe in detail what's going on in the scene (I'm not even touching NSFW here).
  • Sometimes the model starts to behave too synthetically and unemotional.
#

I'd also love to participate in testing.

dusty vault
#

I use wizard through SillyTavern and the default instruct mode. Rarely had any issues, will have to test further when I get home.

swift bramble
swift bramble
#

Might want to link your exp there too for visibility I think :-?

shell ingot
golden storm
golden storm
shell ingot
shell ingot
#

We will begin beta testing tomorrow. Dm me for an invite ;)

north pasture
#

Lol on my bday we r beta testing thats dope. I am seriously excited for this. Wizard is my fav model except for the purple prose and positivity bias. I work around them with a 90% success rate but it requires 2k tokens worth of an extended vocabulary lorebook (hand manually writing out every purple prose and giving it the smutty words to use instead…) and it doesnt like to use all the options given. Itll still do 10% purple prose, not a bad result for just using lorebooks and instructs. Positivity bias i havent scewed enough BUT an instruct + a scenario lorebook depicting the world as fucked, did help a ton. But it still isnt as uncensored as id like and its alot of tokens for this.

So this one yall are working on is like a wet dream to me, and beta testing it on my bday? Glorious!

fleet imp
scarlet plaza
#

Post some feedback on the beta if you guys can, I'm very interested to hear what you think 😉

north pasture
# fleet imp Happy birthday bro. Can't wait to taste this model. Yall guys keep cooking🔥🔥

Thanks. Im not one of the chefs for this model (lol) i just test ALOT with best usage of standard wizard, at the frontend i love. I host my own everything ai server called the AI bunker and make (unorganized cuz of my adhd) guides. Got about 100 ppl there. Lots of channels and info, friendly non-toxic nsfw community. I test ai related stuff (last few months full focus on standard 8x22b wiz at risu frontend) at least 5 hrs daily, some days 14. I try to share all my knowledge.

But, idk technicals. Idk how to host a model. Aether has been a huge help in teaching me api setting affects and all. Still much to learn, slowly.

So not one of the chef’s but definately an “in the know” type.

Some feedback ive seen about this wiz’s first beta run is VERY promising and good. I do believe eventually we will be seeing this on OR.

wooden heart
#

So basically Wizard is more capable than it was letting on but was silently rebelling against some instruction? Is that the gist of it?

lone sundial
#

yes

shell ingot
#

I mean mistral was fucking something but wizard is big brother on steroids

wooden heart
#

Excited to see how you develop it to compare to the Original, I like Wizard's logic but found it dry, now I realize that may have been forced.

shell ingot
#

Our second beta is beginning right now, DM me for an invite.

north pasture
#

@shell ingot should i copy paste my feedback about my testing here…?

north pasture
#

That bad eh? XD

#

“Please, god, no. My eyes bled enough the first time.”

#

🤣

shell ingot
#

we talked about this

true bobcat
north pasture
true bobcat
north pasture
#

100+ friendly non-toxic nsfw degens looking at everything ai (most of us use OR). Frontends, services, models. Very fun group and some of us are pretty direhard with our service of choice. I try to do guides (tho my adhd makes them abit disorganized lol) and help alot with card creations, troubleshooting solutions to unideal behaviors. Love the community 🙂

gilded loom
#

Anything new?

misty briar
#

Waiting

north pasture
#

Cant waaaait lol

shell ingot
#

Hey guys, we are having a our next beta soon, DM me for an invite

mighty sparrow
#

Strongly interested and following you, for now I am not signing up for the beta only because I am leaving for work. Good work!

fleet imp
#

Are ya winning sons?

misty briar
#

Knock knock

shell ingot
#

😛

shell ingot
dire holly
#

Looking forward to this, I think Wizard is great already.

misty briar
#

👀

cloud turret
#

Hey, can I get an invite link too? Thanks.

shell ingot
runic flower
shell ingot
#

dm me

cloud turret
#

Thank you so much!

ornate swift
#

Invites still open?

sand python
#

Can I get an invite link too? 'preciate it.

bitter coral
#

Are invites still open? Would love to come on board

signal otter
#

Hey can I get an invite

frosty birch
#

i wanna too 🥂

manic iris
#

Any ETA on the public OR release of this model? Seems like a rather promising option from the sound of things!

sharp oracle
#

Can i get invite link too?

jade oyster
#

Is this... initiative still going on?

shell ingot
lone sundial
#

let's be real

shell ingot
#

We aren't working on Wizard explicitly anymore, we found that finetuning L3 makes it more creative, unfortunately Wizard is almost completely destroyed and has become an uncreative model.

#

It's really tone deaf, it needs stuff to be written too formally, it refuses like half the stuff our models have no trouble

#

Wizard isn't exactly going to be creating anything like this soon

meager ravine
#

I.e. arena learning, new Evol Instruct, etc.

shell ingot
#

Wizard trainset is so far off from anything that is interesting, that its just unworkable

meager ravine
#

Is it possible to do Arena Learning but with MythoMax, etc?

lone sundial
#

maybe, but it will take a lot of time to implement, Wizard team provided no code for Arena Learning/AutoEvolInstruct

mighty sparrow
#

I don't understand, are you recommending an 8B model with 8K context instead of Wizard?
We are in a bad way then!

I was really hoping for this project, Wizard with all its flaws is really smart and has a context that does not break down to more than 60K.
In short, we just have to wait for the future and the arrival of a model that is not a throwback.

shell ingot
mighty sparrow
#

I always thank you for your work, but 8B is too unintelligent and 8K is not suitable for my chats.
I read on Reddit that you are planning to expand the parameters and get to 70B models and beyond, I continue to follow you.
A 70B and 16K model today I think is the bare minimum, but the audience is large and you are accommodating a lot of people, I congratulate you.

shell ingot
# mighty sparrow I always thank you for your work, but 8B is too unintelligent and 8K is not suit...

Sure thanks for the praise but show me one single 70B model that is natively 16K and not horrible, there simply aren't any today. You have to use RoPE which you already can with the model we trained.

The 70Bs today are awful except Euryale but even that is not great. We started with 8B for costs and I think the output we produced will beat many 70B models in many aspects due to how few good 70B models there are right now.

cloud turret
#

Is this celeste 8b as good as magnum 72b? I've been using magnum for a while and it's been my favorite so far since it's still silly but it's clever enough to understand subtlety too.

boreal crane
cloud turret
boreal crane
#

I tested the 8B Celeste, it may be a great model, but it is still 8B. It cannot follow long conversations/timelines, it has problems with formatting, it mixes characters. A good 72B model like Euryale can handle these things much, much more reliably.

#

Magnum is also a good model, but apart from being expensive I find it often too fine-tuned for a single purpose (ERP). If it encounters a few keywords, it marches unstoppable in one direction.

#

Euryale-70B is my current best compromise model choice, it can do ERP, but it does feel like it was exclusively trained/fine-tuned on a dataset from pornhub (nothing against this or other porn sites). The other extreme is WizardLM, which feels like it was trained exclusively on Walt Disney movies.

cloud turret
#

I see. So Magnum is not considered crap like the other L3 70s besides Euryale, it's just very horny and Euryale is more flexible.

The cost is actually a thing I noticed, normal Qwen 2 72b is 8x cheaper than the other 70bs, and even Magnum 72b which is just a fine tune of it. Which seems strange, is Qwen 2 72b bad or weirdly easy to run or something?

#

Thanks for the input btw, I'll give Euryale a try. I did like how Midnight Miqu 70b v1.5 seemed more "stable" than Magnum, so if Euryale is sort of like that, that might be fun too for some cards.

boreal crane
#

Euryale might need a bit higher temperature than other models to keep it from being repetitive, my current settings are 1.25 temp and 0.1 minP

boreal crane
cloud turret
boreal crane
cloud turret
#

Oh! Thank you so much. So no icon at all means fp16?

Also how does int4 compare to q4km, if you know. I googled it and didn't come up with much. I thought Magnum felt "better" on Openrouter than my local q4km of it, but maybe that was psychosomatic?

boreal crane
cloud turret
#

Great. So is int4 the same as q4? When I hovered over it on Openrouter it said it was a type of quantization, but I can't find anything explaining how it compares to the q(x) terms I'm used to seeing.

boreal crane
# cloud turret Great. So is int4 the same as q4? When I hovered over it on Openrouter it said i...

There is the paper -> https://arxiv.org/abs/2301.12017

lone sundial
jade oyster
lone sundial
cloud turret
# boreal crane There is the paper -> https://arxiv.org/abs/2301.12017

Thanks, but I read the abstract and skimmed the paper, and it doesn't seem to say anything about how the degradation (if any) while using it translates to a q(x) equivalent? The paper said int4 has no degradation unless it's a "decoder only model" but I'm not sure what that means.

cloud turret
shell ingot
mighty sparrow
lone sundial
#

8B is only a beginning, 70B will come

#

also our dataset pretty much enables up to 32K context recall with RoPE

mighty sparrow
#

This is very GOOD news!

#

I tried Celeste 8B but went back to Wizard, in my case it was constantly getting my model's clothing items wrong that were well specified in the character definition. It gave me hell.
Knowing that you are working on a smarter model is great, thank you!

north pasture
# mighty sparrow I tried Celeste 8B but went back to Wizard, in my case it was constantly getting...

That is my big pet peeve with all L3 models. Every single one of them screws up clothing/doesn’t adhere enough to instructs/provided data. Im eager to see what they can make that is larger, but when Llama models are involved i lack faith. (NOT because of these ppl’s capabilities, but cuz of the limitationd of the base model).

If anyone can make an L3 behave itll be lemmy and their team. But i think the issue is, when a model is given creative freedoms and can write creatively very well, it thinks it can forgo our data if it suits it’s creativity.

For example, if i specify clearly in my data that the char never wears panties, isnt wearing panties, hates panties, an L3 model will still give the char panties if it wants to describe how horny she by describing wet panties. Perhaps the datasets a model is trained on can cure that particular example by including more options on how to describe a horny girl. But so far all claude and L3 models, as well as a few other oddballs along the way, that i have tried have all had this issue.

And that concept affects more than just this example lol.

fresh glacier
north pasture
robust void
#

but been testing out 405b, and mixtral2-large today.

#

found m2 pretty decent given that its a fraction of the size of llama3 and gpt4o