#Sao10K/L3-70B-Euryale-v2.1

1 messages · Page 1 of 1 (latest)

fiery oxide
woeful mist
#

This has my vote. This is the model i want to try most next to Awiz (abliterated)

fiery oxide
#

This model is currently hosted by infermatic's community, and people are really loving it

slim cipher
#

Is it uncensored?

fiery oxide
formal steeple
fiery oxide
slim cipher
#

Let's goo, add it pls

craggy crow
#

+1

grand marsh
#

Rope to 16k and add!, it's a very competent model. Easily best I've used for L3 70.

latent sigil
#

+1

red pagoda
#

+1

#

This model is currently my favorite

#

Very good at Story writing,some times it's answer is better than claude opus for me

twin siren
#

+1
Tested it yesterday. Very good for both roleplay and story writing.

woeful mist
#

Those in my server trying it at infermatic are liking it alot except for (infermafic’s issue not the model) extremely long response times, waiting over a minute.

I really hope we see thos roped to 16k context cuz 8k is just not enough.

fiery oxide
woeful mist
fiery oxide
#

and it's overloaded
vLLM can batch well, but even it has it's limits

woeful mist
#

Apperently most models are pretty slow. Astoria isnt i guess tho. One of my members loved astoria cuz its filthy. But i dont tbink (idk i never tried it) it follows indtructs well

#

But back to topic i hope they add euryale so i can finally see what the fuss is about

fiery oxide
#

Well, it's a second most requested model here after UnleashedWiz (soon!)
OR definitely should look into this

tribal dagger
#

anywhere i can try it?

devout crow
#

id love to try it

#

id sub to infermatic but i already added some credits to my OR account a day ago damn

river granite
#

+1

hollow nest
#

Seems to be hosted on novita https://novita.ai/pricing

fiery oxide
formal steeple
#

(with 16k context if possible, ty)

hollow nest
#

sadly I think Novita only do 8k

devout crow
#

man ive been spoiled with 32k context

subtle phoenix
#

Yup working with them to add it. They said earlier today that there was some kink that needs to be ironed out, but I got the PR up already

#

Should I just merge it lol?

subtle phoenix
#

Merged, should be up in 5 mins

#

Note the responses might be gibberish

#

(That's what they told us xd)

sonic merlin
#

It's up, First test came through fine.

fiery oxide
#

bruh Novita doesn't have minp which is a must have for this model

subtle phoenix
fiery oxide
#

This model is at it's peak on temp 1.5 and min-p 0.1

#

It like adores high temp with min-p

sonic merlin
#

This model seems to love Markdown. It even spits out crazy formatted text in the middle of a text only role-play chat. Not exactly gibberish, but a bit strange nonetheless.

sonic merlin
fiery oxide
#

hm lemme test

#

on OR

#

and compare to Infer's host

subtle phoenix
fiery oxide
subtle phoenix
#

ooh it's sub

fiery oxide
#

they are too slow and ratelimited for you prob

sonic merlin
#

for 500 token responses with high latency

fiery oxide
subtle phoenix
fiery oxide
#

Unlimited

#

They are just kinda slow

#
  • default plan has 2 concurrent req limit
sonic merlin
#

each response seems capped to 500 tokens or so if I read it correctly, that is a bit annoying, esp with high latency.

fiery oxide
#

and 1500+ from their Euryale and Stheno

sonic merlin
#

Hmm: -> "512 token responses, 86,400 requests per day." for $15/month

#

On their landing page.

fiery oxide
#

Their site is kinda unfinished and outdated atm

#

There are no limits, as far as my personal experience goes

fiery oxide
sonic merlin
#

Okay, I only have gotten the markdown treatment once in about 10 tries, this seems an acceptable level of annoyance.

fiery oxide
sonic merlin
#

I am using temp 1 currently, yes, I forget to mention this.

fiery oxide
#

well, I mostly used it with temp 1.5 on Infer lol

sonic merlin
#

As long as I don't complete replies in French or Spanish like Dolphin now I am fine.

fiery oxide
#

it really needs min_p tho

#

I hope Novita adds it soon

fiery oxide
#

tho this is more ideal, but again no min_p

sonic merlin
#

(the first settings from you I am referring to, of course)

#

I usually stick to temp 1 and do not get high as models tend to freak out/produce gibberish (the only other models I tried with very high temperature and that did not completely go bonkers right away were GPT-3.5/4, but I last used them 6 months ago or so)

fiery oxide
#

Damn it's really hard to tame with just temp and top_p

daring shale
#

I’m here

daring shale
fiery oxide
#

or are we a tad bit too slow for that lol

daring shale
#

Only on the discord 6440mcfoxsleeping

daring shale
#

We're working on that

fiery oxide
daring shale
#

Nah, on the output speed

#

I didn't know we could be associated tho

fiery oxide
daring shale
#

How does OpenRouter works with other companies?

subtle phoenix
daring shale
subtle phoenix
daring shale
subtle phoenix
#

cc @idle grotto

daring shale
subtle phoenix
daring shale
#

Sure!

subtle phoenix
#

Thanks @fiery oxide for the intro kek

fiery oxide
mortal cove
#

Hmm definitely experiencing the gibberish responses warned about above.
Very excited to try it out once that's ironed out though!

cosmic yew
#

Would be awesome if Infermatic and OR did work together. Tried the former, couldn't figure out how to get it working on TypingMind so staying here even though I roleplay primarily with TypingMind and Infermatic seems to excel with the RP models. Anyway, looking forward to trying out this new holy grail of models.

fiery oxide
subtle phoenix
fiery oxide
subtle phoenix
#

we doing messages -> prompt

#

and I'm pretty sure most of infer model do prompt right

#

(we actually do both tbh xd, it's wonky but... work thus far)

fiery oxide
#

hmm, Infer has some problems with system role not being supported (or at least had) and some with strict user->assistant order too

fiery oxide
subtle phoenix
fiery oxide
#

They don't do any formatting besides what vLLM and Aphro do

livid violet
#

I don't wanna hijack the conversation or anything, but I can see the model is already available on OR through novitaAI. Thing is, when I try to run it through ST it spits out error404. Overloaded servers?

fiery oxide
#

Yea it went offline wtf

daring fox
#

Is Infermatic not slow as fuck anymore? Back when I used it there was consistently 30 seconds to first token at a minimum.

#

And that was on the 70B models, their 120B was like 60 seconds. Idk if times like that are acceptable for OR.

#

Though, being able to access those models without paying $15 up front would be nice

fiery oxide
#

Infer sped up somewhat, and Svak said they are still working on better speed

#

Astoria is like 15t/s usually (4th by usage)

daring fox
#

Is that latency with empty context?

fiery oxide
daring fox
#

I remember it getting really slow when I pushed past 8k, but that was a couple months ago

fiery oxide
#

+L3s are faster than L2s

daring shale
#

Midnight/Euryale

#

And miquliz it's way better than before

#

I swear __

fiery oxide
grand marsh
#

16k on Openrouter a reality? I know Infermatic got the extension.

daring fox
#

Yeah that'd be pretty epic.

daring shale
fiery oxide
#

Nobody even noticed that on my hosts

#

and I always do fp8

fiery oxide
daring shale
#

I still have nightmares of supra asking for evidence on the fp16

daring fox
#

Is Supra finally gone? He's the reason I left the server

fiery oxide
fiery oxide
#

two times

#

I'm the only techdev role now lol

#

Infer server has been supremely friendly since Supra got kicked lol

daring shale
#

Ur free to come back now

#

xd

fiery oxide
fiery oxide
daring shale
fiery oxide
daring fox
#

Qwen is on Infermatic? Isn't that like super censored?

fiery oxide
#

it's kinda censored tho
Hope Magnum-72B will kick it out in the next poll

daring shale
#

Why dont replace it with llama3?

daring fox
#

Yeah that's the one I tried, extreme positivity bias. Reminds me of Mistral 7B tunes

fiery oxide
daring shale
#

Yeah

daring fox
#

Oh you finally fixed the website! No wonder I couldn't find the list on Discord.

fiery oxide
daring shale
#

Llama3 -> Qwen and Qwen -> Magnum or the one that wins the poll

daring shale
daring shale
#

XD

#

It's still lacking some things, but we'll get through them

hallow geyser
#

I just posted my review of euryale in the feedback section of Infermatic discord. TLDR: It's fun. I enjoy it. But for regular RP I'll be sticking to wizard, and maybe midnight for a few cards.

daring shale
#

#1200053136082079845 message

lofty stratus
#

Uhh this started happening on fourth response, everything was normal. 😨
~2.2k context, OpenRouterk, NovitaAI.

fiery oxide
#

BRUH

#

Last time I saw this it was on Wiz 7B on DeepInfra after 8k

fiery oxide
#

like, does it go away with swipes?

#

hm works fine at 5K

lofty stratus
#

I started a new chat and it's still bricked...

fiery oxide
#

try like temp 0.87 top_p 0.81

lofty stratus
#

Normal settings. Going crazy regardless of settings. Tried switching to text completion to see if anything different.

#

maybe an intern at NovitaAI tripped a cable or something

#

I was having a working chat earlier today.

#

restarted ST 😓

livid violet
#

The gibberish response are because the servers are dying

#

some problem at NovitaAI

livid violet
lofty stratus
livid violet
#

oh well

fiery oxide
livid violet
#

The limited sampler settings are the one thing that tempt to just renting a cloud computing unit and setting up oobabooga in the cloud.

fiery oxide
#

Ooba sucks, i use Aphro

livid violet
#

whats wrong with ooba?

fiery oxide
#

like, it's just not worth to use

livid violet
#

welp, haven't had any problems personally

#

only used it with 21b models at most

#

I fully switched to using OR anyways

#

because of the need for bigger models and not having multiple GPU's :c

fiery oxide
#

I do that

livid violet
#

I have an rtx4080

#

not gonna buy an A6000, sry xd

fiery oxide
#

I don't own a A6000 too sadly lmao

livid violet
#

exactly why I thought about renting a cloud service

#

it's quite cheap

fiery oxide
#

Or MI300X

#

both around 4$/hour

livid violet
#

I'll hit you up if I ever need help setting up a runpod unit, aight?

livid violet
#

The model is back up again btw

daring fox
#

Ooba does have the new DRY sampler though, I wonder if it's any good.

fiery oxide
#

Quite popular on Infer

livid violet
#

Ooba has it as well

#

I think

fiery oxide
#

I wish vLLM had more samplers

#

but it at least has min_p lol

#

beam search is cool too

mortal cove
subtle phoenix
#

Ah crap... I mistook this model with #1248338089663926313 :d

#

We only asked @bold sphinx for permission to route to Stheno, but not this one yet

mortal cove
#

Lol. So this one on OR is actually Stheno? xD

subtle phoenix
#

no it's Euryale

#

But didn't get the author's blessing yet :d

fiery oxide
woeful mist
#

I havent seen gibberish or alot of the issues i see in here. Wonder what im doing “right”. Not without issue, but no errors or gibberish lol.

cosmic yew
daring shale
tribal dagger
#

i know Response might be silly but I don't expect that silly, like bunch of random words that have no meaning?

something wrong with provider?

devout crow
#

hey all, whats the consensus on this one?

#

is it any good?

fiery oxide
devout crow
#

damn

fiery oxide
#

We have to wait until either Novita fixes it, or another provider (Infermatic) gets added

devout crow
#

im this 🤏 close to finding a cloud gpu host to run whatever i want

tribal dagger
atomic mountain
#

just discovered this model like 20 minutes ago and naturally as soon as I'm enjoying it it starts throwing 404 errors

Chat Completion API
{"code":404,"reason":"MODEL_NOT_FOUND","message":"model not found","metadata":{"reason":"model: sao10k/l3-70b-euryale-v2.1 is not available"}}

EDIT: Seems to have recovered

woeful mist
#

Im toying with it abit for now but really holding out for infermatic to provide it, with their roped 16k context, and stability. I just pray they can get response times better. The few i kno using it at infermatic already the day it landed there said 200+ seconds for a response 😭

devout crow
#

urgh that context is killing me

#

i need at least 16k

#

but imma try this anyways

pulsar edge
#

Blank blank blank

devout crow
#

okayn ow im just getting 405 errors

#

damn it

devout crow
#

okay i cant lie, this is actually very nice

#

though the constant errors are really annoying

devout crow
surreal anchor
#

Yikes

devout crow
#

babe wake up new p parameter just dropped

tribal dagger
slim cipher
#

Damn... the response is almost human, almost similar it is as opus. I love the writing of this model. Unfortunate for 8k context but its well damn good enough.

lilac hamlet
#

I had plenty of gibberish, too, until I removed the system prompt that comes with the instruct preset. Could it be related to markdown? Maybe it's just a lucky coincidence.

tribal dagger
#

creative, smart, I really like this model, I wish it have a larger context

#

why my Logit Bias not sent? (it is still sent when using wizardlm2-8x22b

sonic merlin
tribal dagger
#

well, lepton maybe

sonic merlin
#

(also extremely tricky to get these right, as they need 100% match the correct tokens)

tribal dagger
#

because with wizardLm2-8x22b it's still work

devout crow
lilac hamlet
devout crow
#

how is it now? is it any better, or just different

#

regardless, this model is an absolute blast

#

if somehow, someway a 16k context variant can happen, ill die happy

devout crow
#

Prayge doing the lords work

devout crow
#

need it ASAP for openrouter lol

#

a 32k variant will make me pass immediately

fiery oxide
devout crow
#

nuts

#

keep us updated

daring shale
#

Yep yep

devout crow
#

quality slowly but surely just degraded

#

taking this bot back to when the tower of babel fell 😔

#

also damn novita errors a LOT

sonic merlin
#

Last two requests: "504 Gateway Time-out"

devout crow
#

yeah its done that a ton today

sonic merlin
silk drift
#

smh this model always goes offline like every other request when I need to use it

sonic merlin
#

yep, it's gone for now -> 404 "model: sao10k/l3-70b-euryale-v2.1 is not available"

#

Now I got a reply again

silk drift
#

overloaded I’m assuming; not sure what other issues would cause 3-5m intermittent blackouts

sonic merlin
#

what happens when you are not google or amazon and only have limited resources

sonic merlin
#

But now it seems overload is more likely, just got another gateway timeout

woeful mist
outer bluff
#

To say that this model is currently unstable is an understatement.

grand marsh
#

The model is great but yeah the provider isn't stable.

silk lodge
#

quick question since I never got it set up with my own presets: could one of you share the preset they're using and having good luck with?

#

whenever I use one of my own configs meant for more traditional llms I just get garbage results

sonic merlin
silk lodge
#

Thanks. I'll try that! :D

woeful wedge
#

Model still dead. Providers are missing out, it's a great model, guaranteed to be a money maker

subtle phoenix
#

Adding infermatic now!

#

It's coming in ~10 mins

woeful wedge
#

Cheers

outer bluff
#

Since the provider of the model seems to be completely down (at least when it comes to requests for this model), I wonder why the little status blip is still showing green next to it on the site.

subtle phoenix
#

Tho PR is up, not 10 mins but it's getting there

outer bluff
#

Okay, it's just more obvious than usual since the thing's been down for hours 🙂

fiery oxide
subtle phoenix
#

Just euryale for now!

silk lodge
#

well this is an odd one.
seems the model selector in Silly is broken

fiery oxide
#

bruh I hope that won't kill latency
Tho Svak did some optimizations today

fiery oxide
silk lodge
#

Ah

#

aw :(

outer bluff
#

Check the availability tab, it's been down for some time

subtle phoenix
#

Infermatic is deployed

silk lodge
#

for me the uptime page is completely empty lol

fiery oxide
#

yeah same for me (Firefox)

sonic merlin
subtle phoenix
#

oh wat

silk lodge
#

if it helps

outer bluff
#

Still only seeing NovitaAI in the providers list. Did a forced reload

sonic merlin
#

Yeah Firefox is a bit too strict for this

subtle phoenix
silk lodge
#

same thing on Chrome 126

subtle phoenix
silk lodge
#

somehow it DOES work on GNOME Web / Epiphany

sonic merlin
#

Safari works too

silk lodge
#

So Chrome + Firefox broken but WebKit (Epiphany) works.

#

yeah GNOME Web is as close to Safari as one can get

#

without a mac

fiery oxide
#

So only WebKit works

#

Luakit works

#

so ye

subtle phoenix
#

works on brave for me :d

silk lodge
#

Hmm...

outer bluff
#

I'm using Vivaldi, which is Chromium-based like Brave, and it works fine

silk lodge
#

my chrome is fully default

#

odd

subtle phoenix
#

chrome works too:

silk lodge
#

weird

fiery oxide
silk lodge
#

same

woeful wedge
#

Model is still toast unfortunately

subtle phoenix
#

interesting...

#

maybe an iframe issue?

silk lodge
fiery oxide
silk lodge
#

X11, GNOME

#

GNOME 46 here specifically

#

using Chrome + FF through Flatpak

#

Nvidia Propriatary drivers

fiery oxide
#

GNOME 45
Radeon driver, FF installed from RPM Fusion

sonic merlin
#

I always assumed this was an iframe permissions problem, Firefox is much stricter than other browsers

#

Model is up, it seems.

#

Nah, still 404 😦

#

But no instant rejection anymore

subtle phoenix
#

It's up

fiery oxide
#

Nice

sonic merlin
#

Maybe SillyTavern needs a restart/reload to pickup the new provider

fiery oxide
subtle phoenix
sonic merlin
#

Hmm. I still get 404 via API/SillyTavern, even after hard restart

subtle phoenix
#

(tho the router is not relying on the cache to route)

outer bluff
#

Aaand working

silk lodge
#

works

sonic merlin
#

still

  error: {
    message: "{\"code\":404,\"reason\":\"MODEL_NOT_FOUND\",\"message\":\"model not found\",\"metadata\":{\"reason\":\"model: sao10k/l3-70b-euryale-v2.1 is not available\"}}",
    code: 404,
  },
silk lodge
#

rebooted Silly, re-opened the tab and boom

#

:D

fiery oxide
subtle phoenix
#

Yup

#

I will just fix it as-is for now, will move it to another variant when this model has more provider I think?

fiery oxide
#

@subtle phoenix Infer doesn't log, terms were updated today

rapid juniper
subtle phoenix
#

But basically when we include the privacy policy URL, we show that tag for ppl to visit.

sonic merlin
#

Still 404 "model: sao10k/l3-70b-euryale-v2.1 is not available"

subtle phoenix
sonic merlin
#

I try from the console with curl

#

Same with curl -> {"error":{"message":"{"code":404,"reason":"MODEL_NOT_FOUND","message":"model not found","metadata":{"reason":"model: sao10k/l3-70b-euryale-v2.1 is not available"}}","code":404}}

#

It took about 30 secs and produced two pages full of newlines though until the error popped up

#

Precisely:

$ time curl https://openrouter.ai/api/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer $OPENROUTER_API_KEY"   -d '{
                                   
  "model": "sao10k/l3-euryale-70b",
  "messages": [                                                
    {"role": "user", "content": "What is the meaning of life?"}
  ]
}'
[~100 newlines omitted]
{"error":{"message":"{\"code\":404,\"reason\":\"MODEL_NOT_FOUND\",\"message\":\"model not found\",\"metadata\":{\"reason\":\"model: sao10k/l3-70b-euryale-v2.1 is not available\"}}","code":404}}

real    0m41,330s
user    0m0,034s
sys    0m0,023s
subtle phoenix
#

oh FYI, the model id on OR is: sao10k/l3-euryale-70b

#

Will add an alias one sec

fiery oxide
sonic merlin
daring shale
subtle phoenix
#

oh lol

fiery oxide
#

lab's exposing OR's guts XD

subtle phoenix
daring shale
silk lodge
fiery oxide
silk lodge
#

whenever you ask the AI something I go an google it

daring shale
#

XD

subtle phoenix
#

lolol

sonic merlin
#

New error: "{"error":{"message":"{"error":{"message":"litellm.Timeout: APITimeoutError - Request timed out. \nerror_str: Request timed out.","type":null,"param":null,"code":408}}","code":408}}" with

time curl https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{
  "model": "sao10k/l3-euryale-70b",
  "provider": { "order: ["Infermatic"] },
  "messages": [
    {"role": "user", "content": "What is the meaning of life?"}
  ]
}'
fiery oxide
#

bruh
@daring shale

daring shale
#

checking

#

The model it's up and running

#

there must me something wrong on the request

sonic merlin
#

Still the same error "{"error":{"message":"{"error":{"message":"litellm.Timeout: APITimeoutError - Request timed out. \nerror_str: Request timed out.","type":null,"param":null,"code":408}}","code":408}}

real 0m40,706s
user 0m0,032s
sys 0m0,025s" with the above command.

fiery oxide
#

yeah, hit https://api.totalgpt.ai/v1 endpoint directly rn (Infer endpoint btw)
1s latency, all good

#

OR issue?

subtle phoenix
silk lodge
#

Issues aside I feel like this model is still very "dry"
It's flowery but dry.

#

big step up from older OWMs

sonic merlin
sonic merlin
#

Literally straight from the OpenRouter webpage, only added the provider preference:

time curl https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{
  "model": "sao10k/l3-euryale-70b",
  "provider": { "order: ["Infermatic"] },
  "messages": [
    {"role": "user", "content": "What is the meaning of life?"}
  ]
}'
#

That should work, shouldn't it?

sonic merlin
daring shale
#

Does it work now?

sonic merlin
#

But I still get the same llitellm error with the " added

subtle phoenix
#

hmmmmm

daring shale
#

If that doesn't works try streaming: true

subtle phoenix
#

let me try

sonic merlin
# daring shale If that doesn't works try streaming: true

Like this?

time curl https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{
  "model": "sao10k/l3-euryale-70b",
  "provider": { "order": ["Infermatic"] },
  "streaming": true,
  "messages": [
    {"role": "user", "content": "What is the meaning of life?"}
  ]
}'

still produces: {"error":{"message":"{"error":{"message":"litellm.Timeout: APITimeoutError - Request timed out. \nerror_str: Request timed out.","type":null,"param":null,"code":408}}","code":408}}

subtle phoenix
#

should be "stream": true

#

(not streaming)

sonic merlin
#

Or maybe 1000 chunks, still streaming...

subtle phoenix
#

hmm, try non stream but with a small max_token

fiery oxide
#

Works in ST

daring shale
#

@sonic merlin try again

sonic merlin
daring shale
#

You can disable streaming too if you want

#

ping me if anything happens

sonic merlin
daring shale
#

Okey, good to know

lofty stratus
#

Wtf every paragraph from Magnum starts with stuff like "He X, Y'ing", "She blinks, taken aback by", or "She looks away, her cheeks flushing".

fiery oxide
#

also I prob should create a Magnum thread lmao

sonic merlin
#

FYI: Infermatic needs to added to SillyTavern as an OpenRouter provider in public/scripts/textgen-models.js (just mention it here as a quick fix)

#

(to hard route and avoid the 404 errors)

lofty stratus
#

Second one isn't so bad.

lofty stratus
fiery oxide
#

bruh Novita is in 4bit lmao?

#

did they quant Wizard then, Im curious

subtle phoenix
#

Talking to them, they might do int8 or fp16 eventually

fiery oxide
subtle phoenix
tribal dagger
#

SillyTavern don't have an infer provider option yet, how can I switch to a preferred provider on OR?

fiery oxide
#

ST will add it soon enough prob

sonic merlin
sonic merlin
fiery oxide
devout crow
#

wow. this is a first lol

sonic merlin
# fiery oxide hm all reqs go to Infer for me

Didn't work for me at that point.Anyway, the patch/diff is brain dead simple:

diff --git a/public/scripts/textgen-models.js b/public/scripts/textgen-models.js
index d8f36cf4..01743e0c 100644
--- a/public/scripts/textgen-models.js
+++ b/public/scripts/textgen-models.js
@@ -39,6 +39,7 @@ const OPENROUTER_PROVIDERS = [
     'Novita',
     'Lynn',
     'Lynn 2',
+    'Infermatic',
 ];
 
 export async function loadOllamaModels(data) {
devout crow
#

i dont even know if im being routed to infermatic becuase this is all i get in the activity page lol

#

the infamous Shadow Provider

fiery oxide
subtle phoenix
devout crow
#

yeah lol

#

its just blank

#

if i hover over the blank space it says "Unknown Provider"

fiery oxide
#

it's there for me, weird

subtle phoenix
#

works for me :d

tribal dagger
#

work for me too lol

devout crow
#

mmmmmmmmmm

#

let me try a new browser ig

#

nvm lol all good ig

sonic merlin
#

FWIW I see the correct provider (Infermatic) in my usage

devout crow
#

damn

tribal dagger
#

@fiery oxide btw, recommended paraments setting please?

fiery oxide
tribal dagger
fiery oxide
#

0.1

tribal dagger
#

damn, kinda high, i usually just set 0,02

fiery oxide
tribal dagger
#

i just use 1

fiery oxide
tribal dagger
fiery oxide
fiery oxide
tribal dagger
#

infer quantz it? cuz Novita seem does and that's why it suck

fiery oxide
#

Also Novita has straight up broken quant

#

bc AWQ 4bit Euryale should be that bad

daring shale
devout crow
#

dang, not gonna lie, these generation times are pretty slow

#

like on average 30-40 seconds

#

im a patient lad though

tribal dagger
devout crow
#

yup lol

#

is this the classic Infermatic Is So Slow?!???!!!111/// thing ive been reading on reddit or

fiery oxide
#

slow, but sure

#

At least we are getting you the good stuff

tribal dagger
#

yeah Infer's problem is they slow

#

but sure good quality

fiery oxide
tribal dagger
#

as long if it is not too long to a few minutes then it accepts able

fiery oxide
#

(also, yes, I'm sort of an informal Infer rep there)

fiery oxide
devout crow
#

i can handle this speed

#

i handled the dark era of Pygmalion on google colab getting fuckin 0.5t/s

#

pepesalute godspeed fellas

fiery oxide
#

Seeing downtime, Svak is looking

devout crow
#

yeah quality slowly went down

#

till it just. well, died

fiery oxide
#

Should soon be fixed

#

We are back up!

fiery oxide
devout crow
#

on it boss AmeliaSaluteSmol

fiery oxide
#

Context window temporarily capped at 8K

devout crow
#

awwwwww

fiery oxide
#

bc vLLM keeps crashing

devout crow
#

should i just select infermatic only? idk wtf is happening with novita

fiery oxide
#

Novita has a broken quant

devout crow
#

gotcha

#

keep us updated for when its back to 16k

#

ty

tribal dagger
#

so when will OR bring larger context variant?

fiery oxide
#

It was at 16K, actually

#

before it started crashing

#

bruh why can't any inference engine just work

daring shale
#

fr

fiery oxide
#

bruh downtime again

#

vLLM, can you be normal for once?!

fiery oxide
devout crow
#

okay i cant get enough of this model i swear to god

#

these chats are hype af

tribal dagger
#

responses feel weird now, im sure got it from Infer

devout crow
#

hm. they do feel slightly different huh

daring shale
#

what do you mean?

devout crow
#

idk they just feel kinda off in a way

#

maybe im just seeing shit, ill chat more

daring shale
#

okey, lmk

daring shale
#

Context fixed, now back to 16K

devout crow
#

nature is healing

tribal dagger
#

why my sillytavern still max at 8k? I usually don't need to unlock context to max context using chat comp

sonic merlin
devout crow
#

i unlocked mine anyways

tribal dagger
devout crow
#

hold on

#

er... shit, it seems like its not

#

@daring shale hate to ping you man but is there a delay for the 16k update?

#

its still capped at 8k

#

wont budge here

#

even at unlocked mode and set to 16k

daring shale
#

what

#

let me see

#

My st is working with the 16k tokens

devout crow
#

hmmmmmmmmmmm

#

i even restarted twice

daring shale
#

aaand it's down again

#

It'll be up in a minute

devout crow
tribal dagger
vapid hornet
#

Amateurs

woeful wedge
#

The price was doubled?

devout crow
#

seems expensive af now yeah

#

rip my credits

woeful wedge
#

Still worth it, but yeah, a quiet price bump is kins of a low blow

daring shale
#

It's working now

#

and the context it's working too

woeful wedge
#

And if you're bumping it, it better work at least

daring shale
#

Idk why it won't let you access

devout crow
#

infermatic isnt even on the provider list for me lol

#

unless thats intended

daring shale
#

lel

daring shale
#

I can't help with that one

subtle phoenix
#

cc @wet marten

devout crow
#

darn.

woeful wedge
woeful wedge
#

Ah, I see. Don't know who you are, but I hope yours doesn't die every half hour

wet marten
#

If you can give an API for that, that'll be better

sonic merlin
devout crow
#

sigh im just gonna stop chatting for now

#

im burning through credits and getting some real bad hallucinations lol

wet marten
#

Maybe it's you

devout crow
#

did manage to get through 8k though

#

still dont know why i cant do 16k

#

ill probably do a reinstall idk

daring shale
#

What version of st are you using?

devout crow
#

latest

daring shale
#

staging?

devout crow
#

nope shouldnt be

#

standard 1.12.1

daring shale
#

well try again and lmk

devout crow
#

roughly stuck around this token count

daring shale
#

why don't you try something that isn't ST to test if it's your api key or ST?

devout crow
#

funny enough i just made a new OR key just to test

#

gimmie a while

daring shale
#

alr

devout crow
#

mmm

#

not working on venus either

#

i must be subconsciously doing a big oopsie lol

#

hmmmmmmmmmmmmm

#

funny enough even the lst here still says 8k context

#

im gonna stop replying now cause im burning a hole thru my credits now lol

sonic merlin
devout crow
#

damn.

woeful wedge
#

The replies are much smaller with Infermatic

#

And it rushes to complete the instruction/story as well. At that price tag, one should expect more, not less and worse.

#

Novita kept crashing, which was annoying, but it output a lot more and the overall quality and coherence was better than now.

sonic merlin
woeful wedge
#

Even at temp of 1 it goes fully unhinged for no reason, far beyond what one would call creative

#

Just comes up with the most random shit for no reason, while yesterday its creativity was stellar with same settings

woeful wedge
#

Whatever was done recently, it had a very bad effect on the model's output length and coherence.

tribal dagger
#

hell nah endpoint start poiting to Novita now

woeful wedge
#

Damn can we have Novita back, can't believe I'm asking for it 😭

subtle phoenix
woeful wedge
#

It's marked as yellow, does that mean it's down?

subtle phoenix
#

No, just "degraded" (since it was 404 a tons earlier)

woeful wedge
#

Is it a complicated process to change it in ST?

#

Is Infematic running a quant of Euryale? I just can't understand why is it so ass compared to yesterday's Novita

daring shale
#

Maybe it's a settings thing, from all the users you are the one having issues. If you want settings recommendations feel free to join the discord and tweak with them #1253005075064819844 message

#

We are not running quant on Euryale, as I already say we are full FP16

woeful wedge
#

That's why I don't understand it even more so. I'm using the settings reccomended to me here which worked wonderfully on Novita (Which I'm guessing is 4 bit?) And no, I'm not the only one, at least one more user complained below my insight. The system prompt is pulled from the model's hugginface page, which also worked great with Novita.

#

Like come on, I'm paying double the price and getting half of what I did before and it halucinates like crazy? That's terrible.

wet marten
tribal dagger
#

it keep point to Novita fu*k

limber basin
wet marten
# limber basin If the person doing is dev, it shouldn't come as a surprise lol

I'm not concerned about devs. The problem is that every now and then there are support cases with merge conflicts usual peeps got from random Reddit/Discord patches. Dev should ideally do patch in a pull request to upstream if it is something valuable, otherwise it will backfire at me later. Hope I made it clear.

faint olive
#

Novita seems to work fine, I didn't notice any quality differences to infermatic.

Just make sure you are using this formatting and this sampling settings.

fiery oxide
#

eh, Novita hallucinates hard after 4k

faint olive
devout crow
#

welp, its a new day so

#

i hope that with a new chat i can break through the 8k barrier mark now lol

devout crow
#

think its dead again

#

nvm back up now

#

also, i went ahead and installed a new fresh copy of sillytavern and it still displays 8k context

sonic merlin
devout crow
#

yeah but im also capped at 8k AOURGH

#

like it wont go beyond 8.5k tokens for whatever shitass reason

#

pissing me off

#

i figured it was just a visual glitch at first

faint olive
subtle phoenix
#

Novita updated to fp8 and also with 16k extended context tokens fyi

faint olive
subtle phoenix
faint olive
#

Oh alright.

sonic merlin
subtle phoenix
#

updated for me:

#

(the base model is still 8k fyi, but provider can do their own max output via Rope/yarn etc...)

faint olive
#

I hope that I can implement my API at some point, I'm working on a framework that might be able to give ridiculous amounts of context but thats slightly off topic

sonic merlin
devout crow
#

Interesting.

#

this is novita btw lol

#

sigh as much as i love euryale i really gotta stop, my credits are sucking into a black hole

sonic merlin
devout crow
#

i cant select infermatic as my sole provider lol fml

sonic merlin
devout crow
#

mmmmm but wont this cause a merge conflict for future updates?

#

for when the update gets pulled or did i read wrong a couple days ago

#

ehhhh i guess i could just back up the js file and restore it for when an update happens

sonic merlin
#

Abusing git and then telling others not to modify their code is a bit crazy IMHO.

fiery oxide
#

Infermatic's price for Euryale was reduced to 1.5$/M, and the precision still stays at bf16

woeful wedge
#

Does Infermatic enforce shorter responses on their side for Euryale? I can't combat it no matter how hard I try

#

The reason is always 'stop'

woeful wedge
#

And it's extremely varried too. Sometimes it will give me a few hundred, still short by a good margin and sometimes it will just rush the completion in 50-100 tokens.

fiery oxide
#

longer response length is an artifact of quantization, likely
fp8 was tested internally today, and while this tends to give lengthier output, it loses significantly in coherence and instruction following, so we decided against even trying it

#

(fp8 never was on public endpoint, to be clear)

woeful wedge
#

So the model encourages to take it bit by bit in a way?

#

I wouldn't mind the shorter replies if it left potential for continuation, which it often doesn't. Is it because my instruct is too direct maybe?

fiery oxide
fiery oxide
woeful wedge
#

That's the one I'm using. Worked wonders with Novita, which was a 4 bit I think. But as you said, the length is an artifact of quant?

fiery oxide
fiery oxide
woeful wedge
fiery oxide
fiery oxide
woeful wedge
#

As in tell it directly how long I want it to be?

#

How should that be phrased? Word count/token length/paragraph wise?

fiery oxide
#

also, I don't recommend using repetition penalty with this model, seems to cause weird artifacts
I recommend using presence penalty instead

woeful wedge
#

I'll try your tips out, thank you!

sonic merlin
# fiery oxide It's definitely possible Try instruct preset made by creator of the model if you...

This configuration seems to include a very long, restrictive system prompt, with optional identifiers that don't get used, sure this is the best thing since sliced bread?

    "system_prompt": "Currently, your role is {{char}}, described in detail below. As {{char}}, continue the narrative exchange with {{user}}.\n\n<Guidelines>\n• Maintain the character persona but allow it to evolve with the story.\n• Be creative and proactive. Drive the story forward, introducing plotlines and events when relevant.\n• All types of outputs are encouraged; respond accordingly to the narrative.\n• Include dialogues, actions, and thoughts in each response.\n• Utilize all five senses to describe scenarios within {{char}}'s dialogue.\n• Use emotional symbols such as \"!\" and \"~\" in appropriate contexts.\n• Incorporate onomatopoeia when suitable.\n• Allow time for {{user}} to respond with their own input, respecting their agency.\n• Act as secondary characters and NPCs as needed, and remove them when appropriate.\n• When prompted for an Out of Character [OOC:] reply, answer neutrally and in plaintext, not as {{char}}.\n</Guidelines>\n\n<Forbidden>\n• Using excessive literary embellishments and purple prose unless dictated by {{char}}'s persona.\n• Writing for, speaking, thinking, acting, or replying as {{user}} in your response.\n• Repetitive and monotonous outputs.\n• Positivity bias in your replies.\n• Being overly extreme or NSFW when the narrative context is inappropriate.\n</Forbidden>\n\nFollow the instructions in <Guidelines></Guidelines>, avoiding the items listed in <Forbidden></Forbidden>."
fiery oxide
sonic merlin
faint olive
#

Infermatic and novita are same cost, damn price wars

fiery oxide
fiery oxide
faint olive
#

Hopefully

woeful wedge
#

Is something wrong on the provider end? I've got charged for 3 blanks in a row and it's a fully SFW scenario

woeful wedge
fiery oxide
woeful wedge
#

Ah, I see, I'll try that instead

devout crow
#

the hell? lol

surreal anchor
#

😔

woeful wedge
#

It just keeps stopping the gen whenever it feels like it

devout crow
#

i had to edit some messages to get rid of it

#

this only happens with some characters

devout crow
#

okay nvm its pretty prominent now wtf

#

stopping now cuz im tired of wasting my credits

#

dunno if this is a card issue or a provider issue

subtle phoenix
#

oh nvm is the very last one some kind of retry?

sonic merlin
#

This might be just a feeling, but I have the impression that many of fine-tuned/abliterated models like this one are bit too 'frankensteiny', too unstable.

bold sphinx
devout crow
#

like i swapped to a new character, got that, rubbed my eyes cuz i thought i was hallucinating

#

but yes im getting really short replies, it sucks

woeful wedge
#

I have to fallback to Wizard until or if this gets sorted. And I swear this model's coherence is a little questionable as it is right now, but it could be just my pink glasses when I tried it out for the first time with Novita

devout crow
#

okay i think i figured out the problem

#

it was the example messages, if they're short, then it wont generate anything longer than the example message no matter what you do

woeful wedge
#

How short are we talking? Did you test what happens if there are no example messages at all?

devout crow
#

yes, they became noticably longer

woeful wedge
#

Hmm, I'll try this out too.

devout crow
#

go for it, it helped for me

woeful wedge
#

It kind of works, but it's not a reliable fix. I noticed an improvement however. Now another issue I experience is the repetition. I have my presence at 0.45

devout crow
#

same. very annoying. i also noticed a really weird issue where it just wont format dialogue with quotation marks

#

even though i explicitly ask it to

woeful mist
surreal anchor
#

😭

woeful wedge
# woeful mist Lmk if it isnt the same. I swear yesterday sometbing broke wit wiz.

I still feel like the 'logic' of Euryale is..off. It just doesn't make connections like it used to when it was introduced. I know everyone keeps telling me that this is the full model as it was intended, yet I still feel like it never topped the quant by Novita? But since I got to use so little of it I just can't distinguish if it's my nostalgia bias or if that was actually the case. Wizard has been off too recently, though even in its current weird state, I feel like Euryale beats it in terms of dialogue. Feels more natural, but it still has that ultra frustrating tendency of becoming formal real quick

woeful mist
#

If wiz persists to be trash ima try euryale and get heavy into forming instructs and proper settings like i had for wiz. If i can control euryale’s short response issue and grt a decent quality going ill share what instructs income up wit

#

Also its been recommended to use infermatic provider for euryale.

woeful wedge
#

My activity tab shows that I've been thrown around like 3-4 providers of Wizard, like literally one reply in between, so it's a little hard to say exactly who's doing a poor job and who's doing well unless I really keep a tab on it

woeful mist
#

I think its the default now. But if u r using it where u can select provider, select infer and see if its ne better/worse. Its logic has been an issue,as is thr case with any L3 model.

woeful mist
woeful wedge
#

How do you select it in ST?

woeful mist
#

Lepton went to total shit. Notiva seems less affected by the issues but still handling its context very poorly for wiz

woeful mist
fiery oxide
pulsar edge
#

Novita version is trash

surreal anchor
#

😔

sonic merlin
# pulsar edge Novita version is trash

I've excluded Novita for this model for some time now, gave too many short, broken answers. Infermatic seems to be much more reliable, though it produces 504 responses quite often.

pulsar edge
tribal dagger
#

wait until SillyTavern update infermatic at a provider (anyway, is there a way to update lastest provider through OR API?)

sonic merlin
tribal dagger
#

what, still don't see it

#

just pull staging ver yesterday

tribal dagger
#

noice, grinding Infermatic for now

tribal dagger
sonic merlin
tribal dagger
sonic merlin
sonic merlin
tribal dagger
#

do u know how to clone a git with specific branch?

#

i hate using termux lol

sonic merlin
tribal dagger
#

oh

sonic merlin
#

You can also ask your LLM to write a 5k token explanation why git should NOT be used as a distribution tool to end users and send it to ST devs, if you feel like it. I mean, they complain HERE that I should not show a patch because it might generate support on their side. Now I am doing support here for their ill choice of abusing git. /rant 🤯

slim cipher
#

I hope you guys can add some features for users to choose providers or some sort in OR like on playground... I don't use Sillytavern tho, which is sad, I only use Venus.

subtle phoenix
tribal dagger
pulsar edge
#

So much trash............ Sometimes I use Silly Tavern, but when I use other apps, Novita comes first. I have to endure all kinds of alien languages! AAHAHHHHHHHHHH F***

tribal dagger
fiery oxide
#

@subtle phoenix can seed param be added for Infer? It's supported (bc vLLM supports it)

pulsar edge
#

Novita...

tribal dagger
#

Sometime Euryale on infermatic so excited that it will write so long until it reached max response (usually just 5-6 paragraphs)

faint olive
#

Something weird with Euryale in general. It's like L3, longer contexts can often lead to weird or incoherent stuff.

I mean like 4+ turns.

However sometimes it works fine?

It feels quite random albeit it's probably that there are some things that aren't in the training dataset and the model forgot how to handle it.

woeful wedge
#

I had some good success with Divine Intellect preset in ST. It got a good share more intelligent ans coherent, some char cards felt more alive too, though It still prefers a good instruction or two to build off of for best results.

hazy rock
#

Tried Euryale but it does not seem to me to be a model on the same level as WizardLM-2 8x22B, which I find to be smarter and better at following instructions.
I also tried a group chat and WizardLM-2 8x22B did not miss a beat, Euryale sometimes gets confused and strange tags and characters appear from time to time.

tribal dagger
#

There's something wrong with Euryale, it's worse than when first brought to the OR

pulsar edge
#

Every word breaks.

fiery oxide
#

unable to reproduce, neither through OR or directly through Infer
Coherent and decent quality for me

#

Infer didn't change anything about this model

errant vault
#

How much money should I put towards euryale for it to last long

surreal anchor
#

1 million

#

😈

errant vault
#

😭😭

fiery oxide
errant vault
errant vault
fiery oxide
#

(automod didn't let me send discord link directly lmfao)

errant vault
#

Do I have to create an account. First?

fiery oxide
errant vault
#

Ever time I click on it it brings me back to this server

#

I got it

#

Thankssssss

errant vault
#

Did eurayle get the price raised

sonic merlin
devout crow
#

is there a way to fix euryale following the example dialogue a little too much? if an example dialogue in ST is short, every single reply will be the same length unless i delete it entirely, which isn't really ideal since example messages are pretty important

#

even if the intro message is long as shit, it'll just compress the reply length based on the example message, its really obnoxious honestly

livid violet
#

I'm only able to push 8k tokens into the prompt no matter what settings I use in ST, even though on the page it says it has 16k context. Any ideas why?

#

My activity shows 8k context use in every single prompt

#

is this because this model is roped to increase context size?

devout crow
livid violet
#

but activity shows something like this

#

makes me wonder, if OR cuts my prompt in half

devout crow
#

yeahhhhhhhh i saw this myself idk myself and many others brought this up and it wasnt addressed i think

#

so idk

#

kind of annoying, i think this has been a problem for 1.5 weeks

livid violet
#

Oh well, back to claude for the time being if that's the case.

devout crow
#

yeah im going back to wizard

errant vault
#

Do eurayle read example dialogues

tribal dagger
errant vault
#

Is there a specific way to make character cards for eurayle

#

So the model can understand it better

surreal anchor
#

If the context problem ever gets fixed, this'll probably be my main model, as I really like it 😄

tribal dagger
#

Command R+ no better than Euryale in roleplay in my opinion. But cmd R+ follow instruction far better.

So yeah, i gonna go back to Cmd R+. Cuz while Euryale strugle to follow a simple instruction of writing 3 paragraphs long (sometime it write longer or shorter. Even use last prefix of promt instruct), Cmd R+ can follow it well and see no issuse.

pulsar edge
errant vault
#

Is eurayle good with group chat

strong tusk
#

I am getting random garbage with this model speaking crap that isnt english filled with symbols

sonic merlin
civic shore
sonic merlin
woeful wedge
#

How does one set up Infermatic as the only one allowed in ST? It is not on the list

#

I checked my activity and with nothing specified I see that I get both Infermatic and Novita mixed in

fiery oxide
woeful wedge
#

That did the trick, thank you

woeful wedge
#

Is infermatic having issues?

#

Can't connect to the model

idle grotto
#

it was, but they're rolling back a change and fixing it

atomic mountain
#

seeing a huge increase in nonsense responses without changing any parameters, doesn't seem to be affecting any other models that I can tell

sonic merlin
atomic mountain
#

ah alright, will do

sonic merlin
#

TLDR; If you use a SillyTavern and have set Infermatic as your sole provider you now need set a new flag, it's already in ST -> #arc-feedback message

#

Otherwise thanks to this feature -> #announcements message the other provider, which quantize, will get sent requests too and return that garbage

atomic mountain
#

All sorted now, thanks for your help!

woeful mist
# devout crow is there a way to fix euryale following the example dialogue a little too much? ...

Silly idea but maybe make your example messages longer then? I make mine exactly how i want my responses to be in every way possible, from formatting to length to char personality and vocabulary - typically i start by having the model sorta make them then heavily edit them.

And i do believe on ST they are temp tokens, so once context fills they drop and ur chat hist becomes the new examples.

devout crow
#

i usually do that but some days i cant be bothered

fiery oxide
#

Infermatic drops Euryale's precision to fp8 (dynamic activation)

fiery oxide
#

Infermatic's team, community and me personally tested dynamic fp8 quantization on vLLM and found quality degradation to be minimal, pretty much invisible.
Though, if you experience major output quality degradation, please report it to me, I will pass it on to Infermatic's team

#

@idle grotto can you please mark Infermatic's endpoint as fp8?

sonic merlin
# fiery oxide Infermatic's team, community and me personally tested dynamic fp8 quantization o...

I cannot confirm that the degradation is minimal or even invisible. Instead of following instructions and producing long outputs, the same cards produce now superficial and short replies, without changing anything. I cannot test this deeply (only ~10 generations), as I have no time for this now, but I know, when I have time again, I'll have to look for another preferred model. This does not work for me anymore.

fiery oxide
#

Apparently FP8 quant we used was a static one, Svak is making dynamic one right now
Morale: never trust fp8 quants on HF, we will make our own in the future
Euryale is the only model recieving reports of degraded quality, Daybreak and Magnum are a-OK (both use first-party dynamic fp8 quants, made by me)

fiery oxide
#

should be fixed now

sonic merlin
fiery oxide
sonic merlin
fiery oxide
#

Tbh, fp8 is not ideal, but 60s+ latency was becoming too much, Infermatic just has limited amount of resources compared to bigger providers

sonic merlin
fiery oxide
hot crane
errant vault
sonic merlin
# errant vault What does FP8 mean?

Precision of (most) of the weights / quantization, FP8 = floating-point 8 bit, FP16/BF16 16bit, see also -> https://en.wikipedia.org/wiki/Minifloat

In computing, minifloats are floating-point values represented with very few bits. Predictably, they are not well suited for general-purpose numerical calculations. They are used for special purposes such as

Computer graphics, where iterations are small and precision has aesthetic effects.
Machine learning, which can be relatively insensitive t...

errant vault
sonic merlin
#

From math perspective high precision (16/32 bit) is only necessary for training, where weights gets accumulated, but during inference most of this precision doesn't matter as high values are more important than tiny fractions for results. Those high values get preserved during quantization, so that even 4 bit weights still work pretty well.

fiery oxide
fiery oxide
#

@subtle phoenix why max output on Infermatic's endpoint is set to 8192?
model is still RoPEd to 16348

subtle phoenix
faint olive
#

Hehe I know the feeling

#

I personally end up rewriting entire codebase 💀 xD

tribal dagger
#

V2.2?

woeful wedge
#

My wallet's ready

woeful wedge
#

@subtle phoenix Can we expect for OR to pick it up in the nearest future?

subtle phoenix
fiery oxide
#

Maybe we can also make a poll on Infer to update 2.1 to 2.2

#

imo 2.2 is a major improvement

woeful wedge
fiery oxide
#

About Infer - I should have some news tomorrow, 2.2 was very warmly received on community cloud

#

so it might be either polled or just swapped

#

but people seem to want it over 2.1

fiery oxide
#

Seems to be no news atm, which is bit weird
Svak told me that either a poll or swap should have been on monday, along swap from v1 to v2 for Magnum
Neither happened yet

#

Should be tomorrow

sonic merlin
simple smelt
#

so look like it's on Infer?

subtle phoenix
#

I got swarmed by some other stuffs xd

woeful wedge
#

Does anyone know if any provider will do the FP16 quant of the model or is the loss minimal on FP8?

visual oasis
#

i can barely notice any different between FP4 and FP8. FP8 is enough

devout crow
#

holy moly, euryale update

surreal anchor
#

😍

woeful wedge
#

Hope it's a few hours away 🙏

subtle phoenix
errant vault
#

Is it uncensored

tall mirage
#

As far as I can tell, but there's still a slight bit of positivity bias like you see in other llama 3 models. It's probably the best one based on it I've tried so far though.

sonic merlin
#

This model will certainly be a fine companion for most role play settings.

errant vault
#

So it’s less hornh?

#

Horny?

tall mirage
#

Off topic but I don't know what they did to make Mistral Nemo 12B Starcannon so good or just because it's the bf16 quant being hosted, but if they can do that with a larger parameter model so it's smarter, we will be eating good.

tall mirage
sonic merlin
#

But I'd recommend to generate a few replies to the same request with this version and the old (if it is still available) to get a feel how they are a bit different.

woeful wedge
#

Euryale 2.2 can cook. Love it

#

And the cut off problem of 2.1 seems to be gone. It's pumping word afrer word lile there's no tommorow.

iron drift
#

Is there any reason why it is at 8k context? It's 16k on Infer itself

sonic merlin
#

8k is what the model spec is suggesting, provider can offer more (sometimes less) e.g. through RoPE tricks, the real context window is in the provider tab as max output.

stray flame
#

The update feels like such a downgrade, feels like its much harder to get decent responses now that incorporate good dialogue

subtle phoenix
stray flame
#

I remember there were a couple different providers for it before, Infermatic being the better one and the others using a quantized version ThinkDrooling

#

Are either of these comparable to Infermatic on 2.1 previously?

tall mirage
stray flame
sinful sparrow
#

(If at all, it was unnoticeable from tests)

errant vault
#

I’m liking starcannon

woeful wedge
#

Has anything been done to Euryale for Llama 3.1 hosted by infermatic? Is it corelated to the massive price drop? It performed quite awfully in recent gens. Gibberish generations, lacking creativity, endless adjectives thrown at you with no real coherence. Settings were untouched, just inexplicable stream of bad generations.

#

If performance was sacrificed to reduce cost, I'd much rather pay more for a sane, consistent model.

visual oasis
woeful wedge
#

Very barebones description, but anything Euryale related is usually good.

subtle phoenix
unborn monolith
#

idk he said it was untouched

atomic mountain
#

DeepInfra seems to be churning out garbage, for both 2.1 and 2.2

#

other providers on both 2.1 and 2 2 seem fine

woeful wedge
subtle phoenix
subtle phoenix
#

kk will derank deepinfra

#

and ping them

atomic mountain
#

yeah exact same everything

woeful wedge
#

Not until the price drop announcement

subtle phoenix
#

I think internally it's prob Deepinfra serving it

#

but weirdly...

#

shouldn't deepinfra be the 1st endpoint it try?....

#

(so it should log deepinfra regardless...)

#

There's a bug I'm trying to track regarding how fallback providers are not being logged properly

#

but... if it's Deepinfra serving the model and that it's the 1st host.... it should have been logged, NOT infermatic xd

#

ugh....

woeful wedge
woeful wedge
#

Nope

subtle phoenix
#

then yeah I'm pretty sure your request hit Infermatic

subtle phoenix
#

I double checked our commit history - the last refactor to the endpoint filtering system was ~4 days ago

woeful wedge
#

Yeah. Whatever was done since that announcement, somehow resulted in whatever is happening now

atomic mountain
#

I have novita and infermatic enabled for .1 and .2 respectively and everything works, as soon as I enable DeepInfra it all goes to hell

#

I'll admit I don't know much about the technical side but I think something similar happened in the past when a provider was quantizing prompts?

woeful wedge
#

Yesterday I wqs using it quite a lot at 1.55 rep pen and it worked swimmingly

#

The typical good Euryale stuff on all cards

#

I think Deepinfra shouls be removed. It's broken as hell and it's possible that somehow it's causing this too.

subtle phoenix
#

kk DeepInfra should be deranked now

unborn monolith
#

but anyways, isnt 2.2 just poorly recieved compared to 2.1 in general?

woeful wedge
subtle phoenix
#

it's only "deranked" - meaning if you call the model without a specified provider, it will not be picked as a candidate

woeful wedge
#

Aha, I see. So, I'm seeing that somehow Text Completion is causing issues, but I have no idea why. Chat Completion 2.2 from Infermatic works good

#

In text completion it either does not respond or is utter garbage

bold sphinx
#

But people have varying opinions so

unborn monolith
#

the man himself

#

sorry brother. I was going off the ugi leaderboard

woeful mist
#

If they dont respond maybe @fiery oxide can shed some light, i know he was a fan of this model.

iron drift
#

I have been using Euryale 2.2 with 1.17 temp, 0.075 minp and 1.05 rep pen

fiery oxide
#

I've noticed right away, even before Infer officially picked this model up, when it was on community cloud, that Euryale 2.2 prefers lower temp than 2.1