L3 installment of Euryale, one of the best (if not the best) RP models. Engaging prose, very good adherence to character cards, very creative, almost zero slop.
https://huggingface.co/Sao10K/L3-70B-Euryale-v2.1
#Sao10K/L3-70B-Euryale-v2.1
1 messages · Page 1 of 1 (latest)
This has my vote. This is the model i want to try most next to Awiz (abliterated)
This model is currently hosted by infermatic's community, and people are really loving it
Is it uncensored?
Completely
wow, even more interesting. Now the last question: context size?
8k, it's L3
RoPEable to 16k as far as I tried
Let's goo, add it pls
+1
Rope to 16k and add!, it's a very competent model. Easily best I've used for L3 70.
+1
+1
This model is currently my favorite
Very good at Story writing,some times it's answer is better than claude opus for me
+1
Tested it yesterday. Very good for both roleplay and story writing.
Those in my server trying it at infermatic are liking it alot except for (infermafic’s issue not the model) extremely long response times, waiting over a minute.
I really hope we see thos roped to 16k context cuz 8k is just not enough.
Infermatic's server turned into Sao's fanclub in less than a week lmao
Yeah. I cant get over how bad their response times are. 264 seconds for some euryale responses! Like whaaat. I dont tolerate more than 15 seconds. Lmao
they are probably have just one 4x A6000 node for Euryale rn
and it's overloaded
vLLM can batch well, but even it has it's limits
Apperently most models are pretty slow. Astoria isnt i guess tho. One of my members loved astoria cuz its filthy. But i dont tbink (idk i never tried it) it follows indtructs well
But back to topic i hope they add euryale so i can finally see what the fuss is about
Well, it's a second most requested model here after UnleashedWiz (soon!)
OR definitely should look into this
anywhere i can try it?
id love to try it
id sub to infermatic but i already added some credits to my OR account a day ago damn
+1
Seems to be hosted on novita https://novita.ai/pricing
Explore the full spectrum of AI APIs tailored for image, video, audio, and LLM applications. Novita AI is designed to elevate your AI-driven business at the pace of technology, offering model hosting and training solutions.
@subtle phoenix can you add it, please?
(with 16k context if possible, ty)
sadly I think Novita only do 8k
man ive been spoiled with 32k context
Yup working with them to add it. They said earlier today that there was some kink that needs to be ironed out, but I got the PR up already
Should I just merge it lol?
Merged, should be up in 5 mins
Note the responses might be gibberish
(That's what they told us xd)
It's up, First test came through fine.
bruh Novita doesn't have minp which is a must have for this model
will ping them about that
This model is at it's peak on temp 1.5 and min-p 0.1
It like adores high temp with min-p
This model seems to love Markdown. It even spits out crazy formatted text in the middle of a text only role-play chat. Not exactly gibberish, but a bit strange nonetheless.
It sure likes XML too
Feels like role-playing with a coding model
how is their pricing btw?
15$/month
ooh it's sub
they are too slow and ratelimited for you prob
for 500 token responses with high latency
You prob can talk to Svak, they have enterprise tier
wait $15 for 500 tokens?
no
Unlimited
They are just kinda slow
- default plan has 2 concurrent req limit
each response seems capped to 500 tokens or so if I read it correctly, that is a bit annoying, esp with high latency.
It's not capped, i've got 3000+ from their Wiz
and 1500+ from their Euryale and Stheno
Then I read it wrong, let me check.
Hmm: -> "512 token responses, 86,400 requests per day." for $15/month
On their landing page.
ehhh, Svak messed up a bit
Their site is kinda unfinished and outdated atm
There are no limits, as far as my personal experience goes
Noted.
anyway you can message @daring shale and ask
Okay, I only have gotten the markdown treatment once in about 10 tries, this seems an acceptable level of annoyance.
I think acceptable temp starts with 1 on this one lol
Lower than this and it starts to lose coherence
I am using temp 1 currently, yes, I forget to mention this.
well, I mostly used it with temp 1.5 on Infer lol
As long as I don't complete replies in French or Spanish like Dolphin now I am fine.
those settings seem to work fine for it
tho this is more ideal, but again no min_p
Thanks, that are basically my settings too, very neutral, only changing them when absolutely required.
(the first settings from you I am referring to, of course)
I usually stick to temp 1 and do not get high as models tend to freak out/produce gibberish (the only other models I tried with very high temperature and that did not completely go bonkers right away were GPT-3.5/4, but I last used them 6 months ago or so)
Euryale is like the only model series I know to consistently prefer high temp for some reason
Damn it's really hard to tame with just temp and top_p
Those are only on the UI, we don’t cap at the API. In the moment the request limits for the api is 18/minute and 3/parallel
So, is Infermatic on OR possible lol?
or are we a tad bit too slow for that lol
Only on the discord 
chatting with OR team already?
hm, A100s finally lol?
How does OpenRouter works with other companies?
Hey howdy!
Euryale is already on som H100
We route to providers and pay per tokens pricing
Heyo!
cc @idle grotto
naizu, so are you interested on Euryale?
I can make a DM group so we can discuss further
Sure!
Thanks @fiery oxide for the intro kek
my pleasure lmao
Hmm definitely experiencing the gibberish responses warned about above.
Very excited to try it out once that's ironed out though!
Would be awesome if Infermatic and OR did work together. Tried the former, couldn't figure out how to get it working on TypingMind so staying here even though I roleplay primarily with TypingMind and Infermatic seems to excel with the RP models. Anyway, looking forward to trying out this new holy grail of models.
yea, TypingMind uses chat completions and those are a bit wonky on Infer atm
Svak and team are working on that
If we route to Infer, that'd solve the problem right?
You are doing prompt -> message transform on your end, right?
we doing messages -> prompt
and I'm pretty sure most of infer model do prompt right
(we actually do both tbh xd, it's wonky but... work thus far)
hmm, Infer has some problems with system role not being supported (or at least had) and some with strict user->assistant order too
So I think some of your workarounds can come in handy
yeah a lot of the jinja template enforces that
yeah, and they do run on vLLM and Aphro, so pure jinja formatting there
They don't do any formatting besides what vLLM and Aphro do
I don't wanna hijack the conversation or anything, but I can see the model is already available on OR through novitaAI. Thing is, when I try to run it through ST it spits out error404. Overloaded servers?
Yea it went offline wtf
prob Novita fixing stuff
Is Infermatic not slow as fuck anymore? Back when I used it there was consistently 30 seconds to first token at a minimum.
And that was on the 70B models, their 120B was like 60 seconds. Idk if times like that are acceptable for OR.
Though, being able to access those models without paying $15 up front would be nice
7t/s on Euryale (top 1 or 2 by usage), with ~3s latency
Infer sped up somewhat, and Svak said they are still working on better speed
Astoria is like 15t/s usually (4th by usage)
Is that latency with empty context?
with like 2-3K
I remember it getting really slow when I pushed past 8k, but that was a couple months ago
things def improved since then
+L3s are faster than L2s
We indeed improve on the speed of the models, now we are focused on decreasing the time of the most used ones
Midnight/Euryale
And miquliz it's way better than before
I swear 
btw, maybe consider fp8 KV Cache
It should give a perf boost and memory usage reduction w/o much (if any) quality loss
16k on Openrouter a reality? I know Infermatic got the extension.
Yeah that'd be pretty epic.
Wouldn't that be a great difference? fp16 -> fp8
Stay tuned
very small diff, esp for fp8 (and not int8)
Barely noticeable
Nobody even noticed that on my hosts
and I always do fp8
even full model weights in fp8 don't lose much compared to bf16/fp16
Supra is insane
Is Supra finally gone? He's the reason I left the server
He prob can't tell the difference, he just pretends
he got the boot
two times
I'm the only techdev role now lol
Infer server has been supremely friendly since Supra got kicked lol
Yeah
Ur free to come back now
xd
so btw, maybe do a test run on fp8 KV cache like you did with RoPE?
fr
he couldn't tell Wiz from a 8B model lmao
we can make a test for qwen
okie
Qwen is on Infermatic? Isn't that like super censored?
it's kinda censored tho
Hope Magnum-72B will kick it out in the next poll
Why dont replace it with llama3?
Yeah that's the one I tried, extreme positivity bias. Reminds me of Mistral 7B tunes
as a generalist? and swap something else?
Yeah
Oh you finally fixed the website! No wonder I couldn't find the list on Discord.
btw dmed how to do fp8 kv cache
Llama3 -> Qwen and Qwen -> Magnum or the one that wins the poll
arigato
Yeah (finally)
XD
It's still lacking some things, but we'll get through them
I just posted my review of euryale in the feedback section of Infermatic discord. TLDR: It's fun. I enjoy it. But for regular RP I'll be sticking to wizard, and maybe midnight for a few cards.
#1200053136082079845 message
Uhh this started happening on fourth response, everything was normal. 😨
~2.2k context, OpenRouterk, NovitaAI.
Is it consistent?
like, does it go away with swipes?
hm works fine at 5K
I started a new chat and it's still bricked...
what kind of settings do you have?
try like temp 0.87 top_p 0.81
Normal settings. Going crazy regardless of settings. Tried switching to text completion to see if anything different.
maybe an intern at NovitaAI tripped a cable or something
I was having a working chat earlier today.
restarted ST 😓
The gibberish response are because the servers are dying
some problem at NovitaAI
best to use the sampler settings provided by the author I reckon, i.e.
Temperature - 1.17
min_p - 0.075
Repetition Penalty - 1.10
👀
oh well
those work decently yeah
Problem is that Novita doesn't have min_p
So we have to wait until Infer will be added to OR as a provider
The limited sampler settings are the one thing that tempt to just renting a cloud computing unit and setting up oobabooga in the cloud.
Ooba sucks, i use Aphro
whats wrong with ooba?
breaks models often, no batching, AWQ and GPTQ are broken
like, it's just not worth to use
welp, haven't had any problems personally
only used it with 21b models at most
I fully switched to using OR anyways
because of the need for bigger models and not having multiple GPU's :c
i mean you can host 70B on A6000 in 4bit
I do that
A6000 is like 0.34/hour on Runpod
I don't own a A6000 too sadly lmao
I mean as long as you don't go into 70B in bf16 territory
It's 2xA100
Or MI300X
both around 4$/hour
I'll hit you up if I ever need help setting up a runpod unit, aight?
The model is back up again btw
Ooba does have the new DRY sampler though, I wonder if it's any good.
Aphro has fan favorite Smoothing Curve tho
Quite popular on Infer
I wish vLLM had more samplers
but it at least has min_p lol
beam search is cool too
Yeah this is also the issue I was having too. Tried different settings, re-rolling responses etc, and same thing. Seems it may just take some time for it to get sorted
Ah crap... I mistook this model with #1248338089663926313 :d
We only asked @bold sphinx for permission to route to Stheno, but not this one yet
Lol. So this one on OR is actually Stheno? xD
To be fair, they share a dataset :p
I havent seen gibberish or alot of the issues i see in here. Wonder what im doing “right”. Not without issue, but no errors or gibberish lol.
I enjoy using OR. It is where I use most LLM's except for OAI and Gemini. I usually load about $20-$50 credit each month, depending on my mood, but never use them all so just accumulating them like Halloween candy.
We have it :p
i know Response might be silly but I don't expect that silly, like bunch of random words that have no meaning?
something wrong with provider?
Model is very good, but Novita the provider is having troubles with it, so it's underperforms on OR rn
damn
We have to wait until either Novita fixes it, or another provider (Infermatic) gets added
im this 🤏 close to finding a cloud gpu host to run whatever i want
It's just weird when I use chat comp, and use force instruction (lecacy mode, Llama3 instruct and instruct name). When I let the prompt format by OR, it doesn't output random words anymore, but the answers still mid
just discovered this model like 20 minutes ago and naturally as soon as I'm enjoying it it starts throwing 404 errors
Chat Completion API
{"code":404,"reason":"MODEL_NOT_FOUND","message":"model not found","metadata":{"reason":"model: sao10k/l3-70b-euryale-v2.1 is not available"}}
EDIT: Seems to have recovered
Even before i toyed with it to make quality better, i didnt have this issue? Hmm.
My larger cards it isnt handling well, but my smaller one its handling amaz-balls. So perhaps ur card data is too complicated/much for it. Its only got an 8k context. That or ur api settings and overall preset arent ideal.
Im toying with it abit for now but really holding out for infermatic to provide it, with their roped 16k context, and stability. I just pray they can get response times better. The few i kno using it at infermatic already the day it landed there said 200+ seconds for a response 😭
Blank blank blank
okay i cant lie, this is actually very nice
though the constant errors are really annoying
oh. 
Yikes
babe wake up new p parameter just dropped
maybe use chat comp and untick Legacy
Damn... the response is almost human, almost similar it is as opus. I love the writing of this model. Unfortunate for 8k context but its well damn good enough.
I had plenty of gibberish, too, until I removed the system prompt that comes with the instruct preset. Could it be related to markdown? Maybe it's just a lucky coincidence.
creative, smart, I really like this model, I wish it have a larger context
why my Logit Bias not sent? (it is still sent when using wizardlm2-8x22b
Do any other provider than OpenAI even support Logit Bias?
well, lepton maybe
(also extremely tricky to get these right, as they need 100% match the correct tokens)
because with wizardLm2-8x22b it's still work
did the quality of the responses change after removing the prompt?
Yeah, I'm using my own prompt in plain English, put in the lorebook – system role at depth 1.
how is it now? is it any better, or just different
regardless, this model is an absolute blast
if somehow, someway a 16k context variant can happen, ill die happy
requesting it
doing the lords work
Stay tuned!
idk about 32K, but 16K are definitely possible
Bc Infermatic has that, and Svak confirmed to me that talks about bringing Infer to OR are going well
Yep yep
uh oh
quality slowly but surely just degraded
taking this bot back to when the tower of babel fell 😔
also damn novita errors a LOT
Last two requests: "504 Gateway Time-out"
yeah its done that a ton today
Worked a few hours fine, now it seems to brake down again or get overloaded -> https://openrouter.ai/models/sao10k/l3-euryale-70b/uptime
smh this model always goes offline like every other request when I need to use it
yep, it's gone for now -> 404 "model: sao10k/l3-70b-euryale-v2.1 is not available"
Now I got a reply again
overloaded I’m assuming; not sure what other issues would cause 3-5m intermittent blackouts
that could also be general work on the system, restarting/reloading etc
what happens when you are not google or amazon and only have limited resources
But now it seems overload is more likely, just got another gateway timeout
Cuz the host doesnt support logit bias
To say that this model is currently unstable is an understatement.
The model is great but yeah the provider isn't stable.
quick question since I never got it set up with my own presets: could one of you share the preset they're using and having good luck with?
whenever I use one of my own configs meant for more traditional llms I just get garbage results
see -> #1250867165737914569 message
I meant the prompt as well
Thanks. I'll try that! :D
Model still dead. Providers are missing out, it's a great model, guaranteed to be a money maker
Cheers
Since the provider of the model seems to be completely down (at least when it comes to requests for this model), I wonder why the little status blip is still showing green next to it on the site.
It's cached and we're nor purging it fast enough I think
Tho PR is up, not 10 mins but it's getting there
Okay, it's just more obvious than usual since the thing's been down for hours 🙂
btw Infer only for Euryale now?
Or other models (Noromaid, Wizard, etc) also will get added?
Just euryale for now!
well this is an odd one.
seems the model selector in Silly is broken
bruh I hope that won't kill latency
Tho Svak did some optimizations today
No, provider went down
Check the availability tab, it's been down for some time
Infermatic is deployed
yeah same for me (Firefox)
Use a different browser, Safari works fine.
oh wat
Still only seeing NovitaAI in the providers list. Did a forced reload
Yeah Firefox is a bit too strict for this
same
Deployment takes about 5 min xd
Ah I meant to say merged
Safari works too
So Chrome + Firefox broken but WebKit (Epiphany) works.
yeah GNOME Web is as close to Safari as one can get
without a mac
works on brave for me :d
Hmm...
I'm using Vivaldi, which is Chromium-based like Brave, and it works fine
chrome works too:
weird
hmmm maybe OS level bug
I'm using Linux (Fedora)
same
Model is still toast unfortunately
Are you on X11 or Wayland
X11, GNOME

X11, GNOME
GNOME 46 here specifically
using Chrome + FF through Flatpak
Nvidia Propriatary drivers
GNOME 45
Radeon driver, FF installed from RPM Fusion
I always assumed this was an iframe permissions problem, Firefox is much stricter than other browsers
Model is up, it seems.
Nah, still 404 😦
But no instant rejection anymore
Nice
Maybe SillyTavern needs a restart/reload to pickup the new provider
Still can't see it in provider list tho
will have to wait till that cache purge itself I think
Hmm. I still get 404 via API/SillyTavern, even after hard restart
(tho the router is not relying on the cache to route)
Aaand working
works
still
error: {
message: "{\"code\":404,\"reason\":\"MODEL_NOT_FOUND\",\"message\":\"model not found\",\"metadata\":{\"reason\":\"model: sao10k/l3-70b-euryale-v2.1 is not available\"}}",
code: 404,
},
btw maybe extended variant for Infer?
We have 16K there
Yup
I will just fix it as-is for now, will move it to another variant when this model has more provider I think?
@subtle phoenix Infer doesn't log, terms were updated today
Im on Windows 10 - Firefox too and i cant see the provider uptime either so i think its a browser issue
cc @idle grotto - some update to that will be added soon
But basically when we include the privacy policy URL, we show that tag for ppl to visit.
Still 404 "model: sao10k/l3-70b-euryale-v2.1 is not available"
which frontend are you using?
SillyTavern
I try from the console with curl
Same with curl -> {"error":{"message":"{"code":404,"reason":"MODEL_NOT_FOUND","message":"model not found","metadata":{"reason":"model: sao10k/l3-70b-euryale-v2.1 is not available"}}","code":404}}
It took about 30 secs and produced two pages full of newlines though until the error popped up
Precisely:
$ time curl https://openrouter.ai/api/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer $OPENROUTER_API_KEY" -d '{
"model": "sao10k/l3-euryale-70b",
"messages": [
{"role": "user", "content": "What is the meaning of life?"}
]
}'
[~100 newlines omitted]
{"error":{"message":"{\"code\":404,\"reason\":\"MODEL_NOT_FOUND\",\"message\":\"model not found\",\"metadata\":{\"reason\":\"model: sao10k/l3-70b-euryale-v2.1 is not available\"}}","code":404}}
real 0m41,330s
user 0m0,034s
sys 0m0,023s
try to add provider: {order: ["Infermatic"]}
I'll try, thanks.
localhost xd
oh lol
lab's exposing OR's guts XD
See provider status and make a load-balanced request to L3-70B-Euryale-v2.1 - A model focused on creative roleplay from Sao10k.
- Better prompt adherence.
- Better anatomy / spatial awareness.
- Adapts much better to unique and custom formatting / reply formats.
- Very creative, lots of unique swipes.
- Is not restri...
OpenRouter is actually all just running on labs laptop.
the entire thing! /s
And Infermatic is powered by Svak's horde of hamsters! /s
it's actually me manually typing out the results
whenever you ask the AI something I go an google it
XD
lolol
New error: "{"error":{"message":"{"error":{"message":"litellm.Timeout: APITimeoutError - Request timed out. \nerror_str: Request timed out.","type":null,"param":null,"code":408}}","code":408}}" with
time curl https://openrouter.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d '{
"model": "sao10k/l3-euryale-70b",
"provider": { "order: ["Infermatic"] },
"messages": [
{"role": "user", "content": "What is the meaning of life?"}
]
}'
bruh
@daring shale
checking
The model it's up and running
there must me something wrong on the request
Still the same error "{"error":{"message":"{"error":{"message":"litellm.Timeout: APITimeoutError - Request timed out. \nerror_str: Request timed out.","type":null,"param":null,"code":408}}","code":408}}
real 0m40,706s
user 0m0,032s
sys 0m0,025s" with the above command.
yeah, hit https://api.totalgpt.ai/v1 endpoint directly rn (Infer endpoint btw)
1s latency, all good
OR issue?
Does it work on playground? https://openrouter.ai/playground?models=sao10k%2Fl3-euryale-70b
yes, 1s latency
works
Issues aside I feel like this model is still very "dry"
It's flowery but dry.
big step up from older OWMs
wait litellm?...
Playground works, but API still gives me the timeout error
That comes from the endpoint, I am using the curl command from above
Literally straight from the OpenRouter webpage, only added the provider preference:
time curl https://openrouter.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d '{
"model": "sao10k/l3-euryale-70b",
"provider": { "order: ["Infermatic"] },
"messages": [
{"role": "user", "content": "What is the meaning of life?"}
]
}'
That should work, shouldn't it?
"order"
You lost a "
Correct.
Does it work now?
But I still get the same llitellm error with the " added
hmmmmm
If that doesn't works try streaming: true
let me try
Like this?
time curl https://openrouter.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d '{
"model": "sao10k/l3-euryale-70b",
"provider": { "order": ["Infermatic"] },
"streaming": true,
"messages": [
{"role": "user", "content": "What is the meaning of life?"}
]
}'
still produces: {"error":{"message":"{"error":{"message":"litellm.Timeout: APITimeoutError - Request timed out. \nerror_str: Request timed out.","type":null,"param":null,"code":408}}","code":408}}
That seems to work, producing ~100 chunks instead of newlines.
Or maybe 1000 chunks, still streaming...
hmm, try non stream but with a small max_token
Works in ST
@sonic merlin try again
OK, now it works with SillyTavern too, thanks.
Works without streaming, too (I never turned it on, I prefer complete replies for some reason).
Okey, good to know
Wtf every paragraph from Magnum starts with stuff like "He X, Y'ing", "She blinks, taken aback by", or "She looks away, her cheeks flushing".
Something with your settings perhaps?
Magnum didn't have this problem when I hosted it today for Infer's community
I used temp 0.87, tfs 0.3 and rep pen 1.08
also I prob should create a Magnum thread lmao
FYI: Infermatic needs to added to SillyTavern as an OpenRouter provider in public/scripts/textgen-models.js (just mention it here as a quick fix)
(to hard route and avoid the 404 errors)
Second one isn't so bad.
I guess that one wasn't a fair comparison since I was using another model before switching which devolved into Action "Speech" mini paragraphs.
Talking to them, they might do int8 or fp16 eventually
is all their big stuff in int4 or just Euryale?
Confirmed for just euryale yeah
SillyTavern don't have an infer provider option yet, how can I switch to a preferred provider on OR?
Requests should default to Infer for now
ST will add it soon enough prob
Hand code it for now -> #1250867165737914569 message
doesn't seem to be necessary
Without it I can't force the provider to Infermatic in SillyTavern and get 404 errors (at least at that time)
hm all reqs go to Infer for me
wow. this is a first lol
Didn't work for me at that point.Anyway, the patch/diff is brain dead simple:
diff --git a/public/scripts/textgen-models.js b/public/scripts/textgen-models.js
index d8f36cf4..01743e0c 100644
--- a/public/scripts/textgen-models.js
+++ b/public/scripts/textgen-models.js
@@ -39,6 +39,7 @@ const OPENROUTER_PROVIDERS = [
'Novita',
'Lynn',
'Lynn 2',
+ 'Infermatic',
];
export async function loadOllamaModels(data) {
i dont even know if im being routed to infermatic becuase this is all i get in the activity page lol
the infamous Shadow Provider
Claude training data moment
wait wat
yeah lol
its just blank
if i hover over the blank space it says "Unknown Provider"
it's there for me, weird
void provider
work for me too lol
FWIW I see the correct provider (Infermatic) in my usage
damn
@fiery oxide btw, recommended paraments setting please?
temp at 1.25 or 1.5 and min_p 0.1
And maybe some presence penalty like 0.3-0.5
0,01 or 0,10? that high?
damn, kinda high, i usually just set 0,02
You prob don't use temp 1.5 usually
i just use 1
This model performs better at high temps imho
ok i'll try it
basically, what I'm recommending is Universal Light or Universal Creative presets in ST
this too?
yeah
infer quantz it? cuz Novita seem does and that's why it suck
Infermatic does not quant models
All models are in their native precision
So Euryale is in bf16
Also Novita has straight up broken quant
bc AWQ 4bit Euryale should be that bad

dang, not gonna lie, these generation times are pretty slow
like on average 30-40 seconds
im a patient lad though
30-50 sec, but it's good enough for me to sit and wait lol
yup lol
is this the classic Infermatic Is So Slow?!???!!!111/// thing ive been reading on reddit or
speeding up is a pain in the a
but work on it is ongoing
as long if it is not too long to a few minutes then it accepts able
(also, yes, I'm sort of an informal Infer rep there)
It's pretty much never that long unless you are genning 3000+ tokens
i can handle this speed
i handled the dark era of Pygmalion on google colab getting fuckin 0.5t/s
godspeed fellas
Seeing downtime, Svak is looking
You are prob getting Novita rn
Should soon be fixed
We are back up!
try now
on it boss 
Context window temporarily capped at 8K
awwwwww
bc vLLM keeps crashing
should i just select infermatic only? idk wtf is happening with novita
yeah you should
Novita has a broken quant
so when will OR bring larger context variant?
when we fix it
hopefully soon
It was at 16K, actually
before it started crashing
bruh why can't any inference engine just work
fr
seems to be fine now
responses feel weird now, im sure got it from Infer
hm. they do feel slightly different huh
what do you mean?
okey, lmk
Context fixed, now back to 16K
nature is healing
why my sillytavern still max at 8k? I usually don't need to unlock context to max context using chat comp
OpenRouter might still clamp the context size to 8k?
i unlocked mine anyways
work?
hold on
er... shit, it seems like its not
@daring shale hate to ping you man but is there a delay for the 16k update?
its still capped at 8k
wont budge here
even at unlocked mode and set to 16k

medic!
Amateurs
The price was doubled?
Still worth it, but yeah, a quiet price bump is kins of a low blow
And if you're bumping it, it better work at least
Idk why it won't let you access
well not exactly bumping it, that was a miscalculation. NovitAI is giving you int4 and 8k tokens charging you 0.75. We're giving you double context and 4x precision for 1.8
lel
@subtle phoenix
I can't help with that one
looks like ST
cc @wet marten
darn.
So you're a new provider of the model?
Ah, I see. Don't know who you are, but I hope yours doesn't die every half hour
The list is hardcoded
If you can give an API for that, that'll be better
It is a really simple patch -> #1250867165737914569 message
sigh im just gonna stop chatting for now
im burning through credits and getting some real bad hallucinations lol
Maybe it's you
did manage to get through 8k though
still dont know why i cant do 16k
ill probably do a reinstall idk
What version of st are you using?
latest
staging?
well try again and lmk
roughly stuck around this token count
why don't you try something that isn't ST to test if it's your api key or ST?
alr
mmm
not working on venus either
i must be subconsciously doing a big oopsie lol
hmmmmmmmmmmmmm
funny enough even the lst here still says 8k context
im gonna stop replying now cause im burning a hole thru my credits now lol
That number comes from OpenRouter, if the context size in their config does not get changed (hard if provider offer different context sizes)
damn.
The replies are much smaller with Infermatic
And it rushes to complete the instruction/story as well. At that price tag, one should expect more, not less and worse.
Novita kept crashing, which was annoying, but it output a lot more and the overall quality and coherence was better than now.
I tend to agree after more than a hundred replies now, though such comparisons are still very hard with probability based systems like LLMs. But at least at the shorter replies part I think I can see that in numbers in my activity list.
Even at temp of 1 it goes fully unhinged for no reason, far beyond what one would call creative
Just comes up with the most random shit for no reason, while yesterday its creativity was stellar with same settings
Whatever was done recently, it had a very bad effect on the model's output length and coherence.
hell nah endpoint start poiting to Novita now
Damn can we have Novita back, can't believe I'm asking for it 😭
it's still there fyi
It's marked as yellow, does that mean it's down?
No, just "degraded" (since it was 404 a tons earlier)
Is it a complicated process to change it in ST?
Is Infematic running a quant of Euryale? I just can't understand why is it so ass compared to yesterday's Novita
Maybe it's a settings thing, from all the users you are the one having issues. If you want settings recommendations feel free to join the discord and tweak with them #1253005075064819844 message
We are not running quant on Euryale, as I already say we are full FP16
That's why I don't understand it even more so. I'm using the settings reccomended to me here which worked wonderfully on Novita (Which I'm guessing is 4 bit?) And no, I'm not the only one, at least one more user complained below my insight. The system prompt is pulled from the model's hugginface page, which also worked great with Novita.
Like come on, I'm paying double the price and getting half of what I did before and it halucinates like crazy? That's terrible.
Casual reminder that applying unofficial patches WILL cause merge conflicts on pull
it keep point to Novita fu*k
If the person doing is dev, it shouldn't come as a surprise lol
I'm not concerned about devs. The problem is that every now and then there are support cases with merge conflicts usual peeps got from random Reddit/Discord patches. Dev should ideally do patch in a pull request to upstream if it is something valuable, otherwise it will backfire at me later. Hope I made it clear.
Novita seems to work fine, I didn't notice any quality differences to infermatic.
Just make sure you are using this formatting and this sampling settings.
eh, Novita hallucinates hard after 4k
I take it back, Aetherwiing is correct
welp, its a new day so
i hope that with a new chat i can break through the 8k barrier mark now lol
think its dead again
nvm back up now
also, i went ahead and installed a new fresh copy of sillytavern and it still displays 8k context
Expected. The context size comes verbatim from OpenRouter's API.
yeah but im also capped at 8k 
like it wont go beyond 8.5k tokens for whatever shitass reason
pissing me off
i figured it was just a visual glitch at first
Novita upgraded to fp8, I think it should be same as Infermatic. Infermatic hallucinates aswell, it's the checkpoint.
Novita updated to fp8 and also with 16k extended context tokens fyi
btw they literally upgraded after I asked them, pretty nice work
It's been in the work for the past 2 days, but yeah they're great
Oh alright.
OpenRouter API still says 8k though, but I guess that will change soon?
prob cached :d
updated for me:
(the base model is still 8k fyi, but provider can do their own max output via Rope/yarn etc...)
I hope that I can implement my API at some point, I'm working on a framework that might be able to give ridiculous amounts of context but thats slightly off topic
https://openrouter.ai/api/v1/models -> 8k for Euryale for me
Ah, ok.
Interesting.
this is novita btw lol
sigh as much as i love euryale i really gotta stop, my credits are sucking into a black hole
Novita has stopped working for me for some reason, with exactly the same settings Infermatic is fine
i cant select infermatic as my sole provider lol fml
That is a one line fix in public/scripts/textgen-models.js
mmmmm but wont this cause a merge conflict for future updates?
for when the update gets pulled or did i read wrong a couple days ago
ehhhh i guess i could just back up the js file and restore it for when an update happens
then either learn a bit git (e.g. git stash) or wait or ask the ST devs that they should create proper, installable build artifacts, which can be modified without producing git merge conflicts every time. git is not a consumer deploy scheme
Abusing git and then telling others not to modify their code is a bit crazy IMHO.
Infermatic's price for Euryale was reduced to 1.5$/M, and the precision still stays at bf16
Does Infermatic enforce shorter responses on their side for Euryale? I can't combat it no matter how hard I try
The reason is always 'stop'
no, we do not
And it's extremely varried too. Sometimes it will give me a few hundred, still short by a good margin and sometimes it will just rush the completion in 50-100 tokens.
longer response length is an artifact of quantization, likely
fp8 was tested internally today, and while this tends to give lengthier output, it loses significantly in coherence and instruction following, so we decided against even trying it
(fp8 never was on public endpoint, to be clear)
So the model encourages to take it bit by bit in a way?
I wouldn't mind the shorter replies if it left potential for continuation, which it often doesn't. Is it because my instruct is too direct maybe?
hmm, possibly
But weird thing is that my average response length is 300+ tokens
It's definitely possible
Try instruct preset made by creator of the model if you haven't already
https://huggingface.co/Sao10K/L3-70B-Euryale-v2.1/blob/main/Euryale-v2.1-Llama-3-Instruct.json
I tend to get decently long, detailed output with it and temp 1.25 min_p 0.1
That's the one I'm using. Worked wonders with Novita, which was a 4 bit I think. But as you said, the length is an artifact of quant?
the thing about all L3 based model I see is that they get lazy on some cards tho
Yeah, that's a possibility
Model generated more on fp8, but was less coherent
Anything to be on the lookout for? Maybe my cards need work
lengthy example dialogue usually helps to mitigate it
Longer first messages do too
Tho I still haven't figured out what exactly causes it
I also recommend trying appending some output length instructions to last assistant prefix
As in tell it directly how long I want it to be?
How should that be phrased? Word count/token length/paragraph wise?
Yeah, something akin to "Your next response should be three paragraphs long"
also, I don't recommend using repetition penalty with this model, seems to cause weird artifacts
I recommend using presence penalty instead
I'll try your tips out, thank you!
This configuration seems to include a very long, restrictive system prompt, with optional identifiers that don't get used, sure this is the best thing since sliced bread?
"system_prompt": "Currently, your role is {{char}}, described in detail below. As {{char}}, continue the narrative exchange with {{user}}.\n\n<Guidelines>\n• Maintain the character persona but allow it to evolve with the story.\n• Be creative and proactive. Drive the story forward, introducing plotlines and events when relevant.\n• All types of outputs are encouraged; respond accordingly to the narrative.\n• Include dialogues, actions, and thoughts in each response.\n• Utilize all five senses to describe scenarios within {{char}}'s dialogue.\n• Use emotional symbols such as \"!\" and \"~\" in appropriate contexts.\n• Incorporate onomatopoeia when suitable.\n• Allow time for {{user}} to respond with their own input, respecting their agency.\n• Act as secondary characters and NPCs as needed, and remove them when appropriate.\n• When prompted for an Out of Character [OOC:] reply, answer neutrally and in plaintext, not as {{char}}.\n</Guidelines>\n\n<Forbidden>\n• Using excessive literary embellishments and purple prose unless dictated by {{char}}'s persona.\n• Writing for, speaking, thinking, acting, or replying as {{user}} in your response.\n• Repetitive and monotonous outputs.\n• Positivity bias in your replies.\n• Being overly extreme or NSFW when the narrative context is inappropriate.\n</Forbidden>\n\nFollow the instructions in <Guidelines></Guidelines>, avoiding the items listed in <Forbidden></Forbidden>."
I've tested it with this, and with just llama-3-instruct names preset, tbh personally I like output with this more
There's no thing such as 100% optimal prompt, but this seems to work
You can try some other presets from our server, some people like them more
https://discord.com/channels/1115287912385351730/1253005075064819844
At a second look and some experimentation I have to revise, this prompt does make sense. I was wrong about the unused identifiers, sorry for that.
Infermatic and novita are same cost, damn price wars
imagine paying 1.5$ for fp8, when you can get bf16 for the same price
Like 5 times slower
we'll see if this can be improved
Hopefully
Is something wrong on the provider end? I've got charged for 3 blanks in a row and it's a fully SFW scenario
This might be cause of the four consecutive blanks I've experienced, as removing it now fixed the issue
oh my bad, I forgot to say it needs to be formatted like a system message with assistant message start at the end
Ah, I see, I'll try that instead
😔
Yeah, keeps happening to me too and nothing seems to fix it
It just keeps stopping the gen whenever it feels like it
okay nvm its pretty prominent now wtf
stopping now cuz im tired of wasting my credits
dunno if this is a card issue or a provider issue
interesting.... that's way below it's context size
oh nvm is the very last one some kind of retry?
This might be just a feeling, but I have the impression that many of fine-tuned/abliterated models like this one are bit too 'frankensteiny', too unstable.

no, that was a whole reply
like i swapped to a new character, got that, rubbed my eyes cuz i thought i was hallucinating
but yes im getting really short replies, it sucks
I have to fallback to Wizard until or if this gets sorted. And I swear this model's coherence is a little questionable as it is right now, but it could be just my pink glasses when I tried it out for the first time with Novita
okay i think i figured out the problem
it was the example messages, if they're short, then it wont generate anything longer than the example message no matter what you do
How short are we talking? Did you test what happens if there are no example messages at all?
yes, they became noticably longer
Hmm, I'll try this out too.
go for it, it helped for me
It kind of works, but it's not a reliable fix. I noticed an improvement however. Now another issue I experience is the repetition. I have my presence at 0.45
same. very annoying. i also noticed a really weird issue where it just wont format dialogue with quotation marks
even though i explicitly ask it to
Lmk if it isnt the same. I swear yesterday sometbing broke wit wiz.
😭
I still feel like the 'logic' of Euryale is..off. It just doesn't make connections like it used to when it was introduced. I know everyone keeps telling me that this is the full model as it was intended, yet I still feel like it never topped the quant by Novita? But since I got to use so little of it I just can't distinguish if it's my nostalgia bias or if that was actually the case. Wizard has been off too recently, though even in its current weird state, I feel like Euryale beats it in terms of dialogue. Feels more natural, but it still has that ultra frustrating tendency of becoming formal real quick
Euryale def beats wiz’s dialogue. Wiz’s weakness is lifelike dialogue. I havent gotten deep into trying euryale. Been focused on trying to figure out wtf is going on with wizard atm. If u go to test wizard at all, and notice differences, plz share in the wizard channel. Maybe read what i aether have noticed and see if ur also noticing similar behaviors.
If wiz persists to be trash ima try euryale and get heavy into forming instructs and proper settings like i had for wiz. If i can control euryale’s short response issue and grt a decent quality going ill share what instructs income up wit
Also its been recommended to use infermatic provider for euryale.
My activity tab shows that I've been thrown around like 3-4 providers of Wizard, like literally one reply in between, so it's a little hard to say exactly who's doing a poor job and who's doing well unless I really keep a tab on it
I think its the default now. But if u r using it where u can select provider, select infer and see if its ne better/worse. Its logic has been an issue,as is thr case with any L3 model.
Oh i use a frontend that lets u select. Risu, ST, and ORs playground (i think) let u select providers
How do you select it in ST?
Lepton went to total shit. Notiva seems less affected by the issues but still handling its context very poorly for wiz
I dont use ST and never have so idk, i just know thry added that feature not too long ago.
Novita version is trash
😔
I've excluded Novita for this model for some time now, gave too many short, broken answers. Infermatic seems to be much more reliable, though it produces 504 responses quite often.
I agree completely. Too many broken words, grammar errors, language mix-ups, etc...
wait until SillyTavern update infermatic at a provider (anyway, is there a way to update lastest provider through OR API?)
The staging branch of SillyTavern has this fix already.
noice, grinding Infermatic for now
umm...i clone it and still don't see it
Ask in the SillyTavern Discord -> https://discord.gg/sillytavern
can u use that provider yet?
Yes.
You don't. You clone the repo and then switch to a branch. Ask your favorite LLM for help with git commands, they are really good at this.
oh
You can also ask your LLM to write a 5k token explanation why git should NOT be used as a distribution tool to end users and send it to ST devs, if you feel like it. I mean, they complain HERE that I should not show a patch because it might generate support on their side. Now I am doing support here for their ill choice of abusing git. /rant 🤯
I hope you guys can add some features for users to choose providers or some sort in OR like on playground... I don't use Sillytavern tho, which is sad, I only use Venus.
You should def ping venus/chub devs
oh yeah, the problem is i use this command
git clone -b staging https://github.com/Cohee1207/SillyTavern
And it not work.
instead i just change the username to:
git clone -b staging https://github.com/SillyTavern/SillyTavern
and it work lol, it's not really different but termux is weird as fuck.
So much trash............ Sometimes I use Silly Tavern, but when I use other apps, Novita comes first. I have to endure all kinds of alien languages! AAHAHHHHHHHHHH F***
blame their interface not good enough.
@subtle phoenix can seed param be added for Infer? It's supported (bc vLLM supports it)
On it
Novita...
Sometime Euryale on infermatic so excited that it will write so long until it reached max response (usually just 5-6 paragraphs)
Something weird with Euryale in general. It's like L3, longer contexts can often lead to weird or incoherent stuff.
I mean like 4+ turns.
However sometimes it works fine?
It feels quite random albeit it's probably that there are some things that aren't in the training dataset and the model forgot how to handle it.
I had some good success with Divine Intellect preset in ST. It got a good share more intelligent ans coherent, some char cards felt more alive too, though It still prefers a good instruction or two to build off of for best results.
Tried Euryale but it does not seem to me to be a model on the same level as WizardLM-2 8x22B, which I find to be smarter and better at following instructions.
I also tried a group chat and WizardLM-2 8x22B did not miss a beat, Euryale sometimes gets confused and strange tags and characters appear from time to time.
There's something wrong with Euryale, it's worse than when first brought to the OR
Every word breaks.
unable to reproduce, neither through OR or directly through Infer
Coherent and decent quality for me
Infer didn't change anything about this model
How much money should I put towards euryale for it to last long
😭😭
wdym Euryale isn't going anywhere, it's doing quite well in terms of traffic rn
Do you know some good setting for euryale
It says I don’t have access to the link
Oh it's on Infermatic's server, you'll need to join it to see
link to server is on https://infermatic.ai/
(automod didn't let me send discord link directly lmfao)
Do I have to create an account. First?
nah, just click on join Discord
oh automod now lets me lmao
Did eurayle get the price raised
Not recently (as in the last days), see here for all changes I recorded -> https://orw.karleo.net/model?id=sao10k/l3-euryale-70b
is there a way to fix euryale following the example dialogue a little too much? if an example dialogue in ST is short, every single reply will be the same length unless i delete it entirely, which isn't really ideal since example messages are pretty important
even if the intro message is long as shit, it'll just compress the reply length based on the example message, its really obnoxious honestly
I'm only able to push 8k tokens into the prompt no matter what settings I use in ST, even though on the page it says it has 16k context. Any ideas why?
My activity shows 8k context use in every single prompt
is this because this model is roped to increase context size?
have you unlocked the max token counter? i was able to get it to 9.8k tokens before i had to stop
Well, yeah. ST even tries to push the full prompt into the API as shown here
but activity shows something like this
makes me wonder, if OR cuts my prompt in half
yeahhhhhhhh i saw this myself idk myself and many others brought this up and it wasnt addressed i think
so idk
kind of annoying, i think this has been a problem for 1.5 weeks
Oh well, back to claude for the time being if that's the case.
yeah im going back to wizard
Do eurayle read example dialogues
what you mean? if you send then Euryale will read. the point here is the model smart enough to not rely too much on it.
Is there a specific way to make character cards for eurayle
So the model can understand it better
If the context problem ever gets fixed, this'll probably be my main model, as I really like it 😄
Command R+ no better than Euryale in roleplay in my opinion. But cmd R+ follow instruction far better.
So yeah, i gonna go back to Cmd R+. Cuz while Euryale strugle to follow a simple instruction of writing 3 paragraphs long (sometime it write longer or shorter. Even use last prefix of promt instruct), Cmd R+ can follow it well and see no issuse.
I believe that the dataset for RP had an impact on the intelligence of the model.
Is eurayle good with group chat
Mine is doing it too
I am getting random garbage with this model speaking crap that isnt english filled with symbols
Set your accepted provider list to "Infermatic" only, "NovitaAI" uses a crappy quantized version of this model that likes to produce garbage. Worked for me the last weeks (and accidentally tested several times as SillyTavern resets the provider list on every reload)
Besides the other great answer, lower the temperature to ~0.8
With Infermatic as your provider you can push this model easily to temperature 1.25 + MinP 0.1. Zero (total) garbage for me in the last few hundred generations with these settings (h/t Auri, scroll way up for several discussions of these settings)
How does one set up Infermatic as the only one allowed in ST? It is not on the list
I checked my activity and with nothing specified I see that I get both Infermatic and Novita mixed in
you'll need to update ST, it was added to the provider list semi-recently
That did the trick, thank you
it was, but they're rolling back a change and fixing it
seeing a huge increase in nonsense responses without changing any parameters, doesn't seem to be affecting any other models that I can tell
You might want read through this -> #arc-feedback message
ah alright, will do
TLDR; If you use a SillyTavern and have set Infermatic as your sole provider you now need set a new flag, it's already in ST -> #arc-feedback message
Otherwise thanks to this feature -> #announcements message the other provider, which quantize, will get sent requests too and return that garbage
All sorted now, thanks for your help!
Silly idea but maybe make your example messages longer then? I make mine exactly how i want my responses to be in every way possible, from formatting to length to char personality and vocabulary - typically i start by having the model sorta make them then heavily edit them.
And i do believe on ST they are temp tokens, so once context fills they drop and ur chat hist becomes the new examples.
i usually do that but some days i cant be bothered
Infermatic drops Euryale's precision to fp8 (dynamic activation)
Infermatic's team, community and me personally tested dynamic fp8 quantization on vLLM and found quality degradation to be minimal, pretty much invisible.
Though, if you experience major output quality degradation, please report it to me, I will pass it on to Infermatic's team
@idle grotto can you please mark Infermatic's endpoint as fp8?
I cannot confirm that the degradation is minimal or even invisible. Instead of following instructions and producing long outputs, the same cards produce now superficial and short replies, without changing anything. I cannot test this deeply (only ~10 generations), as I have no time for this now, but I know, when I have time again, I'll have to look for another preferred model. This does not work for me anymore.
Apparently FP8 quant we used was a static one, Svak is making dynamic one right now
Morale: never trust fp8 quants on HF, we will make our own in the future
Euryale is the only model recieving reports of degraded quality, Daybreak and Magnum are a-OK (both use first-party dynamic fp8 quants, made by me)
should be fixed now
Seems much better now (quick 2 generation test)
Glad to hear that! Really sorry for inconvenience
No problem, good that this model exist in this quality, it's a bit of a gem, hopefully others can enjoy it as much as I do (when I have some time).
Tbh, fp8 is not ideal, but 60s+ latency was becoming too much, Infermatic just has limited amount of resources compared to bigger providers
I think you are absolutely right in principle, for inference FP8 should not matter much, just that it did not seems to work for some reason.
I think calibrating stuff on 512 rows from Ultrachat-2k dataset is very far from ideal
Dynamic might be a bit slower and bigger, but provides more consistent quality
How do you dynamic quant FP8? Is it using nvammo or something different?
What does FP8 mean?
Precision of (most) of the weights / quantization, FP8 = floating-point 8 bit, FP16/BF16 16bit, see also -> https://en.wikipedia.org/wiki/Minifloat
In computing, minifloats are floating-point values represented with very few bits. Predictably, they are not well suited for general-purpose numerical calculations. They are used for special purposes such as
Computer graphics, where iterations are small and precision has aesthetic effects.
Machine learning, which can be relatively insensitive t...
I’m so sorry but does this affect role playing?
It shouldn't much, but other parameters like very high temperature may increase this small effect so it becomes noticeable, also some inference engines seem to have trouble with FP8, apparently
From math perspective high precision (16/32 bit) is only necessary for training, where weights gets accumulated, but during inference most of this precision doesn't matter as high values are more important than tiny fractions for results. Those high values get preserved during quantization, so that even 4 bit weights still work pretty well.
It was made using AutoFP8
vLLM docs have instructions how to do it
https://docs.vllm.ai/en/stable/quantization/fp8.html
@subtle phoenix why max output on Infermatic's endpoint is set to 8192?
model is still RoPEd to 16348
Thanks for the flag - just pushed an update to fix this. I refactored the context thingy recently to clear up some tech debts
V2.2?
Hell yeah, we need this
My wallet's ready
@subtle phoenix Can we expect for OR to pick it up in the nearest future?
is it on featherless yet?
I requested it, likely it will arrive soon
Maybe we can also make a poll on Infer to update 2.1 to 2.2
imo 2.2 is a major improvement
That's very promising to hear. 2.1 was already awesome.
About Infer - I should have some news tomorrow, 2.2 was very warmly received on community cloud
so it might be either polled or just swapped
but people seem to want it over 2.1
Any news?
🤔
Seems to be no news atm, which is bit weird
Svak told me that either a poll or swap should have been on monday, along swap from v1 to v2 for Magnum
Neither happened yet
Should be tomorrow
Technically there are ~32 minutes left in Monday in California
so look like it's on Infer?
yup, updating it very soon
I got swarmed by some other stuffs xd
Does anyone know if any provider will do the FP16 quant of the model or is the loss minimal on FP8?
i can barely notice any different between FP4 and FP8. FP8 is enough
holy moly, euryale update
😍
Hope it's a few hours away 🙏
Is it uncensored
As far as I can tell, but there's still a slight bit of positivity bias like you see in other llama 3 models. It's probably the best one based on it I've tried so far though.
More so than the previous Llama 3 / Euryale 2.1 IMHO, also I have seen very short (e.g. one sentence responses) to requests from this new model, while the old model would go on without problems. Without getting nailed I'd say this is a bit more restricted, but I would not call it "censored".
This model will certainly be a fine companion for most role play settings.
Off topic but I don't know what they did to make Mistral Nemo 12B Starcannon so good or just because it's the bf16 quant being hosted, but if they can do that with a larger parameter model so it's smarter, we will be eating good.
From my experience it won't push into that direction randomly like some of the other models, but can handle it okay.
It will do ERP, no problem.
But I'd recommend to generate a few replies to the same request with this version and the old (if it is still available) to get a feel how they are a bit different.
Euryale 2.2 can cook. Love it
And the cut off problem of 2.1 seems to be gone. It's pumping word afrer word lile there's no tommorow.
Is there any reason why it is at 8k context? It's 16k on Infer itself
8k is what the model spec is suggesting, provider can offer more (sometimes less) e.g. through RoPE tricks, the real context window is in the provider tab as max output.
The update feels like such a downgrade, feels like its much harder to get decent responses now that incorporate good dialogue

the older version is still available fyi
I remember there were a couple different providers for it before, Infermatic being the better one and the others using a quantized version 
Are either of these comparable to Infermatic on 2.1 previously?
The old one hosted by informatic was also fp8 IIRC.
Noted ty
Quantized differently though, Dynamic quantization hurts models' performance less
(If at all, it was unnoticeable from tests)
I’m liking starcannon
TIL
Has anything been done to Euryale for Llama 3.1 hosted by infermatic? Is it corelated to the massive price drop? It performed quite awfully in recent gens. Gibberish generations, lacking creativity, endless adjectives thrown at you with no real coherence. Settings were untouched, just inexplicable stream of bad generations.
If performance was sacrificed to reduce cost, I'd much rather pay more for a sane, consistent model.
https://huggingface.co/Sao10K/L3.1-70B-Hanami-x1/
Hope someone adds this to OR. Successor of euryale 2.2
Very barebones description, but anything Euryale related is usually good.
Check what you're using for rep pen and freqpen
idk he said it was untouched
DeepInfra seems to be churning out garbage, for both 2.1 and 2.2
other providers on both 2.1 and 2 2 seem fine
1.1 and 0.9 now. Used to be at 1.55 and 0.9, but again, before introduction of deepinfra (Which was today?) It worked perfectly fine.
With the exact same parameter?
oh wow
kk will derank deepinfra
and ping them
yeah exact same everything
Weird thing is though that OR lists Infermatic as provider for these broken messages even thoug I've never observed a behavior like that from Infermatic's 2.2
Not until the price drop announcement
that's a bug
I think internally it's prob Deepinfra serving it
but weirdly...
shouldn't deepinfra be the 1st endpoint it try?....
(so it should log deepinfra regardless...)
There's a bug I'm trying to track regarding how fallback providers are not being logged properly
but... if it's Deepinfra serving the model and that it's the 1st host.... it should have been logged, NOT infermatic xd
ugh....
But I'm not allowing fallbacks, I've unchecked it on ST. Will it still somehow go for Deepinfra anyways?
ooh so no falback at all?
Nope
then yeah I'm pretty sure your request hit Infermatic
This happens only after the price drops right?
I double checked our commit history - the last refactor to the endpoint filtering system was ~4 days ago
Yeah. Whatever was done since that announcement, somehow resulted in whatever is happening now
I have novita and infermatic enabled for .1 and .2 respectively and everything works, as soon as I enable DeepInfra it all goes to hell
I'll admit I don't know much about the technical side but I think something similar happened in the past when a provider was quantizing prompts?
Yesterday I wqs using it quite a lot at 1.55 rep pen and it worked swimmingly
The typical good Euryale stuff on all cards
I think Deepinfra shouls be removed. It's broken as hell and it's possible that somehow it's causing this too.
kk DeepInfra should be deranked now
but anyways, isnt 2.2 just poorly recieved compared to 2.1 in general?
Can still send requests to it through ST
yeah
it's only "deranked" - meaning if you call the model without a specified provider, it will not be picked as a candidate
Aha, I see. So, I'm seeing that somehow Text Completion is causing issues, but I have no idea why. Chat Completion 2.2 from Infermatic works good
In text completion it either does not respond or is utter garbage
Not on Infermatic's side? Users there prefer 2.2
But people have varying opinions so
If they dont respond maybe @fiery oxide can shed some light, i know he was a fan of this model.
I have been using Euryale 2.2 with 1.17 temp, 0.075 minp and 1.05 rep pen
I've noticed right away, even before Infer officially picked this model up, when it was on community cloud, that Euryale 2.2 prefers lower temp than 2.1
