Sao10K/L3-70B-Euryale-v2.1 | OpenRouter | Page 1

fiery oxide Jun 13, 2024, 5:39 PM

#

L3 installment of Euryale, one of the best (if not the best) RP models. Engaging prose, very good adherence to character cards, very creative, almost zero slop.
https://huggingface.co/Sao10K/L3-70B-Euryale-v2.1

Sao10K/L3-70B-Euryale-v2.1 · Hugging Face

woeful mist Jun 13, 2024, 6:41 PM

#

This has my vote. This is the model i want to try most next to Awiz (abliterated)

fiery oxide Jun 14, 2024, 10:26 AM

#

This model is currently hosted by infermatic's community, and people are really loving it

slim cipher Jun 14, 2024, 10:53 AM

#

Is it uncensored?

fiery oxide Jun 14, 2024, 11:07 AM

#

slim cipher Is it uncensored?

Completely

formal steeple Jun 14, 2024, 3:46 PM

#

fiery oxide Completely

wow, even more interesting. Now the last question: context size?

fiery oxide Jun 14, 2024, 3:47 PM

#

formal steeple wow, even more interesting. Now the last question: context size?

8k, it's L3
RoPEable to 16k as far as I tried

slim cipher Jun 14, 2024, 3:57 PM

#

Let's goo, add it pls

craggy crow Jun 14, 2024, 4:46 PM

#

+1

grand marsh Jun 14, 2024, 9:19 PM

#

Rope to 16k and add!, it's a very competent model. Easily best I've used for L3 70.

latent sigil Jun 15, 2024, 12:53 AM

#

+1

red pagoda Jun 15, 2024, 1:29 AM

#

+1

#

This model is currently my favorite

#

Very good at Story writing,some times it's answer is better than claude opus for me

twin siren Jun 15, 2024, 10:19 AM

#

+1
Tested it yesterday. Very good for both roleplay and story writing.

woeful mist Jun 15, 2024, 7:39 PM

#

Those in my server trying it at infermatic are liking it alot except for (infermafic’s issue not the model) extremely long response times, waiting over a minute.

I really hope we see thos roped to 16k context cuz 8k is just not enough.

fiery oxide Jun 15, 2024, 7:40 PM

#

woeful mist Those in my server trying it at infermatic are liking it alot except for (inferm...

Infermatic's server turned into Sao's fanclub in less than a week lmao

woeful mist Jun 15, 2024, 7:41 PM

#

fiery oxide Infermatic's server turned into Sao's fanclub in less than a week lmao

Yeah. I cant get over how bad their response times are. 264 seconds for some euryale responses! Like whaaat. I dont tolerate more than 15 seconds. Lmao

fiery oxide Jun 15, 2024, 7:42 PM

#

woeful mist Yeah. I cant get over how bad their response times are. 264 seconds for some eur...

they are probably have just one 4x A6000 node for Euryale rn

#

and it's overloaded
vLLM can batch well, but even it has it's limits

woeful mist Jun 15, 2024, 7:45 PM

#

Apperently most models are pretty slow. Astoria isnt i guess tho. One of my members loved astoria cuz its filthy. But i dont tbink (idk i never tried it) it follows indtructs well

#

But back to topic i hope they add euryale so i can finally see what the fuss is about

fiery oxide Jun 15, 2024, 7:50 PM

#

Well, it's a second most requested model here after UnleashedWiz (soon!)
OR definitely should look into this

tribal dagger Jun 16, 2024, 3:13 AM

#

anywhere i can try it?

devout crow Jun 16, 2024, 4:15 AM

#

id love to try it

#

id sub to infermatic but i already added some credits to my OR account a day ago damn

river granite Jun 17, 2024, 6:43 PM

#

+1

hollow nest Jun 18, 2024, 10:49 AM

#

Seems to be hosted on novita https://novita.ai/pricing

Novita AI Pricing

Explore the full spectrum of AI APIs tailored for image, video, audio, and LLM applications. Novita AI is designed to elevate your AI-driven business at the pace of technology, offering model hosting and training solutions.

#

fiery oxide Jun 18, 2024, 1:09 PM

#

hollow nest

@subtle phoenix can you add it, please?

formal steeple Jun 18, 2024, 3:04 PM

#

(with 16k context if possible, ty)

hollow nest Jun 18, 2024, 3:04 PM

#

sadly I think Novita only do 8k

devout crow Jun 18, 2024, 3:19 PM

#

man ive been spoiled with 32k context

subtle phoenix Jun 18, 2024, 4:04 PM

#

Yup working with them to add it. They said earlier today that there was some kink that needs to be ironed out, but I got the PR up already

#

Should I just merge it lol?

subtle phoenix Jun 18, 2024, 4:52 PM

#

Merged, should be up in 5 mins

#

Note the responses might be gibberish

#

(That's what they told us xd)

sonic merlin Jun 18, 2024, 5:00 PM

#

It's up, First test came through fine.

fiery oxide Jun 18, 2024, 5:00 PM

#

bruh Novita doesn't have minp which is a must have for this model

subtle phoenix Jun 18, 2024, 5:00 PM

#

fiery oxide bruh Novita doesn't have minp which is a must have for this model

will ping them about that

fiery oxide Jun 18, 2024, 5:00 PM

#

This model is at it's peak on temp 1.5 and min-p 0.1

#

It like adores high temp with min-p

sonic merlin Jun 18, 2024, 5:13 PM

#

This model seems to love Markdown. It even spits out crazy formatted text in the middle of a text only role-play chat. Not exactly gibberish, but a bit strange nonetheless.

fiery oxide Jun 18, 2024, 5:14 PM

#

sonic merlin This model seems to love Markdown. It even spits out crazy formatted text in the...

It sure likes XML too

sonic merlin Jun 18, 2024, 5:14 PM

#

fiery oxide It sure likes XML too

Feels like role-playing with a coding model

fiery oxide Jun 18, 2024, 5:15 PM

#

hm lemme test

#

on OR

#

and compare to Infer's host

subtle phoenix Jun 18, 2024, 5:15 PM

#

fiery oxide and compare to Infer's host

how is their pricing btw?

fiery oxide Jun 18, 2024, 5:15 PM

#

subtle phoenix how is their pricing btw?

15$/month

subtle phoenix Jun 18, 2024, 5:15 PM

#

ooh it's sub

fiery oxide Jun 18, 2024, 5:15 PM

#

they are too slow and ratelimited for you prob

sonic merlin Jun 18, 2024, 5:16 PM

#

for 500 token responses with high latency

fiery oxide Jun 18, 2024, 5:16 PM

#

subtle phoenix ooh it's sub

You prob can talk to Svak, they have enterprise tier

subtle phoenix Jun 18, 2024, 5:16 PM

#

sonic merlin for 500 token responses with high latency

wait $15 for 500 tokens?

fiery oxide Jun 18, 2024, 5:16 PM

#

subtle phoenix wait $15 for 500 tokens?

no

#

Unlimited

#

They are just kinda slow

#

default plan has 2 concurrent req limit

sonic merlin Jun 18, 2024, 5:17 PM

#

each response seems capped to 500 tokens or so if I read it correctly, that is a bit annoying, esp with high latency.

fiery oxide Jun 18, 2024, 5:17 PM

#

sonic merlin each response seems capped to 500 tokens or so if I read it correctly, that is a...

It's not capped, i've got 3000+ from their Wiz

#

and 1500+ from their Euryale and Stheno

sonic merlin Jun 18, 2024, 5:17 PM

#

fiery oxide It's not capped, i've got 3000+ from their Wiz

Then I read it wrong, let me check.

#

Hmm: -> "512 token responses, 86,400 requests per day." for $15/month

#

On their landing page.

fiery oxide Jun 18, 2024, 5:18 PM

#

sonic merlin Hmm: -> "512 token responses, 86,400 requests per day." for $15/month

ehhh, Svak messed up a bit

#

Their site is kinda unfinished and outdated atm

#

There are no limits, as far as my personal experience goes

sonic merlin Jun 18, 2024, 5:19 PM

#

fiery oxide There are no limits, as far as my personal experience goes

Noted.

fiery oxide Jun 18, 2024, 5:21 PM

#

subtle phoenix how is their pricing btw?

anyway you can message @daring shale and ask

sonic merlin Jun 18, 2024, 5:31 PM

#

Okay, I only have gotten the markdown treatment once in about 10 tries, this seems an acceptable level of annoyance.

fiery oxide Jun 18, 2024, 5:32 PM

#

sonic merlin Okay, I only have gotten the markdown treatment once in about 10 tries, this see...

I think acceptable temp starts with 1 on this one lol
Lower than this and it starts to lose coherence

sonic merlin Jun 18, 2024, 5:32 PM

#

I am using temp 1 currently, yes, I forget to mention this.

fiery oxide Jun 18, 2024, 5:33 PM

#

well, I mostly used it with temp 1.5 on Infer lol

sonic merlin Jun 18, 2024, 5:33 PM

#

As long as I don't complete replies in French or Spanish like Dolphin now I am fine.

fiery oxide Jun 18, 2024, 5:34 PM

#

it really needs min_p tho

#

I hope Novita adds it soon

fiery oxide Jun 18, 2024, 5:34 PM

#

sonic merlin As long as I don't complete replies in French or Spanish like Dolphin now I am f...

those settings seem to work fine for it

#

tho this is more ideal, but again no min_p

sonic merlin Jun 18, 2024, 5:38 PM

#

fiery oxide tho *this* is more ideal, but again no min_p

Thanks, that are basically my settings too, very neutral, only changing them when absolutely required.

#

(the first settings from you I am referring to, of course)

#

I usually stick to temp 1 and do not get high as models tend to freak out/produce gibberish (the only other models I tried with very high temperature and that did not completely go bonkers right away were GPT-3.5/4, but I last used them 6 months ago or so)

fiery oxide Jun 18, 2024, 5:44 PM

#

sonic merlin I usually stick to temp 1 and do not get high as models tend to freak out/produc...

Euryale is like the only model series I know to consistently prefer high temp for some reason

#

Damn it's really hard to tame with just temp and top_p

daring shale Jun 18, 2024, 7:09 PM

#

I’m here

#

BastardSmile

daring shale Jun 18, 2024, 7:10 PM

#

sonic merlin Hmm: -> "512 token responses, 86,400 requests per day." for $15/month

Those are only on the UI, we don’t cap at the API. In the moment the request limits for the api is 18/minute and 3/parallel

fiery oxide Jun 18, 2024, 7:11 PM

#

daring shale I’m here

So, is Infermatic on OR possible lol?

#

or are we a tad bit too slow for that lol

daring shale Jun 18, 2024, 7:13 PM

#

Only on the discord 6440mcfoxsleeping

daring shale Jun 18, 2024, 7:13 PM

#

fiery oxide or are we a tad bit too slow for that lol

prob

#

We're working on that

fiery oxide Jun 18, 2024, 7:13 PM

#

daring shale We're working on that

chatting with OR team already?

daring shale Jun 18, 2024, 7:13 PM

#

Nah, on the output speed

#

I didn't know we could be associated tho

fiery oxide Jun 18, 2024, 7:14 PM

#

daring shale Nah, on the output speed

hm, A100s finally lol?

daring shale Jun 18, 2024, 7:14 PM

#

How does OpenRouter works with other companies?

subtle phoenix Jun 18, 2024, 7:14 PM

#

daring shale I didn't know we could be associated tho

Hey howdy!

daring shale Jun 18, 2024, 7:15 PM

#

fiery oxide hm, A100s finally lol?

Euryale is already on som H100

subtle phoenix Jun 18, 2024, 7:15 PM

#

daring shale How does OpenRouter works with other companies?

We route to providers and pay per tokens pricing

daring shale Jun 18, 2024, 7:15 PM

#

subtle phoenix Hey howdy!

Heyo!

subtle phoenix Jun 18, 2024, 7:15 PM

#

cc @idle grotto

daring shale Jun 18, 2024, 7:16 PM

#

subtle phoenix We route to providers and pay per tokens pricing

naizu, so are you interested on Euryale?

subtle phoenix Jun 18, 2024, 7:16 PM

#

daring shale Heyo!

I can make a DM group so we can discuss further

daring shale Jun 18, 2024, 7:16 PM

#

Sure!

subtle phoenix Jun 18, 2024, 7:16 PM

#

Thanks @fiery oxide for the intro kek

fiery oxide Jun 18, 2024, 7:16 PM

#

subtle phoenix Thanks <@718785396485390346> for the intro kek

my pleasure lmao

mortal cove Jun 18, 2024, 7:17 PM

#

Hmm definitely experiencing the gibberish responses warned about above.
Very excited to try it out once that's ironed out though!

cosmic yew Jun 18, 2024, 7:22 PM

#

Would be awesome if Infermatic and OR did work together. Tried the former, couldn't figure out how to get it working on TypingMind so staying here even though I roleplay primarily with TypingMind and Infermatic seems to excel with the RP models. Anyway, looking forward to trying out this new holy grail of models.

fiery oxide Jun 18, 2024, 7:26 PM

#

cosmic yew Would be awesome if Infermatic and OR did work together. Tried the former, could...

yea, TypingMind uses chat completions and those are a bit wonky on Infer atm
Svak and team are working on that

subtle phoenix Jun 18, 2024, 7:26 PM

#

fiery oxide yea, TypingMind uses chat completions and those are a bit wonky on Infer atm Sva...

If we route to Infer, that'd solve the problem right?

fiery oxide Jun 18, 2024, 7:27 PM

#

subtle phoenix If we route to Infer, that'd solve the problem right?

You are doing prompt -> message transform on your end, right?

subtle phoenix Jun 18, 2024, 7:27 PM

#

we doing messages -> prompt

#

and I'm pretty sure most of infer model do prompt right

#

(we actually do both tbh xd, it's wonky but... work thus far)

fiery oxide Jun 18, 2024, 7:28 PM

#

hmm, Infer has some problems with system role not being supported (or at least had) and some with strict user->assistant order too

fiery oxide Jun 18, 2024, 7:28 PM

#

subtle phoenix and I'm pretty sure most of infer model do prompt right

So I think some of your workarounds can come in handy

subtle phoenix Jun 18, 2024, 7:28 PM

#

fiery oxide hmm, Infer has some problems with system role not being supported (or at least h...

yeah a lot of the jinja template enforces that

fiery oxide Jun 18, 2024, 7:29 PM

#

subtle phoenix yeah a lot of the jinja template enforces that

yeah, and they do run on vLLM and Aphro, so pure jinja formatting there

#

They don't do any formatting besides what vLLM and Aphro do

livid violet Jun 18, 2024, 7:33 PM

#

I don't wanna hijack the conversation or anything, but I can see the model is already available on OR through novitaAI. Thing is, when I try to run it through ST it spits out error404. Overloaded servers?

fiery oxide Jun 18, 2024, 7:34 PM

#

Yea it went offline wtf

fiery oxide Jun 18, 2024, 7:34 PM

#

livid violet I don't wanna hijack the conversation or anything, but I can see the model is al...

prob Novita fixing stuff

daring fox Jun 18, 2024, 7:44 PM

#

Is Infermatic not slow as fuck anymore? Back when I used it there was consistently 30 seconds to first token at a minimum.

#

And that was on the 70B models, their 120B was like 60 seconds. Idk if times like that are acceptable for OR.

#

Though, being able to access those models without paying $15 up front would be nice

fiery oxide Jun 18, 2024, 7:52 PM

#

daring fox Is Infermatic not slow as fuck anymore? Back when I used it there was consistent...

7t/s on Euryale (top 1 or 2 by usage), with ~3s latency

#

Infer sped up somewhat, and Svak said they are still working on better speed

#

Astoria is like 15t/s usually (4th by usage)

daring fox Jun 18, 2024, 7:58 PM

#

Is that latency with empty context?

fiery oxide Jun 18, 2024, 7:58 PM

#

daring fox Is that latency with empty context?

with like 2-3K

daring fox Jun 18, 2024, 7:59 PM

#

I remember it getting really slow when I pushed past 8k, but that was a couple months ago

fiery oxide Jun 18, 2024, 8:00 PM

#

daring fox I remember it getting really slow when I pushed past 8k, but that was a couple m...

things def improved since then

#

+L3s are faster than L2s

daring shale Jun 18, 2024, 8:03 PM

#

daring fox Is Infermatic not slow as fuck anymore? Back when I used it there was consistent...

We indeed improve on the speed of the models, now we are focused on decreasing the time of the most used ones

#

Midnight/Euryale

#

And miquliz it's way better than before

#

I swear

fiery oxide Jun 18, 2024, 8:05 PM

#

daring shale We indeed improve on the speed of the models, now we are focused on decreasing t...

btw, maybe consider fp8 KV Cache
It should give a perf boost and memory usage reduction w/o much (if any) quality loss

grand marsh Jun 18, 2024, 8:06 PM

#

16k on Openrouter a reality? I know Infermatic got the extension.

daring fox Jun 18, 2024, 8:11 PM

#

Yeah that'd be pretty epic.

daring shale Jun 18, 2024, 8:11 PM

#

fiery oxide btw, maybe consider fp8 KV Cache It should give a perf boost and memory usage re...

Wouldn't that be a great difference? fp16 -> fp8

daring shale Jun 18, 2024, 8:11 PM

#

grand marsh 16k on Openrouter a reality? I know Infermatic got the extension.

Stay tuned

fiery oxide Jun 18, 2024, 8:11 PM

#

daring shale Wouldn't that be a great difference? fp16 -> fp8

very small diff, esp for fp8 (and not int8)
Barely noticeable

#

Nobody even noticed that on my hosts

#

and I always do fp8

fiery oxide Jun 18, 2024, 8:12 PM

#

daring shale Wouldn't that be a great difference? fp16 -> fp8

even full model weights in fp8 don't lose much compared to bf16/fp16

daring shale Jun 18, 2024, 8:12 PM

#

I still have nightmares of supra asking for evidence on the fp16

#

fiery oxide Jun 18, 2024, 8:12 PM

#

daring shale I still have nightmares of supra asking for evidence on the fp16

Supra is insane

daring fox Jun 18, 2024, 8:13 PM

#

Is Supra finally gone? He's the reason I left the server

fiery oxide Jun 18, 2024, 8:13 PM

#

fiery oxide Supra is insane

He prob can't tell the difference, he just pretends

fiery oxide Jun 18, 2024, 8:13 PM

#

daring fox Is Supra finally gone? He's the reason I left the server

he got the boot

#

two times

#

I'm the only techdev role now lol

#

Infer server has been supremely friendly since Supra got kicked lol

daring shale Jun 18, 2024, 8:17 PM

#

daring fox Is Supra finally gone? He's the reason I left the server

Yeah

#

Ur free to come back now

#

xd

fiery oxide Jun 18, 2024, 8:17 PM

#

daring shale Yeah

so btw, maybe do a test run on fp8 KV cache like you did with RoPE?

daring shale Jun 18, 2024, 8:17 PM

#

fiery oxide He prob can't tell the difference, he just pretends

fr

fiery oxide Jun 18, 2024, 8:17 PM

#

daring shale fr

he couldn't tell Wiz from a 8B model lmao

daring shale Jun 18, 2024, 8:19 PM

#

fiery oxide so btw, maybe do a test run on fp8 KV cache like you did with RoPE?

we can make a test for qwen

fiery oxide Jun 18, 2024, 8:21 PM

#

daring shale we can make a test for qwen

okie

daring fox Jun 18, 2024, 8:21 PM

#

Qwen is on Infermatic? Isn't that like super censored?

daring shale Jun 18, 2024, 8:22 PM

#

Not this one

#

https://huggingface.co/Qwen/Qwen2-72B-Instruct

Qwen/Qwen2-72B-Instruct · Hugging Face

fiery oxide Jun 18, 2024, 8:22 PM

#

it's kinda censored tho
Hope Magnum-72B will kick it out in the next poll

daring shale Jun 18, 2024, 8:23 PM

#

Why dont replace it with llama3?

daring fox Jun 18, 2024, 8:23 PM

#

Yeah that's the one I tried, extreme positivity bias. Reminds me of Mistral 7B tunes

daring shale Jun 18, 2024, 8:23 PM

#

daring fox Qwen is on Infermatic? Isn't that like super censored?

https://infermatic.ai/docs/models-list/

fiery oxide Jun 18, 2024, 8:23 PM

#

daring shale Why dont replace it with llama3?

as a generalist? and swap something else?

daring shale Jun 18, 2024, 8:23 PM

#

Yeah

daring fox Jun 18, 2024, 8:23 PM

#

Oh you finally fixed the website! No wonder I couldn't find the list on Discord.

fiery oxide Jun 18, 2024, 8:24 PM

#

daring shale Yeah

btw dmed how to do fp8 kv cache

daring shale Jun 18, 2024, 8:24 PM

#

Llama3 -> Qwen and Qwen -> Magnum or the one that wins the poll

daring shale Jun 18, 2024, 8:24 PM

#

fiery oxide btw dmed how to do fp8 kv cache

arigato

daring shale Jun 18, 2024, 8:25 PM

#

daring fox Oh you finally fixed the website! No wonder I couldn't find the list on Discord.

Yeah (finally)

#

XD

#

It's still lacking some things, but we'll get through them

hallow geyser Jun 18, 2024, 8:30 PM

#

I just posted my review of euryale in the feedback section of Infermatic discord. TLDR: It's fun. I enjoy it. But for regular RP I'll be sticking to wizard, and maybe midnight for a few cards.

daring shale Jun 18, 2024, 8:32 PM

#

#1200053136082079845 message

lofty stratus Jun 18, 2024, 8:50 PM

#

Uhh this started happening on fourth response, everything was normal. 😨
~2.2k context, OpenRouterk, NovitaAI.

fiery oxide Jun 18, 2024, 8:51 PM

#

BRUH

#

Last time I saw this it was on Wiz 7B on DeepInfra after 8k

fiery oxide Jun 18, 2024, 8:51 PM

#

lofty stratus Uhh this started happening on fourth response, everything was normal. 😨 ~2.2k ...

Is it consistent?

#

like, does it go away with swipes?

#

hm works fine at 5K

lofty stratus Jun 18, 2024, 8:54 PM

#

I started a new chat and it's still bricked...

fiery oxide Jun 18, 2024, 8:54 PM

#

lofty stratus I started a new chat and it's still bricked...

what kind of settings do you have?

#

try like temp 0.87 top_p 0.81

lofty stratus Jun 18, 2024, 8:59 PM

#

Normal settings. Going crazy regardless of settings. Tried switching to text completion to see if anything different.

#

maybe an intern at NovitaAI tripped a cable or something

#

I was having a working chat earlier today.

#

restarted ST 😓

livid violet Jun 18, 2024, 9:09 PM

#

The gibberish response are because the servers are dying

#

some problem at NovitaAI

#

livid violet Jun 18, 2024, 9:11 PM

#

fiery oxide try like temp 0.87 top_p 0.81

best to use the sampler settings provided by the author I reckon, i.e.
Temperature - 1.17
min_p - 0.075
Repetition Penalty - 1.10

lofty stratus Jun 18, 2024, 9:12 PM

#

👀

livid violet Jun 18, 2024, 9:12 PM

#

oh well

fiery oxide Jun 18, 2024, 9:12 PM

#

livid violet best to use the sampler settings provided by the author I reckon, i.e. Temperatu...

those work decently yeah
Problem is that Novita doesn't have min_p
So we have to wait until Infer will be added to OR as a provider

livid violet Jun 18, 2024, 9:13 PM

#

The limited sampler settings are the one thing that tempt to just renting a cloud computing unit and setting up oobabooga in the cloud.

fiery oxide Jun 18, 2024, 9:13 PM

#

Ooba sucks, i use Aphro

livid violet Jun 18, 2024, 9:13 PM

#

whats wrong with ooba?

fiery oxide Jun 18, 2024, 9:13 PM

#

livid violet whats wrong with ooba?

breaks models often, no batching, AWQ and GPTQ are broken

#

like, it's just not worth to use

livid violet Jun 18, 2024, 9:14 PM

#

welp, haven't had any problems personally

#

only used it with 21b models at most

#

I fully switched to using OR anyways

#

because of the need for bigger models and not having multiple GPU's :c

fiery oxide Jun 18, 2024, 9:15 PM

#

livid violet because of the need for bigger models and not having multiple GPU's :c

i mean you can host 70B on A6000 in 4bit

#

I do that

livid violet Jun 18, 2024, 9:15 PM

#

I have an rtx4080

#

not gonna buy an A6000, sry xd

fiery oxide Jun 18, 2024, 9:15 PM

#

livid violet I have an rtx4080

A6000 is like 0.34/hour on Runpod

#

I don't own a A6000 too sadly lmao

livid violet Jun 18, 2024, 9:15 PM

#

exactly why I thought about renting a cloud service

#

it's quite cheap

fiery oxide Jun 18, 2024, 9:16 PM

#

livid violet it's quite cheap

I mean as long as you don't go into 70B in bf16 territory
It's 2xA100

#

Or MI300X

#

both around 4$/hour

livid violet Jun 18, 2024, 9:17 PM

#

I'll hit you up if I ever need help setting up a runpod unit, aight?

fiery oxide Jun 18, 2024, 9:17 PM

#

livid violet I'll hit you up if I ever need help setting up a runpod unit, aight?

ok

livid violet Jun 18, 2024, 9:17 PM

#

The model is back up again btw

daring fox Jun 18, 2024, 9:17 PM

#

Ooba does have the new DRY sampler though, I wonder if it's any good.

fiery oxide Jun 18, 2024, 9:18 PM

#

daring fox Ooba does have the new DRY sampler though, I wonder if it's any good.

Aphro has fan favorite Smoothing Curve tho

#

Quite popular on Infer

livid violet Jun 18, 2024, 9:18 PM

#

Ooba has it as well

#

I think

fiery oxide Jun 18, 2024, 9:22 PM

#

I wish vLLM had more samplers

#

but it at least has min_p lol

#

beam search is cool too

mortal cove Jun 18, 2024, 9:38 PM

#

lofty stratus Uhh this started happening on fourth response, everything was normal. 😨 ~2.2k ...

Yeah this is also the issue I was having too. Tried different settings, re-rolling responses etc, and same thing. Seems it may just take some time for it to get sorted

subtle phoenix Jun 18, 2024, 9:40 PM

#

Ah crap... I mistook this model with #1248338089663926313 :d

#

We only asked @bold sphinx for permission to route to Stheno, but not this one yet

mortal cove Jun 18, 2024, 9:42 PM

#

Lol. So this one on OR is actually Stheno? xD

subtle phoenix Jun 18, 2024, 9:42 PM

#

no it's Euryale

#

But didn't get the author's blessing yet :d

fiery oxide Jun 18, 2024, 9:53 PM

#

subtle phoenix Ah crap... I mistook this model with <#1248338089663926313> :d

To be fair, they share a dataset :p

woeful mist Jun 18, 2024, 11:15 PM

#

I havent seen gibberish or alot of the issues i see in here. Wonder what im doing “right”. Not without issue, but no errors or gibberish lol.

cosmic yew Jun 18, 2024, 11:52 PM

#

subtle phoenix If we route to Infer, that'd solve the problem right?

I enjoy using OR. It is where I use most LLM's except for OAI and Gemini. I usually load about $20-$50 credit each month, depending on my mood, but never use them all so just accumulating them like Halloween candy.

daring shale Jun 19, 2024, 12:23 AM

#

subtle phoenix But didn't get the author's blessing yet :d

We have it :p

tribal dagger Jun 19, 2024, 12:35 AM

#

i know Response might be silly but I don't expect that silly, like bunch of random words that have no meaning?

something wrong with provider?

devout crow Jun 19, 2024, 12:37 AM

#

hey all, whats the consensus on this one?

#

is it any good?

fiery oxide Jun 19, 2024, 12:41 AM

#

devout crow is it any good?

Model is very good, but Novita the provider is having troubles with it, so it's underperforms on OR rn

devout crow Jun 19, 2024, 12:41 AM

#

damn

fiery oxide Jun 19, 2024, 12:41 AM

#

We have to wait until either Novita fixes it, or another provider (Infermatic) gets added

devout crow Jun 19, 2024, 12:41 AM

#

im this 🤏 close to finding a cloud gpu host to run whatever i want

tribal dagger Jun 19, 2024, 1:14 AM

#

fiery oxide Model is very good, but Novita the provider is having troubles with it, so it's ...

It's just weird when I use chat comp, and use force instruction (lecacy mode, Llama3 instruct and instruct name). When I let the prompt format by OR, it doesn't output random words anymore, but the answers still mid

atomic mountain Jun 19, 2024, 1:39 AM

#

just discovered this model like 20 minutes ago and naturally as soon as I'm enjoying it it starts throwing 404 errors

Chat Completion API
{"code":404,"reason":"MODEL_NOT_FOUND","message":"model not found","metadata":{"reason":"model: sao10k/l3-70b-euryale-v2.1 is not available"}}

EDIT: Seems to have recovered

woeful mist Jun 19, 2024, 4:40 AM

#

tribal dagger i know Response might be silly but I don't expect that silly, like bunch of rand...

Even before i toyed with it to make quality better, i didnt have this issue? Hmm.

My larger cards it isnt handling well, but my smaller one its handling amaz-balls. So perhaps ur card data is too complicated/much for it. Its only got an 8k context. That or ur api settings and overall preset arent ideal.

#

Im toying with it abit for now but really holding out for infermatic to provide it, with their roped 16k context, and stability. I just pray they can get response times better. The few i kno using it at infermatic already the day it landed there said 200+ seconds for a response 😭

devout crow Jun 19, 2024, 4:56 AM

#

urgh that context is killing me

#

i need at least 16k

#

but imma try this anyways

pulsar edge Jun 19, 2024, 4:57 AM

#

Blank blank blank

devout crow Jun 19, 2024, 5:14 AM

#

okayn ow im just getting 405 errors

#

damn it

devout crow Jun 19, 2024, 5:33 AM

#

okay i cant lie, this is actually very nice

#

though the constant errors are really annoying

devout crow Jun 19, 2024, 6:18 AM

#

oh. thinkeyes

surreal anchor Jun 19, 2024, 6:19 AM

#

Yikes

devout crow Jun 19, 2024, 6:22 AM

#

babe wake up new p parameter just dropped

tribal dagger Jun 19, 2024, 7:43 AM

#

devout crow oh. <:thinkeyes:663445534723407881>

maybe use chat comp and untick Legacy

slim cipher Jun 19, 2024, 7:53 AM

#

Damn... the response is almost human, almost similar it is as opus. I love the writing of this model. Unfortunate for 8k context but its well damn good enough.

lilac hamlet Jun 19, 2024, 7:58 AM

#

I had plenty of gibberish, too, until I removed the system prompt that comes with the instruct preset. Could it be related to markdown? Maybe it's just a lucky coincidence.

tribal dagger Jun 19, 2024, 8:26 AM

#

creative, smart, I really like this model, I wish it have a larger context

#

why my Logit Bias not sent? (it is still sent when using wizardlm2-8x22b

sonic merlin Jun 19, 2024, 9:06 AM

#

tribal dagger why my Logit Bias not sent? (it is still sent when using wizardlm2-8x22b

Do any other provider than OpenAI even support Logit Bias?

tribal dagger Jun 19, 2024, 9:06 AM

#

well, lepton maybe

sonic merlin Jun 19, 2024, 9:07 AM

#

(also extremely tricky to get these right, as they need 100% match the correct tokens)

tribal dagger Jun 19, 2024, 9:07 AM

#

because with wizardLm2-8x22b it's still work

devout crow Jun 19, 2024, 10:34 AM

#

lilac hamlet I had plenty of gibberish, too, until I removed the system prompt that comes wit...

did the quality of the responses change after removing the prompt?

lilac hamlet Jun 19, 2024, 10:42 AM

#

devout crow did the quality of the responses change after removing the prompt?

Yeah, I'm using my own prompt in plain English, put in the lorebook – system role at depth 1.

devout crow Jun 19, 2024, 10:43 AM

#

how is it now? is it any better, or just different

#

regardless, this model is an absolute blast

#

if somehow, someway a 16k context variant can happen, ill die happy

subtle phoenix Jun 19, 2024, 1:22 PM

#

devout crow if somehow, someway a 16k context variant can happen, ill die happy

requesting it

devout crow Jun 19, 2024, 1:23 PM

#

Prayge doing the lords work

fiery oxide Jun 19, 2024, 1:40 PM

#

devout crow if somehow, someway a 16k context variant can happen, ill die happy

Stay tuned!

devout crow Jun 19, 2024, 1:42 PM

#

need it ASAP for openrouter lol

#

a 32k variant will make me pass immediately

fiery oxide Jun 19, 2024, 1:45 PM

#

devout crow a 32k variant will make me pass immediately

idk about 32K, but 16K are definitely possible
Bc Infermatic has that, and Svak confirmed to me that talks about bringing Infer to OR are going well

devout crow Jun 19, 2024, 1:46 PM

#

nuts

#

keep us updated

daring shale Jun 19, 2024, 2:58 PM

#

Yep yep

devout crow Jun 19, 2024, 3:33 PM

#

uh oh

#

quality slowly but surely just degraded

#

taking this bot back to when the tower of babel fell 😔

#

also damn novita errors a LOT

sonic merlin Jun 19, 2024, 3:43 PM

#

Last two requests: "504 Gateway Time-out"

devout crow Jun 19, 2024, 3:43 PM

#

yeah its done that a ton today

sonic merlin Jun 19, 2024, 3:45 PM

#

Worked a few hours fine, now it seems to brake down again or get overloaded -> https://openrouter.ai/models/sao10k/l3-euryale-70b/uptime

#

silk drift Jun 19, 2024, 3:50 PM

#

smh this model always goes offline like every other request when I need to use it

sonic merlin Jun 19, 2024, 3:51 PM

#

yep, it's gone for now -> 404 "model: sao10k/l3-70b-euryale-v2.1 is not available"

#

Now I got a reply again

silk drift Jun 19, 2024, 3:58 PM

#

overloaded I’m assuming; not sure what other issues would cause 3-5m intermittent blackouts

sonic merlin Jun 19, 2024, 4:00 PM

#

silk drift overloaded I’m assuming; not sure what other issues would cause 3-5m intermitten...

that could also be general work on the system, restarting/reloading etc

#

what happens when you are not google or amazon and only have limited resources

sonic merlin Jun 19, 2024, 4:19 PM

#

But now it seems overload is more likely, just got another gateway timeout

woeful mist Jun 19, 2024, 5:06 PM

#

tribal dagger why my Logit Bias not sent? (it is still sent when using wizardlm2-8x22b

Cuz the host doesnt support logit bias

outer bluff Jun 19, 2024, 7:40 PM

#

To say that this model is currently unstable is an understatement.

grand marsh Jun 19, 2024, 7:41 PM

#

The model is great but yeah the provider isn't stable.

silk lodge Jun 19, 2024, 8:36 PM

#

quick question since I never got it set up with my own presets: could one of you share the preset they're using and having good luck with?

#

whenever I use one of my own configs meant for more traditional llms I just get garbage results

sonic merlin Jun 19, 2024, 8:40 PM

#

silk lodge quick question since I never got it set up with my own presets: could one of you...

see -> #1250867165737914569 message

silk lodge Jun 19, 2024, 8:41 PM

#

sonic merlin see -> https://discord.com/channels/1091220969173028894/1250867165737914569/1252...

I meant the prompt as well

limber basin Jun 19, 2024, 8:46 PM

#

silk lodge I meant the prompt as well

https://huggingface.co/Sao10K/L3-70B-Euryale-v2.1/blob/main/Euryale-v2.1-Llama-3-Instruct.json

silk lodge Jun 19, 2024, 8:55 PM

#

Thanks. I'll try that! :D

woeful wedge Jun 19, 2024, 9:35 PM

#

Model still dead. Providers are missing out, it's a great model, guaranteed to be a money maker

subtle phoenix Jun 19, 2024, 9:36 PM

#

Adding infermatic now!

#

It's coming in ~10 mins

woeful wedge Jun 19, 2024, 9:36 PM

#

Cheers

outer bluff Jun 19, 2024, 10:03 PM

#

Since the provider of the model seems to be completely down (at least when it comes to requests for this model), I wonder why the little status blip is still showing green next to it on the site.

subtle phoenix Jun 19, 2024, 10:05 PM

#

outer bluff Since the provider of the model seems to be completely down (at least when it co...

It's cached and we're nor purging it fast enough I think

#

Tho PR is up, not 10 mins but it's getting there

outer bluff Jun 19, 2024, 10:05 PM

#

Okay, it's just more obvious than usual since the thing's been down for hours 🙂

fiery oxide Jun 19, 2024, 10:05 PM

#

subtle phoenix Adding infermatic now!

btw Infer only for Euryale now?
Or other models (Noromaid, Wizard, etc) also will get added?

subtle phoenix Jun 19, 2024, 10:06 PM

#

Just euryale for now!

silk lodge Jun 19, 2024, 10:06 PM

#

well this is an odd one.
seems the model selector in Silly is broken

fiery oxide Jun 19, 2024, 10:06 PM

#

bruh I hope that won't kill latency
Tho Svak did some optimizations today

fiery oxide Jun 19, 2024, 10:06 PM

#

silk lodge well this is an odd one. seems the model selector in Silly is broken

No, provider went down

silk lodge Jun 19, 2024, 10:07 PM

#

Ah

#

aw :(

outer bluff Jun 19, 2024, 10:07 PM

#

Check the availability tab, it's been down for some time

sonic merlin Jun 19, 2024, 10:07 PM

#

outer bluff Since the provider of the model seems to be completely down (at least when it co...

-> https://openrouter.ai/models/sao10k/l3-euryale-70b/uptime

subtle phoenix Jun 19, 2024, 10:07 PM

#

Infermatic is deployed

silk lodge Jun 19, 2024, 10:08 PM

#

for me the uptime page is completely empty lol

#

fiery oxide Jun 19, 2024, 10:08 PM

#

yeah same for me (Firefox)

sonic merlin Jun 19, 2024, 10:08 PM

#

silk lodge

Use a different browser, Safari works fine.

subtle phoenix Jun 19, 2024, 10:08 PM

#

oh wat

silk lodge Jun 19, 2024, 10:09 PM

#

#

if it helps

outer bluff Jun 19, 2024, 10:09 PM

#

Still only seeing NovitaAI in the providers list. Did a forced reload

sonic merlin Jun 19, 2024, 10:09 PM

#

Yeah Firefox is a bit too strict for this

fiery oxide Jun 19, 2024, 10:09 PM

#

outer bluff Still only seeing NovitaAI in the providers list. Did a forced reload

same

subtle phoenix Jun 19, 2024, 10:10 PM

#

outer bluff Still only seeing NovitaAI in the providers list. Did a forced reload

Deployment takes about 5 min xd

silk lodge Jun 19, 2024, 10:10 PM

#

#

same thing on Chrome 126

subtle phoenix Jun 19, 2024, 10:10 PM

#

subtle phoenix Infermatic is deployed

Ah I meant to say merged

silk lodge Jun 19, 2024, 10:10 PM

#

#

somehow it DOES work on GNOME Web / Epiphany

sonic merlin Jun 19, 2024, 10:11 PM

#

Safari works too

silk lodge Jun 19, 2024, 10:11 PM

#

So Chrome + Firefox broken but WebKit (Epiphany) works.

#

yeah GNOME Web is as close to Safari as one can get

#

without a mac

fiery oxide Jun 19, 2024, 10:11 PM

#

So only WebKit works

#

Luakit works

#

so ye

subtle phoenix Jun 19, 2024, 10:12 PM

#

works on brave for me :d

silk lodge Jun 19, 2024, 10:12 PM

#

Hmm...

outer bluff Jun 19, 2024, 10:12 PM

#

I'm using Vivaldi, which is Chromium-based like Brave, and it works fine

silk lodge Jun 19, 2024, 10:12 PM

#

my chrome is fully default

#

odd

subtle phoenix Jun 19, 2024, 10:12 PM

#

chrome works too:

silk lodge Jun 19, 2024, 10:12 PM

#

weird

fiery oxide Jun 19, 2024, 10:13 PM

#

subtle phoenix chrome works too:

hmmm maybe OS level bug
I'm using Linux (Fedora)

silk lodge Jun 19, 2024, 10:13 PM

#

same

woeful wedge Jun 19, 2024, 10:13 PM

#

Model is still toast unfortunately

subtle phoenix Jun 19, 2024, 10:13 PM

#

interesting...

#

maybe an iframe issue?

silk lodge Jun 19, 2024, 10:13 PM

#

fiery oxide hmmm maybe OS level bug I'm using Linux (Fedora)

Are you on X11 or Wayland

fiery oxide Jun 19, 2024, 10:13 PM

#

silk lodge Are you on X11 or Wayland

X11, GNOME

silk lodge Jun 19, 2024, 10:14 PM

#

myman

#

X11, GNOME

#

GNOME 46 here specifically

#

using Chrome + FF through Flatpak

#

Nvidia Propriatary drivers

fiery oxide Jun 19, 2024, 10:16 PM

#

GNOME 45
Radeon driver, FF installed from RPM Fusion

sonic merlin Jun 19, 2024, 10:16 PM

#

I always assumed this was an iframe permissions problem, Firefox is much stricter than other browsers

#

Model is up, it seems.

#

Nah, still 404 😦

#

But no instant rejection anymore

subtle phoenix Jun 19, 2024, 10:19 PM

#

#

It's up

fiery oxide Jun 19, 2024, 10:20 PM

#

Nice

sonic merlin Jun 19, 2024, 10:20 PM

#

Maybe SillyTavern needs a restart/reload to pickup the new provider

fiery oxide Jun 19, 2024, 10:20 PM

#

subtle phoenix

Still can't see it in provider list tho

subtle phoenix Jun 19, 2024, 10:21 PM

#

fiery oxide Still can't see it in provider list tho

will have to wait till that cache purge itself I think

sonic merlin Jun 19, 2024, 10:22 PM

#

Hmm. I still get 404 via API/SillyTavern, even after hard restart

subtle phoenix Jun 19, 2024, 10:22 PM

#

(tho the router is not relying on the cache to route)

outer bluff Jun 19, 2024, 10:23 PM

#

Aaand working

silk lodge Jun 19, 2024, 10:23 PM

#

works

sonic merlin Jun 19, 2024, 10:24 PM

#

still

  error: {
    message: "{\"code\":404,\"reason\":\"MODEL_NOT_FOUND\",\"message\":\"model not found\",\"metadata\":{\"reason\":\"model: sao10k/l3-70b-euryale-v2.1 is not available\"}}",
    code: 404,
  },

silk lodge Jun 19, 2024, 10:24 PM

#

rebooted Silly, re-opened the tab and boom

#

:D

fiery oxide Jun 19, 2024, 10:25 PM

#

subtle phoenix (tho the router is not relying on the cache to route)

btw maybe extended variant for Infer?
We have 16K there

subtle phoenix Jun 19, 2024, 10:25 PM

#

Yup

#

I will just fix it as-is for now, will move it to another variant when this model has more provider I think?

fiery oxide Jun 19, 2024, 10:27 PM

#

@subtle phoenix Infer doesn't log, terms were updated today

rapid juniper Jun 19, 2024, 10:27 PM

#

fiery oxide hmmm maybe OS level bug I'm using Linux (Fedora)

Im on Windows 10 - Firefox too and i cant see the provider uptime either so i think its a browser issue

subtle phoenix Jun 19, 2024, 10:28 PM

#

fiery oxide <@353228093420208131> Infer doesn't log, terms were updated today

cc @idle grotto - some update to that will be added soon

#

But basically when we include the privacy policy URL, we show that tag for ppl to visit.

sonic merlin Jun 19, 2024, 10:28 PM

#

Still 404 "model: sao10k/l3-70b-euryale-v2.1 is not available"

subtle phoenix Jun 19, 2024, 10:29 PM

#

sonic merlin Still 404 "model: sao10k/l3-70b-euryale-v2.1 is not available"

which frontend are you using?

sonic merlin Jun 19, 2024, 10:29 PM

#

subtle phoenix which frontend are you using?

SillyTavern

#

I try from the console with curl

#

Same with curl -> {"error":{"message":"{"code":404,"reason":"MODEL_NOT_FOUND","message":"model not found","metadata":{"reason":"model: sao10k/l3-70b-euryale-v2.1 is not available"}}","code":404}}

#

It took about 30 secs and produced two pages full of newlines though until the error popped up

#

Precisely:

$ time curl https://openrouter.ai/api/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer $OPENROUTER_API_KEY"   -d '{
                                   
  "model": "sao10k/l3-euryale-70b",
  "messages": [                                                
    {"role": "user", "content": "What is the meaning of life?"}
  ]
}'
[~100 newlines omitted]
{"error":{"message":"{\"code\":404,\"reason\":\"MODEL_NOT_FOUND\",\"message\":\"model not found\",\"metadata\":{\"reason\":\"model: sao10k/l3-70b-euryale-v2.1 is not available\"}}","code":404}}

real    0m41,330s
user    0m0,034s
sys    0m0,023s

subtle phoenix Jun 19, 2024, 10:41 PM

#

oh FYI, the model id on OR is: sao10k/l3-euryale-70b

#

http://localhost:3000/models/sao10k/l3-euryale-70b/status

#

Will add an alias one sec

fiery oxide Jun 19, 2024, 10:42 PM

#

sonic merlin Precisely: ```shell $ time curl https://openrouter.ai/api/v1/chat/completions ...

try to add provider: {order: ["Infermatic"]}

sonic merlin Jun 19, 2024, 10:42 PM

#

fiery oxide try to add `provider: {order: ["Infermatic"]}`

I'll try, thanks.

daring shale Jun 19, 2024, 10:42 PM

#

subtle phoenix http://localhost:3000/models/sao10k/l3-euryale-70b/status

localhost xd

subtle phoenix Jun 19, 2024, 10:43 PM

#

oh lol

fiery oxide Jun 19, 2024, 10:43 PM

#

lab's exposing OR's guts XD

subtle phoenix Jun 19, 2024, 10:43 PM

#

https://openrouter.ai/models/sao10k/l3-euryale-70b/status

L3-70B-Euryale-v2.1 – Provider Status and Load Balancing

See provider status and make a load-balanced request to L3-70B-Euryale-v2.1 - A model focused on creative roleplay from Sao10k.

Better prompt adherence.
Better anatomy / spatial awareness.
Adapts much better to unique and custom formatting / reply formats.
Very creative, lots of unique swipes.
Is not restri...

daring shale Jun 19, 2024, 10:43 PM

#

silk lodge Jun 19, 2024, 10:45 PM

#

fiery oxide lab's exposing OR's guts XD

mitaBallsniff OpenRouter is actually all just running on labs laptop.
the entire thing! /s

fiery oxide Jun 19, 2024, 10:46 PM

#

silk lodge <:mitaBallsniff:1030294112252993606> OpenRouter is actually all just running on ...

And Infermatic is powered by Svak's horde of hamsters! /s

silk lodge Jun 19, 2024, 10:46 PM

#

fiery oxide And Infermatic is powered by Svak's horde of hamsters! /s

it's actually me manually typing out the results

#

whenever you ask the AI something I go an google it

daring shale Jun 19, 2024, 10:46 PM

#

XD

subtle phoenix Jun 19, 2024, 10:47 PM

#

lolol

sonic merlin Jun 19, 2024, 10:47 PM

#

New error: "{"error":{"message":"{"error":{"message":"litellm.Timeout: APITimeoutError - Request timed out. \nerror_str: Request timed out.","type":null,"param":null,"code":408}}","code":408}}" with

time curl https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{
  "model": "sao10k/l3-euryale-70b",
  "provider": { "order: ["Infermatic"] },
  "messages": [
    {"role": "user", "content": "What is the meaning of life?"}
  ]
}'

fiery oxide Jun 19, 2024, 10:47 PM

#

bruh
@daring shale

daring shale Jun 19, 2024, 10:47 PM

#

checking

#

The model it's up and running

#

there must me something wrong on the request

sonic merlin Jun 19, 2024, 10:48 PM

#

Still the same error "{"error":{"message":"{"error":{"message":"litellm.Timeout: APITimeoutError - Request timed out. \nerror_str: Request timed out.","type":null,"param":null,"code":408}}","code":408}}

real 0m40,706s
user 0m0,032s
sys 0m0,025s" with the above command.

fiery oxide Jun 19, 2024, 10:49 PM

#

yeah, hit https://api.totalgpt.ai/v1 endpoint directly rn (Infer endpoint btw)
1s latency, all good

#

OR issue?

subtle phoenix Jun 19, 2024, 10:50 PM

#

Does it work on playground? https://openrouter.ai/playground?models=sao10k%2Fl3-euryale-70b

OpenRouter

Playground | OpenRouter

Experiment with different models and prompts

fiery oxide Jun 19, 2024, 10:50 PM

#

subtle phoenix Does it work on playground? https://openrouter.ai/playground?models=sao10k%2Fl3-...

yes, 1s latency

#

works

silk lodge Jun 19, 2024, 10:51 PM

#

Issues aside I feel like this model is still very "dry"
It's flowery but dry.

#

big step up from older OWMs

subtle phoenix Jun 19, 2024, 10:51 PM

#

sonic merlin New error: "{"error":{"message":"{\"error\":{\"message\":\"litellm.Timeout: APIT...

wait litellm?...

sonic merlin Jun 19, 2024, 10:51 PM

#

subtle phoenix Does it work on playground? https://openrouter.ai/playground?models=sao10k%2Fl3-...

Playground works, but API still gives me the timeout error

sonic merlin Jun 19, 2024, 10:51 PM

#

subtle phoenix wait litellm?...

That comes from the endpoint, I am using the curl command from above

#

Literally straight from the OpenRouter webpage, only added the provider preference:

time curl https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{
  "model": "sao10k/l3-euryale-70b",
  "provider": { "order: ["Infermatic"] },
  "messages": [
    {"role": "user", "content": "What is the meaning of life?"}
  ]
}'

#

That should work, shouldn't it?

fiery oxide Jun 19, 2024, 10:54 PM

#

sonic merlin Literally straight from the OpenRouter webpage, only added the provider preferen...

"order"
You lost a "

sonic merlin Jun 19, 2024, 10:55 PM

#

fiery oxide "order" You lost a "

Correct.

daring shale Jun 19, 2024, 10:56 PM

#

Does it work now?

sonic merlin Jun 19, 2024, 10:56 PM

#

But I still get the same llitellm error with the " added

subtle phoenix Jun 19, 2024, 10:56 PM

#

hmmmmm

daring shale Jun 19, 2024, 10:56 PM

#

If that doesn't works try streaming: true

subtle phoenix Jun 19, 2024, 10:56 PM

#

let me try

sonic merlin Jun 19, 2024, 10:58 PM

#

daring shale If that doesn't works try streaming: true

Like this?

time curl https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{
  "model": "sao10k/l3-euryale-70b",
  "provider": { "order": ["Infermatic"] },
  "streaming": true,
  "messages": [
    {"role": "user", "content": "What is the meaning of life?"}
  ]
}'

still produces: {"error":{"message":"{"error":{"message":"litellm.Timeout: APITimeoutError - Request timed out. \nerror_str: Request timed out.","type":null,"param":null,"code":408}}","code":408}}

subtle phoenix Jun 19, 2024, 10:59 PM

#

should be "stream": true

#

(not streaming)

sonic merlin Jun 19, 2024, 11:00 PM

#

subtle phoenix should be `"stream": true`

That seems to work, producing ~100 chunks instead of newlines.

#

Or maybe 1000 chunks, still streaming...

subtle phoenix Jun 19, 2024, 11:01 PM

#

hmm, try non stream but with a small max_token

fiery oxide Jun 19, 2024, 11:02 PM

#

Works in ST

daring shale Jun 19, 2024, 11:03 PM

#

@sonic merlin try again

sonic merlin Jun 19, 2024, 11:05 PM

#

daring shale <@1239250028741263452> try again

OK, now it works with SillyTavern too, thanks.

daring shale Jun 19, 2024, 11:06 PM

#

You can disable streaming too if you want

#

ping me if anything happens

sonic merlin Jun 19, 2024, 11:07 PM

#

daring shale You can disable streaming too if you want

Works without streaming, too (I never turned it on, I prefer complete replies for some reason).

daring shale Jun 19, 2024, 11:07 PM

#

Okey, good to know

lofty stratus Jun 19, 2024, 11:11 PM

#

Wtf every paragraph from Magnum starts with stuff like "He X, Y'ing", "She blinks, taken aback by", or "She looks away, her cheeks flushing".

fiery oxide Jun 19, 2024, 11:13 PM

#

lofty stratus Wtf every paragraph from Magnum starts with stuff like "He X, Y'ing", "She blink...

Something with your settings perhaps?
Magnum didn't have this problem when I hosted it today for Infer's community
I used temp 0.87, tfs 0.3 and rep pen 1.08

#

also I prob should create a Magnum thread lmao

sonic merlin Jun 19, 2024, 11:30 PM

#

FYI: Infermatic needs to added to SillyTavern as an OpenRouter provider in public/scripts/textgen-models.js (just mention it here as a quick fix)

#

(to hard route and avoid the 404 errors)

lofty stratus Jun 19, 2024, 11:35 PM

#

Second one isn't so bad.

lofty stratus Jun 19, 2024, 11:39 PM

#

lofty stratus Wtf every paragraph from Magnum starts with stuff like "He X, Y'ing", "She blink...

I guess that one wasn't a fair comparison since I was using another model before switching which devolved into Action "Speech" mini paragraphs.

fiery oxide Jun 20, 2024, 12:17 AM

#

bruh Novita is in 4bit lmao?

#

did they quant Wizard then, Im curious

subtle phoenix Jun 20, 2024, 12:18 AM

#

Talking to them, they might do int8 or fp16 eventually

fiery oxide Jun 20, 2024, 12:19 AM

#

subtle phoenix Talking to them, they might do int8 or fp16 eventually

is all their big stuff in int4 or just Euryale?

subtle phoenix Jun 20, 2024, 12:19 AM

#

fiery oxide is all their big stuff in int4 or just Euryale?

Confirmed for just euryale yeah

tribal dagger Jun 20, 2024, 12:19 AM

#

SillyTavern don't have an infer provider option yet, how can I switch to a preferred provider on OR?

fiery oxide Jun 20, 2024, 12:20 AM

#

tribal dagger SillyTavern don't have an infer provider option yet, how can I switch to a prefe...

Requests should default to Infer for now

#

ST will add it soon enough prob

sonic merlin Jun 20, 2024, 12:20 AM

#

tribal dagger SillyTavern don't have an infer provider option yet, how can I switch to a prefe...

Hand code it for now -> #1250867165737914569 message

fiery oxide Jun 20, 2024, 12:20 AM

#

sonic merlin Hand code it for now -> https://discord.com/channels/1091220969173028894/1250867...

doesn't seem to be necessary

sonic merlin Jun 20, 2024, 12:21 AM

#

fiery oxide doesn't seem to be necessary

Without it I can't force the provider to Infermatic in SillyTavern and get 404 errors (at least at that time)

fiery oxide Jun 20, 2024, 12:23 AM

#

sonic merlin Without it I can't force the provider to Infermatic in SillyTavern and get 404 e...

hm all reqs go to Infer for me

devout crow Jun 20, 2024, 12:24 AM

#

wow. this is a first lol

sonic merlin Jun 20, 2024, 12:25 AM

#

fiery oxide hm all reqs go to Infer for me

Didn't work for me at that point.Anyway, the patch/diff is brain dead simple:

diff --git a/public/scripts/textgen-models.js b/public/scripts/textgen-models.js
index d8f36cf4..01743e0c 100644
--- a/public/scripts/textgen-models.js
+++ b/public/scripts/textgen-models.js
@@ -39,6 +39,7 @@ const OPENROUTER_PROVIDERS = [
     'Novita',
     'Lynn',
     'Lynn 2',
+    'Infermatic',
 ];
 
 export async function loadOllamaModels(data) {

devout crow Jun 20, 2024, 12:25 AM

#

i dont even know if im being routed to infermatic becuase this is all i get in the activity page lol

#

#

the infamous Shadow Provider

fiery oxide Jun 20, 2024, 12:26 AM

#

devout crow wow. this is a first lol

Claude training data moment

subtle phoenix Jun 20, 2024, 12:26 AM

#

devout crow

wait wat

devout crow Jun 20, 2024, 12:26 AM

#

yeah lol

#

its just blank

#

if i hover over the blank space it says "Unknown Provider"

fiery oxide Jun 20, 2024, 12:26 AM

#

it's there for me, weird

subtle phoenix Jun 20, 2024, 12:26 AM

#

hmm

#

works for me :d

tribal dagger Jun 20, 2024, 12:26 AM

#

devout crow if i hover over the blank space it says "Unknown Provider"

void provider

#

work for me too lol

devout crow Jun 20, 2024, 12:27 AM

#

mmmmmmmmmm

#

let me try a new browser ig

#

nvm lol all good ig

sonic merlin Jun 20, 2024, 12:27 AM

#

FWIW I see the correct provider (Infermatic) in my usage

devout crow Jun 20, 2024, 12:28 AM

#

damn

tribal dagger Jun 20, 2024, 12:30 AM

#

@fiery oxide btw, recommended paraments setting please?

fiery oxide Jun 20, 2024, 12:35 AM

#

tribal dagger <@718785396485390346> btw, recommended paraments setting please?

temp at 1.25 or 1.5 and min_p 0.1
And maybe some presence penalty like 0.3-0.5

tribal dagger Jun 20, 2024, 12:36 AM

#

fiery oxide temp at 1.25 or 1.5 and min_p 0.1 And maybe some presence penalty like 0.3-0.5

0,01 or 0,10? that high?

fiery oxide Jun 20, 2024, 12:36 AM

#

tribal dagger 0,01 or 0,10? that high?

yes

#

0.1

tribal dagger Jun 20, 2024, 12:37 AM

#

damn, kinda high, i usually just set 0,02

fiery oxide Jun 20, 2024, 12:37 AM

#

tribal dagger damn, kinda high, i usually just set 0,02

You prob don't use temp 1.5 usually

tribal dagger Jun 20, 2024, 12:37 AM

#

i just use 1

fiery oxide Jun 20, 2024, 12:37 AM

#

tribal dagger i just use 1

This model performs better at high temps imho

tribal dagger Jun 20, 2024, 12:38 AM

#

fiery oxide You prob don't use temp 1.5 usually

ok i'll try it

fiery oxide Jun 20, 2024, 12:38 AM

#

tribal dagger ok i'll try it

basically, what I'm recommending is Universal Light or Universal Creative presets in ST

tribal dagger Jun 20, 2024, 12:40 AM

#

fiery oxide basically, what I'm recommending is Universal Light or Universal Creative preset...

this too?

fiery oxide Jun 20, 2024, 12:41 AM

#

tribal dagger this too?

yeah

tribal dagger Jun 20, 2024, 12:47 AM

#

infer quantz it? cuz Novita seem does and that's why it suck

fiery oxide Jun 20, 2024, 12:58 AM

#

tribal dagger infer quantz it? cuz Novita seem does and that's why it suck

Infermatic does not quant models
All models are in their native precision
So Euryale is in bf16

#

Also Novita has straight up broken quant

#

bc AWQ 4bit Euryale should be that bad

daring shale Jun 20, 2024, 1:02 AM

#

Infermatic upvote

devout crow Jun 20, 2024, 2:28 AM

#

dang, not gonna lie, these generation times are pretty slow

#

like on average 30-40 seconds

#

im a patient lad though

tribal dagger Jun 20, 2024, 2:36 AM

#

devout crow like on average 30-40 seconds

30-50 sec, but it's good enough for me to sit and wait lol

devout crow Jun 20, 2024, 2:37 AM

#

yup lol

#

is this the classic Infermatic Is So Slow?!???!!!111/// thing ive been reading on reddit or

fiery oxide Jun 20, 2024, 2:37 AM

#

slow, but sure

#

At least we are getting you the good stuff

tribal dagger Jun 20, 2024, 2:37 AM

#

yeah Infer's problem is they slow

#

but sure good quality

fiery oxide Jun 20, 2024, 2:38 AM

#

tribal dagger yeah Infer's problem is they slow

speeding up is a pain in the a
but work on it is ongoing

tribal dagger Jun 20, 2024, 2:38 AM

#

as long if it is not too long to a few minutes then it accepts able

fiery oxide Jun 20, 2024, 2:38 AM

#

(also, yes, I'm sort of an informal Infer rep there)

fiery oxide Jun 20, 2024, 2:39 AM

#

tribal dagger as long if it is not too long to a few minutes then it accepts able

It's pretty much never that long unless you are genning 3000+ tokens

devout crow Jun 20, 2024, 2:43 AM

#

i can handle this speed

#

i handled the dark era of Pygmalion on google colab getting fuckin 0.5t/s

#

pepesalute godspeed fellas

fiery oxide Jun 20, 2024, 3:51 AM

#

Seeing downtime, Svak is looking

devout crow Jun 20, 2024, 3:53 AM

#

yeah quality slowly went down

#

till it just. well, died

#

fiery oxide Jun 20, 2024, 3:54 AM

#

devout crow yeah quality slowly went down

You are prob getting Novita rn

#

Should soon be fixed

#

We are back up!

fiery oxide Jun 20, 2024, 4:02 AM

#

devout crow yeah quality slowly went down

try now

devout crow Jun 20, 2024, 4:02 AM

#

on it boss AmeliaSaluteSmol

fiery oxide Jun 20, 2024, 4:02 AM

#

Context window temporarily capped at 8K

devout crow Jun 20, 2024, 4:02 AM

#

awwwwww

fiery oxide Jun 20, 2024, 4:02 AM

#

bc vLLM keeps crashing

devout crow Jun 20, 2024, 4:03 AM

#

should i just select infermatic only? idk wtf is happening with novita

fiery oxide Jun 20, 2024, 4:03 AM

#

devout crow should i just select infermatic only? idk wtf is happening with novita

yeah you should

#

Novita has a broken quant

devout crow Jun 20, 2024, 4:05 AM

#

gotcha

#

keep us updated for when its back to 16k

#

ty

tribal dagger Jun 20, 2024, 4:07 AM

#

so when will OR bring larger context variant?

fiery oxide Jun 20, 2024, 4:07 AM

#

tribal dagger so when will OR bring larger context variant?

when we fix it
hopefully soon

#

It was at 16K, actually

#

before it started crashing

#

bruh why can't any inference engine just work

daring shale Jun 20, 2024, 4:10 AM

#

fr

fiery oxide Jun 20, 2024, 4:59 AM

#

bruh downtime again

#

vLLM, can you be normal for once?!

fiery oxide Jun 20, 2024, 5:08 AM

#

fiery oxide bruh downtime again

seems to be fine now

devout crow Jun 20, 2024, 11:32 AM

#

okay i cant get enough of this model i swear to god

#

these chats are hype af

tribal dagger Jun 20, 2024, 1:18 PM

#

responses feel weird now, im sure got it from Infer

devout crow Jun 20, 2024, 1:28 PM

#

hm. they do feel slightly different huh

daring shale Jun 20, 2024, 1:31 PM

#

what do you mean?

devout crow Jun 20, 2024, 1:33 PM

#

idk they just feel kinda off in a way

#

maybe im just seeing shit, ill chat more

daring shale Jun 20, 2024, 1:35 PM

#

okey, lmk

daring shale Jun 20, 2024, 2:04 PM

#

Context fixed, now back to 16K

devout crow Jun 20, 2024, 2:07 PM

#

nature is healing

tribal dagger Jun 20, 2024, 2:27 PM

#

why my sillytavern still max at 8k? I usually don't need to unlock context to max context using chat comp

#

sonic merlin Jun 20, 2024, 2:28 PM

#

tribal dagger

OpenRouter might still clamp the context size to 8k?

devout crow Jun 20, 2024, 2:40 PM

#

i unlocked mine anyways

tribal dagger Jun 20, 2024, 2:43 PM

#

devout crow i unlocked mine anyways

work?

devout crow Jun 20, 2024, 2:44 PM

#

hold on

#

er... shit, it seems like its not

#

@daring shale hate to ping you man but is there a delay for the 16k update?

#

its still capped at 8k

#

wont budge here

#

even at unlocked mode and set to 16k

daring shale Jun 20, 2024, 2:49 PM

#

what

#

let me see

#

#

My st is working with the 16k tokens

devout crow Jun 20, 2024, 2:51 PM

#

hmmmmmmmmmmm

#

i even restarted twice

daring shale Jun 20, 2024, 2:52 PM

#

aaand it's down again

#

It'll be up in a minute

devout crow Jun 20, 2024, 2:52 PM

#

AOURGH

tribal dagger Jun 20, 2024, 2:52 PM

#

daring shale It'll be up in a minute

medic!

vapid hornet Jun 20, 2024, 2:57 PM

#

Amateurs

woeful wedge Jun 20, 2024, 2:59 PM

#

The price was doubled?

devout crow Jun 20, 2024, 2:59 PM

#

seems expensive af now yeah

#

rip my credits

woeful wedge Jun 20, 2024, 2:59 PM

#

Still worth it, but yeah, a quiet price bump is kins of a low blow

daring shale Jun 20, 2024, 2:59 PM

#

It's working now

#

and the context it's working too

woeful wedge Jun 20, 2024, 2:59 PM

#

And if you're bumping it, it better work at least

daring shale Jun 20, 2024, 3:00 PM

#

Idk why it won't let you access

devout crow Jun 20, 2024, 3:00 PM

#

infermatic isnt even on the provider list for me lol

#

unless thats intended

daring shale Jun 20, 2024, 3:01 PM

#

woeful wedge And if you're bumping it, it better work at least

well not exactly bumping it, that was a miscalculation. NovitAI is giving you int4 and 8k tokens charging you 0.75. We're giving you double context and 4x precision for 1.8

#

lel

daring shale Jun 20, 2024, 3:01 PM

#

devout crow infermatic isnt even on the provider list for me lol

@subtle phoenix

#

I can't help with that one

subtle phoenix Jun 20, 2024, 3:02 PM

#

devout crow infermatic isnt even on the provider list for me lol

looks like ST

#

cc @wet marten

devout crow Jun 20, 2024, 3:02 PM

#

darn.

woeful wedge Jun 20, 2024, 3:02 PM

#

daring shale well not exactly bumping it, that was a miscalculation. NovitAI is giving you in...

So you're a new provider of the model?

daring shale Jun 20, 2024, 3:02 PM

#

woeful wedge So you're a new provider of the model?

we're Infermatic.ai

#

Infermatic

woeful wedge Jun 20, 2024, 3:03 PM

#

Ah, I see. Don't know who you are, but I hope yours doesn't die every half hour

wet marten Jun 20, 2024, 3:03 PM

#

subtle phoenix looks like ST

The list is hardcoded

#

If you can give an API for that, that'll be better

sonic merlin Jun 20, 2024, 3:04 PM

#

subtle phoenix looks like ST

It is a really simple patch -> #1250867165737914569 message

devout crow Jun 20, 2024, 3:05 PM

#

sigh im just gonna stop chatting for now

#

im burning through credits and getting some real bad hallucinations lol

wet marten Jun 20, 2024, 3:05 PM

#

Maybe it's you

devout crow Jun 20, 2024, 3:05 PM

#

did manage to get through 8k though

#

still dont know why i cant do 16k

#

ill probably do a reinstall idk

daring shale Jun 20, 2024, 3:06 PM

#

What version of st are you using?

devout crow Jun 20, 2024, 3:06 PM

#

latest

daring shale Jun 20, 2024, 3:06 PM

#

staging?

devout crow Jun 20, 2024, 3:06 PM

#

nope shouldnt be

#

standard 1.12.1

daring shale Jun 20, 2024, 3:08 PM

#

well try again and lmk

devout crow Jun 20, 2024, 3:11 PM

#

roughly stuck around this token count

daring shale Jun 20, 2024, 3:13 PM

#

why don't you try something that isn't ST to test if it's your api key or ST?

devout crow Jun 20, 2024, 3:13 PM

#

funny enough i just made a new OR key just to test

#

gimmie a while

daring shale Jun 20, 2024, 3:21 PM

#

alr

devout crow Jun 20, 2024, 3:23 PM

#

mmm

#

not working on venus either

#

i must be subconsciously doing a big oopsie lol

#

hmmmmmmmmmmmmm

#

funny enough even the lst here still says 8k context

#

im gonna stop replying now cause im burning a hole thru my credits now lol

sonic merlin Jun 20, 2024, 3:27 PM

#

devout crow funny enough even the lst here still says 8k context

That number comes from OpenRouter, if the context size in their config does not get changed (hard if provider offer different context sizes)

devout crow Jun 20, 2024, 3:27 PM

#

damn.

woeful wedge Jun 20, 2024, 3:43 PM

#

The replies are much smaller with Infermatic

#

And it rushes to complete the instruction/story as well. At that price tag, one should expect more, not less and worse.

#

Novita kept crashing, which was annoying, but it output a lot more and the overall quality and coherence was better than now.

sonic merlin Jun 20, 2024, 4:01 PM

#

woeful wedge Novita kept crashing, which was annoying, but it output a lot more and the overa...

I tend to agree after more than a hundred replies now, though such comparisons are still very hard with probability based systems like LLMs. But at least at the shorter replies part I think I can see that in numbers in my activity list.

woeful wedge Jun 20, 2024, 4:02 PM

#

Even at temp of 1 it goes fully unhinged for no reason, far beyond what one would call creative

#

Just comes up with the most random shit for no reason, while yesterday its creativity was stellar with same settings

woeful wedge Jun 20, 2024, 4:18 PM

#

Whatever was done recently, it had a very bad effect on the model's output length and coherence.

tribal dagger Jun 20, 2024, 4:38 PM

#

hell nah endpoint start poiting to Novita now

woeful wedge Jun 20, 2024, 4:48 PM

#

Damn can we have Novita back, can't believe I'm asking for it 😭

subtle phoenix Jun 20, 2024, 4:49 PM

#

woeful wedge Damn can we have Novita back, can't believe I'm asking for it 😭

it's still there fyi

woeful wedge Jun 20, 2024, 4:50 PM

#

It's marked as yellow, does that mean it's down?

subtle phoenix Jun 20, 2024, 4:50 PM

#

No, just "degraded" (since it was 404 a tons earlier)

woeful wedge Jun 20, 2024, 4:51 PM

#

Is it a complicated process to change it in ST?

#

Is Infematic running a quant of Euryale? I just can't understand why is it so ass compared to yesterday's Novita

daring shale Jun 20, 2024, 4:54 PM

#

Maybe it's a settings thing, from all the users you are the one having issues. If you want settings recommendations feel free to join the discord and tweak with them #1253005075064819844 message

#

We are not running quant on Euryale, as I already say we are full FP16

woeful wedge Jun 20, 2024, 4:57 PM

#

That's why I don't understand it even more so. I'm using the settings reccomended to me here which worked wonderfully on Novita (Which I'm guessing is 4 bit?) And no, I'm not the only one, at least one more user complained below my insight. The system prompt is pulled from the model's hugginface page, which also worked great with Novita.

#

Like come on, I'm paying double the price and getting half of what I did before and it halucinates like crazy? That's terrible.

wet marten Jun 20, 2024, 5:19 PM

#

sonic merlin Didn't work for me at that point.Anyway, the patch/diff is brain dead simple: ``...

Casual reminder that applying unofficial patches WILL cause merge conflicts on pull

tribal dagger Jun 20, 2024, 5:36 PM

#

it keep point to Novita fu*k

limber basin Jun 20, 2024, 9:06 PM

#

wet marten Casual reminder that applying unofficial patches WILL cause merge conflicts on p...

If the person doing is dev, it shouldn't come as a surprise lol

wet marten Jun 20, 2024, 9:20 PM

#

limber basin If the person doing is dev, it shouldn't come as a surprise lol

I'm not concerned about devs. The problem is that every now and then there are support cases with merge conflicts usual peeps got from random Reddit/Discord patches. Dev should ideally do patch in a pull request to upstream if it is something valuable, otherwise it will backfire at me later. Hope I made it clear.

faint olive Jun 20, 2024, 10:37 PM

#

Novita seems to work fine, I didn't notice any quality differences to infermatic.

Just make sure you are using this formatting and this sampling settings.

fiery oxide Jun 20, 2024, 10:53 PM

#

eh, Novita hallucinates hard after 4k

faint olive Jun 20, 2024, 10:55 PM

#

faint olive Novita seems to work fine, I didn't notice any quality differences to infermatic...

I take it back, Aetherwiing is correct

devout crow Jun 21, 2024, 12:17 AM

#

welp, its a new day so

#

i hope that with a new chat i can break through the 8k barrier mark now lol

devout crow Jun 21, 2024, 7:19 AM

#

think its dead again

#

nvm back up now

#

also, i went ahead and installed a new fresh copy of sillytavern and it still displays 8k context

sonic merlin Jun 21, 2024, 7:35 AM

#

devout crow also, i went ahead and installed a new fresh copy of sillytavern and it still di...

Expected. The context size comes verbatim from OpenRouter's API.

devout crow Jun 21, 2024, 7:35 AM

#

yeah but im also capped at 8k AOURGH

#

like it wont go beyond 8.5k tokens for whatever shitass reason

#

pissing me off

#

i figured it was just a visual glitch at first

faint olive Jun 21, 2024, 11:38 AM

#

fiery oxide eh, Novita hallucinates hard after 4k

Novita upgraded to fp8, I think it should be same as Infermatic. Infermatic hallucinates aswell, it's the checkpoint.

subtle phoenix Jun 21, 2024, 11:38 AM

#

Novita updated to fp8 and also with 16k extended context tokens fyi

faint olive Jun 21, 2024, 11:39 AM

#

subtle phoenix Novita updated to fp8 and also with 16k extended context tokens fyi

btw they literally upgraded after I asked them, pretty nice work

subtle phoenix Jun 21, 2024, 11:40 AM

#

faint olive btw they literally upgraded after I asked them, pretty nice work

It's been in the work for the past 2 days, but yeah they're great

faint olive Jun 21, 2024, 11:42 AM

#

Oh alright.

sonic merlin Jun 21, 2024, 11:43 AM

#

subtle phoenix Novita updated to fp8 and also with 16k extended context tokens fyi

OpenRouter API still says 8k though, but I guess that will change soon?

subtle phoenix Jun 21, 2024, 11:43 AM

#

sonic merlin OpenRouter API still says 8k though, but I guess that will change soon?

prob cached :d

#

updated for me:

#

(the base model is still 8k fyi, but provider can do their own max output via Rope/yarn etc...)

faint olive Jun 21, 2024, 11:45 AM

#

I hope that I can implement my API at some point, I'm working on a framework that might be able to give ridiculous amounts of context but thats slightly off topic

sonic merlin Jun 21, 2024, 11:46 AM

#

subtle phoenix prob cached :d

https://openrouter.ai/api/v1/models -> 8k for Euryale for me

sonic merlin Jun 21, 2024, 11:46 AM

#

subtle phoenix (the base model is still 8k fyi, but provider can do their own max output via Ro...

Ah, ok.

devout crow Jun 21, 2024, 11:52 AM

#

#

Interesting.

#

this is novita btw lol

#

sigh as much as i love euryale i really gotta stop, my credits are sucking into a black hole

sonic merlin Jun 21, 2024, 11:54 AM

#

devout crow this is novita btw lol

Novita has stopped working for me for some reason, with exactly the same settings Infermatic is fine

devout crow Jun 21, 2024, 11:54 AM

#

i cant select infermatic as my sole provider lol fml

sonic merlin Jun 21, 2024, 11:56 AM

#

devout crow i cant select infermatic as my sole provider lol fml

That is a one line fix in public/scripts/textgen-models.js

devout crow Jun 21, 2024, 11:56 AM

#

mmmmm but wont this cause a merge conflict for future updates?

#

for when the update gets pulled or did i read wrong a couple days ago

#

ehhhh i guess i could just back up the js file and restore it for when an update happens

sonic merlin Jun 21, 2024, 11:59 AM

#

devout crow mmmmm but wont this cause a merge conflict for future updates?

then either learn a bit git (e.g. git stash) or wait or ask the ST devs that they should create proper, installable build artifacts, which can be modified without producing git merge conflicts every time. git is not a consumer deploy scheme

#

Abusing git and then telling others not to modify their code is a bit crazy IMHO.

fiery oxide Jun 21, 2024, 4:24 PM

#

Infermatic's price for Euryale was reduced to 1.5$/M, and the precision still stays at bf16

woeful wedge Jun 21, 2024, 4:43 PM

#

Does Infermatic enforce shorter responses on their side for Euryale? I can't combat it no matter how hard I try

#

The reason is always 'stop'

fiery oxide Jun 21, 2024, 4:43 PM

#

woeful wedge Does Infermatic enforce shorter responses on their side for Euryale? I can't com...

no, we do not

woeful wedge Jun 21, 2024, 4:45 PM

#

And it's extremely varried too. Sometimes it will give me a few hundred, still short by a good margin and sometimes it will just rush the completion in 50-100 tokens.

fiery oxide Jun 21, 2024, 4:45 PM

#

longer response length is an artifact of quantization, likely
fp8 was tested internally today, and while this tends to give lengthier output, it loses significantly in coherence and instruction following, so we decided against even trying it

#

(fp8 never was on public endpoint, to be clear)

woeful wedge Jun 21, 2024, 4:46 PM

#

So the model encourages to take it bit by bit in a way?

#

I wouldn't mind the shorter replies if it left potential for continuation, which it often doesn't. Is it because my instruct is too direct maybe?

fiery oxide Jun 21, 2024, 4:47 PM

#

woeful wedge So the model encourages to take it bit by bit in a way?

hmm, possibly
But weird thing is that my average response length is 300+ tokens

fiery oxide Jun 21, 2024, 4:48 PM

#

woeful wedge I wouldn't mind the shorter replies if it left potential for continuation, which...

It's definitely possible
Try instruct preset made by creator of the model if you haven't already
https://huggingface.co/Sao10K/L3-70B-Euryale-v2.1/blob/main/Euryale-v2.1-Llama-3-Instruct.json
I tend to get decently long, detailed output with it and temp 1.25 min_p 0.1

Euryale-v2.1-Llama-3-Instruct.json · Sao10K/L3-70B-Euryale-v2.1 at ...

woeful wedge Jun 21, 2024, 4:49 PM

#

That's the one I'm using. Worked wonders with Novita, which was a 4 bit I think. But as you said, the length is an artifact of quant?

fiery oxide Jun 21, 2024, 4:49 PM

#

fiery oxide It's definitely possible Try instruct preset made by creator of the model if you...

the thing about all L3 based model I see is that they get lazy on some cards tho

fiery oxide Jun 21, 2024, 4:50 PM

#

woeful wedge That's the one I'm using. Worked wonders with Novita, which was a 4 bit I think....

Yeah, that's a possibility
Model generated more on fp8, but was less coherent

woeful wedge Jun 21, 2024, 4:50 PM

#

fiery oxide the thing about all L3 based model I see is that they get lazy on some cards tho

Anything to be on the lookout for? Maybe my cards need work

fiery oxide Jun 21, 2024, 4:51 PM

#

woeful wedge Anything to be on the lookout for? Maybe my cards need work

lengthy example dialogue usually helps to mitigate it
Longer first messages do too
Tho I still haven't figured out what exactly causes it

fiery oxide Jun 21, 2024, 4:52 PM

#

woeful wedge Anything to be on the lookout for? Maybe my cards need work

I also recommend trying appending some output length instructions to last assistant prefix

woeful wedge Jun 21, 2024, 4:53 PM

#

As in tell it directly how long I want it to be?

#

How should that be phrased? Word count/token length/paragraph wise?

fiery oxide Jun 21, 2024, 4:54 PM

#

woeful wedge As in tell it directly how long I want it to be?

Yeah, something akin to "Your next response should be three paragraphs long"

#

also, I don't recommend using repetition penalty with this model, seems to cause weird artifacts
I recommend using presence penalty instead

woeful wedge Jun 21, 2024, 5:01 PM

#

I'll try your tips out, thank you!

sonic merlin Jun 21, 2024, 6:45 PM

#

fiery oxide It's definitely possible Try instruct preset made by creator of the model if you...

This configuration seems to include a very long, restrictive system prompt, with optional identifiers that don't get used, sure this is the best thing since sliced bread?

    "system_prompt": "Currently, your role is {{char}}, described in detail below. As {{char}}, continue the narrative exchange with {{user}}.\n\n<Guidelines>\n• Maintain the character persona but allow it to evolve with the story.\n• Be creative and proactive. Drive the story forward, introducing plotlines and events when relevant.\n• All types of outputs are encouraged; respond accordingly to the narrative.\n• Include dialogues, actions, and thoughts in each response.\n• Utilize all five senses to describe scenarios within {{char}}'s dialogue.\n• Use emotional symbols such as \"!\" and \"~\" in appropriate contexts.\n• Incorporate onomatopoeia when suitable.\n• Allow time for {{user}} to respond with their own input, respecting their agency.\n• Act as secondary characters and NPCs as needed, and remove them when appropriate.\n• When prompted for an Out of Character [OOC:] reply, answer neutrally and in plaintext, not as {{char}}.\n</Guidelines>\n\n<Forbidden>\n• Using excessive literary embellishments and purple prose unless dictated by {{char}}'s persona.\n• Writing for, speaking, thinking, acting, or replying as {{user}} in your response.\n• Repetitive and monotonous outputs.\n• Positivity bias in your replies.\n• Being overly extreme or NSFW when the narrative context is inappropriate.\n</Forbidden>\n\nFollow the instructions in <Guidelines></Guidelines>, avoiding the items listed in <Forbidden></Forbidden>."

fiery oxide Jun 21, 2024, 6:49 PM

#

sonic merlin This configuration seems to include a very long, restrictive system prompt, with...

I've tested it with this, and with just llama-3-instruct names preset, tbh personally I like output with this more
There's no thing such as 100% optimal prompt, but this seems to work
You can try some other presets from our server, some people like them more
https://discord.com/channels/1115287912385351730/1253005075064819844

sonic merlin Jun 21, 2024, 7:07 PM

#

fiery oxide I've tested it with this, and with just llama-3-instruct names preset, tbh perso...

At a second look and some experimentation I have to revise, this prompt does make sense. I was wrong about the unused identifiers, sorry for that.

faint olive Jun 21, 2024, 7:23 PM

#

Infermatic and novita are same cost, damn price wars

fiery oxide Jun 21, 2024, 7:52 PM

#

faint olive Infermatic and novita are same cost, damn price wars

imagine paying 1.5$ for fp8, when you can get bf16 for the same price

faint olive Jun 21, 2024, 7:52 PM

#

fiery oxide imagine paying 1.5$ for fp8, when you can get bf16 for the same price

Like 5 times slower

fiery oxide Jun 21, 2024, 7:53 PM

#

faint olive Like 5 times slower

we'll see if this can be improved

faint olive Jun 21, 2024, 7:53 PM

#

Hopefully

woeful wedge Jun 21, 2024, 9:16 PM

#

#

Is something wrong on the provider end? I've got charged for 3 blanks in a row and it's a fully SFW scenario

woeful wedge Jun 21, 2024, 9:20 PM

#

fiery oxide Yeah, something akin to "Your next response should be three paragraphs long"

This might be cause of the four consecutive blanks I've experienced, as removing it now fixed the issue

fiery oxide Jun 21, 2024, 9:24 PM

#

woeful wedge This might be cause of the four consecutive blanks I've experienced, as removing...

oh my bad, I forgot to say it needs to be formatted like a system message with assistant message start at the end

woeful wedge Jun 21, 2024, 9:25 PM

#

Ah, I see, I'll try that instead

devout crow Jun 22, 2024, 8:10 AM

#

the hell? lol

#

surreal anchor Jun 22, 2024, 8:23 AM

#

😔

woeful wedge Jun 22, 2024, 9:05 AM

#

devout crow the hell? lol

Yeah, keeps happening to me too and nothing seems to fix it

#

It just keeps stopping the gen whenever it feels like it

devout crow Jun 22, 2024, 9:23 AM

#

i had to edit some messages to get rid of it

#

this only happens with some characters

devout crow Jun 22, 2024, 9:41 AM

#

okay nvm its pretty prominent now wtf

#

stopping now cuz im tired of wasting my credits

#

dunno if this is a card issue or a provider issue

subtle phoenix Jun 22, 2024, 9:55 AM

#

devout crow stopping now cuz im tired of wasting my credits

interesting.... that's way below it's context size

#

oh nvm is the very last one some kind of retry?

sonic merlin Jun 22, 2024, 9:58 AM

#

This might be just a feeling, but I have the impression that many of fine-tuned/abliterated models like this one are bit too 'frankensteiny', too unstable.

bold sphinx Jun 22, 2024, 9:58 AM

#

huhcat

devout crow Jun 22, 2024, 10:21 AM

#

subtle phoenix oh nvm is the very last one some kind of retry?

no, that was a whole reply

#

like i swapped to a new character, got that, rubbed my eyes cuz i thought i was hallucinating

#

but yes im getting really short replies, it sucks

woeful wedge Jun 22, 2024, 1:14 PM

#

I have to fallback to Wizard until or if this gets sorted. And I swear this model's coherence is a little questionable as it is right now, but it could be just my pink glasses when I tried it out for the first time with Novita

devout crow Jun 22, 2024, 1:28 PM

#

okay i think i figured out the problem

#

it was the example messages, if they're short, then it wont generate anything longer than the example message no matter what you do

woeful wedge Jun 22, 2024, 1:28 PM

#

How short are we talking? Did you test what happens if there are no example messages at all?

devout crow Jun 22, 2024, 1:29 PM

#

yes, they became noticably longer

woeful wedge Jun 22, 2024, 1:29 PM

#

Hmm, I'll try this out too.

devout crow Jun 22, 2024, 1:30 PM

#

go for it, it helped for me

woeful wedge Jun 22, 2024, 2:30 PM

#

It kind of works, but it's not a reliable fix. I noticed an improvement however. Now another issue I experience is the repetition. I have my presence at 0.45

devout crow Jun 22, 2024, 3:17 PM

#

same. very annoying. i also noticed a really weird issue where it just wont format dialogue with quotation marks

#

even though i explicitly ask it to

woeful mist Jun 22, 2024, 5:24 PM

#

woeful wedge I have to fallback to Wizard until or if this gets sorted. And I swear this mode...

Lmk if it isnt the same. I swear yesterday sometbing broke wit wiz.

surreal anchor Jun 22, 2024, 8:09 PM

#

😭

woeful wedge Jun 22, 2024, 8:44 PM

#

woeful mist Lmk if it isnt the same. I swear yesterday sometbing broke wit wiz.

I still feel like the 'logic' of Euryale is..off. It just doesn't make connections like it used to when it was introduced. I know everyone keeps telling me that this is the full model as it was intended, yet I still feel like it never topped the quant by Novita? But since I got to use so little of it I just can't distinguish if it's my nostalgia bias or if that was actually the case. Wizard has been off too recently, though even in its current weird state, I feel like Euryale beats it in terms of dialogue. Feels more natural, but it still has that ultra frustrating tendency of becoming formal real quick

woeful mist Jun 22, 2024, 8:46 PM

#

woeful wedge I still feel like the 'logic' of Euryale is..off. It just doesn't make connectio...

Euryale def beats wiz’s dialogue. Wiz’s weakness is lifelike dialogue. I havent gotten deep into trying euryale. Been focused on trying to figure out wtf is going on with wizard atm. If u go to test wizard at all, and notice differences, plz share in the wizard channel. Maybe read what i aether have noticed and see if ur also noticing similar behaviors.

#

If wiz persists to be trash ima try euryale and get heavy into forming instructs and proper settings like i had for wiz. If i can control euryale’s short response issue and grt a decent quality going ill share what instructs income up wit

#

Also its been recommended to use infermatic provider for euryale.

woeful wedge Jun 22, 2024, 8:48 PM

#

My activity tab shows that I've been thrown around like 3-4 providers of Wizard, like literally one reply in between, so it's a little hard to say exactly who's doing a poor job and who's doing well unless I really keep a tab on it

woeful mist Jun 22, 2024, 8:48 PM

#

I think its the default now. But if u r using it where u can select provider, select infer and see if its ne better/worse. Its logic has been an issue,as is thr case with any L3 model.

woeful mist Jun 22, 2024, 8:49 PM

#

woeful wedge My activity tab shows that I've been thrown around like 3-4 providers of Wizard,...

Oh i use a frontend that lets u select. Risu, ST, and ORs playground (i think) let u select providers

woeful wedge Jun 22, 2024, 8:49 PM

#

How do you select it in ST?

woeful mist Jun 22, 2024, 8:49 PM

#

Lepton went to total shit. Notiva seems less affected by the issues but still handling its context very poorly for wiz

woeful mist Jun 22, 2024, 8:49 PM

#

woeful wedge How do you select it in ST?

I dont use ST and never have so idk, i just know thry added that feature not too long ago.

fiery oxide Jun 22, 2024, 8:53 PM

#

woeful wedge How do you select it in ST?

pulsar edge Jun 24, 2024, 8:46 AM

#

Novita version is trash

surreal anchor Jun 24, 2024, 8:46 AM

#

😔

sonic merlin Jun 24, 2024, 8:48 AM

#

pulsar edge Novita version is trash

I've excluded Novita for this model for some time now, gave too many short, broken answers. Infermatic seems to be much more reliable, though it produces 504 responses quite often.

pulsar edge Jun 24, 2024, 8:49 AM

#

sonic merlin I've excluded Novita for this model for some time now, gave too many short, brok...

I agree completely. Too many broken words, grammar errors, language mix-ups, etc...

tribal dagger Jun 24, 2024, 9:01 AM

#

wait until SillyTavern update infermatic at a provider (anyway, is there a way to update lastest provider through OR API?)

sonic merlin Jun 24, 2024, 9:02 AM

#

tribal dagger wait until SillyTavern update infermatic at a provider (anyway, is there a way t...

The staging branch of SillyTavern has this fix already.

tribal dagger Jun 24, 2024, 9:08 AM

#

what, still don't see it

#

just pull staging ver yesterday

sonic merlin Jun 24, 2024, 9:14 AM

#

tribal dagger just pull staging ver yesterday

It's there -> https://github.com/SillyTavern/SillyTavern/blob/staging/public/scripts/textgen-models.js#L43

#

Since 2 days -> https://github.com/SillyTavern/SillyTavern/commit/473e11c773de10685022afc7a9ba17cc263407b0

tribal dagger Jun 24, 2024, 9:24 AM

#

noice, grinding Infermatic for now

tribal dagger Jun 24, 2024, 12:13 PM

#

sonic merlin It's there -> https://github.com/SillyTavern/SillyTavern/blob/staging/public/scr...

umm...i clone it and still don't see it

sonic merlin Jun 24, 2024, 12:15 PM

#

tribal dagger umm...i clone it and still don't see it

Ask in the SillyTavern Discord -> https://discord.gg/sillytavern

tribal dagger Jun 24, 2024, 12:18 PM

#

sonic merlin Ask in the SillyTavern Discord -> https://discord.gg/sillytavern

can u use that provider yet?

sonic merlin Jun 24, 2024, 12:20 PM

#

tribal dagger can u use that provider yet?

Yes.

sonic merlin Jun 24, 2024, 12:21 PM

#

tribal dagger can u use that provider yet?

tribal dagger Jun 24, 2024, 12:21 PM

#

do u know how to clone a git with specific branch?

#

i hate using termux lol

sonic merlin Jun 24, 2024, 12:22 PM

#

tribal dagger do u know how to clone a git with specific branch?

You don't. You clone the repo and then switch to a branch. Ask your favorite LLM for help with git commands, they are really good at this.

tribal dagger Jun 24, 2024, 12:23 PM

#

oh

sonic merlin Jun 24, 2024, 12:30 PM

#

You can also ask your LLM to write a 5k token explanation why git should NOT be used as a distribution tool to end users and send it to ST devs, if you feel like it. I mean, they complain HERE that I should not show a patch because it might generate support on their side. Now I am doing support here for their ill choice of abusing git. /rant 🤯

slim cipher Jun 24, 2024, 12:36 PM

#

I hope you guys can add some features for users to choose providers or some sort in OR like on playground... I don't use Sillytavern tho, which is sad, I only use Venus.

subtle phoenix Jun 24, 2024, 12:36 PM

#

slim cipher I hope you guys can add some features for users to choose providers or some sort...

You should def ping venus/chub devs

tribal dagger Jun 24, 2024, 12:42 PM

#

sonic merlin You don't. You clone the repo and then switch to a branch. Ask your favorite LLM...

oh yeah, the problem is i use this command

git clone -b staging https://github.com/Cohee1207/SillyTavern

And it not work.
instead i just change the username to:

git clone -b staging https://github.com/SillyTavern/SillyTavern

and it work lol, it's not really different but termux is weird as fuck.

pulsar edge Jun 24, 2024, 1:20 PM

#

So much trash............ Sometimes I use Silly Tavern, but when I use other apps, Novita comes first. I have to endure all kinds of alien languages! AAHAHHHHHHHHHH F***

tribal dagger Jun 24, 2024, 1:43 PM

#

pulsar edge So much trash............ Sometimes I use Silly Tavern, but when I use other app...

blame their interface not good enough.

pulsar edge Jun 24, 2024, 7:25 PM

#

tribal dagger blame their interface not good enough.

https://tenor.com/view/cat-kitty-gif-25340141

Tenor

fiery oxide Jun 24, 2024, 7:37 PM

#

@subtle phoenix can seed param be added for Infer? It's supported (bc vLLM supports it)

subtle phoenix Jun 24, 2024, 10:25 PM

#

fiery oxide <@353228093420208131> can `seed` param be added for Infer? It's supported (bc vL...

On it

pulsar edge Jun 25, 2024, 4:39 AM

#

Novita...

tribal dagger Jun 25, 2024, 9:49 AM

#

Sometime Euryale on infermatic so excited that it will write so long until it reached max response (usually just 5-6 paragraphs)

faint olive Jun 25, 2024, 11:43 AM

#

Something weird with Euryale in general. It's like L3, longer contexts can often lead to weird or incoherent stuff.

I mean like 4+ turns.

However sometimes it works fine?

It feels quite random albeit it's probably that there are some things that aren't in the training dataset and the model forgot how to handle it.

woeful wedge Jun 26, 2024, 5:53 AM

#

I had some good success with Divine Intellect preset in ST. It got a good share more intelligent ans coherent, some char cards felt more alive too, though It still prefers a good instruction or two to build off of for best results.

hazy rock Jun 26, 2024, 8:04 PM

#

Tried Euryale but it does not seem to me to be a model on the same level as WizardLM-2 8x22B, which I find to be smarter and better at following instructions.
I also tried a group chat and WizardLM-2 8x22B did not miss a beat, Euryale sometimes gets confused and strange tags and characters appear from time to time.

tribal dagger Jun 27, 2024, 1:13 PM

#

There's something wrong with Euryale, it's worse than when first brought to the OR

pulsar edge Jun 27, 2024, 3:28 PM

#

Every word breaks.

fiery oxide Jun 27, 2024, 3:35 PM

#

unable to reproduce, neither through OR or directly through Infer
Coherent and decent quality for me

#

Infer didn't change anything about this model

errant vault Jun 28, 2024, 3:52 AM

#

How much money should I put towards euryale for it to last long

surreal anchor Jun 28, 2024, 5:47 AM

#

1 million

#

😈

errant vault Jun 28, 2024, 2:08 PM

#

😭😭

fiery oxide Jun 30, 2024, 9:14 AM

#

errant vault How much money should I put towards euryale for it to last long

wdym Euryale isn't going anywhere, it's doing quite well in terms of traffic rn

errant vault Jun 30, 2024, 9:21 AM

#

fiery oxide wdym Euryale isn't going anywhere, it's doing quite well in terms of traffic rn

Do you know some good setting for euryale

fiery oxide Jun 30, 2024, 9:21 AM

#

errant vault Do you know some good setting for euryale

Try those: https://discord.com/channels/1115287912385351730/1253005075064819844

errant vault Jun 30, 2024, 9:24 AM

#

fiery oxide Try those: https://discord.com/channels/1115287912385351730/1253005075064819844

It says I don’t have access to the link

fiery oxide Jun 30, 2024, 9:25 AM

#

errant vault It says I don’t have access to the link

Oh it's on Infermatic's server, you'll need to join it to see
link to server is on https://infermatic.ai/

#

(automod didn't let me send discord link directly lmfao)

errant vault Jun 30, 2024, 9:30 AM

#

Do I have to create an account. First?

fiery oxide Jun 30, 2024, 9:30 AM

#

errant vault Do I have to create an account. First?

nah, just click on join Discord

errant vault Jun 30, 2024, 9:30 AM

#

Ever time I click on it it brings me back to this server

#

I got it

#

Thankssssss

fiery oxide Jun 30, 2024, 9:31 AM

#

errant vault Ever time I click on it it brings me back to this server

https://discord.com/invite/9GUXmDx9GF

#

oh automod now lets me lmao

errant vault Jul 1, 2024, 7:43 AM

#

Did eurayle get the price raised

sonic merlin Jul 1, 2024, 7:50 AM

#

Not recently (as in the last days), see here for all changes I recorded -> https://orw.karleo.net/model?id=sao10k/l3-euryale-70b

devout crow Jul 2, 2024, 3:25 AM

#

is there a way to fix euryale following the example dialogue a little too much? if an example dialogue in ST is short, every single reply will be the same length unless i delete it entirely, which isn't really ideal since example messages are pretty important

#

even if the intro message is long as shit, it'll just compress the reply length based on the example message, its really obnoxious honestly

livid violet Jul 2, 2024, 4:25 PM

#

I'm only able to push 8k tokens into the prompt no matter what settings I use in ST, even though on the page it says it has 16k context. Any ideas why?

#

My activity shows 8k context use in every single prompt

#

is this because this model is roped to increase context size?

devout crow Jul 2, 2024, 5:17 PM

#

livid violet I'm only able to push 8k tokens into the prompt no matter what settings I use in...

have you unlocked the max token counter? i was able to get it to 9.8k tokens before i had to stop

livid violet Jul 2, 2024, 5:22 PM

#

devout crow have you unlocked the max token counter? i was able to get it to 9.8k tokens bef...

Well, yeah. ST even tries to push the full prompt into the API as shown here

#

but activity shows something like this

#

makes me wonder, if OR cuts my prompt in half

devout crow Jul 2, 2024, 5:23 PM

#

yeahhhhhhhh i saw this myself idk myself and many others brought this up and it wasnt addressed i think

#

so idk

#

kind of annoying, i think this has been a problem for 1.5 weeks

livid violet Jul 2, 2024, 5:24 PM

#

Oh well, back to claude for the time being if that's the case.

devout crow Jul 2, 2024, 5:37 PM

#

yeah im going back to wizard

errant vault Jul 3, 2024, 6:40 AM

#

Do eurayle read example dialogues

tribal dagger Jul 3, 2024, 7:46 AM

#

errant vault Do eurayle read example dialogues

what you mean? if you send then Euryale will read. the point here is the model smart enough to not rely too much on it.

errant vault Jul 3, 2024, 2:01 PM

#

Is there a specific way to make character cards for eurayle

#

So the model can understand it better

surreal anchor Jul 4, 2024, 10:24 AM

#

If the context problem ever gets fixed, this'll probably be my main model, as I really like it 😄

tribal dagger Jul 6, 2024, 2:52 PM

#

Command R+ no better than Euryale in roleplay in my opinion. But cmd R+ follow instruction far better.

So yeah, i gonna go back to Cmd R+. Cuz while Euryale strugle to follow a simple instruction of writing 3 paragraphs long (sometime it write longer or shorter. Even use last prefix of promt instruct), Cmd R+ can follow it well and see no issuse.

pulsar edge Jul 7, 2024, 12:02 PM

#

tribal dagger Command R+ no better than Euryale in roleplay in my opinion. But cmd R+ follow i...

I believe that the dataset for RP had an impact on the intelligence of the model.

errant vault Jul 9, 2024, 1:48 AM

#

Is eurayle good with group chat

errant vault Jul 10, 2024, 12:59 AM

#

devout crow is there a way to fix euryale following the example dialogue a little too much? ...

Mine is doing it too

strong tusk Jul 12, 2024, 11:53 PM

#

I am getting random garbage with this model speaking crap that isnt english filled with symbols

sonic merlin Jul 13, 2024, 4:09 AM

#

strong tusk I am getting random garbage with this model speaking crap that isnt english fill...

Set your accepted provider list to "Infermatic" only, "NovitaAI" uses a crappy quantized version of this model that likes to produce garbage. Worked for me the last weeks (and accidentally tested several times as SillyTavern resets the provider list on every reload)

civic shore Jul 13, 2024, 3:38 PM

#

strong tusk I am getting random garbage with this model speaking crap that isnt english fill...

Besides the other great answer, lower the temperature to ~0.8

sonic merlin Jul 13, 2024, 4:02 PM

#

civic shore Besides the other great answer, lower the temperature to ~0.8

With Infermatic as your provider you can push this model easily to temperature 1.25 + MinP 0.1. Zero (total) garbage for me in the last few hundred generations with these settings (h/t Auri, scroll way up for several discussions of these settings)

woeful wedge Jul 13, 2024, 9:51 PM

#

How does one set up Infermatic as the only one allowed in ST? It is not on the list

#

I checked my activity and with nothing specified I see that I get both Infermatic and Novita mixed in

fiery oxide Jul 13, 2024, 10:10 PM

#

woeful wedge How does one set up Infermatic as the only one allowed in ST? It is not on the l...

you'll need to update ST, it was added to the provider list semi-recently

woeful wedge Jul 13, 2024, 10:14 PM

#

That did the trick, thank you

woeful wedge Jul 15, 2024, 9:12 PM

#

Is infermatic having issues?

#

Can't connect to the model

idle grotto Jul 15, 2024, 11:01 PM

#

it was, but they're rolling back a change and fixing it

atomic mountain Jul 20, 2024, 12:11 PM

#

seeing a huge increase in nonsense responses without changing any parameters, doesn't seem to be affecting any other models that I can tell

sonic merlin Jul 20, 2024, 12:15 PM

#

atomic mountain seeing a huge increase in nonsense responses without changing any parameters, do...

You might want read through this -> #arc-feedback message

atomic mountain Jul 20, 2024, 12:17 PM

#

ah alright, will do

sonic merlin Jul 20, 2024, 12:17 PM

#

TLDR; If you use a SillyTavern and have set Infermatic as your sole provider you now need set a new flag, it's already in ST -> #arc-feedback message

#

Otherwise thanks to this feature -> #announcements message the other provider, which quantize, will get sent requests too and return that garbage

atomic mountain Jul 20, 2024, 12:50 PM

#

All sorted now, thanks for your help!

woeful mist Jul 24, 2024, 5:47 AM

#

devout crow is there a way to fix euryale following the example dialogue a little too much? ...

Silly idea but maybe make your example messages longer then? I make mine exactly how i want my responses to be in every way possible, from formatting to length to char personality and vocabulary - typically i start by having the model sorta make them then heavily edit them.

And i do believe on ST they are temp tokens, so once context fills they drop and ur chat hist becomes the new examples.

devout crow Jul 24, 2024, 5:49 AM

#

i usually do that but some days i cant be bothered

fiery oxide Jul 31, 2024, 12:25 PM

#

Infermatic drops Euryale's precision to fp8 (dynamic activation)

fiery oxide Jul 31, 2024, 12:52 PM

#

Infermatic's team, community and me personally tested dynamic fp8 quantization on vLLM and found quality degradation to be minimal, pretty much invisible.
Though, if you experience major output quality degradation, please report it to me, I will pass it on to Infermatic's team

#

@idle grotto can you please mark Infermatic's endpoint as fp8?

sonic merlin Jul 31, 2024, 1:05 PM

#

fiery oxide Infermatic's team, community and me personally tested dynamic fp8 quantization o...

I cannot confirm that the degradation is minimal or even invisible. Instead of following instructions and producing long outputs, the same cards produce now superficial and short replies, without changing anything. I cannot test this deeply (only ~10 generations), as I have no time for this now, but I know, when I have time again, I'll have to look for another preferred model. This does not work for me anymore.

fiery oxide Jul 31, 2024, 4:48 PM

#

Apparently FP8 quant we used was a static one, Svak is making dynamic one right now
Morale: never trust fp8 quants on HF, we will make our own in the future
Euryale is the only model recieving reports of degraded quality, Daybreak and Magnum are a-OK (both use first-party dynamic fp8 quants, made by me)

fiery oxide Jul 31, 2024, 7:07 PM

#

should be fixed now

sonic merlin Jul 31, 2024, 7:13 PM

#

fiery oxide should be fixed now

Seems much better now (quick 2 generation test)

fiery oxide Jul 31, 2024, 7:14 PM

#

sonic merlin Seems much better now (quick 2 generation test)

Glad to hear that! Really sorry for inconvenience

sonic merlin Jul 31, 2024, 7:15 PM

#

fiery oxide Glad to hear that! Really sorry for inconvenience

No problem, good that this model exist in this quality, it's a bit of a gem, hopefully others can enjoy it as much as I do (when I have some time).

fiery oxide Jul 31, 2024, 7:15 PM

#

Tbh, fp8 is not ideal, but 60s+ latency was becoming too much, Infermatic just has limited amount of resources compared to bigger providers

sonic merlin Jul 31, 2024, 7:16 PM

#

fiery oxide Tbh, fp8 is not ideal, but 60s+ latency was becoming too much, Infermatic just h...

I think you are absolutely right in principle, for inference FP8 should not matter much, just that it did not seems to work for some reason.

fiery oxide Jul 31, 2024, 7:17 PM

#

sonic merlin I think you are absolutely right in principle, for inference FP8 should not matt...

I think calibrating stuff on 512 rows from Ultrachat-2k dataset is very far from ideal
Dynamic might be a bit slower and bigger, but provides more consistent quality

hot crane Aug 3, 2024, 9:23 AM

#

fiery oxide Apparently FP8 quant we used was a static one, Svak is making dynamic one right ...

How do you dynamic quant FP8? Is it using nvammo or something different?

errant vault Aug 3, 2024, 4:39 PM

#

fiery oxide Infermatic drops Euryale's precision to fp8 (dynamic activation)

What does FP8 mean?

sonic merlin Aug 3, 2024, 4:41 PM

#

errant vault What does FP8 mean?

Precision of (most) of the weights / quantization, FP8 = floating-point 8 bit, FP16/BF16 16bit, see also -> https://en.wikipedia.org/wiki/Minifloat

Minifloat

In computing, minifloats are floating-point values represented with very few bits. Predictably, they are not well suited for general-purpose numerical calculations. They are used for special purposes such as

Computer graphics, where iterations are small and precision has aesthetic effects.
Machine learning, which can be relatively insensitive t...

errant vault Aug 3, 2024, 4:42 PM

#

sonic merlin Precision of (most) of the weights / quantization, FP8 = floating-point 8 bit, F...

I’m so sorry but does this affect role playing?

sonic merlin Aug 3, 2024, 4:44 PM

#

errant vault I’m so sorry but does this affect role playing?

It shouldn't much, but other parameters like very high temperature may increase this small effect so it becomes noticeable, also some inference engines seem to have trouble with FP8, apparently

#

From math perspective high precision (16/32 bit) is only necessary for training, where weights gets accumulated, but during inference most of this precision doesn't matter as high values are more important than tiny fractions for results. Those high values get preserved during quantization, so that even 4 bit weights still work pretty well.

fiery oxide Aug 3, 2024, 7:58 PM

#

hot crane How do you dynamic quant FP8? Is it using nvammo or something different?

It was made using AutoFP8
vLLM docs have instructions how to do it
https://docs.vllm.ai/en/stable/quantization/fp8.html

fiery oxide Aug 12, 2024, 6:36 PM

#

@subtle phoenix why max output on Infermatic's endpoint is set to 8192?
model is still RoPEd to 16348

subtle phoenix Aug 12, 2024, 10:46 PM

#

fiery oxide <@353228093420208131> why max output on Infermatic's endpoint is set to 8192? mo...

Thanks for the flag - just pushed an update to fix this. I refactored the context thingy recently to clear up some tech debts

faint olive Aug 13, 2024, 8:06 PM

#

Hehe I know the feeling

#

I personally end up rewriting entire codebase 💀 xD

tribal dagger Aug 25, 2024, 3:31 PM

#

V2.2?

grand marsh Aug 25, 2024, 3:57 PM

#

tribal dagger V2.2?

It dropped

https://huggingface.co/Sao10K/L3.1-70B-Euryale-v2.2

Sao10K/L3.1-70B-Euryale-v2.2 · Hugging Face

woeful wedge Aug 25, 2024, 7:19 PM

#

grand marsh It dropped https://huggingface.co/Sao10K/L3.1-70B-Euryale-v2.2

Hell yeah, we need this

#

My wallet's ready

woeful wedge Aug 25, 2024, 8:10 PM

#

@subtle phoenix Can we expect for OR to pick it up in the nearest future?

subtle phoenix Aug 25, 2024, 8:11 PM

#

woeful wedge <@353228093420208131> Can we expect for OR to pick it up in the nearest future?

is it on featherless yet?

fiery oxide Aug 25, 2024, 8:32 PM

#

subtle phoenix is it on featherless yet?

I requested it, likely it will arrive soon

#

Maybe we can also make a poll on Infer to update 2.1 to 2.2

#

imo 2.2 is a major improvement

woeful wedge Aug 25, 2024, 8:49 PM

#

fiery oxide imo 2.2 is a major improvement

That's very promising to hear. 2.1 was already awesome.

fiery oxide Aug 26, 2024, 12:29 AM

#

About Infer - I should have some news tomorrow, 2.2 was very warmly received on community cloud

#

so it might be either polled or just swapped

#

but people seem to want it over 2.1

woeful wedge Aug 26, 2024, 10:16 PM

#

fiery oxide About Infer - I should have some news tomorrow, 2.2 was very warmly received on ...

Any news?

simple smelt Aug 27, 2024, 6:23 AM

#

fiery oxide About Infer - I should have some news tomorrow, 2.2 was very warmly received on ...

🤔

fiery oxide Aug 27, 2024, 6:25 AM

#

Seems to be no news atm, which is bit weird
Svak told me that either a poll or swap should have been on monday, along swap from v1 to v2 for Magnum
Neither happened yet

#

Should be tomorrow

sonic merlin Aug 27, 2024, 6:27 AM

#

fiery oxide Seems to be no news atm, which is bit weird Svak told me that either a poll or s...

Technically there are ~32 minutes left in Monday in California

simple smelt Aug 27, 2024, 3:07 PM

#

so look like it's on Infer?

subtle phoenix Aug 27, 2024, 4:25 PM

#

simple smelt so look like it's on Infer?

yup, updating it very soon

#

I got swarmed by some other stuffs xd

woeful wedge Aug 27, 2024, 5:11 PM

#

Does anyone know if any provider will do the FP16 quant of the model or is the loss minimal on FP8?

visual oasis Aug 27, 2024, 5:14 PM

#

i can barely notice any different between FP4 and FP8. FP8 is enough

devout crow Aug 27, 2024, 6:04 PM

#

holy moly, euryale update

surreal anchor Aug 27, 2024, 6:40 PM

#

😍

woeful wedge Aug 27, 2024, 7:39 PM

#

Hope it's a few hours away 🙏

subtle phoenix Aug 27, 2024, 7:40 PM

#

It's up: https://openrouter.ai/models/sao10k/l3.1-euryale-70b

Llama 3.1 Euryale 70B v2.2 - API, Providers, Stats

Euryale L3.1 70B v2. Run Llama 3.1 Euryale 70B v2.2 with API

errant vault Aug 28, 2024, 1:05 PM

#

Is it uncensored

tall mirage Aug 28, 2024, 1:23 PM

#

As far as I can tell, but there's still a slight bit of positivity bias like you see in other llama 3 models. It's probably the best one based on it I've tried so far though.

sonic merlin Aug 28, 2024, 1:25 PM

#

tall mirage As far as I can tell, but there's still a slight bit of positivity bias like you...

More so than the previous Llama 3 / Euryale 2.1 IMHO, also I have seen very short (e.g. one sentence responses) to requests from this new model, while the old model would go on without problems. Without getting nailed I'd say this is a bit more restricted, but I would not call it "censored".

#

This model will certainly be a fine companion for most role play settings.

errant vault Aug 28, 2024, 1:29 PM

#

So it’s less hornh?

#

Horny?

tall mirage Aug 28, 2024, 1:29 PM

#

Off topic but I don't know what they did to make Mistral Nemo 12B Starcannon so good or just because it's the bf16 quant being hosted, but if they can do that with a larger parameter model so it's smarter, we will be eating good.

tall mirage Aug 28, 2024, 1:30 PM

#

errant vault So it’s less hornh?

From my experience it won't push into that direction randomly like some of the other models, but can handle it okay.

sonic merlin Aug 28, 2024, 1:30 PM

#

errant vault So it’s less hornh?

It will do ERP, no problem.

#

But I'd recommend to generate a few replies to the same request with this version and the old (if it is still available) to get a feel how they are a bit different.

woeful wedge Aug 28, 2024, 1:39 PM

#

Euryale 2.2 can cook. Love it

#

And the cut off problem of 2.1 seems to be gone. It's pumping word afrer word lile there's no tommorow.

iron drift Aug 28, 2024, 2:11 PM

#

Is there any reason why it is at 8k context? It's 16k on Infer itself

sonic merlin Aug 28, 2024, 2:13 PM

#

8k is what the model spec is suggesting, provider can offer more (sometimes less) e.g. through RoPE tricks, the real context window is in the provider tab as max output.

stray flame Aug 28, 2024, 2:45 PM

#

The update feels like such a downgrade, feels like its much harder to get decent responses now that incorporate good dialogue

#

Sadge

subtle phoenix Aug 28, 2024, 2:46 PM

#

stray flame The update feels like such a downgrade, feels like its much harder to get decent...

the older version is still available fyi

stray flame Aug 28, 2024, 2:50 PM

#

I remember there were a couple different providers for it before, Infermatic being the better one and the others using a quantized version ThinkDrooling

#

#

Are either of these comparable to Infermatic on 2.1 previously?

tall mirage Aug 28, 2024, 4:19 PM

#

stray flame I remember there were a couple different providers for it before, Infermatic bei...

The old one hosted by informatic was also fp8 IIRC.

stray flame Aug 28, 2024, 4:20 PM

#

tall mirage The old one hosted by informatic was also fp8 IIRC.

Noted ty

sinful sparrow Aug 28, 2024, 4:21 PM

#

tall mirage The old one hosted by informatic was also fp8 IIRC.

Quantized differently though, Dynamic quantization hurts models' performance less

#

(If at all, it was unnoticeable from tests)

errant vault Aug 28, 2024, 4:35 PM

#

I’m liking starcannon

tall mirage Aug 28, 2024, 11:55 PM

#

sinful sparrow Quantized differently though, Dynamic quantization hurts models' performance les...

TIL

woeful wedge Sep 13, 2024, 9:29 PM

#

Has anything been done to Euryale for Llama 3.1 hosted by infermatic? Is it corelated to the massive price drop? It performed quite awfully in recent gens. Gibberish generations, lacking creativity, endless adjectives thrown at you with no real coherence. Settings were untouched, just inexplicable stream of bad generations.

#

If performance was sacrificed to reduce cost, I'd much rather pay more for a sane, consistent model.

visual oasis Sep 13, 2024, 10:04 PM

#

https://huggingface.co/Sao10K/L3.1-70B-Hanami-x1/

Hope someone adds this to OR. Successor of euryale 2.2

Sao10K/L3.1-70B-Hanami-x1 · Hugging Face

woeful wedge Sep 13, 2024, 10:07 PM

#

Very barebones description, but anything Euryale related is usually good.

subtle phoenix Sep 13, 2024, 10:22 PM

#

woeful wedge Has anything been done to Euryale for Llama 3.1 hosted by infermatic? Is it core...

Check what you're using for rep pen and freqpen

unborn monolith Sep 13, 2024, 10:23 PM

#

idk he said it was untouched

atomic mountain Sep 13, 2024, 10:23 PM

#

DeepInfra seems to be churning out garbage, for both 2.1 and 2.2

#

other providers on both 2.1 and 2 2 seem fine

woeful wedge Sep 13, 2024, 10:24 PM

#

subtle phoenix Check what you're using for rep pen and freqpen

1.1 and 0.9 now. Used to be at 1.55 and 0.9, but again, before introduction of deepinfra (Which was today?) It worked perfectly fine.

subtle phoenix Sep 13, 2024, 10:24 PM

#

atomic mountain other providers on both 2.1 and 2 2 seem fine

With the exact same parameter?

subtle phoenix Sep 13, 2024, 10:24 PM

#

woeful wedge 1.1 and 0.9 now. Used to be at 1.55 and 0.9, but again, before introduction of d...

oh wow

#

kk will derank deepinfra

#

and ping them

atomic mountain Sep 13, 2024, 10:25 PM

#

yeah exact same everything

woeful wedge Sep 13, 2024, 10:26 PM

#

subtle phoenix kk will derank deepinfra

Weird thing is though that OR lists Infermatic as provider for these broken messages even thoug I've never observed a behavior like that from Infermatic's 2.2

#

Not until the price drop announcement

subtle phoenix Sep 13, 2024, 10:26 PM

#

woeful wedge Weird thing is though that OR lists Infermatic as provider for these broken mess...

that's a bug

#

I think internally it's prob Deepinfra serving it

#

but weirdly...

#

shouldn't deepinfra be the 1st endpoint it try?....

#

(so it should log deepinfra regardless...)

#

There's a bug I'm trying to track regarding how fallback providers are not being logged properly

#

but... if it's Deepinfra serving the model and that it's the 1st host.... it should have been logged, NOT infermatic xd

#

ugh....

woeful wedge Sep 13, 2024, 10:27 PM

#

subtle phoenix but... if it's Deepinfra serving the model and that it's the 1st host.... it sho...

But I'm not allowing fallbacks, I've unchecked it on ST. Will it still somehow go for Deepinfra anyways?

subtle phoenix Sep 13, 2024, 10:27 PM

#

woeful wedge But I'm not allowing fallbacks, I've unchecked it on ST. Will it still somehow g...

ooh so no falback at all?

woeful wedge Sep 13, 2024, 10:27 PM

#

Nope

subtle phoenix Sep 13, 2024, 10:28 PM

#

then yeah I'm pretty sure your request hit Infermatic

subtle phoenix Sep 13, 2024, 10:29 PM

#

woeful wedge Has anything been done to Euryale for Llama 3.1 hosted by infermatic? Is it core...

This happens only after the price drops right?

#

I double checked our commit history - the last refactor to the endpoint filtering system was ~4 days ago

woeful wedge Sep 13, 2024, 10:29 PM

#

Yeah. Whatever was done since that announcement, somehow resulted in whatever is happening now

atomic mountain Sep 13, 2024, 10:29 PM

#

I have novita and infermatic enabled for .1 and .2 respectively and everything works, as soon as I enable DeepInfra it all goes to hell

#

I'll admit I don't know much about the technical side but I think something similar happened in the past when a provider was quantizing prompts?

woeful wedge Sep 13, 2024, 10:29 PM

#

Yesterday I wqs using it quite a lot at 1.55 rep pen and it worked swimmingly

#

The typical good Euryale stuff on all cards

#

I think Deepinfra shouls be removed. It's broken as hell and it's possible that somehow it's causing this too.

subtle phoenix Sep 13, 2024, 10:34 PM

#

kk DeepInfra should be deranked now

unborn monolith Sep 13, 2024, 10:38 PM

#

but anyways, isnt 2.2 just poorly recieved compared to 2.1 in general?

woeful wedge Sep 13, 2024, 10:40 PM

#

subtle phoenix kk DeepInfra should be deranked now

Can still send requests to it through ST

subtle phoenix Sep 13, 2024, 10:41 PM

#

woeful wedge Can still send requests to it through ST

yeah

#

it's only "deranked" - meaning if you call the model without a specified provider, it will not be picked as a candidate

woeful wedge Sep 13, 2024, 10:43 PM

#

Aha, I see. So, I'm seeing that somehow Text Completion is causing issues, but I have no idea why. Chat Completion 2.2 from Infermatic works good

#

In text completion it either does not respond or is utter garbage

bold sphinx Sep 14, 2024, 5:11 AM

#

unborn monolith but anyways, isnt 2.2 just poorly recieved compared to 2.1 in general?

Not on Infermatic's side? Users there prefer 2.2

#

But people have varying opinions so

unborn monolith Sep 14, 2024, 5:12 AM

#

the man himself

#

sorry brother. I was going off the ugi leaderboard

woeful mist Sep 30, 2024, 5:25 PM

#

If they dont respond maybe @fiery oxide can shed some light, i know he was a fan of this model.

iron drift Sep 30, 2024, 6:00 PM

#

I have been using Euryale 2.2 with 1.17 temp, 0.075 minp and 1.05 rep pen

fiery oxide Oct 1, 2024, 2:15 AM

#

I've noticed right away, even before Infer officially picked this model up, when it was on community cloud, that Euryale 2.2 prefers lower temp than 2.1

#Sao10K/L3-70B-Euryale-v2.1