Limited outputs and cut offs | OpenRouter | Page 1

copper knot Sep 23, 2023, 3:30 PM

#

Hi, Im using openrouter with venus chub but with ANY model, it gets cut off at about 200tokens. What do I do??

tawny acorn Sep 23, 2023, 10:02 PM

#

Does venus has an option to increase the max_tokens? Likely an advanced params

copper knot Sep 24, 2023, 1:14 AM

#

tawny acorn Does venus has an option to increase the max_tokens? Likely an advanced params

I did increase it but doesnt work😭😭

tawny acorn Sep 24, 2023, 1:33 AM

#

copper knot I did increase it but doesnt work😭😭

How much did you increase it to BTW?

copper knot Sep 24, 2023, 1:33 AM

#

tawny acorn How much did you increase it to BTW?

I tried 500, and all the way up to 1000

tawny acorn Sep 24, 2023, 1:34 AM

#

and it still output only about 200 tokens??

#

I tried with this, got 314 tokens:

copper knot Sep 24, 2023, 1:35 AM

#

tawny acorn I tried with this, got 314 tokens:

Wait now im getting this💀

#

tawny acorn Sep 24, 2023, 1:36 AM

#

Tried again, and got 500-ish tokens

#

Yeah I think something's wrong on chub + openrouter integration - everyone's seeing it

copper knot Sep 24, 2023, 1:37 AM

#

Hmm okay I’ll tell the chub discord about it

copper knot Sep 24, 2023, 4:51 AM

#

tawny acorn Yeah I think something's wrong on chub + openrouter integration - everyone's see...

they said it seems like a problem on openrouter's side- any updates on this error?

tawny acorn Sep 24, 2023, 5:30 AM

#

copper knot they said it seems like a problem on openrouter's side- any updates on this erro...

Can you try it again

#

Just pushed a hotfix that might fixed this

copper knot Sep 24, 2023, 5:32 AM

#

tawny acorn Can you try it again

Working again!!

#

thank youuu

tawny acorn Sep 24, 2023, 5:32 AM

#

Awesome!

copper knot Sep 24, 2023, 5:36 AM

#

tawny acorn Awesome!

buuut the cutoffs aren't fixed yet, but it's fine!

tawny acorn Sep 24, 2023, 5:55 AM

#

@copper knot for the cutoff - it's weird. Can you ask what venus is passing down for the max_tokens?

copper knot Sep 24, 2023, 5:56 AM

#

tawny acorn <@752786549632008283> for the cutoff - it's weird. Can you ask what venus is pas...

wdym?

tawny acorn Sep 24, 2023, 5:56 AM

#

Do they have a slider somewhere that let you configure the max amount of token generated?

copper knot Sep 24, 2023, 5:56 AM

#

tawny acorn Do they have a slider somewhere that let you configure the max amount of token g...

yup

tawny acorn Sep 24, 2023, 5:56 AM

#

By default, if they pass nothing down, we default to 256

#

But I'm not sure what chub's actually passing down

copper knot Sep 24, 2023, 5:58 AM

#

tawny acorn By default, if they pass nothing down, we default to 256

defult is 0, which means unlimited

#

i think thats what you mean? right?

tawny acorn Sep 24, 2023, 6:05 AM

#

copper knot defult is 0, which means unlimited

0 will be normalized to 1 tokens (if that's what they are sending) btw - it has to be "undefined" or set to a valid number

copper knot Sep 24, 2023, 6:06 AM

#

tawny acorn 0 will be normalized to 1 tokens (if that's what they are sending) btw - it has ...

right now, Im moving the slider up and down but nothing seems to work so...idk what the error issss😔

tawny acorn Sep 24, 2023, 6:07 AM

#

Can you try debug the network tab? I.e, press F12, look at the network tab

#

can you send me a screenshot of what Chub's sending OpenRouter

copper knot Sep 24, 2023, 6:07 AM

#

tawny acorn Can you try debug the network tab? I.e, press F12, look at the network tab

does this work the same on mac?

#

im using mac rn

tawny acorn Sep 24, 2023, 6:08 AM

#

Yeah f12 should open up inspector for the venus web app

#

then, navigate to the network tab

#

then try making a request

#

you should see a new item in the network - one of them should be openrouter's completion endpoint

#

check what chub's sending us for max_tokens

copper knot Sep 24, 2023, 6:09 AM

#

tawny acorn Sep 24, 2023, 6:09 AM

#

that looks right to me lol

#

how cut-off is it btw? Like it doesn't finish all the way?

#

Try asking it to give you a long essay and see if it's actually "cut-off" or if your prompt can only generate that much.

copper knot Sep 24, 2023, 6:12 AM

#

tawny acorn how cut-off is it btw? Like it doesn't finish all the way?

like it stops mid sentence after about two paragraphs

#

consistently

#

also, I checked the payload to messages, and noticed that every model only remembers about 20 messages

#

no matter how much context you have

#

my prompt isn't that long so..

tawny acorn Sep 24, 2023, 6:15 AM

#

copper knot also, I checked the payload to messages, and noticed that every model only remem...

That's chub's own configuration - it's what they send our API

pine plover Sep 24, 2023, 6:15 AM

#

copper knot like it stops mid sentence after about two paragraphs

Hi, which model are you using? Cuz I think that the same happens to me.

copper knot Sep 24, 2023, 6:16 AM

#

pine plover Hi, which model are you using? Cuz I think that the same happens to me.

i think its the same for every model

#

at least for me

eternal cedar Sep 24, 2023, 6:16 AM

#

Wow I came on here with the exact same question

#

No matter the model I use, it always gets cut off early. Maybe after about 200 words. Same thing every model in playground

tawny acorn Sep 24, 2023, 6:17 AM

#

@eternal cedar on playground, you will need to set the max_tokens up to see it do more:

pine plover Sep 24, 2023, 6:17 AM

#

Because it sometimes automatically switches your model from the filtered to the unfiltered one. Mythomax in my case, and this model specifically cuts off any answers unless you play with the max tokens settings

#

Maybe you could check in your activity in open router?

tawny acorn Sep 24, 2023, 6:18 AM

#

eternal cedar No matter the model I use, it always gets cut off early. Maybe after about 200 ...

Bump that to 1000 and you will find it's up

tawny acorn Sep 24, 2023, 6:18 AM

#

pine plover Because it sometimes automatically switches your model from the filtered to the ...

Oooh that's likely the case yeah

pine plover Sep 24, 2023, 6:18 AM

#

Even if you choose one model, it might sometime switch to another

tawny acorn Sep 24, 2023, 6:19 AM

#

Try using Mythalion

pine plover Sep 24, 2023, 6:20 AM

#

copper knot i think its the same for every model

Just kind of check if the model you chose in venus is the same that generated the answer in your activity tab in open router

copper knot Sep 24, 2023, 6:20 AM

#

pine plover Just kind of check if the model you chose in venus is the same that generated th...

you mean the default model?

copper knot Sep 24, 2023, 6:21 AM

#

pine plover Just kind of check if the model you chose in venus is the same that generated th...

OHHH

#

bruh

#

its all mythomax

pine plover Sep 24, 2023, 6:21 AM

#

Oh yeah

copper knot Sep 24, 2023, 6:21 AM

#

wait how

pine plover Sep 24, 2023, 6:21 AM

#

That’s the case

tawny acorn Sep 24, 2023, 6:22 AM

#

because something in the prompt was marked as nsfw/flagged by your original model

pine plover Sep 24, 2023, 6:22 AM

#

It usually switches to Mythomax if you do nsfw or anything close to that

tawny acorn Sep 24, 2023, 6:22 AM

#

^ this will make it fallback to mythomax yeah

copper knot Sep 24, 2023, 6:23 AM

#

damn

#

i mean I just want the cutoff to be fixed

#

desperately

pine plover Sep 24, 2023, 6:23 AM

#

Then it’s the model’s issue unfortunately 😔

#

I usually increase the max tokens to 500-600 with Mythomax, and the cutoffs are usually rare

tawny acorn Sep 24, 2023, 6:24 AM

#

pine plover I usually increase the max tokens to 500-600 with Mythomax, and the cutoffs are ...

You can't go above 250 with Mythomax tho

pine plover Sep 24, 2023, 6:25 AM

#

Oh

tawny acorn Sep 24, 2023, 6:25 AM

#

But with Mythalion you should be able to I think

pine plover Sep 24, 2023, 6:25 AM

#

Weird, but I guess try playing with other models

eternal cedar Sep 24, 2023, 6:26 AM

#

tawny acorn Bump that to 1000 and you will find it's up

Just tried this and it appears to work with GPT4 but thats about it

#

Phind code llama still outputting 200 wordsish
Code llama broke and is only outputting [INST] no matter what it is prompted

#

x.x

tawny acorn Sep 24, 2023, 6:30 AM

#

#

eternal cedar Sep 24, 2023, 6:31 AM

#

Also token count is wildly different in playground vs activity

tawny acorn Sep 24, 2023, 6:31 AM

#

The output is indeed a bit ghetto doh

tawny acorn Sep 24, 2023, 6:32 AM

#

eternal cedar Also token count is wildly different in playground vs activity

yeah - on playground, it's normalized to GPT token, whereas on activity, it's counted by the token native to the model

#

Llama2 family of model uses its own tokenizer, so on the activity you will see llama2 tokens instead of GPT tokens

#

Llama2 vocab size is 32k vs GPT vocab is 100k

eternal cedar Sep 24, 2023, 6:34 AM

#

Codellama is totally broken for me now and only outputs [INST], tried changing tokens back to 300, tried clearing browser cash, tried deleting and making new character

#

But it appears Phind is now able to display over 200 words. Lol

tawny acorn Sep 24, 2023, 6:36 AM

#

What's the last message you sent codellama before it outputs [INST]?

eternal cedar Sep 24, 2023, 6:37 AM

#

Woahhhh okay heres a hint

#

Seems to be something in the browser window, after I tabbed out and back into the playground it truncated the response I got from like 800 characters back down to 200

eternal cedar Sep 24, 2023, 6:37 AM

#

tawny acorn What's the last message you sent codellama before it outputs [INST]?

I changed the max token count

#

"Write a snake game using html and javascript. Only output entire code files"

Was the prompt, but that was unchanged when it broke. Had used the same prompt before

#

Think I figured out the problem, at least with Phind.... It appears to be a browser issue (using brave).

If stay on the window thorugh the entire generation it will generate the whole thing. If I don't, it will truncate down to about 200 words when I go back to the window (but still charge for the full API usage and show token count in activity).

Basically, it appears I have keep the window active during generation or it will fail/truncate.

Once it is generated fully, it stays even if I go elsewhere

tawny acorn Sep 24, 2023, 6:49 AM

#

Pushing a fix to the inactive window issue - should be up in a bit

eternal cedar Sep 24, 2023, 6:51 AM

#

🚀 Sweet! Ty

tawny acorn Sep 24, 2023, 6:58 AM

#

eternal cedar 🚀 Sweet! Ty

should be up now, can you pls try it out

eternal cedar Sep 24, 2023, 7:04 AM

#

tawny acorn should be up now, can you pls try it out

Yeah it doesnt appear to be cutting off anymore, it also seems to display text a lot quicker too

copper knot Sep 24, 2023, 11:45 AM

#

tawny acorn ^ this will make it fallback to mythomax yeah

hey any way to fix this?

#

cuz I paid in order to use gpt 4

#

I tried to delete everything from my bot that is 'nsfw' but its not working..

tawny acorn Sep 24, 2023, 7:27 PM

#

copper knot hey any way to fix this?

The flagging is done by OpenAI so we have no control over it. Try removing jailbreaks and system prompt. (i.e, your character card must also not include nsfw stuff)

copper knot Sep 25, 2023, 1:45 AM

#

pine plover Because it sometimes automatically switches your model from the filtered to the ...

can I just ask how you played with the max token settings?

#

how do I get it to stop cutting off

pine plover Sep 25, 2023, 2:08 AM

#

copper knot can I just ask how you played with the max token settings?

I literally just adjust it every time when I get a cutoff - increase or decrease it. But maybe you could try the new model synthia? I heard it’s pretty good and unfiltered

timber wedge Sep 27, 2023, 12:33 PM

#

I'm confused at the cut off... I tried Pygmalion, MythoMax L2 13B (beta), both got cut off at 250 tokens. Although I setup max_tokens to 8192.

woeful gate Sep 28, 2023, 3:46 PM

#

i have the same problem

tawny acorn Sep 29, 2023, 1:08 AM

#

cc @vast moss

tropic vigil Sep 29, 2023, 3:28 AM

#

can confirm this behavior even though max_tokens are set

velvet parcel Sep 30, 2023, 1:55 AM

#

This happens with most models, with or without max_tokens set. Makes most models unusable for my use case (chatbot). Would really appreciate if it's fixed.

woeful gate Sep 30, 2023, 8:43 AM

#

Yes for me too

velvet parcel Sep 30, 2023, 5:20 PM

#

@vast moss any way to fix this? Or is it a limit of your models? I tried the same model in other platforms (hface, deepinfra) and the limitation isn't present

vast moss Sep 30, 2023, 5:36 PM

#

We’re working on a fix for this @velvet parcel - right now some open source models have an output limitation in order to support extremely long (eg 8k token) prompts.

#

We’re doing a refactor of some internal stuff cc @tawny acorn and then doing this next

velvet parcel Sep 30, 2023, 5:38 PM

#

vast moss We’re working on a fix for this <@183451375571697664> - right now some open sour...

Thanks. Right now the model that I want to use and only outputs out to 250~280 tokens is jondurbin/airoboros-l2-70b (many others do to), and the one that I also use but doesn't have that limitation is nousresearch/nous-hermes-llama2-13b.
Both show the same context length in the docs, but for some reason one is limited and the other isn't.

velvet parcel Sep 30, 2023, 5:38 PM

#

vast moss We’re doing a refactor of some internal stuff cc <@353228093420208131> and then ...

That's good to hear. Any ETA?

vast moss Sep 30, 2023, 5:39 PM

#

Likely a week or so. Just curious, is the deepinfra cost for airoboros higher? I can’t tell from their site

velvet parcel Sep 30, 2023, 5:41 PM

#

I'm just doing some tests with it right now. I'd prefer to only use openrouter. About the price, it shows $0.001 per thousand tokens on the model's page. (though it's not the same version of the model as the one in openrouter)

vast moss Oct 1, 2023, 5:30 PM

#

Will post an update here when we're closer to launching this

#

Limited outputs and cut offs

tawny acorn Oct 2, 2023, 9:33 PM

#

@here the cutoff for Mythomax should now be resolved. Airoboros is not fixed yet.

tropic vigil Oct 2, 2023, 9:34 PM

#

tawny acorn @here the cutoff for Mythomax should now be resolved. Airoboros is not fixed yet...

what about synthia?

tawny acorn Oct 2, 2023, 9:35 PM

#

Synthia is also not fixed yet

copper knot Oct 12, 2023, 11:56 AM

#

tawny acorn Synthia is also not fixed yet

Synthia cuts off after like 30 tokens

vast moss Oct 13, 2023, 7:20 PM

#

it should have a 300 token cutoff right now

velvet parcel Oct 15, 2023, 4:45 PM

#

vast moss it should have a 300 token cutoff right now

Hi, when will this cutoff issue be fixed? Or at least in the meanwhile, could you show in a column besides each models info what is the token cutoff? Because otherwise one has to manually try all the models to see which are limited and which aren't, which isn't a very reliable method.

Thanks in advance.

vast moss Oct 20, 2023, 3:56 AM

#

velvet parcel Hi, when will this cutoff issue be fixed? Or at least in the meanwhile, could yo...

Deploying a change now to the /api/v1/models endpoint that will at least allow you to get the top_provider's max_completion_tokens, for each model

velvet parcel Oct 20, 2023, 4:30 AM

#

vast moss Deploying a change now to the `/api/v1/models` endpoint that will at least allow...

Thanks. Is there an ETA for the limit increase?

vast moss Oct 21, 2023, 5:19 PM

#

Not at the moment, unfortunately

wary stump Oct 23, 2023, 6:51 PM

#

I was getting same issue in the playground few weeks back with Claude. So looking forward to seeing it fixed as otherwise it is not usable for me. BTW - is it only an issue in the playground and if I use python there will be no problems with cutoff in Claude?

woeful gate Oct 23, 2023, 7:07 PM

#

What did you set max_token to for Claude in the playground? The default is I think 300 tokens.

wary stump Oct 23, 2023, 7:10 PM

#

woeful gate What did you set max_token to for Claude in the playground? The default is I th...

Yeah, I incresed it now and experimenting. Thanks!

woeful gate Oct 23, 2023, 7:12 PM

#

Same in python. You must set an value for max_tokens :)

plain raven Oct 24, 2023, 11:14 AM

#

Just checking in, cause i am having issues with GPT-4 cutoffs as well. I see that the GPT-4 model has very low prompt and completion token limits. is that what you're all referring to? I'm worried about it basically becoming unusable with such a low context size.

Here's what the models endpoint returns:

{
  "id": "openai/gpt-4-0314",
  "pricing": {
    "prompt": "0.00003",
    "completion": "0.00006",
    "discount": 0
  },
  "context_length": 8191,
  "top_provider": {
    "max_completion_tokens": 8191
  },
  "per_request_limits": {
    "prompt_tokens": "666",
    "completion_tokens": "333"
  }
},
{
  "id": "openai/gpt-4-32k",
  "pricing": {
    "prompt": "0.00006",
    "completion": "0.00012",
    "discount": 0
  },
  "context_length": 32767,
  "top_provider": {
    "max_completion_tokens": 32767
  },
  "per_request_limits": {
    "prompt_tokens": "333",
    "completion_tokens": "166"
  }
},

vast moss Oct 24, 2023, 8:19 PM

#

@plain raven this is a function of your account balance. if you increase it, your limits go up. see openrouter.ai/docs#limits. is that helpful?

#

(There has to be a limit here otherwise ppl go negative)

plain raven Oct 24, 2023, 8:19 PM

#

i have 20 USD in my acc. balance. wouldn't that be enough? (just checked: Credits: $27.385)

vast moss Oct 24, 2023, 8:19 PM

#

hm

#

That doesn't seem right then

#

looking

plain raven Oct 24, 2023, 8:21 PM

#

i can check with different api keys if that'd be helpful. maybe i got a "bad" one 😅

vast moss Oct 24, 2023, 8:21 PM

#

oh is this using a limited api key?

#

if it is, then that key's limit is what matters

plain raven Oct 24, 2023, 8:21 PM

#

no, it shouldn't be. but let me check real quick

#

nope it's not. sk-or-v1-db7...ded is the one in my account giving me these results

vast moss Oct 24, 2023, 8:38 PM

#

ah found the issue, working on a fix

#

thanks for flaggin

#

damn, it's been showing the free-tier limits to everyone (in /models)

#

in just that endpoint

plain raven Oct 24, 2023, 8:39 PM

#

okay, so does that only apply to the API? or generation too?

vast moss Oct 24, 2023, 8:39 PM

#

not when you actually make completion requests

plain raven Oct 24, 2023, 8:39 PM

#

gotcha!

vast moss Oct 24, 2023, 8:39 PM

#

then it uses the correct limits

vast moss Oct 24, 2023, 10:03 PM

#

This should be fixed now!

vast moss Oct 30, 2023, 4:56 AM

#

@velvet parcel fyi, we re-routed airoboros to DeepInfra, which improves the context length and uptime

velvet parcel Nov 5, 2023, 6:55 AM

#

@vast moss i'm noticing a lot of random cutoff in the AI responses. Happens with open hermes llama 13b which I've been using for a long time (which btw is very slow currently, was very fast), and also with open hermes mistral 7b.
The messages cut off at random lengths, could be 70 tokens or more than 200, but are mostly in the shorter side.
It has never happened before.

#

#

It's happening with the other model i'm using too (nousresearch/nous-hermes-llama2-70b). Strange.

#

Happens in the playground too

#

(that was open hermes mistral 7b with max tokens 4096)

#

Also I noticed that open hermes llama 13b started appending <s/> to responses. Both in my app and in the Playground.

tropic vigil Nov 5, 2023, 9:14 PM

#

the cutsoff make coding impossible. Even if I tell it to continue, it will make a different code, and then get cuts off again.

velvet parcel Nov 6, 2023, 2:33 AM

#

@vast moss can you confirm if you're looking into this?

vast moss Nov 6, 2023, 3:12 AM

#

@tawny acorn can you see if you can reproduce this? I couldn't on my end

#

Ah just repro'd

#

@velvet parcel Can you try Mistral now? Deploying a change that might fix it

#

Should also double the tokens per second!

quiet swallow Nov 6, 2023, 3:32 AM

#

Deep into a chat in SillyTavern, using XWin model, I get the following error message despite using the attached prompt settings. I am under the impression the LLM is set up to return 300 tokens no matter the setting I use.

#

Ah, just read some earlier messages that this might be intentional.

vast moss Nov 6, 2023, 8:35 AM

#

Yeah Xwin is intentionally limited to 300

#

@velvet parcel all the models you mentioned should be fixed

#

including codellama and phind, @tropic vigil . thanks for flagging this

#Limited outputs and cut offs