#Limited outputs and cut offs

176 messages · Page 1 of 1 (latest)

copper knot
#

Hi, Im using openrouter with venus chub but with ANY model, it gets cut off at about 200tokens. What do I do??

tawny acorn
#

Does venus has an option to increase the max_tokens? Likely an advanced params

copper knot
tawny acorn
copper knot
tawny acorn
#

and it still output only about 200 tokens??

#

I tried with this, got 314 tokens:

copper knot
tawny acorn
#

Tried again, and got 500-ish tokens

#

Yeah I think something's wrong on chub + openrouter integration - everyone's seeing it

copper knot
#

Hmm okay I’ll tell the chub discord about it

copper knot
tawny acorn
#

Just pushed a hotfix that might fixed this

copper knot
#

thank youuu

tawny acorn
#

Awesome!

copper knot
tawny acorn
#

@copper knot for the cutoff - it's weird. Can you ask what venus is passing down for the max_tokens?

tawny acorn
#

Do they have a slider somewhere that let you configure the max amount of token generated?

tawny acorn
#

By default, if they pass nothing down, we default to 256

#

But I'm not sure what chub's actually passing down

copper knot
#

i think thats what you mean? right?

tawny acorn
copper knot
tawny acorn
#

Can you try debug the network tab? I.e, press F12, look at the network tab

#

can you send me a screenshot of what Chub's sending OpenRouter

copper knot
#

im using mac rn

tawny acorn
#

Yeah f12 should open up inspector for the venus web app

#

then, navigate to the network tab

#

then try making a request

#

you should see a new item in the network - one of them should be openrouter's completion endpoint

#

check what chub's sending us for max_tokens

copper knot
tawny acorn
#

that looks right to me lol

#

how cut-off is it btw? Like it doesn't finish all the way?

#

Try asking it to give you a long essay and see if it's actually "cut-off" or if your prompt can only generate that much.

copper knot
#

consistently

#

also, I checked the payload to messages, and noticed that every model only remembers about 20 messages

#

no matter how much context you have

#

my prompt isn't that long so..

tawny acorn
pine plover
copper knot
#

at least for me

eternal cedar
#

Wow I came on here with the exact same question

#

No matter the model I use, it always gets cut off early. Maybe after about 200 words. Same thing every model in playground

tawny acorn
#

@eternal cedar on playground, you will need to set the max_tokens up to see it do more:

pine plover
#

Because it sometimes automatically switches your model from the filtered to the unfiltered one. Mythomax in my case, and this model specifically cuts off any answers unless you play with the max tokens settings

#

Maybe you could check in your activity in open router?

tawny acorn
tawny acorn
pine plover
#

Even if you choose one model, it might sometime switch to another

tawny acorn
#

Try using Mythalion

pine plover
pine plover
#

Oh yeah

copper knot
#

wait how

pine plover
#

That’s the case

tawny acorn
#

because something in the prompt was marked as nsfw/flagged by your original model

pine plover
#

It usually switches to Mythomax if you do nsfw or anything close to that

tawny acorn
#

^ this will make it fallback to mythomax yeah

copper knot
#

damn

#

i mean I just want the cutoff to be fixed

#

desperately

pine plover
#

Then it’s the model’s issue unfortunately 😔

#

I usually increase the max tokens to 500-600 with Mythomax, and the cutoffs are usually rare

tawny acorn
pine plover
#

Oh

tawny acorn
#

But with Mythalion you should be able to I think

pine plover
#

Weird, but I guess try playing with other models

eternal cedar
#

Phind code llama still outputting 200 wordsish
Code llama broke and is only outputting [INST] no matter what it is prompted

tawny acorn
eternal cedar
#

Also token count is wildly different in playground vs activity

tawny acorn
#

The output is indeed a bit ghetto doh

tawny acorn
#

Llama2 family of model uses its own tokenizer, so on the activity you will see llama2 tokens instead of GPT tokens

#

Llama2 vocab size is 32k vs GPT vocab is 100k

eternal cedar
#

Codellama is totally broken for me now and only outputs [INST], tried changing tokens back to 300, tried clearing browser cash, tried deleting and making new character

#

But it appears Phind is now able to display over 200 words. Lol

tawny acorn
#

What's the last message you sent codellama before it outputs [INST]?

eternal cedar
#

Woahhhh okay heres a hint

#

Seems to be something in the browser window, after I tabbed out and back into the playground it truncated the response I got from like 800 characters back down to 200

eternal cedar
#

"Write a snake game using html and javascript. Only output entire code files"

Was the prompt, but that was unchanged when it broke. Had used the same prompt before

#

Think I figured out the problem, at least with Phind.... It appears to be a browser issue (using brave).

If stay on the window thorugh the entire generation it will generate the whole thing. If I don't, it will truncate down to about 200 words when I go back to the window (but still charge for the full API usage and show token count in activity).

Basically, it appears I have keep the window active during generation or it will fail/truncate.

Once it is generated fully, it stays even if I go elsewhere

tawny acorn
#

Pushing a fix to the inactive window issue - should be up in a bit

eternal cedar
#

🚀 Sweet! Ty

tawny acorn
eternal cedar
copper knot
#

cuz I paid in order to use gpt 4

#

I tried to delete everything from my bot that is 'nsfw' but its not working..

tawny acorn
# copper knot hey any way to fix this?

The flagging is done by OpenAI so we have no control over it. Try removing jailbreaks and system prompt. (i.e, your character card must also not include nsfw stuff)

copper knot
#

how do I get it to stop cutting off

pine plover
timber wedge
#

I'm confused at the cut off... I tried Pygmalion, MythoMax L2 13B (beta), both got cut off at 250 tokens. Although I setup max_tokens to 8192.

woeful gate
#

i have the same problem

tawny acorn
#

cc @vast moss

tropic vigil
#

can confirm this behavior even though max_tokens are set

velvet parcel
#

This happens with most models, with or without max_tokens set. Makes most models unusable for my use case (chatbot). Would really appreciate if it's fixed.

woeful gate
#

Yes for me too

velvet parcel
#

@vast moss any way to fix this? Or is it a limit of your models? I tried the same model in other platforms (hface, deepinfra) and the limitation isn't present

vast moss
#

We’re working on a fix for this @velvet parcel - right now some open source models have an output limitation in order to support extremely long (eg 8k token) prompts.

#

We’re doing a refactor of some internal stuff cc @tawny acorn and then doing this next

velvet parcel
velvet parcel
vast moss
#

Likely a week or so. Just curious, is the deepinfra cost for airoboros higher? I can’t tell from their site

velvet parcel
#

I'm just doing some tests with it right now. I'd prefer to only use openrouter. About the price, it shows $0.001 per thousand tokens on the model's page. (though it's not the same version of the model as the one in openrouter)

vast moss
#

Will post an update here when we're closer to launching this

#

Limited outputs and cut offs

tawny acorn
#

@here the cutoff for Mythomax should now be resolved. Airoboros is not fixed yet.

tawny acorn
#

Synthia is also not fixed yet

copper knot
vast moss
#

it should have a 300 token cutoff right now

velvet parcel
# vast moss it should have a 300 token cutoff right now

Hi, when will this cutoff issue be fixed? Or at least in the meanwhile, could you show in a column besides each models info what is the token cutoff? Because otherwise one has to manually try all the models to see which are limited and which aren't, which isn't a very reliable method.

Thanks in advance.

vast moss
velvet parcel
vast moss
#

Not at the moment, unfortunately

wary stump
#

I was getting same issue in the playground few weeks back with Claude. So looking forward to seeing it fixed as otherwise it is not usable for me. BTW - is it only an issue in the playground and if I use python there will be no problems with cutoff in Claude?

woeful gate
#

What did you set max_token to for Claude in the playground? The default is I think 300 tokens.

wary stump
woeful gate
#

Same in python. You must set an value for max_tokens :)

plain raven
#

Just checking in, cause i am having issues with GPT-4 cutoffs as well. I see that the GPT-4 model has very low prompt and completion token limits. is that what you're all referring to? I'm worried about it basically becoming unusable with such a low context size.

Here's what the models endpoint returns:

{
  "id": "openai/gpt-4-0314",
  "pricing": {
    "prompt": "0.00003",
    "completion": "0.00006",
    "discount": 0
  },
  "context_length": 8191,
  "top_provider": {
    "max_completion_tokens": 8191
  },
  "per_request_limits": {
    "prompt_tokens": "666",
    "completion_tokens": "333"
  }
},
{
  "id": "openai/gpt-4-32k",
  "pricing": {
    "prompt": "0.00006",
    "completion": "0.00012",
    "discount": 0
  },
  "context_length": 32767,
  "top_provider": {
    "max_completion_tokens": 32767
  },
  "per_request_limits": {
    "prompt_tokens": "333",
    "completion_tokens": "166"
  }
},
vast moss
#

@plain raven this is a function of your account balance. if you increase it, your limits go up. see openrouter.ai/docs#limits. is that helpful?

#

(There has to be a limit here otherwise ppl go negative)

plain raven
#

i have 20 USD in my acc. balance. wouldn't that be enough? (just checked: Credits: $27.385)

vast moss
#

hm

#

That doesn't seem right then

#

looking

plain raven
#

i can check with different api keys if that'd be helpful. maybe i got a "bad" one 😅

vast moss
#

oh is this using a limited api key?

#

if it is, then that key's limit is what matters

plain raven
#

no, it shouldn't be. but let me check real quick

#

nope it's not. sk-or-v1-db7...ded is the one in my account giving me these results

vast moss
#

ah found the issue, working on a fix

#

thanks for flaggin

#

damn, it's been showing the free-tier limits to everyone (in /models)

#

in just that endpoint

plain raven
#

okay, so does that only apply to the API? or generation too?

vast moss
#

not when you actually make completion requests

plain raven
#

gotcha!

vast moss
#

then it uses the correct limits

vast moss
#

This should be fixed now!

vast moss
#

@velvet parcel fyi, we re-routed airoboros to DeepInfra, which improves the context length and uptime

velvet parcel
#

@vast moss i'm noticing a lot of random cutoff in the AI responses. Happens with open hermes llama 13b which I've been using for a long time (which btw is very slow currently, was very fast), and also with open hermes mistral 7b.
The messages cut off at random lengths, could be 70 tokens or more than 200, but are mostly in the shorter side.
It has never happened before.

#

It's happening with the other model i'm using too (nousresearch/nous-hermes-llama2-70b). Strange.

#

Happens in the playground too

#

(that was open hermes mistral 7b with max tokens 4096)

#

Also I noticed that open hermes llama 13b started appending <s/> to responses. Both in my app and in the Playground.

tropic vigil
#

the cutsoff make coding impossible. Even if I tell it to continue, it will make a different code, and then get cuts off again.

velvet parcel
#

@vast moss can you confirm if you're looking into this?

vast moss
#

@tawny acorn can you see if you can reproduce this? I couldn't on my end

#

Ah just repro'd

#

@velvet parcel Can you try Mistral now? Deploying a change that might fix it

#

Should also double the tokens per second!

quiet swallow
#

Deep into a chat in SillyTavern, using XWin model, I get the following error message despite using the attached prompt settings. I am under the impression the LLM is set up to return 300 tokens no matter the setting I use.

#

Ah, just read some earlier messages that this might be intentional.

vast moss
#

Yeah Xwin is intentionally limited to 300

#

@velvet parcel all the models you mentioned should be fixed

#

including codellama and phind, @tropic vigil . thanks for flagging this