Question about model billing patterns | OpenRouter | Page 1

unborn egret May 6, 2024, 8:13 AM

#

I'm seeing a lot of situations, across a variety of models where a single request (e.g. a single response from Silly Tavern) is leading to two rows on the billing (if you look at the timing you will see these are typically 1 sec after the original), with those second "responses" typically being only 1 token (occasionally 2 or 0).
Because these appear to be causing the input tokens to be counted again they are essentially near enough doubling the cost.
Any ideas what would be causing this and how to prevent it?

shy trench May 6, 2024, 8:20 AM

#

Likely ST having a retrying pattern?

#

Looks pretty consistent

unborn egret May 6, 2024, 8:27 AM

#

shy trench Likely ST having a retrying pattern?

Any idea what would be causing that sort of behaviour? The model response is coming through ok, so this second one seems to follow the successful reply.

shy trench May 6, 2024, 8:28 AM

#

unborn egret Any idea what would be causing that sort of behaviour? The model response is com...

Can you try hitting with a similar request on the playground: https://openrouter.ai/playground

OpenRouter

A router for LLMs and other AI models

unborn egret May 6, 2024, 8:30 AM

#

shy trench Can you try hitting with a similar request on the playground: https://openrouter...

Not entirely sure how I would do that to be honest? (you've seen the ST logs for how it is assembling the various bits across a character and LB)

shy trench May 6, 2024, 8:30 AM

#

Oh what I mean by that is try to see if yuo can replicate the issue on the playground

#

(with the same model)

#

It seems to work for me i.e, no duplicate response)

#

So might be a feature within ST :-?... @barren nymph - is there some kind of retry feature in ST?

unborn egret May 6, 2024, 8:32 AM

#

shy trench Oh what I mean by that is try to see if yuo can replicate the issue on the playg...

ok - just run a quick prompt on playground against that model and there is no issue at all with that.

shy trench May 6, 2024, 8:33 AM

#

Yeah so not related to your networking. If the router does any retry, it'll be on the same transaction (you don't get charged extra if we retry with another provider)

unborn egret May 6, 2024, 8:35 AM

#

shy trench Yeah so not related to your networking. If the router does any retry, it'll be o...

gotcha, thanks.
Strange thing is these 0-2 token ones seem to follow immediately on the heels of a good response (and are happening while I am still reading what it has sent back - i.e. within 1-2 seconds of drafting its response)

barren nymph May 6, 2024, 8:53 AM

#

Auto-retry exists if your model does not produce anything but an EOS token or otherwise empty reply and you don't have streaming enabled

unborn egret May 6, 2024, 8:55 AM

#

I have streaming enabled for these, and the empty/single token replies are coming within 1-2 seconds of the successfully generated reply

#

I have objectives and summarise switched off so AFAIK there should not be anything making a separate API call in tandem

unborn egret May 7, 2024, 6:25 PM

#

@shy trench just a quick follow-up on this to note that although the example I posted above is 1:1 good:problematic replies, other LLMs are exhibiting a different frequency of the issue. You can see that I had a patch of solid responses, but another patch of alternating problem ones. No change at all in ST settings in that window - this was all a continuation of a single roleplay with the same character.

unborn egret May 7, 2024, 7:00 PM

#

barren nymph Auto-retry exists if your model does not produce anything but an EOS token or ot...

Hi Cohee, is there any way I can turn off the auto-retry to confirm that is having no impact here? I do have streaming enabled, and these blank hits always follow immediately on solid responses (without any further input from me). I cannot think of anything else in ST that I am aware of that could be driving this "double-tap" but because it is essentially doubling all billing using OR I am keen to resolve it. Hope you don't mind the ping on this, but wanted to double-check before I resort to an uninstall/reinstall of ST to see if that fixes.

barren nymph May 7, 2024, 7:18 PM

#

Just disable everything that sends background requests

#

Summary is the likely offender.

unborn egret May 7, 2024, 7:21 PM

#

barren nymph Summary is the likely offender.

AFAIK anything non-core is disabled, incl. Summary and Objectives (I use almost nothing bar these two)
One last question - if ST were in any way responsible for this, would it show in the console?

barren nymph May 7, 2024, 7:48 PM

#

You will see all the requests there

unborn egret May 8, 2024, 3:30 AM

#

Thank you, very much appreciated.
@shy trench I have tripple checked that anything in ST that could cause that is turned off. Will rerun, confirm I still see the issues and send you the log if case you can spot anything that you think could cause this

shy trench May 8, 2024, 4:55 AM

#

@barren nymph OR received 2 requests with the following diff:

https://www.diffchecker.com/FZ7W1tIO/

Looks like the 2nd request included more than just the completion of the first request. The diff shows there're injected contexts in the middle :-?... Does it look familiar to you?

Modred diff - Diffchecker

[You will play each of the crew members

#

(Both are prompts btw)

#

Is it possible that ST is incrementally sending prompt + completion to extract as much generation out of as single run as possible, and that the 2nd one was stopped because the model deemed it's the end of the convo?

unborn egret May 8, 2024, 6:16 AM

#

It looks as though something is causing the "Continue" command to autofire, but I cannot for the life of me work out why that would auto-trigger in parallel to the first request being sent.

hidden meadow May 8, 2024, 6:18 AM

#

unborn egret It looks as though something is causing the "Continue" command to autofire, but ...

Hm, I had a similar problem happen but not with auto-continue option, but with "Request Token Probabilities" option, it fired off regardless of it's being enabled or disabled, and added logprobs: 10 to req, causing a 400 error on Aphro backend
it was on ST 1.11.8

unborn egret May 8, 2024, 6:21 AM

#

hidden meadow Hm, I had a similar problem happen but not with auto-continue option, but with "...

Also on ST 1.11.8

hidden meadow May 8, 2024, 6:22 AM

#

unborn egret Also on ST 1.11.8

In the case of my logprobs issue it got fixed on it's own when I switched to staging branch sayuwuThink

unborn egret May 8, 2024, 8:04 AM

#

ok, thanks, good to know

barren nymph May 8, 2024, 10:02 AM

#

unborn egret It looks as though something is causing the "Continue" command to autofire, but ...

Theres “auto continue” function in advanced settings. But you have to explicitly enable it for chat comps

unborn egret May 8, 2024, 3:45 PM

#

barren nymph Theres “auto continue” function in advanced settings. But you have to explicitly...

Strange...this was disabled, however enabling, then disabling and re-starting appears to have fixed the issue. Need to test a little more extensively, but neither of the first couple of tests generated the problem whereas it was occurring every message with this LLM (but not all LLMs). Wondering whether the inconsistency of the occurrence could be driven by differing frequencies in the cut-off from different models.

Thank you all again for the pointers, patience and troubleshooting here - very much appreciated.

#Question about model billing patterns