#slow Response

30 messages · Page 1 of 1 (latest)

minor palm
#

I’ve been using PaLM 2 Code Chat and I get response within 5seconds to 7 secs. But for a while now it’s been 20 - 30 seconds. I tried other models too and it’s that slow. So I guess it’s from open router.

frank scaffold
#

Yes, same here. Responses are pretty slow with all models but worst with OpenAI models

minor palm
frank scaffold
#

Yeah, it must be an issue on OpenRouter’s end. Calls to OpenAI API directly are fine and quick.
Hopefully this is fixed soon!

minor palm
exotic ingot
#

Same here

dapper oyster
#

Can you try again? We just made a change that should have sped things up a lot

minor palm
minor palm
orchid junco
#

@dapper oyster I've been noticing the same. Extremely long waiting times to start getting a response, more than 10 seconds. Happens with all the models I use, including Mistral 7b OpenHermes

#

Can we know the cause?

frank scaffold
#

I hope this is fixed!

frank tendon
glad vapor
#

just waiting it out is the best

#

or just pushing through with a 10 minute conversation with 3 messages

dapper oyster
#

Working on adding new providers so we have more options for the same models. Right now, we can’t really control the “Host” times on the activity page except by deploying the model ourselves, and when we do, it faces its own load issues

#

Will be scaling up our own deployments to help

frank scaffold
#

Getting pretty slow responses with MythoMax 13B.

formal saddle
#

Mystral 7B

#

Meta Llama-2 13B

#

GPT 3.5 16k

#

In all of them my query was "Hello there"

#

From the OpenRouter activity page, the speed seems fine though

#

My internet is fast , I'm using the openai npm package to make requests with stream on.

Here is the difference when using a direct call to OpenAI with the same parameters:

#

Please let me know if you need any more info that might be useful to debug this, and thanks for the support.

dapper oyster
# formal saddle Mystral 7B

This is wild. We're working on it - launching a few experiments today. Part of the issue is that the main Mystral and Llama providers are going down more than normal. OpenAI is also showing massive latency increases, but it's very sporadic. And if OpenAi returns a 502, we redirect to Azure, which seems to be happening more now

#

and the "Host" column in Activity just currently shows the latency for the one successful host. other latency metrics are on the way

formal saddle
# dapper oyster This is wild. We're working on it - launching a few experiments today. Part of t...

Thanks for the quick reply and for looking into this @dapper oyster !

I just want to point out that this issue is not only in these models.

I just tried the same query in the following models and here are the total times it took for each:

Claude-1:
Waiting for server response: 15s
total: 36s

Claude-2:
Waiting for server response: 50.61s
total: 51s

Zephyr:
Waiting for server response: 26s
total: 54s

PaLM-2:
Waiting for server response: 50s
total: 50s

I can't concretely deduce anything but it seems to be an issue from OpenRouter's server itself. Is it possible to check the CPU and RAM usage/limits to verify?

dapper oyster
#

Yeah we are definitely doing something wrong - possibly when autoscaling. We're switching to new infra to try to fix this

formal saddle
#

Cross reference with host time from activity if you need it