#High granularity average API output speed for popular models

11 messages · Page 1 of 1 (latest)

crude cypress
#

(This is a bit more of a wish)
I'd really love to see OR using it's massive usage data to do graphs of high granularity(like hourly or sub-hourly) output speed for popular models(say, 1B+ tokens per day), this would really help monitoring the state of things, for example, GPT-5 recently has pretty significant speed flutuations

crude cypress
#

@last compass

last compass
#

cc @warm ice - we're doing pretty fine-grained analytics on this now

#

will check but i think we should have been able to show this

warm ice
#

I was actually just about to ship this today or tomorrow

#

ah i see the specific stats youre asking about now

#

yeah so the thing im shipping soon is for 1h/1d/1m views on your user activity charts

#

we've been laying the foundation for all the other charts (global token counts, throughput, latency) to show minute & hour granularity as well, and have em in our internal dashboards already

just need to polish them up a little more and figure out good spots to show them

crude cypress
thin hazel
#

That would be great. I'm seeing massive variance in toks/s on some models and it would be really helpful if OR helps show which providers have a consistent speed

graceful gale
#

I'd also like to see the 1% lows and the 1% highs (basically, the lowest TPS recorded and the highest TPS recorded)