#Latency endpoint

2 messages · Page 1 of 1 (latest)

verbal wren
#

In addition to the :nitro endpoint, can you add a latency-optimized-only endpoint? For example I have a new project that classifies entries in real time when added to a database. Since classification output is only 1 token, all I really care about is TTFT.

obtuse arch