#Custom transformer reranker endpoint. To offload to runpod serverless?

16 messages · Page 1 of 1 (latest)

regal merlin
#

Have a VM where I serve products. I want to rerank them based on a custom personalized transformer (personalized to user history); this means sending the user history + 500 items that need to be reranked to the runpod server.

I understand this will add more latency if my CPU-only VM server is on OVH and I have to query a runpod api endpoint to do inference. Any thoughts on my architecture?

upbeat galleonBOT
strange escarp
#

This looks good to me! The latency added is gained back in how much faster the inference will be in the long term.

regal merlin
strange escarp
#

It depends on how much traffic you get, active helps if you're getting a lot of users pretty steadily because you can gurantee a certain amount of capacity and scale up dynamically.

regal merlin
#

maybe 1 request every few seconds. and each request should only be 5-20 ms. are we billed for idle time on either in between requests for thanks @strange escarp

#

also idk where to open this but our repo has a ton of branches, and it is truncating it and I am not able to find the branch that I am working on.

clear crow
#

can you search (type) on the box?

regal merlin
#

yea i created a bug ticket

regal merlin
#

serverless was pretty bad lol

#

idek how im getting these errors

#

was getting 50ms onprem cpu

clear crow
#

Wondering what kind of errors is it?