Have a VM where I serve products. I want to rerank them based on a custom personalized transformer (personalized to user history); this means sending the user history + 500 items that need to be reranked to the runpod server.
I understand this will add more latency if my CPU-only VM server is on OVH and I have to query a runpod api endpoint to do inference. Any thoughts on my architecture?