hi, I have a smalkl, custom transformer model used for reranking user recommendations(trained in pytorch -> exported to onnx -> served with go server in serverless load balancing).
wanted to share my feedback. it's not looking too hot given architecture. Going to try for non-serverless (pods), and see if that reduces my error rates