How can I stream results out of my runpod serverless worker?
Some background:
I am **NOT **using VLLM
I **am **using yield in my handler
Hitting the v2/{endpoint_id}/stream/{job_id} endpoint gives me back a JSON containing the partial response chunks and "consumes" the "queue". Am I expected to poll the /stream endpoint? Because if so I'm going to poll it every 50ms since it doesn't appear to be rate limited.
What I was expecting was a ReadableStream... did I misconfigure something? Should I mark my generator handler as async? Does setting return_aggregate_stream: True return a ReadableStream under /run?