#How to stream via OPENAI BASE URL?

26 messages · Page 1 of 1 (latest)

full dew
#

Does the OPENAI BASE URL support Server-sent events (SSE) type of streaming?

I was working previously with Ooba streaming was working fine. Since we switched to vLLM/Serverless it is no longer working.

If this is not done via SSE, Is there perhaps any tutorial you could recommend how to achieve streaming, please?

full dew
full dew
#

Actually I think this was a false alarm. I was looking at this documentation, which I believe is outdated:

https://doc.runpod.io/reference/llama2-13b-chat

It is no longer needed to do a loop. It seems the OpenAI Base Url: https://api.runpod.ai/v2/vllm-{{endpoint_id}}/openai/v1 with stream:true is already supporting SSE specification.

Amazing.

chrome lily
#

It's a different endpoint sorry

chrome lily
#

Check out the docs for the right openai endpoint usage

full dew
#

Sorry, maybe I wasn't able to find the most up-to-date docs. The ones I found keep mentioning Run endpoint with loop.

RunPod need to do better with Docs and explain the OpenAI endpoint better. In fact the whole Run and SyncRun are not needed when using OpenAI endpoints. Docs need to highlight that.

chrome lily
#

This one

full dew
#

Yes, but if you look at the Streaming section it assumes the user is utilising Python with the OpenAI library.

And it suggests to do a loop:

for response in response_stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

This is not needed, because the OpenAI endpoint is already supporting SSE. An SSE implementation in any programming language will take care of looping automatically.

It would be good to mention that.
Thanks

chrome lily
full dew
#

Sure 🙂

#

Done

chrome lily
#

noice thanks bro 🙂

#

thats great

chrome lily
#

can you send it here? i wanna see it

chrome lily
#

@full dew

full dew
#

Sure.

Server-sent events (SSE) clients simplify the process of handling asynchronous data. Unlike traditional methods where a POST request and repeated polling are necessary to retrieve and monitor job data until completion, SSE automates this process by managing asynchronous events behind the scenes.

For Swift implementations, you can use https://github.com/launchdarkly/swift-eventsource. This library allows replacing semaphores with a more modern async/continuation pattern, eliminating the need for polling. The client listens to the incoming data stream and triggers onClosed() when the stream ends. Code example: https://github.com/launchdarkly/swift-eventsource/issues/75#issuecomment-2032533650

Although I have limited experience with Python's SSE clients, I have tested the sseclient-py package available at https://github.com/mpetazzoni/sseclient. Another notable Python SSE client is detailed at https://github.com/btubbs/sseclient with a useful guide here: https://maxhalford.github.io/blog/flask-sse-no-deps/.

I am not sure if Python's implementation is non-blocking as I have not extensively used it. However, ideally, it should be asynchronous to serve its purpose effectively. In Swift, I can confirm that it functions perfectly without blocking.

chrome lily
#

that looks fine already ( the for loop ) and doesn't seem to be need to be updated unless there is some other great alternative right

#

for an example

full dew
#

Sure, I don't mind. The ones above are SSE libraries. I just thought I point it out. But yeah you can keep it as it is, just mention that SSE is supported and maybe it would be beneficial to mention the difference between OpenAPI Base URL and Run/ SyncRun. But it's really up to you.

chrome lily