How to stream via OPENAI BASE URL? | Runpod | Page 1

full dew May 4, 2024, 3:26 PM

#

Does the OPENAI BASE URL support Server-sent events (SSE) type of streaming?

I was working previously with Ooba streaming was working fine. Since we switched to vLLM/Serverless it is no longer working.

If this is not done via SSE, Is there perhaps any tutorial you could recommend how to achieve streaming, please?

full dew May 4, 2024, 4:03 PM

#

SSE is the recommended way for an openai (compatible) API: https://platform.openai.com/docs/api-reference/streaming

I have a bad feeling RunPod doesm't support this yet. If not, please put this on the roadmap. Thanks

full dew May 5, 2024, 8:34 AM

#

Actually I think this was a false alarm. I was looking at this documentation, which I believe is outdated:

https://doc.runpod.io/reference/llama2-13b-chat

It is no longer needed to do a loop. It seems the OpenAI Base Url: https://api.runpod.ai/v2/vllm-{{endpoint_id}}/openai/v1 with stream:true is already supporting SSE specification.

Amazing.

chrome lily May 5, 2024, 9:45 AM

#

It's a different endpoint sorry

chrome lily May 5, 2024, 9:45 AM

#

full dew Actually I think this was a false alarm. I was looking at this documentation, wh...

Yes it's the right endpoint for openai, supports the client library well

#

Check out the docs for the right openai endpoint usage

full dew May 5, 2024, 11:29 AM

#

Sorry, maybe I wasn't able to find the most up-to-date docs. The ones I found keep mentioning Run endpoint with loop.

RunPod need to do better with Docs and explain the OpenAI endpoint better. In fact the whole Run and SyncRun are not needed when using OpenAI endpoints. Docs need to highlight that.

chrome lily May 5, 2024, 1:58 PM

#

https://docs.runpod.io/category/vllm-endpoint

vLLM Endpoint | RunPod Documentation

Run any LLM model with RunPod's vLLM Worker.

#

This one

#

https://docs.runpod.io/serverless/workers/vllm/get-started

Get started | RunPod Documentation

RunPod provides a simple way to run large language models (LLMs) as a Serverless Endpoint.

full dew May 5, 2024, 2:49 PM

#

Yes, but if you look at the Streaming section it assumes the user is utilising Python with the OpenAI library.

And it suggests to do a loop:

for response in response_stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

This is not needed, because the OpenAI endpoint is already supporting SSE. An SSE implementation in any programming language will take care of looping automatically.

It would be good to mention that.
Thanks

chrome lily May 5, 2024, 5:40 PM

#

full dew Yes, but if you look at the Streaming section it assumes the user is utilising P...

Ohh yeah yeah

chrome lily May 5, 2024, 5:40 PM

#

full dew Yes, but if you look at the Streaming section it assumes the user is utilising P...

would you mind helping me by creating a #1185337232517759028 on that? dont forget to also mention the docs link please 🙂

full dew May 5, 2024, 7:35 PM

#

Sure 🙂

#

Done

chrome lily May 5, 2024, 7:54 PM

#

noice thanks bro 🙂

#

thats great

chrome lily May 6, 2024, 5:48 AM

#

full dew Yes, but if you look at the Streaming section it assumes the user is utilising P...

Im curious what will the code look like if we use the looping that is automatically handled by the code

#

can you send it here? i wanna see it

chrome lily May 6, 2024, 8:12 AM

#

@full dew

full dew May 6, 2024, 8:47 AM

#

Sure.

Server-sent events (SSE) clients simplify the process of handling asynchronous data. Unlike traditional methods where a POST request and repeated polling are necessary to retrieve and monitor job data until completion, SSE automates this process by managing asynchronous events behind the scenes.

For Swift implementations, you can use https://github.com/launchdarkly/swift-eventsource. This library allows replacing semaphores with a more modern async/continuation pattern, eliminating the need for polling. The client listens to the incoming data stream and triggers onClosed() when the stream ends. Code example: https://github.com/launchdarkly/swift-eventsource/issues/75#issuecomment-2032533650

Although I have limited experience with Python's SSE clients, I have tested the sseclient-py package available at https://github.com/mpetazzoni/sseclient. Another notable Python SSE client is detailed at https://github.com/btubbs/sseclient with a useful guide here: https://maxhalford.github.io/blog/flask-sse-no-deps/.

I am not sure if Python's implementation is non-blocking as I have not extensively used it. However, ideally, it should be asynchronous to serve its purpose effectively. In Swift, I can confirm that it functions perfectly without blocking.

chrome lily May 6, 2024, 10:59 AM

#

full dew Yes, but if you look at the Streaming section it assumes the user is utilising P...

i feel like thats a nice way already to demonstrate how to receive the stream, but is there some other way that you can think of ? like without using any library

#

that looks fine already ( the for loop ) and doesn't seem to be need to be updated unless there is some other great alternative right

#

for an example

full dew May 6, 2024, 2:23 PM

#

Sure, I don't mind. The ones above are SSE libraries. I just thought I point it out. But yeah you can keep it as it is, just mention that SSE is supported and maybe it would be beneficial to mention the difference between OpenAPI Base URL and Run/ SyncRun. But it's really up to you.

chrome lily May 6, 2024, 2:25 PM

#

full dew Sure, I don't mind. The ones above are SSE libraries. I just thought I point it ...

Yeah that difference has to be further explained

#How to stream via OPENAI BASE URL?