#video description
1 messages · Page 1 of 1 (latest)
Hi @oak sierra. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:
To use GPT-4's image capabilities to get a description of a video, you would first need to extract frames from the video. This can be done using OpenCV, a popular library for computer vision tasks. Here's an example of how you can do this:
import cv2
import base64
video = cv2.VideoCapture("data/bison.mp4")
base64Frames = []
while video.isOpened():
success, frame = video.read()
if not success:
break
_, buffer = cv2.imencode(".jpg", frame)
base64Frames.append(base64.b64encode(buffer).decode("utf-8"))
video.release()
print(len(base64Frames), "frames read.")
In this example, the video is read frame by frame, and each frame is encoded into a base64 string and stored in a list.
Once you have the frames, you can craft a prompt and send a request to GPT-4. Note that you don't need to send every frame for GPT-4 to understand what's going on. Here's an example of how you can do this:
from openai import OpenAI
PROMPT_MESSAGES = [
{
"role": "user",
"content": [
"These are frames from a video that I want to upload. Generate a compelling description that I can upload along with the video.",
*map(lambda x: {"image": x, "resize": 768}, base64Frames[0::50]),
],
},
]
params = {
"model": "gpt-4-vision-preview",
"messages": PROMPT_MESSAGES,
"max_tokens": 200,
}
client = OpenAI()
result = client.chat.completions.create(**params)
print(result.choices[0].message.content)
In this example, a prompt is created that includes the base64-encoded frames, and a request is sent to GPT-4 to generate a description of the video. The result is then printed.
Please note that as of the time of the information provided in the extracts, the image input feature is a research model and is not available for public use. There are no official timelines on its release (source (https://community.openai.com/t/309294)).